Reverse Intro

Table of contents

  • Introduction
  • Command line syntax
  • General concepts
  • standard adaptors
  • The default adaptor
  • Compilation and installation
  • Examples
  • Introduction

    reverse is a preprocessor that turns various programming languages like C,C++,java into powerful templating languages.

    The main idea is to have two independant languages and mix them together by embedding one language into the other. The main language of the input file is called the source language(SL), the other one the embedded language(EL). The SL is then translated into EL statements, and output in between the embedded EL-statements.


    Example: Hello World

      Hello reverse world {@
        for (int i=1;i<=5;i++) <@ !($i$)@>
      @}
          
            Hello reverse world !(1) !(2) !(3) !(4) !(5)
          
    This example illustrates a hello-world program for reverse. In this example, the source language is plaintext, and the embedded language is C++. The plaintext is translated into C++ output statements. This is the standard reverse mode plaintext_cpp.

    Used constructs:

    Command line syntax

    Currently reverse supports the following command line options:

          reverse [-i] [adaptor] [inputfile]
        

    Meanings of the command line options:

    -i - Enables inverse processing mode. Usually processing a file is started in SL context. When inverse mode is enabled, file processing is started in EL context. When you want to use reverse as a language preprocessor, you will probably want it to work in inverse mode.
    mode - The name of the adaptor that shall be used to process the files. If nothing is specified, the default adaptor will be used. Available adaptors
    inputfile - The name of the file that shall be processed. If nothing is specified, standard input will be processed.

    General concepts

    The reverse parser

    The reverse parser is implemented as a DFA with some additional variables. It parses the input stream and uses the current adaptor to recognize operators. When an operator is encountered, the context is switched and perhaps some output is done. In any context, only applicable operators are recognized. For example in SL context, the reverse embedding open operator (rembo) would not be recognized, but treated like normal text.

    Operators that would be recognized by the parser may be escaped by preceeding them with the escape sequence. Any operator preceeded by an escape sequence is treated like normal text. The escape sequence itself without a succeeding operator is not treated specially. In the case you want the literal text of the escape sequence before an operator, you have to put two escape sequences before the operator.

    The parser splits the input file into blocks. Each block contains data of exactly one context. The following example illustrates how an input is split into blocks.


    Example: How the input is split into blocks

              Hello reverse world{@
                for (int i=1;i<=5;i++) <@ !($i$)@>
              @}
            

    generated blocks
    Block dataBlock context
    'Hello reverse world'SL
    '\n for (int i=1;i<=5;i++) 'EL
    ' !('SL
    'i'ELExp
    ')'SL
    '\n'EL

    Context operators:

    embo - This tag is used in SL context and starts a nested EL context
    embc - This tag closes the current EL context
    embosl - Like embo, but spaces before this tag will be ignored. Example using trimmed tags
    embcsl - like embc, but a leading blank line behind this tag will be ignored. Example using trimmed tags
    rembo - This tag is used in EL context and opens a nested SL context
    rembc - This tag closes the current SL context
    rembosl - Like rembo, but if the following block contains a leading blank line, it will be cut of. This tag may improve readability of indented code. Example using trimmed tags
    rembcsl - Like rembc, but trailing whitespace of the last block is removed Example using trimmed tags
    vembo - Used in SL context to open ELExp context
    vembc - Closes current ELExp context
    hintb - Used in both SL and EL context to open hint context
    hinte - Closes hint context
    slcommb - Used in SL context to open SL-comment or in SL-comment context to open nested comment, if the adaptor supports comment nesting
    slcomme - Closes SL-comment or nested SL-comment
    elcommb - Used in EL context to open EL-comment or in EL-comment context to open nested comment, if the adaptor supports nested comments
    elcomme - Closes EL-comment or nested EL-comment
    esc - If the escape sequence preceeds any tag (including the escape seqeunce itself), this tag looses its special meaning and is considered part of the current block
    chString - This tag causes the adaptor to be switched. The rest of the line of that tag is treated like a command line for reverse. You can specify any options affecting adaptors and configuration. You cannot switch input files or display the help message using this command.

    Contexts

    A reverse file is split into contexts. A context is a section of the file delimited by tags. The content of the file is processed according to its context. The basic operation of reverse with regard to contexts is independant of the actual SL/EL combination.

    "SL context - This is the context for source language. All blocks in this context are translated into EL and then OL according to the current adaptors translation rules and then output.
    EL context - This context contains the embedded EL statements. Blocks of this context are translated into OL and then output.
    ELExp context - This context contains embedded EL expressions. They are translated into EL and the OL and output. The idea of expressions in a templating language is to generate code to output the expressions value.
    Hint context - Via this context, configuration data is submitted to the current reverse adaptor. The format of the configuration data depends entirely on the current adaptor, but the default format is name=value.
    Comment context - Blocks into this context are ignored. This context is intended for doing comments on reverse level. Those comments are useful when commenting out code containing reverse tags.

    Adaptors

    The general behaviour of reverse is independant from the actual SL/EL combination. An adaptor is the encapsulation of all stuff that is specific to the SL/EL combination. These are, among others, the translation rules and the operators.

    translators

    A translator is an program that translates blocks of one language into another. In reverse, SL,EL,ELExp are the languages that needs to be translated. Translators may also be chained, for example the translators SL->EL and EL->OL may be combined to form an SL->OL translator. An adaptor contains three translators.

    SL2EL - This translator translates source language to embedded language. When used to generate templating languages, SL will usually be plaintext, and EL will be some programming language like C++ or java. The SL->EL translator will convert the plaintext block into a C++ statement to output the text in the block (cout<< in C++, or print in java).
    ELExp2EL - This translator translates an embedded expression to EL. When used in a templating language, a statement will be generated that outputs the value of the expression. For example "$i$" would be converted to "cout<lt;i;"
    EL2OL - This translator is applied to all EL blocks (Regardless if they originally come from EL,SL or ELExp) to convert them into the block that is finally output. Usually this translator is the identity translator, i.e. it does nothing but passing its input on to its output unchanged.

    Tags

    The adaptor defines how the tags that are used as context operators look like. Currently, the adaptor simply contains a list of strings that represent the tags. No tag may have a beginning that is another tag, because the parser matches the tags as soon as possible. For example the tags {@ and {@@ may not occur together if they are applicable to the same context, because the parser would never recognize the {@@, but always interpret the beginning of this tag as {@ followed by an @-character.

    EL block and indentation control

    Reverse needs to know about the statement grouping syntax of the embedded language, because it has to group generated statements. Currently it is assumed that there is a begin and and end symbol for a statement group (e.g. {,} in C++ or java, or BEGIN,END in Pascal). These symbols are stored as strings.

    For better readability of the generated code, the code is indented at each block. To control indentation, there is an indentation factor that controls how many spaces are used to indent at each level. If the indentation factor is 0, no indentation is done at all.

    Comment control

    Comment delimiters are handled just like usual context operator tags. Their values are stored as strings.

    Hint processing

    Whenever the parser finnishes a hint block, a method of the adaptor is called with the hint block as parameter. The default implementation will examine the hint block to have the format "name=value". If the hint block matches this format, a special method is called. Otherwise, a fallback method for unrecognized hints is called. Any hints that are not recognized by the adaptor are treated as an error.

    Line number generation

    Many languages support associating the line numbers of generated code with the original line numbers. (#line - directive in C). If the EL-language supports this feature, reverse is capable to associate any OL-code generated by EL or ELExp contexts to the line numbers of the input file. For this purpose, the adaptor must implement the getPositionSetCode() and getPositionResetCode() methods. These methods are called with a position and generate the code to associate the following block with this position.

    Configuration files

    Configuration files contain hints that are passed to adaptors. In many cases it's handy to use a configuration file rather then specifying the hints inside the source file.

    Configuration file syntax

    A configuration file contains hints. Every line of the configuration file is processed as a hint. Lines that begin with a sharp (#) are ignored.

    Configuration file search paths

    Any configuration file that is specified on the command line is searched in several directories. The search is done in the following directories in this order.

    . - First the file is searched in the current directory
    ~/.reverse - Then the file is searched in the reverse configuration directory if it exists
    $REVERSE_CONFIG - The REVERSE_CONFIG environment variable may contain a list of directories separated by colons (:). These are searched next.
    $PATH - Finally all directories listed in the PATH environment variable are searched

    The Reverse configuration directory

    Reverse will check for the directory .reverse in your home directory. If it exists, it has some special meaning. If there is a file named name_default.cfg, this file will be interpreted as config file for the adaptor name. Whenever this adaptor is loaded, this configuration file will be processed. This way you can alter settings for some adaptors globally.

    Adaptor modes

    The .reverse directory may also have a subdirectory name. This subdirectory then contains all so called modes for the adaptor name. A mode is an ordinary configuration file. The only difference is how it is loaded. You may specify a mode behind the adaptor name separated by a colon (:). So, for example, if there is a file ~/.reverse/name/mode.cfg, you can specify the adaptor name:mode. This will use the adaptor name and load the configuration file.

    The default adaptor has several modes that completely describe its behavior. Default adaptor modes

    standard adaptors

    Currently there are two standard adaptors available with reverse. They both generate C++ code. One generates output statements to an ostream class (like cout), the other generates statements to append the output to a string or rope or anything else that supports a +=(char*) and +=(string const &) operator.

    plaintext_cpp

    overview

    This adaptor turns C++ into a templating language. Its source language is plaintext that is output to an ostream. The embedded language is C++. This adaptor is most useful in inverse mode. Then it can be used in code generators to generate the output code while keeping the sources clean and avoiding printf() or other output statement overhead. The following example illustrates the usefulness of this adaptor in inverse mode


    Example: plaintext_cpp adaptor in inverse mode

    Illustrates the benefits of the plaintext_cpp adaptor over plan C++
      void outputClass(CClass *cls) {
        cout<<"class "<<cls->name<<" :public "<<cls->baseClassName<<endl;
        cout<<"{"<<endl;
        for (CClass::FieldList::const_iterator i=cls->getFields().begin();i!=cls->getFields().end()) {
          cout<<"  "<<(*i)->datatype<<" "<<(*i)->name<<";"<<endl;
        }
        cout<<"}"<<endl;
      }
              
      void outputClass(CClass *cls) {
        [@
          class $cls->name$ :public $cls->baseClassName$
          {
            {@
              for (CClass::FieldList::const_iterator i=cls->getFields().begin();i!=cls->getFields().end())
              [@
                $(*i)->datatype$ $(*i)->name$;
              @]
            @}
          }
        @]
      }
              
    Both code fragments implement some (simple and hypothetical) code generator that output C++ code to declare a class with a baseclass and some fields. The first code fragment is realized in pure C++, while the second code fragment does the same, but is realized using the reverse preprocessor in inverse mode with the plaintext_cpp adaptor. The first code fragment suffers from becoming nearly unreadable due to a large overhead involved with the output statements. The second code fragment is much clearer to read once you get used to the context operator tags ([@@], $$, {@@} and <@@>)

    syntax

    The plaintext_cpp mode uses tags that contain the @-character. In standard C++, this character is unused, so it should cause few problems. The tags and their menaing are the following

    {@ @}, (@ @) - These are the opening and closing embedding operators. Everything between this tags is interpreted as embedded C++ code. The second pair of operators are the trimmed version. The trimmed version will ignore leading whitespaces and a trailing blankline in SL.
    <@ @>, [@ @] - These are the reverse embedding operators used to implement plaintext sections into C++ code. The difference between <@ @> and [@ @] is that the [@ @] - operator will cut off a leading blank line and all trailing whitespace in the last line, while the <@ @> operator will output its content exactly. The [@ @] helps writing code that is clearer to read.
    $ $ - The two $-signs are used to embed a C++ expression into plaintext. The value of the expression will be output. For this purpose, the << operator of the ostream class is used, so the expression type should be compatible with this operator.
    @@ @@ - Everything delimited by double-@s is passed as hint to the adaptor. List of supported hints
    /@ @/ - The /@ @/ delimit an block comment on reverse level. These delimiters can be used in SL and EL context
    \ - The backslash serves as the escape character. Any tag preceeded by a backslash is interpreted as normal input.

    hints

    outname=varname - Sets the name of the variable containing the output stream. This may be any valid C++ expression that evaluates to an object of ostream-type or of any other type that supports the <<(char *) and <<(any type used in embedded expressions) operators. The default output stream is cout.
    positionHints={true|false} - Controls the generation of #line directives. If this is set to true, #line directives are generated that associate the generated code for any EL and ELExp blocks with the original source file.

    pt_cppstr

    This adaptor is similiar to the plaintext_cpp adaptor, but the output is not written to a stream, but appended to a string or any other object with the operation +=(type) and a global function type etoa(stype) that must be defined for all stypes that occur in embedded expressions.

    The tags are the same as in plaintext_cpp mode

    The only supported hint is varname. Its value is a c++ expression that evaluates to the object the output shall be appended to. The default value is result. Usually you will use some string variable here.

    plaintext_java

    This adaptor resemples the plaintext_cpp adaptor. It has the same syntax, but it generates java code using the System.out.print() method by default. You can specify the name of the method and extra arguments by the hints outname and outExtraArgs. If outExtraArgs is not empty, it will be appended as extra arguments to the specified method. For example, with outname="My.out.myPrint" and outExtraArgs="1,2,true", a generated statement might look like this: My.out.myPrint("Output-Text",1,2,true);.

    The default adaptor

    VSTL - Very Simple Templating Language

    Overview

    VSTL is a very simple templating language that can be used to describe text transformations. In reverse, VSTL may be used to describe translators and position hint generators. It is used in the default adaptor. With its help, a mode for the default adaptor can refine its translator and/or position hint generator.

    Syntax and semantics

    VSTL is an expression based language. Each VSTL program is a expression. The base elements of expressions are constants and variables. A constant may be a string or an integer. Strings are encolese in either double or single quotes, and the escape caharcter can be used much like in C. For example '"' or "\"". Integers are also specified like in C. For example as 100 or 0x64 or 0144. Variables are just names that have the same rules as C identifiers, i.e. they must begin with a letter or _, and may only contain letters, digits and _.

    The nesting elements in a VSTL expression are function calls and lists. A function call is also written similiar to C. A call of function name is written as name(1,"Hello",3). In this example, the parameters would be 1, "Hello" and 3. If the parameter list is empty, the paranthesis must still be written, else the function would be interpreted as variable name. Lists are just a collection of elements of various types. A list is written as {"A",1,3}. The empty list is {}. You may specify any expression inside a list, so you could for example write {"a",substr("Hello",1,3),add(7,9)}.

    Converting to boolean

    VSTL has no boolean datatype, but boolean operations. These will interpret any datatype as boolean. An integer is considered true if it is not 0, a string and a list if they are not empty. If a function shall return a boolean value, always an integer with a value of 0 or 1 is returned.

    Built in functions

    VSTL comes with a set of built-in functions. They are available from any VSTL expression. To describe the functions, we will use a signature notation for the parameters and return type here. Consider a function fname. The signature might be: i fname(ils+). That means that an integer is returned, and the function expects an integer, a list and any positive number of strings as arguments. fname(1,{},'','','') would match this signature, while fname('') would not. The datatypes are s,i,l (string, integer, list). Additionally, the asterix character (*) at the end of a parameter list stands for any number of parameters. The question tag (?) stands for one parameter of any datatype and a datatype character followed by a plus sign (+) stands for any positive count of parameters with this datatype.

    Built in functions
    SignatureDescription
    i add(i+)Returns the sum of all arguments
    sub/div/mul/mod (ii)Returns the difference/quotient/product/remainder or the two arguments
    and/or/xor (??)Returns the logical and/or/xor of its arguments according to their boolean interpretations
    not (?)Returns the negation of the argument
    eq/neq (??)Return true if the two arguments are equal/not equal. Arguments with different datatypes are considered equal if their string-representations are equal
    lt/gt/leq/geq (ii), (ss)Compares integers or strings
    if (??), (???)Returns the second argument if the first one is true, else void or, if present, the third argument is returned
    idx (il)returns the ith element of the list. If the index out of range, this function call fails.
    substr (si), (sii)Returns the substring of the first argument from the index specified by the second argument and with a length specified by the third. If the third argument is omitted, the substring from the specified index to the end is returned
    subst (sss)The second string is interpreted as a list of characters. Each character in the first string that occurs in the second sring is replaced by the corresponding character in the third string. e.g. subst('Hello','eo','oe') will return 'Holle'
    subst (ssl)Works like subst(sss) but the replacements for the characters are written as list. This way a substitue may be more than one character long. For example subst('<tag>','<>',{'&lt;&gt;'}) returns '&lt;tag&gt;'
    s toString(?)Converts argument to string
    i toInteger(?)Converts argument to integer. 0 on failure
    i toBoolean(?)Converts argument to an integer that is 1 or 0 according to the boolean equivalent of the argument

    Hints of the default adaptor

    Tag-names

    The following hints are assignments in the form name=value. With these hints, all tags that are interpreted by the parser can be defined. Here follows a list of the valid names, they should be self-explanatory

    • embo
    • embc
    • embosl
    • embcsl
    • rembo
    • rembc
    • rembosl
    • rembcsl
    • vembo
    • vembc
    • elcommb
    • elcomme
    • slcommb
    • slcomme
    • esc
    • adString
    • elCommNesting [true or false]
    • slCommNesting [true or false]

    El-block generation

    The following hints are assignment hints to describe the block opening and closing commands of EL and the indention factor.

    • bbeg
    • bend
    • indent

    Adaptor/mode description

    The desc-tag is used for the description of the mode. Its value should be a short textual description of the mode.

    Hint to VSTL reflection

    The mode can define assignment hint names. Assignment hints with these names are read, the value is remembered and is available as a VSTL-variable with the same name.

    The vstlArgName=name or vstlArgName=name:initValue hint defines a VSTL-variable name with the initial value initValue. If initValue is not specified, the initial value is "". The variable is always of type string.

    Translators

    Translators are described as VSTL expressions. When the expression is evaluated, the variable SOURCE is defined and contains the input string for the translator. The result of the VSTL-expression is taken as the output.

    The hint names the code can be assigned to are el2ol_translator, sl2el_translator and elexp2el_translator.

    Position hint generation

    Position hint generation is also handled with VSTL-expressions. The variables LINE, FILE and COLUMN are defined and hold the position for that hint code shall be generated.

    The hint names for the position hint code are positionSetCode and positionResetCode.

    Compilation and installation

    See README file in the reverse module directory for step-by-step compilation instructions.

    Examples

    Simple class generator

    The following two code fragments do the same thing. The first is written in pure C++, the second is written using reverse (inverse plaintext_cpp adaptor).

      void outputClass(CClass *cls) {
        cout<<"class "<<cls->name<<" :public "<<cls->baseClassName<<endl;
        cout<<"{"<<endl;
        for (CClass::FieldList::const_iterator i=cls->getFields().begin();i!=cls->getFields().end()) {
          cout<<"  "<<(*i)->datatype<<" "<<(*i)->name<<";"<<endl;
        }
        cout<<"}"<<endl;
      }
          
      void outputClass(CClass *cls) {
        [@
          class $cls->name$ :public $cls->baseClassName$
          {
            {@
              for (CClass::FieldList::const_iterator i=cls->getFields().begin();i!=cls->getFields().end())
              [@
                $(*i)->datatype$ $(*i)->name$;
              @]
            @}
          }
        @]
      }
          

    Both code fragments implement some (simple and hypothetical) code generator that output C++ code to declare a class with a baseclass and some fields. The first code fragment is realized in pure C++, while the second code fragment does the same, but is realized using the reverse preprocessor in inverse mode with the plaintext_cpp adaptor. The first code fragment suffers from becoming nearly unreadable due to a large overhead involved with the output statements. The second code fragment is much clearer to read once you get used to the context operator tags ([@@], $$, {@@} and <@@>)

    Html table output

    The following example illustrates how reverse can be used to generate html code with a C++ program. The code fragment below outputs a html-page containing some tables that represent the content of the three-dimensional array R. This example also uses the inverse plaintext_cpp adaptor.

      [@
        <html>
          <head></head>
          <body>
          
      @]
      
      for (int k=0;k<=NUMSTATES;k++) {
        [@
          <table border="1">
            <tr>
              <th>R<sup>$k$</sup></th>
    
              {@for (int j=0;j<NUMSTATES;j++) [@
                  <th>j=$j+1$</th>
                @]  
              @}
            </tr>
            
            {@
              for (int i=0;i<NUMSTATES;i++) [@
                <tr>
                  <th>i=$i+1$</th>
                  {@
                    for (int j=0;j<NUMSTATES;j++) [@
                      <td>$R[k][i][j]$</td>
                    @]
                  @}
                </tr>
              @]
            @}
          </table>
        @]
      }
    
      [@
          </body>
        </html>
      @]
          
          

    Using trimmed tags to improve readability