parser

Parser generator


FIXME: Give lexemes as an extra argument to Parser?
FIXME: Rename second argument to parse method to "tokens"?
FIXME: Make start_token an optional argument to parse? (swap with
  token list) and have it default to the first non-terminal?


A parser is created by

    p = Parser {grammar}

and called with

    result = p:parse (start_token, token_list[, from])

where start_token is the non-terminal at which to start parsing in
the grammar, token_list is a list of tokens of the form

    {ty = "token_type", tok = "token_text"}

and from is the token in the list from which to start (the default
value is 1).

The output of the parser is a tree, each of whose
nodes is of the form:

    {ty = symbol, node_1 = tree_1, node_2 = tree_2, ... [, list]}

where each node_i is a symbolic name, and list is the list of
trees returned if the corresponding token was a list token.

A grammar is a table of rules of the form

    non-terminal = {production_1, production_2, ...}

plus a special item

    lexemes = Set {"class_1", "class_2", ...}

Each production gives a form that a non-terminal may take. A
production has the form

    production = {"token_1", "token_2", ...,
                  [action][,abstract]}

A production

  * must not start with the non-terminal being defined (it must not
    be left-recursive)
  * must not be a prefix of a later production in the same
    non-terminal

Each token may be

  * a non-terminal, i.e. a token defined by the grammar
     * an optional symbol is indicated by the suffix "_opt"
     * a list is indicated by the suffix "_list", and may be
       followed by "_" (default is no separator)
  * a lexeme class
  * a string to match literally

The parse tree for a literal string or lexeme class is the string
that was matched. The parse tree for a non-terminal is a table of
the form

   {ty = "non_terminal_name", tree_1, tree_2, ...}

where the tree_i are the parse trees for the corresponding
terminals and non-terminals.

An action is of the form

    action = function (tree, token, pos)
      ...
      return tree_
    end

It is passed the parse tree for the current node, the token list,
and the current position in the token list, and returns a new parse
tree.

An abstract syntax rule is of the form

    name = {i_1, i_2, ...}

where i_1, i_2, ... are numbers. This results in a parse tree of
the form

    {ty = "name"; tree_i_1, tree_i_2, ...}

If a production has no abstract syntax rule, the result is the
parse node for the current node.


Parser constructor (deals with abstract syntax rules)


Parser:parse: Parse a token list
  start: the token at which to start
  token: the list of tokens
returns
  tree: parse tree