The power of the top-down tools can be extended simply by increasing the amount of lookahead used in parsing decisions; doing so is advocated by some researchers in spite of the slightly degraded performance of the resulting parser. My research led me to formulate the complementary problem: namely, given the power of bottom-up parsers – LR(1) grammars can describe every conceivable deterministic context-free language – is there a way to make the corresponding parser generators easier to use? The answer, I believe, is yes.
Rather than devising yet another subset of context-free grammars (CFGs) that could be parsed in an efficient – or otherwise ingenious – manner, I was led to reconsider the nature of CFGs themselves. I discovered that CFGs, while more than adequate for describing simple languages (e.g., arithmetic expressions), simply do not "scale up" to complex content such as programming or mark-up languages. When attempting to describe such content with a CFG, it is very easy to introduce an unintended ambiguity – and extremely difficult to remove it.
As an alternative to CFGs, I therefore propose a formalism called an ordered context-free grammar (OCFG). In general, OCFGs can describe the same set of context-free languages (CFLs) as ordinary CFGs can; however, they can generally do so in a much more intuitive fashion, and with significantly fewer unintended ambiguities. This is because OCFGs impose a partial order on the productions they contain, and hence on the possible derivations of sentences (terminal strings) in the grammar. Presented with an ambiguity, one can usually resolve it by selecting a derivation – and its corresponding parser action – that comes "first." Furthermore, some derivations that are perfectly legal for context-free grammars may be disallowed in OCFGs if they violate the precedence and associativity properties of the productions. (For example, a parse tree for arithmetic expressions in which the (+) node is a direct descendant of the (*) node would normally be considered invalid.)
At the same time, OCFGs are sufficiently similar to ordinary CFGs to
leverage well-known techniques during parser construction. The TM
parser generator, available for download below, can be thought of as generalization
of the algorithm found in bison or yacc. The table-driven
bottom-up parser generated by
is essentially identical to those produced by the LALR(1) tools. In its
present (i.e., version 1.x) form,
handles a subset of OCFGs known as LALRP(1) (which stands for "LALR(1)
with precedence") and allows productions to be arranged into linear orders.
A forthcoming
release, described in a separate technical
report, will allow arbitrary partial orders, and may also be extended
to handle LRP(1) ("LR(1) with precedence") grammars if LALRP(1) proves
insufficient.
The following items are available for download (the usual disclaimers
apply):
![]() |
Ordered Context-Free
Grammars![]() This paper describes the theoretical underpinnings of ordered context-free
grammars (OCFGs), and also contains some discussion of a forthcoming (2.x)
release of the |
![]() |
Presentation Slides
about OCFGs![]() This slide set describes motivations behind creating ordered context-free
grammars (OCFGs) and touches upon some of their properties, as well as |
![]() |
More
Intuitive Bottom-Up Parsing![]() This paper describes the forthcoming |
![]() |
The ![]() ![]() Follow this link to obtain the latest distribution.
|