Precedence-Based Parsing

It has often been said that parsing is one of the most thoroughly studied and best understood areas of computer science. This may indeed be true on a theoretical level, but does not seem to translate into the availability of easy-to-use tools benefitting software practitioners. Indeed, my very "stumbling" into this research area resulted from frustration with existing parser generators. There seems to exist an inevitable trade-off between their ease-of-use (true for "top-down," LL(1) tools such as CoCo and ANTLR) and expressive power (true for "bottom-up," LR(1)/LALR(1) tools such as yacc).

The power of the top-down tools can be extended simply by increasing the amount of lookahead used in parsing decisions; doing so is advocated by some researchers in spite of the slightly degraded performance of the resulting parser. My research led me to formulate the complementary problem: namely, given the power of bottom-up parsers – LR(1) grammars can describe every conceivable deterministic context-free language – is there a way to make the corresponding parser generators easier to use? The answer, I believe, is yes.

Rather than devising yet another subset of context-free grammars (CFGs) that could be parsed in an efficient – or otherwise ingenious – manner, I was led to reconsider the nature of CFGs themselves. I discovered that CFGs, while more than adequate for describing simple languages (e.g., arithmetic expressions), simply do not "scale up" to complex content such as programming or mark-up languages. When attempting to describe such content with a CFG, it is very easy to introduce an unintended ambiguity – and extremely difficult to remove it.

As an alternative to CFGs, I therefore propose a formalism called an ordered context-free grammar (OCFG). In general, OCFGs can describe the same set of context-free languages (CFLs) as ordinary CFGs can; however, they can generally do so in a much more intuitive fashion, and with significantly fewer unintended ambiguities. This is because OCFGs impose a partial order on the productions they contain, and hence on the possible derivations of sentences (terminal strings) in the grammar. Presented with an ambiguity, one can usually resolve it by selecting a derivation – and its corresponding parser action – that comes "first." Furthermore, some derivations that are perfectly legal for context-free grammars may be disallowed in OCFGs if they violate the precedence and associativity properties of the productions. (For example, a parse tree for arithmetic expressions in which the (+) node is a direct descendant of the (*) node would normally be considered invalid.)

At the same time, OCFGs are sufficiently similar to ordinary CFGs to leverage well-known techniques during parser construction. The BerthaTM parser generator, available for download below, can be thought of as generalization of the algorithm found in bison or yacc. The table-driven bottom-up parser generated by Bertha is essentially identical to those produced by the LALR(1) tools. In its present (i.e., version 1.x) form, Bertha handles a subset of OCFGs known as LALRP(1) (which stands for "LALR(1) with precedence") and allows productions to be arranged into linear orders. A forthcoming Bertha release, described in a separate technical report, will allow arbitrary partial orders, and may also be extended to handle LRP(1) ("LR(1) with precedence") grammars if LALRP(1) proves insufficient.


The following items are available for download (the usual disclaimers apply):
 
OCFG Technical Report Ordered Context-Free GrammarsGet Acrobat Reader (ICS Technical Report No. 99–18. Released: 26-Apr-1999. Modified: 25-Jan-2000.)

This paper describes the theoretical underpinnings of ordered context-free grammars (OCFGs), and also contains some discussion of a forthcoming (2.x) release of the BerthaTM parser generator. The tech report now contains bookmarks and internal/external hyperlinks for ease of navigation.
 

OCFG Slides Presentation Slides about OCFGsGet Acrobat Reader (Created: 30-Apr-1999. Modified: 16-Jun-1999.)

This slide set describes motivations behind creating ordered context-free grammars (OCFGs) and touches upon some of their properties, as well as BerthaTM. Formalism has been kept to a minimum here.
 

OCFG Technical Report More Intuitive Bottom-Up ParsingGet Acrobat Reader (ICS Technical Report No. 99–48. Released: 28-Oct-1999. Modified: 10-Nov-1999.)

This paper describes the forthcoming BerthaTM 2.x parser generator, how its input syntax allows the crafting of arbitrary partial orders among productions, and why it is both more convenient and strictly more powerful than tools like yacc
 

This Way to Bertha(tm)! The BerthaTM Parser GeneratorGet Bertha!

Follow this link to obtain the latest distribution.
 



HOME PAGE
(C) Copyright 1998-2002 by Ziemowit Laski. All Rights Reserved.