40

From this wikipedia page:

The fundamental difference between context-free grammars and parsing expression grammars is that the PEG's choice operator is ordered. If the first alternative succeeds, the second alternative is ignored. Thus ordered choice is not commutative, unlike unordered choice as in context-free grammars and regular expressions. Ordered choice is analogous to soft cut operators available in some logic programming languages.

Why does PEG's choice operator short circuits the matching? Is it because to minimize memory usage (due to memoization)?

I'm not sure what the choice operator is in regular expressions but let's suppose it is this: /[aeiou]/ to match a vowel. So this regex is commutative because I could have written it in any of the 5! (five factorial) permutations of the vowel characters? i.e. /[aeiou]/ behaves the same as /[eiaou]/. What is the advantage of it being commutative? (c.f. PEG's non-commutativity)

The consequence is that if a CFG is transliterated directly to a PEG, any ambiguity in the former is resolved by deterministically picking one parse tree from the possible parses. By carefully choosing the order in which the grammar alternatives are specified, a programmer has a great deal of control over which parse tree is selected.

Is this saying that PEG's grammar is superior to CFG's?

3
  • "Superior"? What are you criteria for "superior"?
    – Gabe
    Mar 31, 2011 at 14:02
  • 1
    For the commutativity, think of (air|airplane) trying to match the word airplane.
    – xanatos
    Mar 31, 2011 at 14:47
  • It looks like you're confusing the concepts of choice operator and character class. In regular expressions character classes are delimited with square brackets [aeiou] while the choice operator is the pipe character |. In PEG the choice operator is instead the slash character /. Aug 11, 2014 at 10:56

3 Answers 3

62

A CFG grammar is non-deterministic, meaning that some input could result in two or more possible parse-trees. Though most CFG-based parser-generators have restrictions on the determinability of the grammar. It will give a warning or error if it has two or more choices.

A PEG grammar is deterministic, meaning that any input can only be parsed one way.

To take a classic example; The grammar

if_statement := "if" "(" expr ")" statement "else" statement
              | "if" "(" expr ")" statement;

applied to the input

if (x1) if (x2) y1 else y2

could either be parsed as

if_statement(x1, if_statement(x2, y1, y2))

or

if_statement(x1, if_statement(x2, y1), y2)

A CFG-parser would generate a Shift/Reduce-conflict, since it can't decide if it should shift (read another token), or reduce (complete the node), when reaching the "else" keyword. Of course, there are ways to get around this problem.

A PEG-parser would always pick the first choice.

Which one is better is for you to decide. My opinion is that often PEG-grammars is easier to write, and CFG grammars easier to analyze.

1
  • Could you provide an example of such a CFG grammar (with 2 parse trees)? Apr 4, 2011 at 5:47
4

I think you're confusing CFG with LR and with ambiguity. Grammars are not deterministic/nondeterministic, though their parsers may be. An ambiguous grammar is still CFG if it complies with the definition, and a deterministic parser can be built for it doing what PEG does.

2
  • 1
    No, CFGs are sometimes ambiguous because their "choice" operator has no precedence, so if a given string matches both options in the "choice", then you have an ambiguity. The "choice" in PEGs have first-match-wins precedence, so there is no ambiguity because the leftmost option necessarily wins. Jan 15, 2013 at 1:46
  • 4
    No. A CFG may be ambiguous because all options are equally valid. A CFG is ambiguous when the same phrase can be generated by different sequences of productions. In LL and LR, ambiguity means that a parser/recognizer has no way to know which sequence of productions (which syntax tree) corresponds to a given phrase. PEG solves the ambiguity problem by ranking productions according to the order in which they are declared. It tells parses that the right syntax tree is the first syntax tree.
    – Apalala
    Jan 15, 2013 at 10:53
1

PEGs and CFGs are two different ways of specifying a language. If you write a parser by hand, chances are very good that you will write a so-called recursive descent parser. A recursive descent parser will automatically resolve any ambiguities in your grammar, but does so silently and likely not in the way you would have wanted. The problem with this is that you never find out that there were ambiguities that got automatically resolved, unless you thoroughly test your parser. PEGs are basically a formalization of recursive descent parsers, and so come with this problem. For examples of this problem see How does backtracking affect the language recognized by a parser?, and https://cs.stackexchange.com/questions/143480/dragon-book-4-4-5-exercise/143975.

CFGs have a lot of theory to back them up, but PEGs not so much. The set of languages that can be encoded by CFG and those that can be encoded by PEG partially overlap, but neither encompasses the other.

For a more thorough review of this I recommend the excellent essay Which Parsing Approach?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.