=head1 NAME Parse::Marpa::Doc::Parse_Terms - Standard Parsing Terms used in the Marpa Documents =head1 DESCRIPTION This document is intended as a reminder of the standard vocabulary of parsing. I put B of terms in boldface, for easy skimming. A reader who feels comfortable with parsing terminology, can skip this document entirely. All the definitions here are consistent with at least some of the textbook definitions, and are in that sense standard. But no effort is made to cover the full range of standard meaning, or even to give the most common meaning. The focus is on the meaning of the terms as used in the Marpa documentation. If you've never looked into parsing before, you'll probably find this document too terse. As an introduction, I reccommend L. It's available on-line. L is also an excellent place to start. =head2 Basic terms A B is a set of rules. The B describe a set of strings of B. A string of symbols is often called a B. The rules of a grammar are often called B. =head2 Stages of Parsing A B is a program that determines whether its B is one of the symbol strings in the set described by the rules of a grammar. A B is a program which finds the structure of the input according to the rules of a grammar. The term B is used in a strict and a loose sense. B in the loose sense means all the phases of finding a grammar's structure, including a separate recognition phase if the parser has one. (Marpa does.) If a parser has phases, B refers specifically to the phase that finds the structure of the input. When the Marpa documents use the term B in its strict sense, they will speak explicitly of "parsing in the strict sense". Otherwise, B will mean parsing in the loose sense. Parsers often use a B to convert B, usually B, into a series of B. Each token represents a B of the grammar and has a B. The series of symbols represented by the series of tokens becomes the B seen by the recognizer. The B is also called the B. A lexical analyzer is often called a B or a B, and B is often called B or B. =head2 Productions A standard way of describing rules is Backus-Naur Form, or B. In one common way of writing BNF, a production looks like this. =begin Marpa::Test::Display: ## skip display =end Marpa::Test::Display: Expression ::= Term Factor The equivalent in Marpa's MDL language looks like this: =begin Marpa::Test::Display: ## skip display =end Marpa::Test::Display: Expression: Term, Factor. In the production above, C, C and C are symbols. A production consists of a B and a B. In a B, like those Marpa parses, the left hand side of a production is always a symbol string of length 1. The right hand side of a production is a symbol string of zero or more symbols. In the example, C is the left hand side, and C and C are right hand side symbols. Left hand side and right hand side are often abbreviated as B and B. If the rhs of a production has no symbols, the production is called an B or an B. Any symbol which is allowed to occur in the symbol string input is called a B symbol. If the symbols in a symbol string are all terminals, that symbol string is also called a B. =head2 Derivations A B of a derivation, or B, is made by taking a symbol string and any production in the grammar whose lhs occurs in that symbol string, and replacing any occurrence of the lhs symbol in the symbol string with the rhs of the production. For example, if C, C, C, C, and C are symbols, and =begin Marpa::Test::Display: ## skip 2 displays =end Marpa::Test::Display: X: B, C. is a production, then A X D -> A B C D is a derivation step, with "C" as its beginning and "C" as its end or result. We say that the symbol string "C" B the symbol string "C". A B is a sequence of derivation steps. The B of a derivation is its length in steps. A symbol string B another if and only if there is a derivation of length 1 from the first symbol string to the second string. Every symbol string is said to derive itself in a derivation of length 0. Such a zero length derivation is a B. If a derivation is not trivial or direct, that is, if it has more than one step, then it is an B derivation. A derivation which is not trivial, that is, one which has one or more steps, is a B derivation. Where the symbol string beginning a derivation consists of a single symbol, we often say that symbol B the symbol string which results from the derivation. We say that the beginning symbol B, B, B or B produces the symbol string if the length of the derivation is respectively, 0, greater than 0, 1, or greater than 1, just as we do when we say a symbol string derives another symbol string. When a symbol produces or derives a symbol string, we also say that the symbol B the symbol string, or that the symbol string B the symbol. In any parse, one symbol is distinguished as the B. The parse of an input is B if and only if the start symbol produces the input sentence according to the grammar. =head2 Nulling The B of a symbol string is the number of symbols in it. The zero length symbol string is called the B. The empty string can be considered to be a sentence, in which case it is the B. A string of one or more symbols is B. A derivation which produces the empty string is a B. A derivation from the start symbol which produces the empty string is a B. If in a particular grammar, a symbol has a null derivation, it is a B. If, in a particular grammar, the only sentence produced by a symbol is the empty sentence, it is a B. All nulling symbols are nullable symbols. If a symbol is not nullable, it is B. If a symbol is not nulling, it is B. In any instance where a symbol produces the empty string, it is said to be B, or to be a B. =head2 Other Special Cases If any derivation from the start symbol uses a rule, that rule is called B or B. A rule that is not accessible is called B or B. If any derivation which results in a sentence uses a rule, that rule is said to be B. A rule that is not productive is called B. A simple case of an unproductive rule is one whose rhs contains a symbol which is not a terminal and not on the lhs of any other rule. A B rule is one which is inaccessible or unproductive. A symbol is B if it appears in a reachable production. A symbol is B if it appears on the lhs of a productive rule, or if it is a nullable symbol. If a symbol is not reachable or not accessible, it is B or B. If a symbol is not productive, it is B. If any symbol in the grammar non-trivially produces a symbol string containing itself, the grammar is said to be B. If any symbol non-trivially produces a symbol string with itself on the left, the grammar is said to be B. If any symbol non-trivially produces a symbol string with itself on the right, the grammar is said to be B. A non-trivial derivation from any symbol to the symbol string of length one containing the original symbol is a cycle. Put less formally, any non-trivial derivation, which starts from a symbol and ends with a symbol string containing nothing but the same symbol, completely unchanged, is a cycle. A grammar which contains no cycles is B. A B in one which is cycle-free and has no useless rules or empty productions. =head2 Structure The structure of a parse can be represented by as a series of derivation steps from the start symbol to the input. Another way to represent structure is as a B. Every symbol used in the parse is represented by a B of the parse tree. Wherever a production is used in the parse, its lhs is represented by a B and the rhs symbols are represented by B. The start symbol is the B of the tree. Terminals and symbols on the left hand side of empty productions are B. If a node is not a B, it is an B. A B is one that represents a nulled symbol. If, for a given grammar and a given input, more than one derivation tree is possible, we say that parse is B. If any parse from a grammar is ambiguous, the grammar is B. The node at the root of the tree is the B. Any node with a symbol being used as a terminal is a B. The B of a node is its distance from the root. The root node is at depth 0. If a parent node is at depth C, its children are at depth C. As with depths in bodies of water, the concepts of B and B have the sense opposite to that of the numerical values. The node at depth 0 is higher than any other node. Parent nodes are higher than their children. =head2 Semantics In real life, the structure of a parse is usually a means to an end. Grammars usually have a B associated with them, and what the user actually wants is the B of the parse according to the semantics. The tree representation is especially useful when evaluating a parse. Every node which represents a terminal symbol has a value associated with it on input. Non-null inner nodes take their semantics from the production whose lhs they represent. Nulled nodes are often deal with in special ways. Their handling in Marpa is described L. The semantics for a production describe how to calculate the value of the node which represents the lhs (the parent node) from the values of zero or more of the nodes which represent the rhs symbols (child nodes). Values are computed recursively, bottom-up. The value of a parse is the value of its start symbol. =head1 SUPPORT See the L in the main module. =head1 AUTHOR Jeffrey Kegler =head1 LICENSE AND COPYRIGHT Copyright 2007 - 2008 Jeffrey Kegler This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. =cut