=head1 NAME Parse::Marpa::Doc::Diagnostics - Diagnostics =head1 DESCRIPTION This document describes techniques for debugging Marpa parses and grammars. It also lists those Marpa methods and Marpa options whose main use is in tracing and debugging. =head2 Basic Debugging Techniques If parsing failed before or at the end of input, first look at the place in the input where parsing was exhausted. That, along with inspection of the input and the grammar, is often enough to spot the problem. But, typically, you'll have already tried that before consulting this document. Next, you should make sure that Marpa's C option is turned on. It's on by default, so you probably have already done that, too. You should also turn off Marpa's C option, which is on by default. When the C option is on, Marpa "strips" its objects of data that is not needed for subsequent processing. This saves time and memory, but data that is not needed for processing can be extremely valuable for debugging. When the C option is left on, many of Marpa's diagnostics methods will return partial information or no information at all. When a problem is not obvious, the first thing I do is turn on the C option. This tells you which tokens the lexer is looking for and which ones it thinks it found. If the problem is in lexing, C tells you the whole story. Even if the problem is in the grammar, which tokens the lexer is looking for is a clue to what the recognizer is doing. That is because Marpa uses predictive lexing and only looks for tokens that could result in a successful parse. It sometimes helps to look carefully at the output of C and C, to check if anything there is clearly not right or not what you expected. =head2 Advanced Techniques Next, depending on where in the process you're having problems, you might want to turn on some of the more helpful traces. C will show you the actions as they are being finalized. C traces the initialization of, and the iteration of, the parse tree. C traces the values of the nodes as they are pushed on, and popped off, the evaluation stack. After an evaluation, C will show the entire tree. For a complete investigation of a parse, do the following: =over 4 =item * Turn off the C Marpa option. By default, it is on. =item * Make sure the C option is turned on. It is on by default. =item * Run C on the precomputed grammar. =item * Run C on the precomputed grammar. =item * Run C on the precomputed grammar. =item * Turn on C before input. =item * Run C on the recognizer. =item * Run C in verbose mode on the evaluator after it is created. =item * Run C in verbose mode on the evaluator after each call of the C method. =item * Turn on C so you can see the values as they pushed onto the evaluation stack, popped off it, calculated and pushed back on. =back Note that when the input text to the grammar is of any length, the outputs from C, C, C, C, and C can be lengthy. You'll want to work with short inputs if at all possible. The L has example outputs from the C, C, C, and C methods, and explains how to read them. =head1 OPTIONS These are Marpa options. Unless otherwise stated, the Marpa options are valid for all methods which accept Marpa options as named arguments ( C, C, C, C, C, and C). All options are useful at any point in the parse, unless otherwise stated. Trace output goes to the trace file handle. =over 4 =item academic The academic option turns off all grammar rewriting. The resulting grammar is useless for recognition and parsing. The purpose of the C argument is allow the testing of Marpa's precomputations against examples from textbooks. This is handy for testing the internals. An exception is thrown if the user attempts to create a recognizer from a grammar marked academic. The C option cannot be set in the recognizer or after the grammar is precomputed. =item strip The value is a Boolean. If true, Marpa "strips" its objects when they contain data that is not needed for further processing. This saves space and time. This is the default behavior. If C is set to false, all data in Marpa's objects, even data no longer needed for processing, is left in place for the entire life of the object. This leftover data can be very important if you're debugging. A grammar is stripped when it is precomputed. A recognizer is stripped when the end of input is recognized. Turning C off after the end of input has been recognized will have no effect. =item trace_actions Traces actions as they are compiled. Little or no knowledge of Marpa internals required. This option is useless once the recognizer has been created. Setting it after that point will result in a warning. =begin undocumented: # =item trace_completions # Traces each Earley set as it is completed. # May be set at any point. # Add warning if set after recognition? # I find it better to wait until evaluation time, # when there's more information # and then use C. # Requires knowledge of Marpa internals. =end undocumented: =item trace_iterations Traces creation of, and iteration of, the parse tree. Knowledge of Marpa internals very useful. May usefully be set at any point in the parse. =item trace_lex A shorthand for setting both C and C. Very useful, and can be interpreted with limited knowledge of Marpa internals. Because Marpa uses predictive lexing, this can give you an idea not only of how lexing is working, but also of what the recognizer is looking for. May be set at any point in the parse, but will be useless if set after input is complete. =item trace_lex_matches Traces every successful match in lexing. Can be interpreted with little knowledge of Marpa internals. May be set at any point in the parse, but will be useless if set after input is complete. =item trace_lex_tries Traces every attempted match in lexing. Can be interpreted with little knowledge of Marpa internals. Usually not useful without C. C turns on both. May be set at any point in the parse, but will be useless if set after input is complete. =item trace_priorities Traces the priority setting of each QDFA state. Requires knowledge of Marpa internals. The priorities are set during precomputation. A trace message warns the user if he sets C after that point. =item trace_rules Traces rules as they are added to the grammar. Useful, but you may prefer the C method. Doesn't require knowledge of Marpa internals. A trace message warns the user if he sets this option when rules have already been added. If a user adds rules using the C named argument, and uses the C named argument in the same call, it will take effect B the processing of the C option, which is probably not what he intended. To be effective C must be set in a method call prior to the one with the C option. =item trace_values Takes as its value an integer zero or greater, which sets the debugging level. A debugging level of zero means no tracing of values. A level of one or more turns on tracing of values as they are pushed onto the evaluation stack, popped off it, and calculated. If the debugging level is 3 or more, the entire evaluation stack is dumped at every step in the evaluation. Very helpful. Knowledge of Marpa internals is helpful, but not required. May usefully be set at any point. =back =head1 METHODS =head2 Static Method =head3 show_location =begin Marpa::Test::Display: ## next display in_file($_, 't/equation_s.t'); =end Marpa::Test::Display: my $recce = Parse::Marpa::Recognizer->new( { grammar => $grammar } ); =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: my $fail_location = $recce->text( \$text_to_parse ); if ( $fail_location >= 0 ) { print {*STDERR} Parse::Marpa::show_location( 'Parsing failed', \$text_to_parse, $fail_location, ) or Carp::croak "print to STDERR failed: $OS_ERROR"; exit 1; } A utility routine helpful for creating messages about problems parsing text. Takes three arguments, all required. The first argument must be a string containing a I. The second argument must be a reference to a string containing the I that was being parsed. The third argument must be an integer, and will be interpreted as a character I within that string. C returns a multi-line string. The first, header, line contains the I. The second line is the line from the I being parsed which contains the character I. The third line contains an ASCII "caret" symbol pointing to the position of the offset in the second line. =head2 Grammar Methods =head3 inaccessible_symbols =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); for my $symbol ( @{ $grammar->inaccessible_symbols() } ) { say 'Inaccessible symbol: ', $symbol; } Returns the plumbing names of the inaccessible symbols. Not useful before the grammar is precomputed. Used for test scripts. For debugging and tracing, the C option is usually the most convenient way to obtain the same information. =head3 show_NFA =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); print $grammar->show_NFA() or Carp::croak "print failed: $OS_ERROR"; Returns a multi-line string listing the states of the NFA with the LR(0) items and transitions for each. Not useful before the grammar is precomputed. Not really helpful for debugging grammars and requires very deep knowledge of Marpa internals. =head3 show_QDFA =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); print $grammar->show_QDFA() or Carp::croak "print failed: $OS_ERROR"; Returns a multi-line string listing the states of the QDFA with the LR(0) items, NFA states, and transitions for each. Not useful before the grammar is precomputed. Very useful in debugging, but requires knowledge of Marpa internals. =head3 show_accessible_symbols =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); say 'Accessible symbols: ', $grammar->show_accessible_symbols(); Returns a one-line string with the plumbing names of the accessible symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging. =head3 show_nullable_symbols =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); say 'Nullable symbols: ', $grammar->show_nullable_symbols(); Returns a one-line string with the plumbing names of the nullable symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging. =head3 show_nulling_symbols =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); say 'Nulling symbols: ', $grammar->show_nulling_symbols(); Returns a one-line string with the plumbing names of the nulling symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging. =head3 show_problems =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); print $grammar->show_problems() or Carp::croak "print failed: $OS_ERROR"; Returns a string describing the problems a grammar had in the precomputation phase. For many precomputation problems, Marpa does not immediately throw an exception. This is because there are often several problems with a grammar. Throwing an exception on the first problem would force the user to fix them one at a time -- very tedious. If there were no problems, returns a string saying so. This method is not useful before precomputation. An exception is thrown if the user attempts to stringify, or to create a parse from, a grammar with problems. The string returned by C will be part of the exception's error message. =head3 show_productive_symbols =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); say 'Productive symbols: ', $grammar->show_productive_symbols(); Returns a one-line string with the plumbing names of the productive symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging. =head3 show_rules =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); print $grammar->show_rules() or Carp::croak "print failed: $OS_ERROR"; Returns a string listing the rules, each commented as to whether it was nullable, nulling, unproductive, inaccessible, empty or not useful. If a rule had a non-zero priority, that is also shown. Often useful and much of the information requires no knowledge of the Marpa internals to interpret. C shows a rule as not useful ("C") if it decides not to use it for any reason. Rules marked "C" include not just the ones called useless in standard parsing terminology (inaccessible and unproductive rules) but also any rule which is replaced by one of Marpa's grammar rewrites. =head3 show_symbols =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); print $grammar->show_symbols() or Carp::croak "print failed: $OS_ERROR"; Returns a string listing the symbols, along with whether they were nulling, nullable, unproductive or inaccessible. Also shown is a list of rules with that symbol on the left hand side, and a list of rules which have that symbol anywhere on the right hand side. Often useful and much of the information requires no knowledge of the Marpa internals to interpret. =head3 unproductive_symbols =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: $grammar->precompute(); for my $symbol ( @{ $grammar->unproductive_symbols() } ) { say 'Unproductive symbol: ', $symbol; } Given a precomputed grammar, returns the plumbing names of the unproductive symbols. Not useful before the grammar is precomputed. Used in test scripts. For debugging and tracing, the C option is usually a more convenient way to obtain the same information. =head2 Recognizer Method =head3 show_earley_sets =begin Marpa::Test::Display: ## next display in_file($_, 't/equation_s.t'); =end Marpa::Test::Display: my $recce = Parse::Marpa::Recognizer->new( { grammar => $grammar } ); =begin Marpa::Test::Display: ## next 2 display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: my $fail_location = $recce->text( \$text_to_parse ); print $recce->show_earley_sets or Carp::croak "print failed: $OS_ERROR"; Returns a multi-line string listing every Earley item in every Earley set. =head2 Evaluator Method =head3 show_bocage =begin Marpa::Test::Display: ## next display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: print $evaler->show_bocage($show_bocage_verbosity) or Carp::croak "print failed: $OS_ERROR"; Returns a multi-line string describing the bocage for an evaluator. The first line gives the name of the Perl package in which Marpa runs the actions for that evaluator, and a count of the parses derived so far from the bocage. The bocage follows. The bocage is given in pre-order, in the form of a grammar. Parse bocage grammars are similar to parse forest grammars. In the L, parse bocage grammars are described at length, using an example output from C. The optional I argument must be an integer greater than or equal to zero. For each parse bocage and-production, if I is set greater than zero, the LR(0) item corresponding to the and-production's and-node is shown, along with the and-node's argument count (or rule length), and an indication of whether or not there is a Perl closure for the and-node. In addition to and-productions, parse bocages grammars contain or-productions. The information in or-productions is redundant -- all of it is evident from the and-productions. Or-productions are shown only if I is set at 2 or more. =head3 show_tree =begin Marpa::Test::Display: ## next display in_file($_, 'bin/mdl'); =end Marpa::Test::Display: print $evaler->show_tree($show_tree_flag) or Carp::croak "print failed: $OS_ERROR"; When called after a successful call to the C method, C returns a multi-line string describing the parse tree produced in the C call. The tree is listed in pre-order. For each parse tree node, its depth and and-node are given. Also, for the current choice of and-node at that parse tree node, C gives the cause and predecessor or-nodes, if any; the rule; and the argument count (that is, the rule length). The optional I argument, if a true value, causes verbose output. In verbose output, the value of the node is given, when the tree node has a value. Also in verbose output, if the node tree has a Perl closure, that is indicated. If the C method was never called, or if the last call to C returned failure, the result returned by C is unpredictable. =head1 SUPPORT See the L in the main module. =head1 AUTHOR Jeffrey Kegler =head1 LICENSE AND COPYRIGHT Copyright 2007 - 2008 Jeffrey Kegler This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. =cut