This is an overview of the bootstrap process for Perl6 rules (aka regular expressions). [In the beginning...] * PCRE - "Perl Compatible Regular Expressions" Provided initial pugs regex support. Currently used when you specify :perl5. Fast. Available on all platforms. Limited to a subset of perl5 syntax and semantics. No support for perl6 style. Mature. Some minor bugs. Significant additional development is not expected. [Then parrot was linked to pugs...] * PGE Intended to become the primary engine on parrot-based perl6. Requires a parrot, internal or external. Currently used when you don't specify :perl5. Currently supports a limited subset of perl6 regexes. PGE consists of a PIR (parrot assembly code) parser (which builds a tree), and an emitter (which crawls the tree and generates a PIR implementation of the matcher). The parser is a throwaway, intended to be replaced by a perl6-based parser. The parser is currently a bottleneck. [Is primary development taking place in parrot or pugs?] [FIXME - this section should be checked by someone familiar with PGE.] * P6CRE Intended to serve as a temporary perl6 regexp parser to permit PGE codegen to zip ahead. Translates perl6 regexps into a combination of PCRE perl5 regexps and capture "repackers" which create a perl6-style match tree. A match tree can serve as a parse tree. The match tree for a perl6-rules specification of perl6-rules can be walked, creating an P6CRE rx which can generate the same match tree. Limitations: NOT CHECKED IN; unfinished; glacial (especially rx generation); exposes PCRE semantics (no subrule left-recursion, odd constraints on right-recursion, etc); full perl6 regex support is not intended (no embedded code, ...). [Then perl5 was linked to pugs...] * Regexp::Parser Also intended serve as a temporary perl6 regexp parser to permit PGE codegen to zip ahead. And ... probably more? It's a version of CPAN's Regexp::Parser updated to understand perl6 syntax. Written perl5. Generates perl5 object-based parse tree. Limitations: NOT CHECKED IN; not quite finished(?); parser only. [FIXME - this section should be reviewed by someone familiar with Regexp::Parser.] * Unnamed per5-based engine. With perl5 now linkable, with callbacks (which PGE and PCRE currently lack), we could use the perl5 regex engine with P6CRE-like regexp transliteration and capture repacking. This could provide fast perl6 regexps, with good backtracking, and callbacks (external aliases and partial embedded code support). Depending on the current state of PGE, this may or may not help folk work on large grammars (perl6, ruby, etc). The code generator should probably be written in perl5 for speed. Either Regexp::Parser or P6CRE might be used to parse. Risks: may be obsoleted by PGE - with parsing unstuck, PGE may begin to develop rapidly, providing basic regexs (risk high), and parrot codegen for pugs may happen rapidly, providing callbacks (risk low?high?; alternate mechanism?); platform availability of perl5 linkage (???); friction in the linkage (???); complexity (most of P6CRE development time was spent fighting pugs and perl6, rather than the domain, but still, P6CRE is unfinished, and the domain has lots of details to look after); may be obsoleted by Parsec combinators; depends on experimental regexp features which are apparently... unevenly supported across perl5 versions. Limitations: paper airplane - just an idea. A possiblity: do a very rough and initial version, and *check it in* (this being written by the P6CRE author who hasn't yet;). Then depending on how easy it is, and how much PGE is or isn't a bottleneck, we can flesh it out. * Unnamed - compiling rules into Parsec combinators See hw2005.txt.