package re::engine::Lua; use 5.010000; use XSLoader (); # All engines should subclass the core Regexp package our @ISA = 'Regexp'; BEGIN { $VERSION = '0.03'; XSLoader::load __PACKAGE__, $VERSION; } sub import { $^H{regcomp} = ENGINE; } sub unimport { delete $^H{regcomp} if $^H{regcomp} == ENGINE; } 1; __END__ =head1 NAME re::engine::Lua - Lua regular expression engine =head1 SYNOPSIS use re::engine::Lua; if ('Hello, world' =~ /Hello, (world)/) { print "Greetings, $1!"; } =head1 DESCRIPTION Replaces perl's regex engine in a given lexical scope with the Lua 5.1 one. See "Lua 5.1 Reference Manual", section 5.4.1 "Patterns", L. =head2 Character Class: A I is used to represent a set of characters. The following combinations are allowed in describing a character class: =over 4 =item B (where I is not one of the I C<^$()%.[]*+-?)> represents the character I itself. =item B<.> (a dot) represents all characters. =item B<%a> represents all letters. =item B<%c> represents all control characters. =item B<%d> represents all digits. =item B<%l> represents all lowercase letters. =item B<%p> represents all punctuation characters. =item B<%s> represents all space characters. =item B<%u> represents all uppercase letters. =item B<%w> represents all alphanumeric characters. =item B<%x> represents all hexadecimal digits. =item B<%z> represents the character with representation 0. =item B<%x> (where I is any non-alphanumeric character) represents the character I. This is the standard way to escape the magic characters. Any punctuation character (even the non magic) can be preceded by a C<'%'> when used to represent itself in a pattern. =item B<[set]> represents the class which is the union of all characters in I. A range of characters may be specified by separating the end characters of the range with a C<'-'>. All classes C<%x> described above may also be used as components in I. All other characters in I represent themselves. For example, C<[%w_]> (or C<[_%w]>) represents all alphanumeric characters plus the underscore, C<[0-7]> represents the octal digits, and C<[0-7%l%-]> represents the octal digits plus the lowercase letters plus the C<'-'> character. The interaction between ranges and classes is not defined. Therefore, patterns like C<[%a-z]> or C<[a-%%]> have no meaning. =item B<[^set]> represents the complement of I, where I is interpreted as above. =back For all classes represented by single letters (C<%a>, C<%c>, etc.), the corresponding uppercase letter represents the complement of the class. For instance, C<%S> represents all non-space characters. The definitions of letter, space, and other character groups depend on the current locale. In particular, the class C<[a-z]> may not be equivalent to C<%l>. =head2 Pattern Item: A I may be =over 4 =item * a single character class, which matches any single character in the class; =item * a single character class followed by C<'*'>, which matches 0 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence; =item * a single character class followed by C<'+'>, which matches 1 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence; =item * a single character class followed by C<'-'>, which also matches 0 or more repetitions of characters in the class. Unlike C<'*'>, these repetition items will always match the I possible sequence; =item * a single character class followed by C<'?'>, which matches 0 or 1 occurrence of a character in the class; =item * C<%n>, for I between 1 and 9; such item matches a substring equal to the i-th captured string (see below); =item * C<%bxy>, where I and I are two distinct characters; such item matches strings that start with I, end with I, and where the I and I are I. This means that, if one reads the string from left to right, counting I<+1> for an I and I<-1> for a I, the ending I is the first I where the count reaches 0. For instance, the item C<%b()> matches expressions with balanced parentheses. =back =head2 Pattern: A I is a sequence of pattern items. A C<'^'> at the beginning of a pattern anchors the match at the beginning of the subject string. A C<'$'> at the end of a pattern anchors the match at the end of the subject string. At other positions, C<'^'> and C<'$'> have no special meaning and represent themselves. =head2 Captures: A pattern may contain sub-patterns enclosed in parentheses; they describe I. When a match succeeds, the substrings of the subject string that match captures are stored (I) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern C<"(a*(.)%w(%s*))">, the part of the string matching C<"a*(.)%w(%s*)"> is stored as the first capture (and therefore has number 1); the character matching C<"."> is captured with number 2, and the part matching C<"%s*"> has number 3. As a special case, the empty capture C<()> captures the current string position (a number). For instance, if we apply the pattern C<"()aa()"> on the string C<"flaaap">, there will be two captures: 3 and 5. NOT SUPPORTED BY re::engine::Lua A pattern cannot contain embedded zeros. Use C<%z> instead. =head1 AUTHORS FranEois PERRAD =head1 HOMEPAGE The development is hosted at L. =head1 COPYRIGHT Copyright 2007-2008 FranEois PERRAD. This program is free software; you can redistribute it and/or modify it under the same terms as Lua. The code fragment from original Lua 5.1.3 is under a MIT license. See the F file for details. =cut