=head1 NAME XML::Compile::Translate::Reader - translate XML to HASH =head1 INHERITANCE XML::Compile::Translate::Reader is a XML::Compile::Translate =head1 SYNOPSIS my $schema = XML::Compile::Schema->new(...); my $code = $schema->compile(READER => ...); =head1 DESCRIPTION The translator understands schemas, but does not encode that into actions. This module implements those actions to translate from XML into a (nested) Perl HASH structure. =head2 Unsupported features =head1 METHODS =head2 Constructors $obj-EB(TRANSLATOR, OPTIONS) =over 4 See L =back $obj-EB(NAME) XML::Compile::Translate::Reader-EB(NAME) =over 4 See L =back =head2 Attributes =head2 Handlers XML::Compile::Translate::Reader-EB(ELEMENT|ATTRIBUTE|TYPE, OPTIONS) =over 4 See L =back =head1 DETAILS =head2 Translator options =head2 Processing Wildcards If you want to collect information from the XML structure, which is permitted by C and C specifications in the schema, you have to implement that yourself. The problem is C has less knowledge than you about the possible data. =head3 option any_attribute By default, the C specification is ignored. When C is given, all attributes which are fulfilling the name-space requirement added to the returned data-structure. As key, the absolute element name will be used, with as value the related unparsed XML element. In the current implementation, if an explicit attribute is also covered by the name-spaces permitted by the anyAttribute definition, then it will also appear in that list (and hence the handler will be called as well). Use L to write your own handler, to influence the behavior. The handler will be called for each attribute, and you must return list of pairs of derived information. When the returned is empty, the attribute data is lost. The value may be a complex structure. example: anyAttribute in a READER Say your schema looks like this: Then, in an application, you write: my $r = $schema->compile ( READER => pack_type('http://mine', 'el') , anyAttribute => 'ALL' ); # or lazy: READER => '{http://mine}el' my $h = $r->( <<'__XML' ); 42 everything __XML use Data::Dumper 'Dumper'; print Dumper $h; __XML__ The output is something like $VAR1 = { a => 42 , '{http://mine}a' => ... # XML::LibXML::Node with 42 , '{http://mine}b' => ... # XML::LibXML::Node with everything }; You can improve the reader with a callback. When you know that the extra attribute is always of type C, then you can do my $read = $schema->compile ( READER => '{http://mine}el' , anyAttribute => \&filter ); my $anyAttRead = $schema->compile ( READER => '{http://mine}non-empty' ); sub filter($$$$) { my ($fqn, $xml, $path, $translator) = @_; return () if $fqn ne '{http://mine}b'; (b => $anyAttRead->($xml)); } my $h = $r->( see above ); print Dumper $h; Which will result in $VAR1 = { a => 42 , b => 'everything' }; The filter will be called twice, but return nothing in the first case. You can implement any kind of complex processing in the filter. =head3 option any_element By default, the C definition in a schema will ignore all elements from the container which are not used. Also in this case C is required to produce C results. C will ignore all results, although this are being processed for validation needs. The C and C of C are ignored: the amount of elements is always unbounded. Therefore, you will get an array of elements back per type. =head2 Mixed elements [available since 0.86] ComplexType and ComplexContent components can be declared with the C<> attribute. This implies that text is not limited to the content of containers, but may also be used inbetween elements. Usually, you will only find ignorable white-space between elements. In this example, the C container is marked to be mixed: before 2 after Often the "mixed" option is bending one of both ways: either the element is needed as text, or the element should be parsed and the text ignored. The reader has various options to avoid the need of processing raw XML::LibXML nodes. With L set to =over 4 =item ATTRIBUTES (the default) a HASH is returned, the attributes are processed. The node is found as XML::LibXML::Element with the key '_'. Above example will produce $r = { id => 5, _ => $xmlnode }; =item TEXTUAL Like the previous, but now the textual representation of the content is returned with key '_'. Above example will produce $r = { id => 5, _ => ' before 2 after '}; =item STRUCTURAL will remove all mixed-in text, and treat the element as normal element. The example will be transformed into $r = { id => 5, b => 2 }; =item XML_NODE return the XML::LibXML::Node itself. The example: $r = $xmlnode; =item XML_STRING return the mixed node as XML string, just as in the source. Be warned that it is rather expensive: the string was parsed and then stringified again, which is costly for large nodes. Result: $r = ' before 2 after '; =item CODE reference the reference is called with the XML::LibXML::Node as first argument. When a value is returned (even undef), then the right tag with the value will be included in the translators result. When an empty list is returned by the code reference, then nothing is returned (which may result in an error if the element is required according to the schema) =back When some of your mixed elements need different behavior from other elements, then you have to go play with the normal hooks in specific cases. =head2 Schema hooks =head3 hooks executed before the XML is being processed The C hooks receives an XML::LibXML::Node object and the path string. It must return a new (or same) XML node which will be used from then on. You probably can best modify a node clone, not the original as provided by the user. When C is returned, the whole node will disappear. This hook offers a predefined C. example: to trace the paths $schema->addHook(path => qr/./, before => 'PRINT_PATH'); =head3 hooks executed as replacement Your C hook should return a list of key-value pairs. To produce it, it will get the XML::LibXML::Element, the translator settings as HASH, the path, and the localname. This hook has a predefined C, which will not process the found element, but simply return the string "SKIPPED" as value. This way, a whole tree of unneeded translations can be avoided. Sometimes, the Schema spec is such a mess, that XML::Compile cannot automatically translate it. I have seen cases where confusion over name-spaces is created: a choice between three elements with the same name but different types. Well, in such case you may use L to translate a part of your tree. Simply use XML::LibXML::Simple qw/XMLin/; $schema->addHook ( type => ...bad-type-definition... , replace => sub { my ($xml, $args, $path, $type) = @_; ($type => XMLin($xml, ...)); } ); =head3 hooks for post-processing, after the data is collected The data is collect, and passed as second argument after the XML node. The third argument is the path. Be careful that the collected data might be a SCALAR (for simpleType). Return a HASH or a SCALAR. C may work, unless it is the value of a required element you throw awy. This hook also offers a predefined C. Besides, it has C, C, and C, which will result in additional fields in the HASH, respectively containing the CODE which was processed, the element names, and the attribute names. The keys start with an underscore C<_>. =head2 Typemaps In a typemap, a relation between an XML element type and a Perl class (or object) is made. Each translator back-end will implement this a little differently. This section is about how the reader handles typemaps. =head3 Typemap to Class Usually, an XML type will be mapped on a Perl class. The Perl class implements the C method as constructor. $schema->typemap($sometype => 'My::Perl::Class'); package My::Perl::Class; ... sub fromXML { my ($class, $data, $xmltype) = @_; my $self = $class->new($data); ... $self; } Your method returns the data which will be included in the result tree of the reader. You may return an object, the unmodified C<$data>, or C. When C is returned, this may fail the schema parser when the data element is required. In the simpelest implementation, the class stores its data exactly as the XML structure: package My::Perl::Class; sub fromXML { my ($class, $data, $xmltype) = @_; bless $data, $class; } # The same, even shorter: sub fromXML { bless $_[1], $_[0] } =head3 Typemap to Object An other option is to implement an object factory: one object which creates other objects. In this case, the C<$xmltype> parameter can come of use, to have one object spawning many different other objects. my $object = My::Perl::Class->new(...); $schema->typemap($sometype => $object); package My::Perl::Class; sub fromXML { my ($object, $xmltype, $data) = @_; return Some::Other::Class->new($data); } This object factory may be a very simple solution when you map XML onto objects which are not under your control; where there is not way to add the C method. =head3 Typemap to CODE The light version of an object factory works with CODE references. $schema->typemap($t1 => \&myhandler); sub myhandler { my ($backend, $data, $type) = @_; return My::Perl::Class->new($data) if $backend eq 'READER'; $data; } # shorter $schema->typemap($t1 => sub {My::Perl::Class->new($_[1])} ); =head3 Typemap implementation Internally, the typemap is simply translated into an "after" hook for the specific type. After the data was processed via the usual mechanism, the hook will call method C on the class or object you specified with the data which was read. You may still use "before" and "replace" hooks, if you need them. Syntactic sugar: $schema->typemap($t1 => 'My::Package'); $schema->typemap($t2 => $object); is comparible to $schema->typemap($t1 => sub {My::Package->fromXML(@_)}); $schema->typemap($t2 => sub {$object->fromXML(@_)} ); with some extra checks. =head1 SEE ALSO This module is part of XML-Compile distribution version 0.94, built on August 26, 2008. Website: F All modules in this suite: L, L, L, L, L, L, L. Please post questions or ideas to the mailinglist at F For life contact with other developers, visit the C<#xml-compile> channel on C. =head1 LICENSE Copyrights 2006-2008 by Mark Overmeer. For other contributors see ChangeLog. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See F