=head1 NAME XML::Compile::Schema - Compile a schema into CODE =head1 INHERITANCE XML::Compile::Schema is a XML::Compile XML::Compile::Schema is extended by XML::Compile::Cache =head1 SYNOPSIS # compile tree yourself my $parser = XML::LibXML->new; my $tree = $parser->parse...(...); my $schema = XML::Compile::Schema->new($tree); # get schema from string my $schema = XML::Compile::Schema->new($xml_string); # get schema from file my $schema = XML::Compile::Schema->new($filename); # adding schemas $schema->addSchemas($tree); # three times the same: well-known url, filename in schemadir, url $schema->importDefinitions('http://www.w3.org/2001/XMLSchema'); $schema->importDefinitions('2001-XMLSchema.xsd'); $schema->importDefinitions(SCHEMA2001); # from ::Util # alternatively my @specs = ('one.xsd', 'two.xsd', $schema_as_string); my $schema = XML::Compile::Schema->new(\@specs); # ARRAY! # see what types are defined $schema->printIndex; # create and use a reader use XML::Compile::Util qw/pack_type/; my $elem = pack_type 'my-namespace', 'my-local-name'; # $elem eq "{my-namespace}my-local-name" my $read = $schema->compile(READER => $elem); my $data = $read->($xmlnode); my $data = $read->("filename.xml"); # when you do not know the element type beforehand use XML::Compile::Util qw/type_of_node/; my $elem = type_of_node $xml->documentElement; my $reader = $reader_cache{$type} # either exists ||= $schema->compile(READER => $elem); # or create my $data = $reader->($xmlmsg); # create and use a writer my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => '{myns}mytype'); my $xml = $write->($doc, $hash); my $result = $doc->setDocumentElement($xml); # show result print $xml->toString; # to create the type nicely use XML::Compile::Util qw/pack_type/; my $type = pack_type 'myns', 'mytype'; print $type; # shows {myns}mytype # using a compiled routines cache use XML::Compile::Cache; # seperate distribution my $schema = XML::Compile::Cache->new(...); # Error handling tricks with Log::Report use Log::Report mode => 'DEBUG'; # enable debugging dispatcher SYSLOG => 'syslog'; # errors to syslog as well try { $reader->($data) }; # catch errors in $@ =head1 DESCRIPTION This module collects knowledge about one or more schemas. The most important method provided is L, which can create XML file readers and writers based on the schema information and some selected element or attribute type. Various implementations use the translator, and more can be added later: =over 4 =item C<< $schema->compile('READER'...) >> translates XML to HASH The XML reader produces a HASH from a XML::LibXML::Node tree or an XML string. Those represent the input data. The values are checked. An error produced when a value or the data-structure is not according to the specs. The CODE reference which is returned can be called with anything accepted by L. example: create an XML reader my $msgin = $rules->compile(READER => '{myns}mytype'); # or ... = $rules->compile(READER => pack_type('myns', 'mytype')); my $xml = $parser->parse("some-xml.xml"); my $hash = $msgin->($xml); or my $hash = $msgin->('some-xml.xml'); my $hash = $msgin->($xml_string); my $hash = $msgin->($xml_node); =item C<< $schema->compile('WRITER', ...) >> translates HASH to XML The writer produces schema compliant XML, based on a Perl HASH. To get the data encoding correctly, you are required to pass a document object in which the XML nodes may get a place later. example: create an XML writer my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => '{myns}mytype'); my $xml = $write->($doc, $hash); print $xml->toString; alternative my $write = $schema->compile(WRITER => 'myns#myid'); =item C<< $schema->template('XML', ...) >> creates an XML example Based on the schema, this produces an XML message as example. Schemas are usually so complex that people loose overview. This example may put you back on track, and used as starting point for many creating the XML version of the message. =item C<< $schema->template('PERL', ...) >> creates an Perl example Based on the schema, this produces an Perl HASH structure (a bit like the output by Data::Dumper), which can be used as template for creating messages. The output contains documentation, and is usually much clearer than the schema itself. =back Be warned that the B; you can develop schemas which do work well with this module, but are not valid according to W3C. In many cases, however, the translater will refuse to accept mistakes: mainly because it cannot produce valid code. =head1 METHODS =head2 Constructors XML::Compile::Schema-EB([XMLDATA], OPTIONS) =over 4 Details about many name-spaces can be organized with only a single schema object (actually, the data is administered in an internal L object) The initial information is extracted from the XMLDATA source. The XMLDATA can be anything what is acceptable by L, which is everything accepted by L or an ARRAY of those things. You can specify the hooks before you define the schemas the hooks work on: all schema information and all hooks are only used when the readers and writers get compiled. Option --Defined in --Default hook undef hooks [] ignore_unused_tags key_rewrite [] schema_dirs XML::Compile undef typemap {} . hook => ARRAY-WITH-HOOKDATA | HOOK =over 4 See L. Adds one HOOK (HASH). =back . hooks => ARRAY-OF-HOOK =over 4 See L. =back . ignore_unused_tags => BOOLEAN|REGEXP =over 4 (WRITER) Usually, a C warning is produced when a user provides a data structure which contains more data than is needed for the XML message which is created; this will show structural problems. However, in some cases, you may want to play tricks with the data-structure and therefore disable this precausion. With a REGEXP, you can have more control. Only keys which do match the expression will be ignored silently. Other keys (usually typos and other mistakes) will get reported. See L =back . key_rewrite => HASH|CODE|ARRAY-of-HASH-and-CODE =over 4 Translate XML keys into different Perl keys. See L. =back . schema_dirs => DIRECTORY|ARRAY-OF-DIRECTORIES . typemap => HASH =over 4 HASH of Schema type to Perl object or Perl class. See L, the serialization of objects. =back =back =head2 Accessors $obj-EB(HOOKDATA|HOOK|undef) =over 4 HOOKDATA is a LIST of options as key-value pairs, HOOK is a HASH with the same data. C is ignored. See L and L below. =back $obj-EB(HOOK, [HOOK, ...]) =over 4 Add multiple hooks at once. These must all be HASHes. See L and L. C values are ignored. =back $obj-EB(CODE|HASH, CODE|HASH, ...) =over 4 Add new rewrite rules to the existing list (initially provided with L). The whole list of rewrite rules is returned. The last added set of rewrite rules will be applied first. See L. =back $obj-EB(DIRECTORIES|FILENAME) XML::Compile::Schema-EB(DIRECTORIES|FILENAME) =over 4 See L =back $obj-EB(XML, OPTIONS) =over 4 Collect all the schemas defined in the XML data. The XML parameter must be a XML::LibXML node, therefore it is adviced to use L, which has a much more flexible way to specify the data. Option --Default filename undef source undef . filename => FILENAME =over 4 Explicitly state from which file the data is coming. =back . source => STRING =over 4 An indication where this schema data was found. If you use L in LIST context, you get such an indication. =back =back $obj-EB(PAIR) =over 4 Synonym for L. =back $obj-EB(PAIRS) =over 4 Add new XML-Perl type relations. See L. =back $obj-EB =over 4 Returns the LIST of defined hooks (as HASHes). =back =head2 Compilers $obj-EB(('READER'|'WRITER'), TYPE, OPTIONS) =over 4 Translate the specified ELEMENT (found in one of the read schemas) into a CODE reference which is able to translate between XML-text and a HASH. When the TYPE is C, an empty LIST is returned. The indicated TYPE is the starting-point for processing in the data-structure, a toplevel element or attribute name. The name must be specified in C<{url}name> format, there the url is the name-space. An alternative is the C which refers to an element or type with the specific C attribute value. When a READER is created, a CODE reference is returned which needs to be called with XML, as accepted by L. Returned is a nested HASH structure which contains the data from contained in the XML. The transformation rules are explained below. When a WRITER is created, a CODE reference is returned which needs to be called with an XML::LibXML::Document object and a HASH, and returns a XML::LibXML::Node. Most options below are B in the manual-page L, which implements the compilation. Option --Default any_attribute undef any_element undef attributes_qualified check_occurs check_values default_values elements_qualified hook undef hooks undef ignore_facets ignore_unused_tags include_namespaces interpret_nillable_as_optional key_rewrite [] mixed_elements 'ATTRIBUTES' namespace_reset output_namespaces undef path permit_href prefixes {} sloppy_integers typemap {} use_default_namespace validation . any_attribute => CODE|'TAKE_ALL'|'SKIP_ALL' =over 4 [0.89] In general, C schema components cannot be handled automatically. If you need to create or process anyAttribute information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end. Before release 0.89 this option was named C, which will still work. =back . any_element => CODE|'TAKE_ALL'|'SKIP_ALL' =over 4 [0.89] In general, C schema components cannot be handled automatically. If you need to create or process any information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end. Before release 0.89 this option was named C, which will still work. =back . attributes_qualified => BOOLEAN =over 4 When defined, this will overrule the C flags in all schemas. When not qualified, the xml will not produce nor process prefixes on attributes. =back . check_occurs => BOOLEAN =over 4 Whether code will be produced to do bounds checking on elements and blocks which may appear more than once. When the schema says that maxOccurs is 1, then that element becomes optional. When the schema says that maxOccurs is larger than 1, then the output is still always an ARRAY, but now of unrestricted length. =back . check_values => BOOLEAN =over 4 Whether code will be produce to check that the XML fields contain the expected data format. Turning this off will improve the processing speed significantly, but is (of course) much less safe. Do not set it off when you expect data from external sources: validation is a crucial requirement for XML. =back . default_values => 'MINIMAL'|'IGNORE'|'EXTEND' =over 4 How to treat default values as provided by the schema. With C (the writer default), you will see exactly what is specified in the XML or HASH. With C (the reader default) will show the default and fixed values in the result. C does remove all fields which are the same as the default setting: simplifies. See L. =back . elements_qualified => C|C|C|BOOLEAN =over 4 When defined, this will overrule the C flags in all schemas. When C is specified, at least the top-element will be name-space qualified. When C or a true value is given, then all elements will be used qualified. When C or a false value is given, the XML will not produce or process prefixes on the elements. The C
attributes will be respected, except on the top element when C is specified. Use hooks when you need to fix name-space use in more subtile ways. =back . hook => HOOK|ARRAY-OF-HOOKS =over 4 Define one or more processing hooks. See L below. These hooks are only active for this compiled entity, where L and L can be used to define hooks which are used for all results of L. The hooks specified with the C or C option are run before the global definitions. =back . hooks => HOOK|ARRAY-OF-HOOKS =over 4 Alternative for option C. =back . ignore_facets => BOOLEAN =over 4 Facets influence the formatting and range of values. This does not come cheap, so can be turned off. It affects the restrictions set for a simpleType. The processing speed will improve, but validation is a crucial requirement for XML: please do not turn this off when the data comes from external sources. =back . ignore_unused_tags => BOOLEAN|REGEXP =over 4 Overrules what is set with L. =back . include_namespaces => BOOLEAN =over 4 Indicates whether the WRITER should include the prefix to namespace translation on the top-level element of the returned tree. If not, you may continue with the same name-space table to combine various XML components into one, and add the namespaces later. =back . interpret_nillable_as_optional => BOOLEAN =over 4 Found in the schema wild-life: people who think that nillable means optional. Not too hard to fix. For the WRITER, you still have to state NIL explicitly, but the elements are not constructed. The READER will output NIL when the nillable elements are missing. =back . key_rewrite => HASH|CODE|ARRAY-of-HASH-and-CODE =over 4 Add key rewrite rules to the front of the list of rules, as set by L and L. See L =back . mixed_elements => CODE|PREDEFINED =over 4 What to do when mixed schema elements are to be processed. Read more in the L section below. =back . namespace_reset => BOOLEAN =over 4 Use the same prefixes in C as with some other compiled piece, but reset the counts to zero first. =back . output_namespaces => HASH|ARRAY-of-PAIRS =over 4 Pre release 0.87 name for the C option. =back . path => STRING =over 4 Prepended to each error report, to indicate the location of the error in the XML-Scheme tree. =back . permit_href => BOOLEAN =over 4 When parsing SOAP-RPC encoded messages, the elements may have a C attribute, pointing to an object with C. The READER will return the unparsed, unresolved node when the attribute is detected, and the SOAP-RPC decoder will have to discover and resolve it. =back . prefixes => HASH|ARRAY-of-PAIRS =over 4 Can be used to pre-define prefixes for namespaces (for 'WRITER' or key rewrite) for instance to reserve common abbreviations like C for external use. Each entry in the hash has as key the namespace uri. The value is a hash which contains C, C, and C fields. Pass a reference to a private hash to catch this index. An ARRAY with prefix, uri PAIRS is simpler. prefixes => [ mine => $myns, two => $twons ] prefixes => { $myns => 'mine', $twons => 'two' } # the previous is short for: prefixes => { $myns => [ uri => $myns, prefix => 'mine', used => 0 ] , $twons => [ uri => $twons, prefix => 'two', ...] }; =back . sloppy_integers => BOOLEAN =over 4 The C and C types must support at least 18 digits, which is larger than Perl's 32 bit internal integers. Therefore, the implementation will use Math::BigInt objects to handle them. However, often an simple C type whould have sufficed, but the XML designer was lazy. A long is much faster to handle. Set this flag to use C as fast (but inprecise) replacements. Be aware that C and C objects are nearly but not fully transparent mimicing the behavior of Perl's ints and floats. See their respective manual-pages. Especially when you wish for some performance, you should optimize access to these objects to avoid expensive copying which is exactly the spot where the differences are. You can also improve the speed of Math::BigInt by installing Math::BigInt::GMP. Add C<< use Math::BigInt try => 'GMP'; >> to the top of your main script to get more performance. =back . typemap => HASH =over 4 Add this typemap to the relations defined by L or L =back . use_default_namespace => BOOLEAN =over 4 [0.91] When mixing qualified and unqualified namespaces, then the use of a default namespace can be quite confusing: a name-space without prefix. Therefore, by default, all qualified elements will have an explicit prefix. =back . validation => BOOLEAN =over 4 XML message must be validated, to lower the chance on abuse. However, of course, it costs performance which is only partially compensated by fewer checks in your code. This flag overrules the C, C, and C. =back =back $obj-EB(NODE|REF-XML-STRING|XML-STRING|FILENAME|FILEHANDLE|KNOWN) =over 4 See L =back $obj-EB