=head1 NAME XML::Compile::Schema - Compile a schema into CODE =head1 INHERITANCE XML::Compile::Schema is a XML::Compile XML::Compile::Schema is extended by XML::Compile::Cache =head1 SYNOPSIS # compile tree yourself my $parser = XML::LibXML->new; my $tree = $parser->parse...(...); my $schema = XML::Compile::Schema->new($tree); # get schema from string my $schema = XML::Compile::Schema->new($xml_string); # get schema from file my $schema = XML::Compile::Schema->new($filename); # adding schemas $schema->addSchemas($tree); # three times the same: well-known url, filename in schemadir, url $schema->importDefinitions('http://www.w3.org/2001/XMLSchema'); $schema->importDefinitions('2001-XMLSchema.xsd'); $schema->importDefinitions(SCHEMA2001); # from ::Util # alternatively my @specs = ('one.xsd', 'two.xsd', $schema_as_string); my $schema = XML::Compile::Schema->new(\@specs); # ARRAY! # see what types are defined $schema->printIndex; # create and use a reader use XML::Compile::Util qw/pack_type/; my $elem = pack_type 'my-namespace', 'my-local-name'; # $elem eq "{my-namespace}my-local-name" my $read = $schema->compile(READER => $elem); my $data = $read->($xmlnode); my $data = $read->("filename.xml"); # when you do not know the element type beforehand use XML::Compile::Util qw/type_of_node/; my $elem = type_of_node $xml->documentElement; my $reader = $reader_cache{$type} # either exists ||= $schema->compile(READER => $elem); # or create my $data = $reader->($xmlmsg); # create and use a writer my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => '{myns}mytype'); my $xml = $write->($doc, $hash); my $result = $doc->setDocumentElement($xml); # show result print $xml->toString; # to create the type nicely use XML::Compile::Util qw/pack_type/; my $type = pack_type 'myns', 'mytype'; print $type; # shows {myns}mytype # using a compiled routines cache use XML::Compile::Cache; # seperate distribution my $schema = XML::Compile::Cache->new(...); # Error handling tricks with Log::Report use Log::Report mode => 'DEBUG'; # enable debugging dispatcher SYSLOG => 'syslog'; # errors to syslog as well try { $reader->($data) }; # catch errors in $@ =head1 DESCRIPTION This module collects knowledge about one or more schemas. The most important method provided is L, which can create XML file readers and writers based on the schema information and some selected element or attribute type. Various implementations use the translator, and more can be added later: =over 4 =item C<< $schema->compile('READER'...) >> translates XML to HASH The XML reader produces a HASH from a XML::LibXML::Node tree or an XML string. Those represent the input data. The values are checked. An error produced when a value or the data-structure is not according to the specs. The CODE reference which is returned can be called with anything accepted by L. example: create an XML reader my $msgin = $rules->compile(READER => '{myns}mytype'); # or ... = $rules->compile(READER => pack_type('myns', 'mytype')); my $xml = $parser->parse("some-xml.xml"); my $hash = $msgin->($xml); or my $hash = $msgin->('some-xml.xml'); my $hash = $msgin->($xml_string); my $hash = $msgin->($xml_node); =item C<< $schema->compile('WRITER', ...) >> translates HASH to XML The writer produces schema compliant XML, based on a Perl HASH. To get the data encoding correctly, you are required to pass a document object in which the XML nodes may get a place later. example: create an XML writer my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => '{myns}mytype'); my $xml = $write->($doc, $hash); print $xml->toString; alternative my $write = $schema->compile(WRITER => 'myns#myid'); =item C<< $schema->template('XML', ...) >> creates an XML example Based on the schema, this produces an XML message as example. Schemas are usually so complex that people loose overview. This example may put you back on track, and used as starting point for many creating the XML version of the message. =item C<< $schema->template('PERL', ...) >> creates an Perl example Based on the schema, this produces an Perl HASH structure (a bit like the output by Data::Dumper), which can be used as template for creating messages. The output contains documentation, and is usually much clearer than the schema itself. =back Be warned that the B; you can develop schemas which do work well with this module, but are not valid according to W3C. In many cases, however, the translater will refuse to accept mistakes: mainly because it cannot produce valid code. =head1 METHODS =head2 Constructors XML::Compile::Schema-EB([XMLDATA], OPTIONS) =over 4 Details about many name-spaces can be organized with only a single schema object (actually, the data is administered in an internal L object) The initial information is extracted from the XMLDATA source. The XMLDATA can be anything what is acceptable by L, which is everything accepted by L or an ARRAY of those things. You may also add any OPTION accepted by L to guide the understanding of the schema. When no XMLDATA is provided, you can add it later with L You can specify the hooks before you define the schemas the hooks work on: all schema information and all hooks are only used when the readers and writers get compiled. Option --Defined in --Default block_namespace [] hook undef hooks [] ignore_unused_tags key_rewrite [] schema_dirs XML::Compile undef typemap {} . block_namespace => NAMESPACE|TYPE|HASH|CODE|ARRAY =over 4 See L =back . hook => ARRAY-WITH-HOOKDATA | HOOK =over 4 See L. Adds one HOOK (HASH). =back . hooks => ARRAY-OF-HOOK =over 4 See L. =back . ignore_unused_tags => BOOLEAN|REGEXP =over 4 (WRITER) Usually, a C warning is produced when a user provides a data structure which contains more data than is needed for the XML message which is created; this will show structural problems. However, in some cases, you may want to play tricks with the data-structure and therefore disable this precausion. With a REGEXP, you can have more control. Only keys which do match the expression will be ignored silently. Other keys (usually typos and other mistakes) will get reported. See L =back . key_rewrite => HASH|CODE|ARRAY-of-HASH-and-CODE =over 4 Translate XML element local-names into different Perl keys. See L. =back . schema_dirs => DIRECTORY|ARRAY-OF-DIRECTORIES . typemap => HASH =over 4 HASH of Schema type to Perl object or Perl class. See L, the serialization of objects. =back =back =head2 Accessors $obj-EB(HOOKDATA|HOOK|undef) =over 4 HOOKDATA is a LIST of options as key-value pairs, HOOK is a HASH with the same data. C is ignored. See L and L below. =back $obj-EB(HOOK, [HOOK, ...]) =over 4 Add multiple hooks at once. These must all be HASHes. See L and L. C values are ignored. =back $obj-EB(PREDEF|CODE|HASH, ...) =over 4 Add new rewrite rules to the existing list (initially provided with L). The whole list of rewrite rules is returned. C rules will be applied first. Special care is taken that the prefix will not be called twice. The last added set of rewrite rules will be applied first. See L. =back $obj-EB(DIRECTORIES|FILENAME) XML::Compile::Schema-EB(DIRECTORIES|FILENAME) =over 4 See L =back $obj-EB(XML, OPTIONS) =over 4 Collect all the schemas defined in the XML data. The XML parameter must be a XML::LibXML node, therefore it is adviced to use L, which has a much more flexible way to specify the data. Option --Default attribute_form_default element_form_default filename undef source undef target_namespace . attribute_form_default => 'qualified'|'unqualified' . element_form_default => 'qualified'|'unqualified' =over 4 Overrule the default as found in the schema. Many old schemas (like WSDL11 and SOAP11) do not specify the correct default element form in the schema but only in the text. =back . filename => FILENAME =over 4 Explicitly state from which file the data is coming. =back . source => STRING =over 4 An indication where this schema data was found. If you use L in LIST context, you get such an indication. =back . target_namespace => NAMESPACE =over 4 Overrule (or set) the target namespace in the schema. =back =back $obj-EB(PAIR) =over 4 Synonym for L. =back $obj-EB(PAIRS) =over 4 Add new XML-Perl type relations. See L. =back $obj-EB(NAMESPACE|TYPE|HASH|CODE|ARRAY) =over 4 Block all references to a NAMESPACE or full TYPE, as if they do not appear in the schema. Specially useful if the schema includes references to old (deprecated) versions of itself which are not being used. It can also be used to block inclusion of huge structures which are not used, for increased compile performance, or to avoid buggy constructs. These values can also be passed with L and L. =back $obj-EB =over 4 Returns the LIST of defined hooks (as HASHes). =back $obj-EB(SCHEMA, [SCHEMA]) =over 4 Pass a L object, or extensions like L, to be used as definitions as well. First, elements are looked-up in the current schema definition object. If not found the other provided SCHEMA objects are checked in the order as they were added. Searches for definitions do not recurse into schemas which are used by the used schema. example: use other Schema my $wsdl = XML::Compile::WSDL->new($wsdl); my $geo = Geo::GML->new(version => '3.2.1'); # both $wsdl and $geo extend XML::Compile::Schema $wsdl->useSchema($geo); =back =head2 Compilers $obj-EB(('READER'|'WRITER'), TYPE, OPTIONS) =over 4 Translate the specified ELEMENT (found in one of the read schemas) into a CODE reference which is able to translate between XML-text and a HASH. When the TYPE is C, an empty LIST is returned. The indicated TYPE is the starting-point for processing in the data-structure, a toplevel element or attribute name. The name must be specified in C<{url}name> format, there the url is the name-space. An alternative is the C which refers to an element or type with the specific C attribute value. When a READER is created, a CODE reference is returned which needs to be called with XML, as accepted by L. Returned is a nested HASH structure which contains the data from contained in the XML. The transformation rules are explained below. When a WRITER is created, a CODE reference is returned which needs to be called with an XML::LibXML::Document object and a HASH, and returns a XML::LibXML::Node. Many OPTIONS below are B in the manual-page L, which implements the compilation. Option --Default abstract_types 'ERROR' any_attribute undef any_element undef any_type attributes_qualified block_namespace [] check_occurs check_values default_values elements_qualified hook undef hooks undef ignore_facets ignore_unused_tags include_namespaces interpret_nillable_as_optional key_rewrite [] mixed_elements 'ATTRIBUTES' namespace_reset output_namespaces undef path permit_href prefixes {} sloppy_floats sloppy_integers typemap {} use_default_namespace validation . abstract_types => 'ERROR'|'IGNORE'|'ACCEPT' =over 4 How to handle the use abstract types. Of course, they should not be used, but sometime they accidentally are. When set to C, an error will be produced whenever an abstract type is encountered. C will ignore the existence of abstract types. C will ignore the fact that the types are abstract, and treat them as non-abstract types. =back . any_attribute => CODE|'TAKE_ALL'|'SKIP_ALL' =over 4 [0.89] In general, C schema components cannot be handled automatically. If you need to create or process anyAttribute information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end. [pre-0.89] this option was named C, which will still work. =back . any_element => CODE|'TAKE_ALL'|'SKIP_ALL' =over 4 [0.89] In general, C schema components cannot be handled automatically. If you need to create or process any information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end. [pre-0.89] this option was named C, which will still work. =back . any_type => CODE =over 4 [1.07] how to handle "anyType" type elements. Depends on the backend. =back . attributes_qualified => BOOLEAN =over 4 When defined, this will overrule the C flags in all schemas. When not qualified, the xml will not produce nor process prefixes on attributes. =back . block_namespace => NAMESPACE|TYPE|HASH|CODE|ARRAY =over 4 See L. =back . check_occurs => BOOLEAN =over 4 Whether code will be produced to do bounds checking on elements and blocks which may appear more than once. When the schema says that maxOccurs is 1, then that element becomes optional. When the schema says that maxOccurs is larger than 1, then the output is still always an ARRAY, but now of unrestricted length. =back . check_values => BOOLEAN =over 4 Whether code will be produce to check that the XML fields contain the expected data format. Turning this off will improve the processing speed significantly, but is (of course) much less safe. Do not set it off when you expect data from external sources: validation is a crucial requirement for XML. =back . default_values => 'MINIMAL'|'IGNORE'|'EXTEND' =over 4 How to treat default values as provided by the schema. With C (the writer default), you will see exactly what is specified in the XML or HASH. With C (the reader default) will show the default and fixed values in the result. C does remove all fields which are the same as the default setting: simplifies. See L. =back . elements_qualified => C|C|C|BOOLEAN =over 4 When defined, this will overrule the namespace use on elements in all schemas. When C is specified, at least the top-element will be name-space qualified. When C or a true value is given, then all elements will be used qualified. When C or a false value is given, the XML will not produce or process prefixes on the elements. The C
attributes will be respected, except on the top element when C is specified. Use hooks when you need to fix name-space use in more subtile ways. With C, you can correct whole schema's about their name-space behavior. =back . hook => HOOK|ARRAY-OF-HOOKS =over 4 Define one or more processing hooks. See L below. These hooks are only active for this compiled entity, where L and L can be used to define hooks which are used for all results of L. The hooks specified with the C or C option are run before the global definitions. =back . hooks => HOOK|ARRAY-OF-HOOKS =over 4 Alternative for option C. =back . ignore_facets => BOOLEAN =over 4 Facets influence the formatting and range of values. This does not come cheap, so can be turned off. It affects the restrictions set for a simpleType. The processing speed will improve, but validation is a crucial requirement for XML: please do not turn this off when the data comes from external sources. =back . ignore_unused_tags => BOOLEAN|REGEXP =over 4 Overrules what is set with L. =back . include_namespaces => BOOLEAN =over 4 Indicates whether the WRITER should include the prefix to namespace translation on the top-level element of the returned tree. If not, you may continue with the same name-space table to combine various XML components into one, and add the namespaces later. No namespace definition can be added the production rule produces an attribute. =back . interpret_nillable_as_optional => BOOLEAN =over 4 Found in the schema wild-life: people who think that nillable means optional. Not too hard to fix. For the WRITER, you still have to state NIL explicitly, but the elements are not constructed. The READER will output NIL when the nillable elements are missing. =back . key_rewrite => HASH|CODE|ARRAY-of-HASH-and-CODE =over 4 Add key rewrite rules to the front of the list of rules, as set by L and L. See L =back . mixed_elements => CODE|PREDEFINED =over 4 What to do when mixed schema elements are to be processed. Read more in the L section below. =back . namespace_reset => BOOLEAN =over 4 Use the same prefixes in C as with some other compiled piece, but reset the counts to zero first. =back . output_namespaces => HASH|ARRAY-of-PAIRS =over 4 [Pre-0.87] name for the C option. Deprecated. =back . path => STRING =over 4 Prepended to each error report, to indicate the location of the error in the XML-Scheme tree. =back . permit_href => BOOLEAN =over 4 When parsing SOAP-RPC encoded messages, the elements may have a C attribute, pointing to an object with C. The READER will return the unparsed, unresolved node when the attribute is detected, and the SOAP-RPC decoder will have to discover and resolve it. =back . prefixes => HASH|ARRAY-of-PAIRS =over 4 Can be used to pre-define prefixes for namespaces (for 'WRITER' or key rewrite) for instance to reserve common abbreviations like C for external use. Each entry in the hash has as key the namespace uri. The value is a hash which contains C, C, and C fields. Pass a reference to a private hash to catch this index. An ARRAY with prefix, uri PAIRS is simpler. prefixes => [ mine => $myns, two => $twons ] prefixes => { $myns => 'mine', $twons => 'two' } # the previous is short for: prefixes => { $myns => [ uri => $myns, prefix => 'mine', used => 0 ] , $twons => [ uri => $twons, prefix => 'two', ...] }; =back . sloppy_floats => BOOLEAN =over 4 The float types of XML are all quite big, and support NaN, INF, and -INF. Perl's normal floats do not, and therefore Math::BigFloat is used. This, however, is slow. When true, you will crash on any value which is not understood by Perl's default float... but run much faster. See also C. =back . sloppy_integers => BOOLEAN =over 4 The XML C data-types must support at least 18 digits, which is larger than Perl's 32 bit internal integers. Therefore, the implementation will use Math::BigInt objects to handle them. However, often an simple C type whould have sufficed, but the XML designer was lazy. A long is much faster to handle. Set this flag to use C as fast (but inprecise) replacements. Be aware that C and C objects are nearly but not fully transparently mimicing the behavior of Perl's ints and floats. See their respective manual-pages. Especially when you wish for some performance, you should optimize access to these objects to avoid expensive copying which is exactly the spot where the differences are. You can also improve the speed of Math::BigInt by installing Math::BigInt::GMP. Add C<< use Math::BigInt try => 'GMP'; >> to the top of your main script to get more performance. =back . typemap => HASH =over 4 Add this typemap to the relations defined by L or L =back . use_default_namespace => BOOLEAN =over 4 [0.91] When mixing qualified and unqualified namespaces, then the use of a default namespace can be quite confusing: a name-space without prefix. Therefore, by default, all qualified elements will have an explicit prefix. =back . validation => BOOLEAN =over 4 XML message must be validated, to lower the chance on abuse. However, of course, it costs performance which is only partially compensated by fewer checks in your code. This flag overrules the C, C, and C. =back =back XML::Compile::Schema-EB(NODE|REF-XML-STRING|XML-STRING|FILENAME|FILEHANDLE|KNOWN) =over 4 See L =back $obj-EB