=head1 NAME XML::LibXML::Document - DOM Document Class =head1 synopsis use XML::LibXML $dom = XML::LibXML::Document->new( $version, $encoding ); $dom = XML::LibXML::Document->createDocument( $version, $encoding ); $strEncoding = $doc->encoding(); $doc->setEncoding($new_encoding); $strVersion = $doc->version(); $doc->standalone $doc->setStandalone($numvalue); my $compression = $doc->compression; $doc->setCompression($ziplevel); $docstring = $dom->toString($format); $state = $doc->toFile($filename, $format); $state = $doc->toFH($fh, $format); $document->toStringHTML(); $bool = $dom->is_valid(); $dom->validate(); $root = $dom->documentElement(); $dom->setDocumentElement( $root ); $element = $dom->createElement( $nodename ); $element = $dom->createElementNS( $namespaceURI, $qname ); $text = $dom->createTextNode( $content_text ); $comment = $dom->createComment( $comment_text ); $attrnode = $doc->createAttribute($name [,$value]); $fragment = $doc->createDocumentFragment() $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] ); $cdata = $dom->create( $cdata_content ); my $pi = $doc->createProcessingInstruction( $target, $data ); my $entref = $doc->createEntityReference($refname); $dtd = $document->createInternalSubset( $rootnode, $public, $system); $dtd = $document->createExternalSubset( $rootnode, $public, $system); $document->importNode( $node ); $document->adoptNode( $node ); my $dtd = $doc->externalSubset; my $dtd = $doc->internalSubset; $doc->setExternalSubset($dtd); $doc->setInternalSubset($dtd); my $dtd = $doc->removeExternalSubset(); my $dtd = $doc->removeInternalSubset(); =head1 DESCRIPTION The Document Class is in most cases the result of a parsing process. But sometimes it is necessary to create a Document from scratch. The DOM Document Class provides functions that are conform to the DOM Core naming style. It inherits all functions from I as specified in the DOM specification. This enables to access the nodes beside the root element on document level - a I for example. The support for these nodes is limited at the moment. While generaly nodes are bound to a document in the DOM concept it is suggested that one should always create a node not bound to any document. There is no need of really including the node to the document, but once the node is bound to a document, it is quite safe that all strings have the correct encoding. If an unbound textnode with an iso encoded string is created (e.g. with $CLASS->new()), the I function may not return the expected result. All this seems like a limitation as long UTF8 encoding is ashured. If iso encoded strings come into play it is much safer to use the node creation functions of B. =head2 Methods =over 4 =item B alias for createDocument() =item B The constructor for the document class. As Parameter it takes the version string and (optionally) the ecoding string. Simply calling B will create the document: Both parameter are optional. The default value for B<$version> is I<1.0>, of course. If the B<$encoding> parameter is not set, the encoding will be left unset, which means UTF8 is implied. The call of B without any parameter will result the following code: Alternatively one can call this constructor directly from the XML::LibXML class level, to avoid some typing. This will not cause any effect to the class instance, which is alway XML::LibXML::Document. my $document = XML::LibXML->createDocument( "1.0", "UTF8" ); is therefore a shortcut for my $document = XML::LibXML::Document->createDocument( "1.0", "UTF8" ); =item B returns the encoding string of the document. my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" ); print $doc->encoding; # prints ISO-8859-15 Optionally this function can be accessed by B or B. =item B From time to time it is useful to change the effective encoding of a document. This method provides the interface to manipulate the encoding of a document. Note that this function has to be used very careful, since you can't simply convert one encoding in any other, since some (or even all) characters may not exist in the new encoding. XML::LibXML will not test if the operation is allowed or possible for the given document. The only switching ashured to work is to UTF8. =item B returns the version string of the document B is an alternative form of this function. =item B This function returns the Numerical value of a documents XML declarations standalone attribute. It returns B<1> if standalone="yes" was found, B<0> if standalone="no" was found and B<-1> if standalone was not specified (default on creation). =item B Through this method it is possible to alter the value of a documents standalone attribute. Set it to B<1> to set standalone="yes", to B<0> to set standalone="no" or set it to B<-1> to remove the standalone attribute from the XML declaration. =item B libxml2 allows to read documents directly from gziped files. In this case the compression variable is set to the compression level of that file (0-8). If XML::LibXML parsed a different source or the file wasn't compressed, the returned value will be B<-1>. =item B If one intends to write the document directly to a file, it is possible to set the compression level for a given document. This level can be in the range from 0 to 8. If XML::LibXML should not try to compress use B<-1> (default). Note that this feature will B work if libxml2 is compiled with zlib support and toFile() is used for output. =item B B is a deparsing function, so the DOM Tree can be translated into a string, ready for output. The optional B<$format> parameter sets the indenting of the output. This parameter is expected to be an I value, that specifies that indentation should be used. The format parameter can have three different values if it is used: If $format is 0, than the document is dumped as it was originally parsed If $format is 1, libxml2 will add ignoreable whitespaces, so the nodes content is easier to read. Existing text nodes will not be altered If $format is 2 (or higher), libxml2 will act as $format == 1 but it add a leading and a trailing linebreak to each text node. libxml2 uses a hardcoded indentation of 2 space characters per indentation level. This value can not be altered on runtime. B: XML::LibXML::Document::toString returns the data in the document encoding rather than UTF8! =item B This function is similar to toString(), but it writes the document directly into a filesystem. This function is very usefull, if one needs to store large documents. The format parameter has the same behaviour as in toString(). =item B This function is similar to toString(), but it writes the document directly to a filehandler or a stream. The format parameter has the same behaviour as in toString(). =item B B deparses the tree to a string as HTML. With this method indenting is automatic and managed by libxml2 internally. =item B Returns either TRUE or FALSE depending on the DOM Tree is a valid Document or not. You may also pass in a XML::LibXML::Dtd object, to validate against an external DTD: if (!$dom->is_valid($dtd)) { warn("document is not valid!"); } =item B This is an exception throwing equivalent of is_valid. If the document is not valid it will throw an exception containing the error. This allows you much better error reporting than simply is_valid or not. Again, you may pass in a DTD object =item B Returns the root element of the Document. A document can have just one root element to contain the documents data. Optionaly one can use B. =item B This function enables you to set the root element for a document. The function supports the import of a node from a different document tree. =item B This function creates a new Element Node bound to the DOM with the name I<$nodename>. =item B This function creates a new Element Node bound to the DOM with the name I<$nodename> and placed in the given namespace. =item B As an equivalent of B, but it creates a B bound to the DOM. =item B As an equivalent of B, but it creates a B bound to the DOM. =item B Creates a new Attribute node. =item B This function creates a DocumentFragment. =item B Creates an Attribute bound to a namespace. =item B Similar to createTextNode and createComment, this function creates a CDataSection bound to the current DOM. =item B create a processing instruction node. Since this method is quite long one may use its short form B. =item B If a docuemnt has a DTD specified, one can create entity refereferences by using this function. If one wants to add a entity reference to the document, this reference has to be created by this function. An entity reference is unique to a document and cannot passed to other documents as other nodes can be passed. B A text content containing something that looks like an entity reference, will not be expanded to a real entity reference unless it is a predefined entity my $string = "&foo;"; $some_element->appendText( $string ); print $some_element->textContent; # prints "&foo;" =item B This function creates and adds an internal subset to the given document. Because the function automaticly adds the DTD to the document there is no need to add the created node explicitly to the document. my $document = XML::LibXML::Document->new(); my $dtd = $document->createInternalSubset( "foo", undef, "foo.dtd" ); will result in the following XML document: By setting the public parameter it is possible to set PUBLIC dtds to a given document. So my $document = XML::LibXML::Document->new(); my $dtd = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef ); will cause the following declaration to be created on the document: =item B This function is similar to I but this DTD is concidered to be external and is therefore not added to the document itself. Nevertheless it can be used for validation purposes. =item B If a node is not part of a document, it can be imported to another document. As specified in DOM Level 2 Specification the Node will not be altered or removed from its original document (I<$node->cloneNode(1)> will get called implicitly). B Don't try to use importNode() to import subtrees that contain an entity reference - even if the entity reference is the root node of the subtree. This will cause serious problems to your program. This is a limitation of libxml2 and not of XML::LibXML itself. =item B If a node is not part of a document, it can be imported to another document. As specified in DOM Level 3 Specification the Node will not be altered but it will removed from its original document. After a document adopted a node, the node, its attributes and all its descendants belong to the new document. Because the node does not belong to the old document, it will be unlinked from its old location first. B Don't try to adoptNode() to import subtrees that contain entity references - even if the entity reference is the root node of the subtree. This will cause serious problems to your program. This is a limitation of libxml2 and not of XML::LibXML itself. =item B If a document has an external subset defined it will be returned by this function. B Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In particular one may not want use common node function on doctype declaration nodes! =item B If a document has an internal subset defined it will be returned by this function. B Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In particular one may not want use common node function on doctype declaration nodes! =item B B This method sets a DTD node as an external subset of the given document. =item B B This method sets a DTD node as an internal subset of the given document. =item B B If a document has an external subset defined it can be removed from the document by using this function. The removed dtd node will be returned. =item B B If a document has an internal subset defined it can be removed from the document by using this function. The removed dtd node will be returned. =back =head1 AUTHOR Matt Sergeant, Christian Glahn =head1 SEE ALSO XML::LibXML, XML::LibXML::DOM, XML::LibXML::Element, XML::LibXML::Text, XML::LibXML::Attr, XML::LibXML::Comment, XML::LibXML::DocumentFragment, XML::LibXML::DTD =head1 VERSION 1.53