NAME XHTML::Util - (alpha software) powerful utilities for common but difficult to nail HTML munging. VERSION 0.99_08 SYNOPSIS use strict; use warnings; use XHTML::Util; my $xu = XHTML::Util ->new(\"This is naked\n\ntext for making into paragraphs."); print $xu->enpara, $/; #
This is naked
# #text for making into paragraphs.
$xu = XHTML::Util ->new(\"Quotes should probably have paras."); print $xu->enpara("blockquote"); #
#$xu = XHTML::Util ->new(\'Something.'); print $xu->strip_tags('a'); # Something. DESCRIPTION You can use CSS expressions to most of the methods. E.g., to only enpara the contents of div tags with a class of "enpara" -- "EQuotes should probably have paras.
#
I can do HTML when I'm paying attention.
Or I need to for some reason.
Oh, I stopped paying attention... What happens here? Or here? I'd like to see this in a paragraph so it's legal markup.
now
this
should
not be touched!
I meant to do that.
With "XHTML::Util-EIs this a paragraph
or two?
I can do HTML when I'm paying attention.
Or I need to for some reason.
Oh, I stopped paying attention... What happens here? Or here?
I'd like to see this in a paragraph so it's legal markup.
now
this
should
not be touched!
I meant to do that.
parser The XML::LibXML parser object used to parse (X)HTML. doc The XML::LibXML::Document object created from input. root The documentElement of the XML::LibXML::Document object. text The "textContent" of the root node. head The head element. body The body element. Note there is always an implicit head and body even with fragments because libxml creates them, well, we ask it to do so. as_fragment Returns the original (intent-wise) fragment or the elements within the body if starting with a full document. as_string Stringified version of object. If the object was created from an HTML fragment, a fragment will be returned. debug Yep. 1-5 with higher giving more info to STDERR. is_document Returns true if the originally parsed item was a full HTML document. is_fragment Returns true if the originally parsed item was a fragment. clone same_same Takes another XHTML::Util object or the valid argument to create one. Attempts to determine if the resulting object is the same as the calling object. E.g., print $xu->same_same(\"OH HAI
") ? "Yepper!\n" : "Noes...\n"; tags Returns a list of all known HTML tags. Please ignore method. I'm not sure it's a good idea, well named, or will remain. selector_to_xpath This wraps "selector_to_xpath" in selector_to_xpath HTML::Selector::Xpath. Not really meant to be used but exposed in case you want it. print $xu->selector_to_xpath("form[name='register'] input[type='password']"); # //form[@name='register']//input[@type='password'] TO DO I think the default doc should be \"". There is no reason to jump through that hoop if wanting to build up something from scratch. Finish spec and tests. Get it running solid enough to remove alpha label. Generalize the argument handling. Provide optional setting or methods for returning nodes instead of serialized content. Improve document/head related handling/options. I can see this being easier to use functionally. I haven't decided on the argspec or method-->sub approach for that yet. I think it's a good idea. BUGS AND LIMITATIONS All input should be UTF-8 or at least safe to run decode_utf8 on. Regular Latin character sets, I suspect, will be fine but extended sets will probably give garbage or unpredictable results; guessing. This will wreck XML and probably XHTML with a custom DTD too. It uses HTML::Tagset's conception of what valid tags are. This is not optimal but it is easier than DTD handling. It might improve to more automatic detection in the future. I have used many of these methods and snippets in many projects and I'm tired of recycling them. Some are extremely useful and, at least in the case of "enpara", better than any other implementation I've been able to find in any language. That said, a lot of the code herein is not well tested or at least not well tested in this incarnation. Bug reports and good feedback are adored. SEE ALSO XML::LibXML, HTML::Tagset, HTML::Entities, HTML::Selector::XPath, HTML::TokeParser::Simple, CSS::Tiny. CSS W3Schools,