=head1 TITLE [DRAFT] Synopsis 26 - Perl Documentation =head1 AUTHORS Brian Ingerson Sam Vilain =head1 VERSION Maintainer: Sam Vilain Date: 9 Apr 2005 Last Modified: 13 Apr 2005 This document attempts to describe the documentation capabilities of Perl 6. It assumes familiarity with Perl 5 and the Pod (Plain Old Documentation) format. =head1 OVERVIEW Throughout this document, the term "Perldoc" will be used as the generic term to describe Perl Documentation rather than the original name "Pod". Pod now refers to a specific I of Perldoc, as well as a I. With Perldoc, there's more than one way to do documentation (TMTOWTDD). This document covers the following major areas: =over 4 =item Perldoc Containment in C<...> This covers how Perl syntax is distinguished from Perldoc syntax, with sections on: Perl 5, Perl 6, "raw" files (eg, C<.pod> / C<.kwid> files), and custom Perldoc containment. =item Perl 6 Pod Syntax This section describes the syntax of POD. =over 4 =item Pod, by example A walk through all of the Perl 6 Pod syntax. =item Changes from Perl 5 Pod There are slight changes to the Perl 5 Pod structure, to make it consistent and unambiguous. This section summarises them with side-by-side examples =back =item Syntax Dialects Various Perldoc syntax formats can map to the same PDTD. This way, someone can use an alternate 'dialect' in their Perl code, and tools that deal with Pod don't need to do anything special to be able to support it, other than by using Perldoc. =item Escaping and Embedding Dialects To allow everything from inline tests to multimedia documentation, Perldoc dialect syntaxes will normally provide mechanisms for escaping markup as actual content, and for embedding other dialects (fragments, or special nodes) within itself. =item Semantic Dialects Semantic Dialects affect the structure of the document, and covers "opaque" objects, and alternate document structures not necessarily fully interoperable with tools expecting pure PodDT. =item The Kwid Syntax Dialect Kwid is a completely new syntax based on experience from more modern internet social communication. =over 4 =item Kwid Basics How to use Kwid, with side-by-side POD comparison. =back =item Perldoc Object Model (PDOM) The document object model that drives Perldoc, POD and dialects. =over 4 =item Core model (PDOM) Navigating the document tree structure, and passing them around as event streams. =item Perldoc type definitions (PDTD) Validating documents =item Pod Document Type (PodDT) Linking documents with PDTD's, and mixing document types. =item Perldoc Linking (PLink) Referring to other documents, and extracting information from POD at runtime via the C<%*DOC> variable. =back =back =head1 Perldoc Containment in C<...> The first thing to tackle is how various interpreters (including the Perl interpreter) distinguish which characters in a file or stream are actual Perl code and which ones are Perldoc. =head2 Perldoc Containment in Perl 5 To not break anything, Perldoc attempts to stay within the same bounds as those imposed by the Perl 5 interpreter, namely that a section of Perldoc begins with a line matching the regexp: /^=\w.*$/m and ends with the next line matching: /^=cut.*$/m Pod has a generic identifier to start a Pod section: /^=pod.*$/m Note that the C<=pod> and C<=cut> lines are not considered part of the Pod, but simply as containment markers. However Pod also allows a section to begin with any number of block identifiers as long as it starts with an equals sign. So the line: =head2 Something To Say acts not only as a containment starting marker, but also as part of the content (a heading). =head2 Perldoc Containment in Perl 6 This draft currently keeps exactly the same restrictions as the Perldoc Containment in Perl 5, for two reasons: =over =item # Backwards compatability with Perl 5. There is no reason why Perldoc dialects and tools cannot be used with Perl 5 today, without any need to change the interpreter. =item # To gain acceptance with the Perl 6 design team, by not asking for anything special to accomplish its goals. That said, the containment rules could and should be made smarter by the Perl 6 team. =back However, it should be noted that most POD processors and Perldoc will treat whitespace differently. With Perl 5 POD, lone whitespace was often considered content - with Perldoc, this is never the case. =head2 File Containment The above describes how to divine the Perl from the Doc, which assumes they are intertwingled in a Perl source code file. Documentation can also live in a file by itself. Perldoc considers files ending with C<.pod> to be documentation in the Pod dialect and files ending with C<.kwid> to be in the Kwid dialect, etc. A perldoc parser can look to the file extension for a dialect hint, if no other clue is provided. This implies that the lines like: =pod =kwid =doc =cut are not necessary in pure Perldoc files. In fact, in a Kwid file, they would just be plain text. =head2 Perldoc Containment in $language Parsers that extract documentation out of files of any format can easily be added, by writing a custom parser. The custom parser's job is to read a stream of characters, lines or paragraphs (at its option) and output Perldoc DOM events, which can be used to build a PDOM tree or fed straight into an output processor. If these events are found to conform to the PodDT (described later), then no further programming is necessary to treat completely alien source formats, like PHPDoc or JavaDoc, as if they were Perldoc. =head1 Perl 6 POD Syntax This section details the new Perl 6 POD syntax. =head2 Pod, by example B: content to be based on the below comments, but a section here would remove the dependancy that the user must be familiar with Perl 5 and its POD. In summary, with Pod, when you write something like; =head1 foo content This is converted by the I into document stream events; - start_element("head1") - characters("foo") - end_element("head1") - start_element("para") - characters("content") - end_element("para") So, the C<=begin> and C<=end> syntax simply allows you to place 'raw' events in the stream; the above could be written explicitly as: =begin head1 foo =end =begin para content =end Which would have generated the same series of events, though with slightly different properties to indicate the actual source form that was used. The exact rules will be detailed here once the Perldoc proof of concept explores what works. =head2 Changes from Perl 5 POD In general, Perl 6 POD is backwards compatible with Perl 5 Pod. This gives the Pod dialect a slight advantage in being able to start a section with actual content. Any other dialect that wanted this feature would need to have similar block markup. Perldoc extends the notion of containment while still fitting inside the Perl5/Pod restrictions. Perldoc offers a generic starting marker of: =doc This is a dialect agnostic version of the traditional: =pod which is still valid in Perldoc but is a shortcut for: =doc.pod and =doc.kwid is long for: =kwid The term "doc" is more readily understood by those readers not familiar with Pod or Kwid. If C<=doc> has no dialect qualifier, it is assumed to be the dialect of the previous section. If there is no previous section, the dialect should be autodetected. All of the text following the C<=doc> marker but on the same line is considered to be the first line of the actual content. This allows Kwid to do some thing like this: =doc - Something Some interesting point. =cut to be a synonym for: =doc.kwid - Something Some interesting point. =cut which is semantically equivalent to Pod's =item Something Some interesting point. =cut Also, in order to make POD more consistent, the following minor details will change: =over =item Allow named hyperlinks Pod allows this syntax: L for named links to other documents. It should also allow: L for named hyperlinks. =item C<=over> and C<=back> will be deprecated. This markers are ambiguous as indenters and list markers. Instead we will have: =begin list =end list with special syntax to make them less verbose. =back For more information on how the Pod dialect might change, see L. If the Pod dialect is changed significantly by the Perl 6 design team, it is suggested that there remain a legacy Perl 5 dialect. Hopefully the legacy dialect would be called "Pod", and the improved version something else. "Mod"?? =head1 SYNTAX DIALECTS In the spirit of TMTOWTDI, Perldoc allows an author to chose a documentation syntax of their choice without needing to worry whether downstream processes and tools will be able to use it properly. These variations of syntax are referred to as I. =head2 Background and Rationale Pod was created in a time before modern day phenomenons like wikis existed. Wikis are similar to Pod in that that they ask authors to write content prose and structural/formatting markup in an all text format that is simpler and less foreboding than HTML. Then some program converts the text into a nicely readable format like HTML. Wiki syntax comes in dozens of varieties, but the main theme is "make the unformatted text feel as close as possible to the formatted text, because most of the people using wikis will not be technical". Normal non-programmmer folk aren't all that good at picking out cryptic markup from content. And while the authors of most Perldoc are very technical, some of them wonder why they can't just use the friendlier markup. =head2 Other Dialects "Pod" is now the Perldoc dialect that looks exactly like Pod. "Kwid" is one Perldoc dialect that takes the best ideas from the various wiki syntaxes that correspond to ideas in the Pod model. Other dialects should be created by people who are neither fond of Pod nor Kwid. An XML dialect would be trivial to define since the PDOM can be thought of as being an XML schema. Likewise an HTML dialect would be useful as a formal syntax for creating Pod from HTML. A WYSIWYG Perldoc editor could be thought of as just another dialect. =head1 ESCAPING AND EMBEDDING SYNTAXES In addition to providing syntactical constructs for all the nodes of the PDOM, a Perldoc dialect must provide forms for escaping plain text and embedding other Perldoc syntax dialects. =head2 Escaping Escaping means to mark characters which are semantically part of the content of the document but might otherwise be construed as markup. Here are some examples where the first line is ambiguous or wrong and the following line(s) fixes it. In the Pod dialect: =head1 Not a heading Ehead1 Not a heading This equation C b> This equation C b> This equation C<< a > b >> This is verbatim X<<< >>>X This is verbatim XE<< >>>X Perldoc(tm) is fun PerldocE is fun In the Kwid dialect: = Not a heading \= Not a heading Not a link: [title|page/section] Not a link: \[title|page/section] This is not huggy but should be *bold*/italic/ This is not huggy but should be {*bold*}{/italic/} This is verbatim {*zyz*} This is verbatim {\*zyz*} This is verbatim { {*zyz*} } Perldoc(tm) is fun Perldoc™ is fun Note that none of these is purported to be elegant, but a complete syntax requires such mechanisms. =head2 Embedding Syntaxes Each dialect must have a mechanism to switch parsing to another dialect and back again. Using Pod and Kwid again, here is an example of Pod embedding Kwid: =doc.pod =head2 Here is a list =begin kwid * one * two * three =end kwid That was a B! =cut and here is the opposite: =doc.kwid == Here is a list .pod =over =item * one =item * two =item * three =back ..pod That was a *list*! =cut =head1 SEMANTIC DIALECTS Semantic dialects add to the I in some non-conforming way. These are normally called I. For instance, they might add a single element that has a link to an image (or inlined image data). To something expecting normal PodDT, these nodes are effectively "opaque". Any parser is free to construct documents that are not conforming to PodDT. During a process known as I, dialect-specific handlers can mutate the document structure in well-defined but flexible ways, decide to "remove" themselves from the stream, or even raise a fatal error. It is also possible to embed portions of alternate document types in POD. The 'legacy' C<=for>, C<=begin> and C<=end> markers are one way to do this, for instance; =begin testing my $foo=Foo->new("doobie"=>"doob-doo"); ok($foo->drew("you"), "blue", "schmoo"); =end testing This will put a node of name C into the Perldoc tree. This does not conform to the PodDT, so it must be a dialect. Unknown sections like this are ignored (that is, removed) during conformance if no document type handler (POD Dialect plug-in) can be found. This means that various users of POD can continue to simply put their items into the POD, marked appropriately, and expect to be able to pull them out sensibly. In fact, it's even easier to get them out in Perl 6; assuming that the above block is in the current source file, it could be accessed as: %*POD{'//testing'} This syntax is detailed further in the section, L. A non-backwards compatible mechanism is also supplied to mark "inline" POD styles as belonging to an alternate semantic dialect (document type): Z:image: In the above example, a node with name "image", containing a text node with content "image.jpg" is created. Kwid also defines syntax for I level extensions. This was written: {date: 2005-04-09} So dates can have extension processors to do fancy things. Other syntax dialects are free to specify their own weird and wonderful ways of including these martian nodes, or not at all if they don't care. B the details of the above embedding syntax are still subject to exploration by proof of concept. =head2 Example: Tables Tables are a prime example of the use of this sort of construct. While too unweildy to impose on every tool, tables are useful in many documentation applications. So there will be an extension that handle tables. Such an extension could be coded without writing a parser using POD tokens like so; =begin table =begin row =begin cell =end =end =end If tools are not available at a particular stage of processing an extension construct, that construct will be reported as an opaque object by the PDOM. It is entirely possible that a further stage of proceesing will be able to move the opaque object into some representation dictated by the extension's schema. However, by allowing these fragments to be specified using document types that can be shared between semantic dialects, the scope for interoperability between the tools increases. =head1 THE KWID DIALECT The Kwid dialect is more formally described here: L The quickest way to explain Kwid is simply to show a side by side Pod/Kwid cheat sheet: =head1 Big Thing = Big Thing =head4 Small Thing ==== Small Thing A paragraph of A paragraph of plain text. plain text. # verbatim # verbatim sub v { sub v { shift; shift; } } =over * foo * bar =item * foo =item * bar =back =over - foo =item foo Foo is free - bar Bar is he Foo is free =item bar Bar is he =over Something B! Something *bold*! Something I! Something /italic/! Some code C! Some code `E = M * C ^ 2`! =begin opaque .opaque =end opaque ..opaque =for opaque .:opaque This is just a small example to give you an idea. Kwid is really nice for nested lists: * This ++ One --- One Is the /lonelist/ integer --- Won The Race ++ Two --- Two For the show --- Too Far from here * That * The Other The above in Pod would be horribly long. =head1 THE PERLDOC OBJECT MODEL (PDOM) Perldoc is based on a Document Object Model. This model is designed to assist interoperability between processing tools and use within editors. There are several distinct, clearly defined components with heritage from Document Processing technologies, such as SGML, TeX and XML. Correspondance to various technologies in these areas are mentioned in this section so as to gel concepts cleanly. =head2 Core model (PDOM) This is very akin to the 'core' section of the W3C XML-DOM specification, level 2. Basically nothing other than tree navigation and attribute/element editing. It as well as a standard for passing trees around as events, similar in appearance to SAX (but simple C<;)>). The PDOM can be thought of as a tree of I. There are 3 kinds of nodes: =over =item Text Nodes Leaf nodes containing content text. =item Collection Nodes Nodes that contain other nodes, and possibly a set of attributes. =item Ignorable Nodes Nodes for text in the syntax presentation that has no bearing on the document's intended content, but must be preserved for applications like editors and syntax hilighting. This typically includes extra whitespace and throwaway comments. =back At this point, you might be looking at the above and thinking, "what about 'opaque' types, like images?". Well, with Perldoc there is no such thing as "Opaque". Anything (be it a node attribute, a fragment or even an entire document structure) that has a different I than the expected type can be considered "opaque" - after all, we don't know what to do with it yet! During a pre-formatting stage known as I, rules that map these document types to Perl modules will allow Perldoc plug-ins to perform arbitrary actions when these 'weird' nodes are encountered. =head3 PDOM Serial API Perldoc allows for SAX-style streaming parsing and emission of documents. The serial API looks something like this: - start_document(title) - Start a new Perldoc document - end_document() - End a Perdoc document - start_element(type) - Start a new node - end_element(type) - End a node - characters(text) - Content text as unicode chars - ignorable(text) - Non-content text =head3 PDOM Random Access API This API consists of functions that would likely look similar to XML/DOM. Larry has also stated that the documentation of a program will be available through the global variable C<%*POD> which I would humbly suggest be changed or (at least aliased) to C<%*DOC>. It has not yet been determined how all the parts of the PDOM would be accessed through this hash. Perhaps the variable C<$*DOC> could hold a reference to the programs PDOM object. =head2 Perldoc type definitions (PDTD) This is the "type" of the document structure (or fragment), and defines what tags and attributes are allowed. There are various hooks to allow virtually limitless styles of extensions within this framework. This is analagous to SGML DTDs, etc. This structure itself is specified in a refreshingly straightforward manner, a sample of which is below: --- #YAML 1.0 elements: perldoc: '(sect1|%block)*' %block: '(para|verbatim)' sect1: 'title?(sect1|%block)*' sect2: 'title?para*' %inline: '(strong|emphasis|link|#CHAR+)' para: '%inline*' title: '%inline*' verbatim: '#CHAR*' link: '%inline*' attr: link: target: '.*' ents: trade: "\u{2122}" ... Note: the above is an early prototype and will almost certainly change in details. =head2 Pod Document Type (PodDT) If the content you are writing is inline Perl documentation, there is a strong chance that it is the POD PDTD (we'll use DT or I as a suffix for these, for instance "PodDT"). Anything that is a true POD dialect will be able to normalise to or produce PodDT when asked. Extensions above and beyond POD can be achieved in many ways, but they all start with a fragment (or complete document) that has an alternate document type. The technique of marking up nodes and attributes as belonging to alternate document types is comparable to B. There are two categories of collection nodes: =over =item Block Nodes These are nodes that correspond in nature to HTML C
s. They represent things like I, I, I and I. =item Phrase Nodes These are nodes that correspond in nature to HTML Cs. They represent things like I, I, I and I. =back Each node has a I that indicates what type of data it holds. The following is a list of nodes that exist in the PDOM model: - heading1_block - heading2_block - heading3_block - heading4_block - paragraph_block - verbatim_block - comment_block - opaque_block - unordered_list - ordered_list - definition_list - list_item - item_term - item_definition - bold_phrase - italic_phrase - code_phrase - file_phrase - opaque_phrase - document_link - hyper_link - plain_text PDOM defines both a serial API and a random access API for accessing document content. =item Perldoc Linking (PLink) Perldoc Linking covers two angles; =over =item 1 References and/or inclusions of other documentation in the Perldoc =item 2 Referrring to the Perldoc from the code, for instance via C<%*DOC>. =back This section covers XPath-style specification of document points, as well as traditional heading-links.