=head1 TITLE [DRAFT] Synopsis 26 - Documentation =head1 AUTHORS Ingy döt Net Damian Conway =head1 VERSION Maintainer: Ingy döt Net Date: 9 Apr 2005 Last Modified: 27 Mar 2006 This document describes the documentation capabilities of Perl 6. It assumes familiarity with Perl 5 and its POD (Plain Old Documentation) format. Throughout this document, the term "Perldoc" will be used as the generic term to describe Perl Documentation. "Pod" now refers to a specific I of Perldoc. With Perl 6, there's more than one way to document it. See L to cut to the chase and read information on the many ways in which you can annotate your code with Perldoc. =head1 THE PERLDOC OBJECT MODEL (PDOM) Perldoc has a Document Object Model. That is, all Perl documentation in any dialect is modeled according to a schema. There are also standard Perl runtime APIs for accessing, generating, and transforming the content of documents. Perldoc allows multiple documentation dialects, but requires that they are parsed down to a single internal representation. This information can then be exposed or transformed in a consistent manner, which facilitates the creation of powerful Perldoc tools. The information model (known as the Perldoc Object Model or PDOM) is very nearly a superset the one that Perl 5's L<"perlpod"> implicitly defines. A PDOM representation can be thought of as a tree of I. There are 4 kinds of nodes: =over =item Text Nodes Leaf nodes containing content text. =item Collection Nodes Nodes that contain other nodes. =item External Nodes Leaf nodes that represent something that is not part of the PDOM, but may be resolved by some other process at some other time. This might include I, I, I or I. External nodes are typically handled by PDOM extensions, described later on. =for discussion I am not sure that this distinction between internal and external nodes is useful. I'd like to think of nodes as having various roles, some of which are relevant to the current process and some of which are not. In particular, for some purposes a node should be able to be both code and documentation. =for historical_explanation much of the early design on Perldoc came from a few principles; 1. incorporating information from Code into the Documentation (a la OODoc) should happen through a plug-in for sanity 2. the PDOM tree can have arbitrary weird stuff as inserted by the parser modules, and it just knows how to transform itself to a standard tree (via interfering with events during the processing stage), so that people postprocessing actual documents can just deal with a set of B 3. the structure should be able to be "flattened" in a sane way without losing information, for instance output to XML, both in the unprocessed (where special nodes may exist) and processed (where they may not exist) stages. This must be possible from the unprocessed stage; it started out as "flat" text. This is why, for instance, you see modules referred to as "Parser" modules. They parse only, and all the rest of their actions must wait until the processing stage. =item Ignorable Nodes Nodes corresponding to text in the syntax presentation that has no bearing on the document's intended content, but must be preserved for applications like editors and syntax highlighting. This typically includes non-significant whitespace and throwaway comments. =for discussion Likewise, "ignorable" is just an absence of interest, but Damian's "sufficient advanced magic" demonstrates that one person's ignorable is another person's "please don't ignore". We need to make sure that any chunk of text can be annotated with multiple roles, or we at least need a fast way to declare that a particular kind of chunk fills a particular set of roles. =for historical_explanation These were only ever for putting chunks that were discarded in the input stream, specifically for the purpose of writing syntax highlighting editors, that are only interested in the first parsing phase but not the processing phase. If you wanted something that is "don't ignore", then use a different node type. =back There are two categories of collection nodes: =over =item Block Nodes These are nodes that correspond HTML C
s. They represent things like I, I, and I. =item Phrase Nodes These are nodes that correspond to HTML Cs. They represent things like I or I text, I, and I. =back Each node has a I that indicates what kind of data it holds. The following is a list of standard nodes types that exist in the PDOM model: - Document - External (intervening code or other non-Perldoc text) =for discussion Wait a minute, it isn't important to identify code as "code" to the document? What about if some of the code wants to be included in the document when the clearest way to spec some aspects might be with the code itself. (Plus any time you can get away with that, you prevent the doc from getting out of sync with the code by definition.) It seems like we're falling into a fixed class-based heirarchy in Perldoc just about the time we're escaping it with the Perl 6 type system... =for historical_explanation We wanted to make sure that you needed to jump through some pretty big hoops (ie a plugin) before a parse error can break your documentation. - Block - Heading (a section heading) =for discussion Hmm, maybe the real block wants to be the section, and the heading is its label? A heading seems to be falling more into the

separator mold rather than

container mold. Maybe that's okay, but we should talk about it. (I guess I'm sensitive to this because Perl 5 has statement separator opcodes when the doc model wants statement containers, and that turned out to be painful for p5-to-p5.) I guess there's a sense in which a heading is a block within a block, but you'd sometimes like to treat everything this heading controls as a single object, and the separator model doesn't provide that. =for reference We talked about this once when considering using something neutral for the standard document model, like DocBook. DocBook uses this model - such as Foo ... - Paragraph (regular text to be formatted) - Verbatim (preformatted regular text) =for discussion A weakness of current pod is that there's no "slightly formatted" verbatim. It'd be nice to have a semi-verbatim paragraph with a single hard-to-use-accidentally escape, much like '\qq[...]' in P6. Again, role based behaviors might make this easier to thread through other names or conventions within a particular lexical scope. - Annotation (output as a footnote or other elaboration) =for discussion This seems to be a role specifying disposition of the current text rather late in processing, but this should be orthogonal to other roles. For example, you could have something that's simultaneously an insertion and an annotation. Maybe there are better examples. Anyway, I just get the feeling we're describing several different snapshots of trees at different stages of processing, and maybe they're not all the same type of tree. - Insertion (documentation included from another file) =for discussion There's a fundamental inconsistency here, and we can't have it both ways, at least not easily. Is this document view before or after the processors? Annotation is talking about future processing. Insertion is talking about past processing. Or does "Insertion" mean a hyperlink? The current verbiage, plus the existence of "Link" below, seems to point the former. But if some preprocessor has got ahold of the text before we even start in on it, why do we give a rip? (Not intended to be a rhetorical question.) - Module (the specification of a parsing extension) =for discussion Only parsing? I think this view is too limited under a role-based view. How do we define what new documentation roles can do? - Data (a data section for the surrounding code) =for discussion What if we want to apply some filtering role before the program sees it. =for historical_explanation As explained above, programs interested in standard form output without these things get transformed first by a processing stage. - External (unknown Perldoc block to be handled externally) =for discussion This strikes me as a figure/ground violation. I don't really believe in "miscellanous" except by process of elimination, which implies some process has already happened. =for historical_explanation This is the single node type used by parsers that want to do something special with the document at processing time. - Item (a list item) =for discussion What's the outer list block called? How do we distinguish different list roles? Do lists support lexically scoped declarations? - Phrase - Plaintext (typically set in roman) - Strong (typically set in bold) - Emphasis (typically set in italic) - Code (typically set in fixed-width) - Link (an internal or external cross-reference) - Nonbreaking (preserves and doesn't break whitespace) - Entity (Unicode codepoint or XML entity) - Indexer (index entry, never shown inline) - External (unknown Perldoc phrase to be handled externally) Documents contain one or more blocks, which in turn contain one or more phrases. Blocks may also contain nested blocks and phrases may contain nested phrases. =for discussion While stongly implying it, this doesn't actually come out and say that it is illegal to put a block inside a phrase. Is it? (And should it be? Seems like kind of an artificial distinction. What is fundamental about the scoping of phrasal roles that prohibits blocks? I don't think XML forces a distinction like this, though I believe it allows you to define your tags to force the distinction...) =for discussion This should not be precluded by the PDOM itself, but the standard POD doctype (called PodDT in the S26draft-mugwump.pod - http://search.cpan.org/src/AUTRIJUS/Perl6-Pugs-6.2.11/docs/AES/S26draft-mugwump.pod) should not allow this in conformant documents. =head1 PDOM API PDOM defines both a serial API and a random access API for accessing document content. =head2 PDOM Serial API Perldoc allows for SAX-style streamed parsing and emission of documents. You can register callbacks on the various subclasses of Perldoc nodes, which will be triggered whenever a component of the corresponding type is parsed. Each node class also defines standard C<.begin>, C<.end>, C<.for>, and C<.content> methods that support serial generation of Perldoc. For example: PD::Doc.begin($title); PD::Para.begin; PD::Para.content($text1); PD::Para.end; PD::Item.content($bullet $text2); PD::Item.content($bullet, $text3); PD::Item.content($bullet, $text4); PD::Para.for($text5); PD::Doc.end; =for discussion There seems to be an implicitly curried stream parameter threaded through here. Might be better to make it an explicit object so you can generate two streams of events at once? Then the curried form can be defined in terms of the non-curried form. Define the class and let the user curry it into different modules in different scopes if they like. =head2 PDOM Random Access API This API consists of functions that would likely look similar to an XML/DOM API. =for discussion This seems kinda hand wavey. If we can't use the XML/DOM api directly, we need to be clear on why and/or where it's bogus or suboptimal. =for an_explanation The DOM API contains lots of CamelCase methods, is not very flexible, is awkward to create new nodes, etc. SAX also contains a large number of methods that just seemed like backwards compatibility cruft. The documentation of a program will be available through the global variable C<%*DOC>. It has not yet been determined how all the parts of the PDOM would be accessed through this hash. Perhaps the variable C<$*DOC> could hold a reference to the program's PDOM object. =for discussion What's a "program"? Each compilation unit probably thinks of its own PDOM as a document, and those can't all be named %*DOC because that namespace is shared. At best each module could have bits of %*DOC, and then you have to figure out how to name each module's bits as a key. That's partly why we've reserved the = twigil to indicate the current compilation unit's document, though that's of course a lexical (file) scope rather than a module scope. Maybe you can get at another namespace's pdom via THAT::MODULE<%=foo> if we assume there's some mapping of module names to their associated file scope (though that's slightly problematic for reopened packages). But anyway, if we can delay that leap to after we know which package we're talking about, then if any name mangling has to be done, it can all be automatic, rather than guessing at how the keys of %*DOC are formed from package names. =for discussion If there's a singular top-level object, it should probably be $*DOC anyway. %*DOC.lookatme is going to introspect that hash rather than the document object. =for historical_explanation I think I used $*DOC because I saw $*POD in early design documents. Perhaps it should be $?POD a la $?FILE =head1 SYNTAX CONTAINMENT =for discussion Ouch, the cognitive dissonance. We just shifted from anti-syntactic PDOM to syntactic bikeshedding... Various interpreters (including the Perl interpreter) need to distinguish which characters in a file or stream are actual Perldoc and which are irrelevant (typically because they're Perl code). =for discussion Maybe a use or =use should be able to change the default /^^=/ syntax somehow. Regardless of the code or data in which it's embedded, every section of Perldoc begins with a line whose first character (in column 1) is C<=>, followed immediately by any Unicode identifier sequence, except C<'cut'>. In other words, any line that matches the Perl 6 regex: /^^ = eq 'cut'}>/ indicates the start of a section of Perldoc. =for discussion I'd like to kill =cut as dead as possible, but no deader. =for historical_explanation A lot of this section was based around the notion that you really don't wanted to have to parse the documentation POD to tell where it starts and ends, you see. So that the actual conversion from source to a $?DOC style object can be lazy and not hold up parsing of a source file Each section of Perldoc ends at either: =for discussion Nit: this "either" has three possibilities. You see, subconsciously you're thinking we can get rid of =cut. :-) =over =item * The Perldoc directive matching the directive that opened the section. For example: at C<=end verbatim> if the section started with C<=begin verbatim>. =item * The first blank line, if the opening directive was not a C<=begin>, C<=doc>, C<=pod>, or C<=kwid>. A blank line is a line that is either empty or that contains only whitespace characters. =item * The special directive C<=cut> =back =for discussion For that matter, I don't much like =pod, =doc, or =kwid either... =head2 Explicit documentation regions Perldoc also provides a directive that explicitly indicates the start of an extended piece of documentation: =doc =for discussion Don't see the need for a =doc declarator. Why not unify with =use, and make modules powerful enough to do the general syntactic and role-based switching. For example: =doc pod =doc kwid =doc markdown =doc javadoc If C<=doc> has no dialect qualifier, it is assumed to be the same dialect as the previous C<=doc> region specified (it is an error if the first C<=doc> doesn't specify a dialect). =for discussion I think bare =doc/=cut is just =begin/=end spelled badly. All we need to do is say that =use scopes to the end of the current =begin/=end block, if any, or the rest of the file if not. Normally a section of Perldoc finishes after the closing delimiter of its directive, and the Perl interpreter goes back to parsing code: sub foo { =begin item * The foo() subroutine has no useful purpose. It's provided for backwards compatibility only. Use the do_foo() sub instead. =end item goto &do_foo; } However, when a documentation region that begins with a C<=doc> directive, the documentation continues until the next C<=cut> directive is encountered. That is, if the previous example had been written: sub foo { =doc pod =begin item * The foo() subroutine has no useful purpose. It's provided for backwards compatibility only. Use the do_foo() sub instead. =end item goto &do_foo; } then the last two lines would be treated as Pod, not Perl. =for discussion =begin without =end can do that too. =head2 Code-specific regions Perldoc defines two special directives that relate to the non-Perldoc components of a file: C<=DATA> and C<=END>. =for discussion Why do they have to be special? Data is not used often enough to warrant short huffman coding. And what if you want something to be simultaneously data and a Perldoc component? To me, data is just a Perldoc component with a name or role that only the program is (usually) interested in. The C directive creates a Perl DATA block within your document. Any Perl code in the document can access this data via the special C<%=DATA> variable. DATA blocks can be labelled (unlabelled DATA blocks have an empty string as their implicit label): =begin DATA Fibonacci 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169 =end DATA my $fib_seq = %DATA{Fibonacci}; DATA blocks are included in the internal representation of a Perldoc document. They are treated as a specific type of L. The C<=END> directive indicates the termination of the non-Perldoc component of a file. Everything after that line until the end of the file is parsed as Perldoc. In a sense, C<=END> is like C<=doc>, except that the Perldoc region it introduces is not terminated by a C<=cut> (or anything else except end-of-file). =for discussion more like: =begin END suppressing the warning of missing =end END =head2 Standard Perldoc dialects =begin question-for-damian Why do we need this? If: =doc means: =doc kwid after a previous `=doc kwid`, then we can just do: =doc - Something Some interesting point. =cut and we don't need to /bless/ Pod and Kwid as standard. =end question-for-damian =begin answer-from-larry In my mental model that's just =use kwid =begin - Something Some interesting point. =end =end answer-from-larry The two standard Perldoc dialects (L and L) have abbreviated markers: =pod =kwid These are exactly equivalent in effect to: =doc pod =doc kwid including the requirement for a C<=cut> to terminate their documentation region. However, any text following a C<=pod> or C<=kwid> marker on the same line is considered to be the first line of the actual content. This allows Kwid to do something like this: =kwid - Something Some interesting point. =cut which is semantically equivalent to: =doc kwid - Something Some interesting point. =cut Note that the C<=doc>, C<=cut>, C<=pod>, and C<=kwid> lines are not considered part of the Perldoc information (they are not seen by the API or included in an internal representation). They are simply containment markers. =for discussion This seems like a relatively useless feature to me =head2 File Containment The above descriptions explain how how to distinguish the Perl from the Doc, which assumes they are intertwingled in a Perl source code file. Documentation can also live in a file by itself. Perldoc considers files ending with C<.pod> to be documentation in the Pod dialect and files ending with C<.kwid> to be in the Kwid dialect, etc. A perldoc parser can look to the file extension for a dialect hint, if no other clue is provided. =for discussion #!/usr/bin/pod #!/usr/bin/kwid vs =use pod =use kwid as "first thing in file", or even P6-ish file-scoped declarator: =use pod; =use kwid; This implies that the lines like: =doc =pod =kwid =cut are not necessary in pure Perldoc files. In fact, in a Kwid file, they would just be plain text. =for discussion Grumble, I don't think any of those are necessary in Pod files either... =head1 SYNTAX DIALECTS In the spirit of TMTOWTDI, Perldoc allows an author to choose a documentation syntax that best suits them, without needing to worry whether downstream processes and tools will be able to use it properly. These variations of syntax are referred to as I. "Pod" is the Perldoc dialect that has evolved from Perl 5's "Plain Ol' Documentation" markup. "Kwid" is a Perldoc dialect that has evolved from wiki markup syntaxes. Other dialects may be created or adapted by people who are fond of neither Pod nor Kwid. An XML dialect would be trivial to define since the PDOM can be thought of as being an XML schema. Likewise an HTML dialect might be useful as a formal syntax for creating Pod from HTML. A WYSIWYG Perldoc editor could also be thought of as just another dialect. =head1 The Pod Dialect I is an evolution of Perl 5's Plain Ol' Documentation markup. Compared to Perl 5 POD, Perldoc's Pod dialect is much more uniform, somewhat more compact, and slightly more expressive. =head2 General syntactic structure Pod blocks are specified using I. Every Pod block directive may be written in any of three equivalent forms: I, I, and I. =head3 Delimited blocks Delimited block directives are delimited by C<=begin> and C<=end> markers, both of which are followed by the typename of the directive. Typenames that are entirely upper-case or entirely lower-case are reserved for Perldoc. External formatters (including users!) should use only mixed-case typenames. =for discussion oops The C<=begin> marker may also be followed by a multi-word label for the block (which is used in different ways by different types of blocks). The general syntax is: =begin BLOCK_TYPE OPTIONAL LABEL TEXT BLOCK CONTENTS =end BLOCK_TYPE For example: =begin Table Table of Contents Constants 1 Variables 10 Subroutines 33 Everything else 57 =end Table =begin item Name The applicant's full name =end item =begin item Contact The applicant's contact details =end item Note that everything between the C<=begin> line and the C<=end> line is considered to be the contents of the block. No blanks lines are required around the directives, and blank lines within the contents are treated as part of the contents. =for discussion I think we need to reserve some kind of modifier/adverbial syntax: =begin Table :foo :title =head3 Paragraph blocks Paragraph block directives are introduced by a C<=for> marker and terminated by the next Pod directive or the first blank line (which is I considered part of the block data). The C<=for> marker is followed by the name of the directive and an optional label. The general syntax is: =for BLOCK_TYPE OPTIONAL LABEL TEXT BLOCK DATA For example: =for Table Table of Contents Constants 1 Variables 10 Subroutines 33 Everything else 57 =for item Name The applicant's full name =for item Contact The applicant's contact details =begin note-to-damian I'm nervous about the ambiguity of =for. =for item Blah Blah Blah Blah Blah Blah Blah Blah Blah Blah Blah Blah Where does the item element end and the description begin. Might be wrap-fragile if you base it on newline. For example: =head1 Surely this extra long wrapped line is really just one long header line and: This next paragraph doesn't start until here. =end note-to-damian Once again, blank lines are not required around the directive (this is a universal feature of Pod). =for discussion Unified handling of adverbials might help with this. =head3 Unlabelled blocks Unlabelled block directives are introduced by an C<'='> sign followed immediately by the typename of the directive. The rest of the line is treated as block data, rather than a block label. The content terminates at the next Pod directive or the first blank line (which is not part of the block data). The general syntax is: =BLOCK_TYPE BLOCK DATA MORE BLOCK DATA For example: =table Constants 1 Variables 10 Subroutines 33 Everything else 57 =item Name: The applicant's full name =item Contact: The applicant's contact details Note that C<=begin>, C<=end>, C<=for>, and C<=cut> markers are always considered to be intrinsic keywords, not unlabelled block markers. Hence, you cannot specify unlabelled blocks of a type named C, C, or C. Instead you have to write: =begin cut This is a C type block (whatever I is!) =end cut =for discussion So basically, =foo bar is simply sugar for =for foo bar The C<=pod> and C<=doc> markers are not labelled forms either (because they are not terminated at the next blank line, but rather by a C<=cut> directive). Nevertheless they I each have fully delimited forms as well: =begin pod ... =end pod =begin doc pod ... =end doc These forms may be nested (see L). =head3 Block equivalence The three equivalent block specifications (delimited, paragraph, and unlabelled) are treated identically by the documentation model, so you can use whichever form is most convenient for a particular documentation task. In the descriptions that follow, the unlabelled form will generally be used, but should be read as standing for all three forms equally. For example, although L<"Headings"> shows only: =head1 TOP LEVEL HEADING this automatically implies that you could also write that directive as: =for head1 TOP LEVEL HEADING or: =begin head1 TOP LEVEL HEADING =end head1 if you prefer. =head2 Encoding =begin question-for-damian Do we really want to go there? I would like say that all Perldoc documents are encoded in utf8 (or utf16 with a BOM). And further, that you cannot switch mid-document, yagni concatenation arguments to the contrary. As a meta point, I would like to take the chance to simplify things where possible, with a few good hardcoded rules. Rules are easier to relax than stiffen later on. This also makes tools easy to implement. If the YAML project taught me one hard lesson, this one is crucial. =end question-for-damian You can specify the encoding used in a particular document (or portion thereof) using the C directive: =encoding ShiftJIS =encoding ISO-8859-5 =encoding EBCDIC The specified encoding is used from the start of the next line in the document. If a second C directive is encountered, the current encoding changes again after that point. Note, however, that the second encoding directive must itself be encoded using the first encoding scheme. =for discussion I'm inclined to agree with ingy. =head2 Blocks Pod provides notations for specifying all the standard block types in the PDOM... =head3 Headings Pod provides an unlimited number of levels of heading: =head1 A TOP LEVEL HEADING =head2 A Second Level Heading =head3 A third level heading =head86 A "Missed it by I much!" heading Pod formatters are only required to provide distinct renderings for the first four levels of heading. Headings at levels without distinct renderings are typically rendered like the lowest distinctly rendered level. =head4 Numbered headings =for discussion This is ugly, and fails to factor policy. Need something like: =use head1 N. =use head2 N. =use head3 (N) To create hierarchical numbered headings, use the C<< NZ<><> >> formatting code (see L). For example: =head1 N<>. The Problem =head1 N<>. The Solution =head2 N<>. Analysis =head3 (N<>) Overview =head3 (N<>) Details =head2 N<>. Design =head1 N<>. The Implementation produces: =over B<1. The Problem> B<2. The Solution> =over B<2.1. Analysis> =over B<(2.1.1) Overview> B<(2.1.2) Details> =back B<2.2. Design> =back B<3. The Implementation> =back =head3 Ordinary paragraph blocks Ordinary paragraph blocks consist of text that is to be formatted into your document at the currently level of nesting, with whitespace squeezed, lines filled, and any special inline mark-up (see L) applied. Ordinary paragraphs consist of one or more lines of text, each of which starts with a non-whitespace character at column 1. The paragraph is terminated by the first blank line. For example: This is an ordinary paragraph. Its text will be squeezed and short lines filled. This is another ordinary paragraph. Its text will also be squeezed and short lines filled. Ordinary paragraphs do not require an explicit marker or delimiters, but there I an explicit C marker available if you wish to use it: =para This is an ordinary paragraph. Its text will be squeezed and short lines filled. and likewise the longer C<=for> and C<=begin>/C<=end> forms. For example: =begin para This is an ordinary paragraph. Its text will be squeezed and short lines filled. =end para Note that, when any form of explicit C directive is used, the text no longer has to begin at column 1 because leading whitespace is automatically removed. =for discussion =use para :indent(4n) =head3 Verbatim blocks Verbatim blocks are used to specify pre-formatted text, which should be rendered without rejustification, without squeezing, and without applying any inline formatting codes. Typically these blocks are used to show examples of code, data, or I/O, and are set using a fixed-width font. A verbatim block is specified as one or more lines of text, each of which starts with a whitespace character. The block is terminated by a blank line. For example: This I paragraph introduces a following B block: hi la re ly ow e T s l p a x t a h n r wi pe ac ss he There is also an explicit C directive, which allows verbatim text to start at the first column and to contain blank lines: The C subroutine adds feedback: =begin verbatim sub loud_update ($who, $status) { say "$who -> $status."; silent_update($who, $what); } =end verbatim =for discussion =use verbatim :escape{'<<<','>>>'} =head3 Lists Lists in Pod are specified as a series of C directives. No special list directives or other delimiters are required to enclose the entire list. For example: The seven suspects are: =item * Happy =item * Dopey =item * Sleepy =item * Bashful =item * Sneezy =item * Grumpy =item * Doc Lists may be nested, using the C<=item1>, C<=item2>, C<=item3>, etc. directives. Note that C<=item> is just an abbreviation for C<=item1>: =item1 * Animal =item2 - Vertebrate =item2 - Invertebrate =item1 * Mineral =item2 - Solid =item2 - Liquid =item2 - Gas which produces: =over =over =item * Animal =over =item - Vertebrate =item - Invertebrate =back =item * Mineral =over =item - Solid =item - Liquid =item - Gas =back =back =back It is I an error for nested C<=item2>, C<=item3>, etc. directives to appear without a preceding higher level C directive. Any missing "outer" directives are implied, and treated as being empty. This is useful to create lists that are indented with respect to the current text: There are four commonly encountered forms of matter: =item2 * Solids =item2 * Liquids =item2 * Gases =item2 * Plasmas which would be rendered as: =over There are four commonly encountered forms of matter: =over =over =item * Solids =item * Liquids =item * Gases =item * Plasmas =back =back =back This also provides a convenient way of creating block quotes: We will now consider Shakespeare's most famous soliloquy: =begin item2 To be, or not to be--that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune... =end item2 =head4 Multi-paragraph list items Use the delimited form of the C directive to specify items that contain multiple paragraphs. For example: Let's consider some common proverbs: =begin item * The rain in Spain falls mainly on the plain. This is a common myth and an unconscionable slur on the Spanish people, the majority of whom are extremely attractive. =end item =begin item * The early bird gets the worm. In deciding whether to become an early riser, it is worth considering whether you would actually enjoy annelids for breakfast. =end item As you can see, folk wisdom is often of dubious value. which produces: =over Let's consider some common proverbs: =over =item * The rain in Spain falls mainly on the plain. This is a common myth and an unconscionable slur on the Spanish people, the majority of whom are extremely attractive. =item * The early bird gets the worm. In deciding whether to become an early riser, it is worth considering whether you would actually enjoy annelids for breakfast. =back As you can see, folk wisdom is often of dubious value. =back =head4 Ordered lists =for discussion Also ugly. Should be unified with =head numeration. A leading =use item N. could imply =over and =back much like =item Alternately we might be talking about some kind of macro definition of headings and list items. Ordered lists can be created using the C<< NZ<><> >> formatting code (see L): =item N<>. Visito =item2 N<>. Veni =item2 N<>. Vidi =item2 N<>. Vici to produce: =over =over 1. Visito 1.1. Veni 1.2. Vidi 1.3. Vici =back =back =head4 Definition lists To create term/definition lists, specify the term as the label of the item and the definition as its contents. =for item MAD, I Affected with a high degree of intellectual independence. =for item MEEKNESS, I Uncommon patience in planning a revenge that is worth while. =for item MORAL, I Conforming to a local and mutable standard of right. Having the quality of general expediency. =head4 Unordered lists To create unordered lists, use any non-alphanumeric character(s) as the label of an item (or first word of an unlabelled item): =item * Reading =for item * Writing =begin item * 'Rithmatic =end item =for discussion =over =use item * =item ... =for item ... =begin item ... =end item =back =head3 Inserted Perldoc You can incorporate sections of documentation drawn from other files using the C<=insert> directive. This directive takes one or more URLs or shellish file globs as its block data, parses the resulting file(s) for Perldoc, and then adds the resulting internal representation(s) of the file contents to the current internal Perldoc representation that's being built. C<=insert> directives are handy for breaking out standard components of your documentation set into reusable "modules": =head1 COPYRIGHT =insert file:/shared/docs/std_copyright.pod =head1 DISCLAIMER =insert file:/shared/docs/std_disclaimer.pod or for incorporating documentation from helper modules: =head1 EXTENSIONS The following extensions are currently available: =begin insert glob:lib/perl6/MyModule/Plugins/*.pm glob:/usr/local/lib/perl6/MyModule/Plugins/*.pm =end insert =for discussion Possibly these should just be "eager" links, and unified syntactically. It bothers me that this mechanism seems to preclude inclusion of anything below the paragraph level. =head3 External blocks Directives whose names are not recognized as Pod built-ins are assumed to be destined for external formatters or parser plug-ins. For example: =begin Table The Other Guys Superhero | Secret Identity | Superpower ---------------|-----------------|------------------------------ The Shoveller | Eddie Stevens | King Arthur's singing shovel Blue Raja | Geoffrey Smith | Master of cutlery Mr Furious | Roy Orson | Ticking time bomb of fury The Bowler | Carol Pinnsler | Haunted bowling ball =end Table =for Xhtml =Image http://www.perlfoundation.org/images/perl_logo_32x104.png External blocks are converted by the Perldoc parser to Perldoc::Block::External objects. The resulting object's C<.typename> method retrieves the name of the block type: C<'table'>, C<'XHTML'>, C<'image'>, etc. The object's C<.contents> method retrieves the block's (unformatted) data. By default, Perldoc formatters ignore external blocks that they do not recognize. =for discussion Or rather, they don't ignore them--they never see them, because those chunks didn't respond when the PDOM was asking for the string of chunks responding to the particular role. There is no "External" role, just lack of an appropriate internal role, where the list of "internal" roles is extended dynamically by =use, not precanned. =head3 Modular blocks Although external blocks are normally ignored, Perldoc provides a mechanism whereby you can specify how particular external blocks are handled: the C<=use> directive. Specifying a C<=use> causes a Perldoc processor to load the corresponding plug-in module at that point. Plug-ins can use L to change the way subsequent Perldoc is parsed (even to the extent of installing new parsing rules or a new grammar), or they can simply provide handlers for specific types of L. For example: =comment Install the Table plugin to render the following table... =use Table =begin Table The Other Guys Superhero | Secret Identity | Superpower ---------------|-----------------|------------------------------ The Shoveller | Eddie Stevens | King Arthur's singing shovel Blue Raja | Geoffrey Smith | Master of cutlery Mr Furious | Roy Orson | Ticking time bomb of fury The Bowler | Carol Pinnsler | Haunted bowling ball =end Table The C<=use> statement causes the Perldoc processor immediately to look for a module named C and to load it. More genarally, the first word of any C<=use> block is appended to a standard plugin prefix (C) and passed as the first argument of a Perl 6 C. If the processor is unable to load the requested module, it should (but is not required to) issue a warning. However, in all cases it must continue processing the remaining Perldoc. The processor may also elect to change the way it handles subsequent external blocks (for example, if a C<=use Table> fails, it may choose to convert subsequent C<=Table> blocks to C<=verbatim> blocks so that at least some form of the information is presented). You can use fully and partially specified module names (as in a Perl 6 C statement): =use Table::Html-1.2.1-(Any) and even pass arguments: =use Image :Jpeg prefix=>'http://dev.perl.org' This last example would result in a C statement something like: require $STD_MODULE_PREFIX ~ 'Image', eval q{ :Jpeg prefix=>'http://dev.perl.org' } or warn "Couldn't load Perldoc module 'Image' at $LOCATION\n"; =for discussion There needs to be a way to declare a MyTable that does Table|Foo|Bar|Baz. (Or we need at least need to allow | notation for the table itself). =head3 Comments Comments are Pod blocks that are completely ignored by any formatter. They are included in any internal representation of Pod, and accessible via the Perldoc APIs, but should never be rendered in any way. Comments are useful for meta-documentation (documenting the documentation): =comment Add more here about the algorithm and for temporarily removing parts of a document: =item N<>. Retreat to remote Himalayan monastery =item N<>. Learn the ancient mysteries of space and time =item N<>. Achieve enlightenment =begin comment =item N<>. Prophet! =end comment Note that, since the Perl interpreter ignores all Perldoc, C blocks can also be used as (nestable!) block comments in Perl 6: # This is a Perl 5 style # code comment # spanning multiple lines =begin comment This is a Perl 6 style delimited code comment spanning multiple lines =end comment Note that, unlike Perl 5, no C<=cut> marker is required after a block comment in code (or after any other Pod block directive for that matter) unless the comment is also inside an explicit C<=doc> or C<=pod> region. =for discussion Preceding paragraph can just die. =begin #idea Comments should have more than one identifier. I think I'd prefer a rule that standard roles never match /^_/ or some such. Then you can comment out any =begin/=end block merely by putting _ on its name. Alternately, you mark the beginning with some special character: =begin #foo ... =end which degenerates to =begin # ... =end =end =head2 Formatting codes Formatting codes provide a way to add inline mark-up to a piece of text within the contents of a (non-verbatim) block. All Pod formatting codes consist of a capital letter, followed immediately by a set of angle brackets, which contain the text or data to which the formatting code is to be applied. If the contents of the angle brackets includes an unbalanced angle bracket, you can use either "French brackets" or multiple angle brackets as the delimiters. For example: The Perl 5 heredoc syntax was: C« <>> =head3 Typesetting specifiers =for discussion I had to read that BZ<><> idiom several times before I understood it. I think a V«B<>» would read much better. V<< B<> >> would be also be nice if it stripped the spaces. Anyway, I really hate all the empty X<> variants... =for discussion Y'know, what we really need often is a C<> variant that does verbatim, since it's really code most of the time. If C<> did verbatim by default we could say C<< B<> >>. Maybe relegate current C<> behavior to T<> or some such. The C<< BZ<><> >> formatting code specifies that the contained text is to be set in a 'strong' style (typically B). The C<< IZ<><> >> formatting code specifies that the contained text is to be set in an 'emphatic' style (typically I). The C<< CZ<><> >> formatting code specifies that the contained text is to be set in a 'code' style (typically C). These three codes may be arbitrarily nested and formatters should endeavour to convey that nesting accurately. For example, something like: =for discussion I always wonder what the difference is between strong and emphatic, *other* than that one is typically rendered bold, and the other italic. These seem to be metaphors in search of a real meaning... (I always think of bold as "emphatic", and italic as just "strange".) Perhaps their only real meaning is to serve as shibboleths to the markup community. :) =over C<< IZ<>, she thought, IZ<> mystery BZ<> solved at last!> >> =back should produce: =over I, she thought, I Marie Celeste I solved at last!> =back with the nested italics switching back to roman in the traditional manner. =head3 Links All kinds of links, filenames, and cross-references (both internal and external) are specified with the C<< LZ<><> >> code. The link specification consists of a I terminated by a colon, followed by an I (in the scheme's preferred syntax), followed by an I beginning with a C<#>. All three components are optional. Standard schemes include: =over =item C and C A standard URL. For example: This module needs the LAME library (available from L) =item C A filename on the local system. =item C A link to the system man pages. For example: This module implements the standard Unix L facilities. =item C A link to some other Perldoc documentation. For example: You may wish to use L to view the results. See also: L. =back If the scheme specifier is omitted, it is assumed to be C. To refer to a specific section within a webpage, manpage, or Perldoc document, add the name of that section after the main link, separated by a C<#>. For example: Also see L and L To refer to a section of the current document, omit both the scheme and the external address: This mechanism is described under L<#Special Features> below. Normally a link is presented as some rendered version of the link specification itself. However, you can specify an alternate presentation by prefixing the link with the required text and a vertical bar. For example: This module needs the L. You could also write the code L =head3 Hierarchical ordinals The C<< NZ<><> >> formatting code is converted into the ordinal number of the innermost surrounding block (i.e. its ordinal position relative to the most recent higher-level block construct of the same type). For example: =head1 N<>. The Beginning =head2 N<>. The Very Beginning =item1 (N<>) The void =item2 [N<>] Formlessness =item2 [N<>] Contentlessness =item1 (N<>) The explosion =item1 (N<>) The expansion =head1 N<>. The Middle Bit would be rendered as: =over B<1. The Beginning> =over B<1.1. The Very Beginning> =over =item (1) The void =over =item [1.1] Formlessness =item [1.2] Contentlessness =back =item (2) The explosion =item (3) The expansion =back =back B<2. The Middle Bit> =back Every type of block maintains a separate ordinal counter. C<< NZ<><> >> codes appearing in blocks that possess hierarchical relationships (such as C and C blocks) produce multi-part ordinals, in which each component is the current ordinal value for the next level of "outer" structure. By default each component is separated by a period, but this may be changed by specifying a format for the code within its angle brackets. For example: =item1 N. By-laws =item2 N. Statutory by-laws =item3 N Governance =item3 N Elections =item3 N Meetings =item2 N. Executive by-laws =item2 N. Elective by-laws =item1 N. Statutes =for discussion Urque. Much prefer: =use item1 N. =use item2 Na. =use item3 Na(R). =item1 By-laws =item2 Statutory by-laws =item3 Governance =item3 Elections =item3 Meetings =item2 Executive by-laws =item2 Elective by-laws =item1 Statutes or if you don't like to use use for that: =pre item1 N. =pre item2 Na. =pre item3 Na(R). which could be generalized to prefix any named =foo. (Can still have N<> if we need it, but it sure is ugly. Did I mention it's ugly?) would be rendered as: =over 1. By-laws =over 1a. Statutory by-laws =over 1aZ<>(I) Governance 1aZ<>(II) Elections 1aZ<>(III) Meetings =back 1b. Executive by-laws 1c. Elective by-laws =back 2. Statutes =back Within the C<< NZ<><> >> formatting code: =over =item * Each C is replaced by the next ordinal component as a decimal number =item * Each C is replaced by the next ordinal component as an uppercase alphabetic letter (C, C, C, etc.) =item * Each C is replaced by the next ordinal component as a lowercase alphabetic letter (C, C, C, etc.) =item * Each C is replaced by the next ordinal component as an uppercase Roman numeral (C, C, C, etc.) =item * Each C is replaced by the next ordinal component as a lowercase Roman numeral (C, C, C, etc.) =item * Any non-alphabetic character is reproduced verbatim =back If there are fewer specifiers than levels of ordinals, any extra ordinals revert to period-separated decimals. =head3 Non-breaking text Any text enclosed in a C<< SZ<><> >> code is formatted normally, except that every whitespace character in it is treated as a non-breaking. For example: Instead, you should consider using a Perl 6 S $datum {...}>> loop. would be formatted like so: =over =over =item Instead, you should consider using a Perl 6 =item C<<< S<< for @data -> $datum {...} >> >>> loop. =back =back rather than: =over =over =item Instead, you should consider using a Perl 6 C<<< for @data >>> =item C<<< -> $datum {...} >>> loop. =back =back Note that excess whitespace in an C<< SZ<><> >> code is still squeezed. Use a C<< VZ<><> >> code (see L) to preserve it. =head3 Entities To include named Unicode or XML entities, use the C<< EZ<><> >> code. If the contents are not a number, they are interpreted as a Unicode character name, or (failing that) as an XML entity. For example: Perl 6 makes considerable use of E and E. or, equivalently: Perl 6 makes considerable use of E and E. If the contents of the C<< EZ<><> >> are a number, that number is treated as the decimal Unicode value for the desired codepoint. For example: Perl 6 makes considerable use of E<171> and E<187>. You can also use explicit binary, octal, decimal, or hexadecimal numbers: Perl 6 makes considerable use of E<0b10101011> and E<0b10111011>. Perl 6 makes considerable use of E<0o253> and E<0o273>. Perl 6 makes considerable use of E<0d171> and E<0d187>. Perl 6 makes considerable use of E<0xAB> and E<0xBB>. Multiple entities can be specified in a single C<< EZ<><> >> code, separated by commas: Perl 6 makes considerable use of E. =head3 Indexing terms Anything enclosed in a C<< XZ<><> >> code is a index entry, which is typically ignored by most Perldoc formatters, unless they are building an index for a document. A C<< XZ<><> >> and its contents are never actually rendered in a document. =for discussion Sounds like a role to me...what if you also want to autoindex the headings or items? We need to define the roles that each of these letters fills and allow users to add to or delete from them, much like P6's user-defined quotes. =head3 Annotations Anything enclosed in an C<< AZ<><> >> code is an inline annotation. For example: Use a C loop instead.A loop is far more powerful than its Perl 5 predecessor.> Different formatters may render such annotations in a variety of ways: as footnotes, as endnotes, as sidebars, as pop-ups, as expandable tags, etc. They are never, however, rendered as unmarked in-line text. So the previous example might be rendered as: =over Use a C loop instead.* =back and later: =over B =over * The Perl 6 C loop is far more powerful than its Perl 5 predecessor. =back =back =for discussion What if you want to annotate multiple paragraphs? =head3 External formatting codes Perldoc extensions and plug-ins can create their own formatting codes, using the C<< MZ<><> >> code. An C<< MZ<><> >> code must start with a colon-terminated scheme specifier. The rest of the enclosed text is treated as the contents of the formatting code. For example: =heading1 Overview of the M class External formatting codes are internally represented by a Perldoc::Phrase::External object, whose C<.typename> method returns the scheme specifier minus its terminating colon (e.g. C<'Metadata'>), and whose C<.contents> method returns the remainder of the raw enclosed text (e.g. S<< C<' $?CLASS.name '> >>). =head3 Verbatim text The C<< VZ<><> >> formatting code treats everything inside it as being verbatim. Specifically, it treats embedded formatting codes as literal text and does not squeeze any whitespace. For example: The B>> formatting code disarms other codes like C, B<> and C<>>>. The hash entry C>> contains the applicant's full name. Note, however that C<< VZ<><> >> code only changes the way its contents are parsed, I the way they are rendered. That is, the contents are still wrapped and formatted like plain text, and the effects of any formatting codes surrounding the C<< VZ<><> >> code are still applied to its contents. For example the previous example is rendered: =over The B<<< VZ<>E> >>> formatting code disarms other codes like C<< IZ<><>, BZ<><> and CZ<><> >>. The hash entry C<< %NAMEZ<> >> contains the applicant's full name. =back =head1 The Kwid Dialect Kwid is a completely new syntax based on experience from modern internet social communication. =for Ingy [Insert summary here. Maybe copy the above Pod summary and adapt?] =head1 COMPARISON OF POD AND KWID Here is a side-by-side comparison of some of the major features of Pod and Kwid: =head1 Big Thing = Big Thing =head4 Small Thing ==== Small Thing A paragraph of A paragraph of plain text. plain text. # verbatim # verbatim sub v { sub v { shift; shift; } } =item * foo * foo =item * bar * bar =item2 N<> barber ++ barber =item2 N<> bard ++ bard Something B! Something *strong*! Something I! Something /emphatic/! Some code C! Some code `E = M * C ^ 2`! Some V> markup Some \*escaped\* markup =begin Section_type .Section_type =end Section_type ..Section_type =for Section_type .:Section_type =head1 EMBEDDING DIALECTS In addition to providing syntactical constructs for all the nodes of the PDOM, a Perldoc dialect must provide a mechanism for switching the parser to another dialect, and back again. To embed another dialect in Pod, just use the delimited form of the C<=doc> directive: =pod =head2 Here is a list =begin doc kwid * one * two * three =end doc That was a B! =cut To embed other dialects in Kwid, do the same thing (in Kwid syntax, of course): =kwid == Here is a list .pod =item * one =item * two =item * three ..pod That was a *list*! =cut =head1 PDOM EXTENSIONS All Perldoc dialects and tools are required to support all of the core constructs defined in the PDOM schema. It is assumed that data in any dialect should be able to round trip semantically when converted to any other dialect and back. It is also intended that there will be extension libraries (a.k.a. "plug- ins") to add syntax parsing, schema definition, and formatting/conversion capabilities for various constructs that fall outside of the core PDOM. Tables are a prime example. While too unwieldy to impose on every tool, tables are useful in many documentation applications. So there will be an extension that handle tables. If tools are not available at a particular stage of processing an extension construct, that construct will be reported as an external object by the PDOM. It is entirely possible that a further stage of processing will be able to move the external object into some representation dictated by the extension's schema. Pod and Kwid define marker syntax for I level extensions. =begin foo .foo ... ... =end foo ..foo Both dialects also define syntax for I level extensions. This Pod was written: M This Kwid was written: {date: 2005-04-09}