Usually, there's one converter for every target format in this model. This is no strict rule - you may want to implement features which are not provided by an already existing \C converter, so feel free to implement your own version. Nevertheless, to avoid confusion, it may be worth a try to cooperate with the original author before you start. Maybe you can join his team. In this document, it is assumed that you are familiar with the \PP language and its element, which is explained in ... =The architecture of a converter All \PP converters are basically built the same way. \LOCALTOC{type=linked} ==The base design To relieve converter authors, the CPAN distribution \C<\PP::Package> provides a framework to write converters. The simple idea is that because all converters have to process \PP sources (and should do this the same way), there's no need to implement this parsing again and again. So the framework provides a \I which reads the sources and generates data which contain the source contents. Please have a look at the following image. \IMAGE{src="pp-src-stream-e.png"} The parser reads \PP sources and checks them for integrity. Valid sources are translated into intermediate data which is called a "stream", so all converters will be fed with correct input. The parser is provided by the framework class \B<\PP::Parser>. It implements the \PP base language definition to recognize paragraphs, macros, variables, tags, and so on. Once we have the intermediate data, there's another job all converters need to perform the same way: these data need to be processed as well. It's seems to be a good idea to encapsulate this processing by another general interface. This relieves converter authors even more, freeing them from the need of dealing with the details of the stream implementation. (Which may occasionally change.) So there's another framework class called \B<\PP::Backend>. Its objects can walk through the stream, calling user defined functions to provide its elements. And these callbacks are the place where the target format is produced. \IMAGE{src="pp-stream-result-e.png"} With this framework, a converter author can focus just on target format generation. That's the part naturally most interesting to him. ==The language implementation Let's go back to the parser. It was said that it implements the "\PP base language definition to recognize paragraphs, macros, variables, tags, and so on". Does this mean that the complete language is implemented there? No, that's not the case. Instead of this, there is an important part left to the converter author to make the design as flexible as possible. This point is the definition of tags. Tags are very converter specific. They usually reflect a feature of the target format or a special feature the converter author wants to provide. Hyperlinks, for example, are essential if converting to HTML. They can be used in PDF as well. But if you are writing a converter to \I, they might be useless. Or: one author wants to provide footnotes, while another one does not. To implement all wished tags in the parser would make the converter framework very inflexible and hard to maintain. Such an approach could end up with a huge, difficult to maintain or even unusable tag library. All tag implementation details of all converters would need to be well coordinated. So that's no real alternative. Instead of this, \PP (or its \I definition) only defines the \I of a tag (and reserves a small set of tags implementing base features like tables or image integration). \I So you are free to define all the tags you want, and you can modify this set without changes to the framework. A definition currently includes the tag name, option and body declarations, and controls how the parser handles tag occurences. It's even possible to hook into the parsers tag processing in various ways. Tags are defined by \I, for a simple reason: I looked for a way to make their usage as easy as possible. And what could be easier than to write something like use \PP::Tags::\RED; \? Hardly nothing. But there are even more advantages. Defining tags by modules provides a simple way to \I definitions, to publish them in a central tag repository (CPAN) and to use them in various converters. \PP even offers a way to say the parser "We do not implement the tags of target language SuperDooper, but please treat them as tags anywhere (we will ignore them in the backend running subsequently)." - which makes it easy to process one and the same source by numerous converters defining completely different tag sets. ==The whole picture So there are two main tasks to perform when writing a converter: define the tags you want to use and write backend callbacks which generate one or more documents in the target format. These pieces are then put together by an application that loads tag definitions, runs the parser and calls the backend (which invokes your callbacks). The following chapters will describe this work in detail. =Tag definition It's up to you to define your tag meanings. Tags are usually used to mark up text. This may be a logical markup (index entry, code sequence, ...) or a formatting one (bold, italics, ...), for example. \RED<\\B> marks "something" to be formatted bold. The pp2html tag \RED<\\X> declares index entries like \RED<\\X>. Note that the common (and recommended) way of markup is to expect the marked text part in the tags body. However, it is also possible to declare begin and end tags which enclose the marked parts, like the builtin \C<\\TABLE> and \C<\\END_TABLE> do. This allows to enclose even empty lines (and therefore several paragraphs). \\TABLE Column | Column contents | contents \\END_TABLE Note that a tag does note necessarily need to have a body part. \C<\\END_TABLE>, for example, has not. Depending on the tag meaning (or "semantics"), a tag may need options. These are parameters passed to the tag, specifying how it shall be evaluated. Tag options can be optional or mandatory. The \\IMAGE tag uses options to specify what file should be loaded, as in \\IMAGE{src="image.png"} As a general rule, tag options control tag processing, while the tag body contains parts of the document. Keep in mind that your tags might be processed by \I converters as well which do not handle them. In such a case, only the tag body will remain a visible part of the source. The same is true vice versa: Theoretically, the image tag could use the tags \I as well to declare the image file: \\IMAGE\RED<> But if a converter ignores \\IMAGE, this would result in the \I "image.png" which will usually make no sense to a reader. So, when you design your tags, make sure that nothing of them remains visible in the result in case they will be ignored. ==Finding tag names New tag names can be freely chosen, with two exceptions: first, certain tag names are already used (and therefore reserved) by the base system: @| tag | description \BC<\\B>, \BC<\\C>, \BC<\\HIDE>, \BC<\\I>, \BC<\\IMAGE>, \BC<\\READY>, \BC<\\REF>, \BC<\\SEQ> | Base tags defined by \BC<\PP::Tags::Basic>. By convention, \I converters support these tags. (The list might be incomplete, please look at the latest version of the module.) \BC<\\TABLE>, \BC<\\END_TABLE> | construct tables \BC<\\EMBED>, \BC<\\END_EMBED> | embed other languages into a \PP source, e.g. to directly include parts in the target format, or to call Perl code which produces \PP on the fly \BC<\\INCLUDE> | loads additional files which are made part of the source (in various ways) Second, please have a look at existing converters and \I tags. It might confuse users if one and the same tag name has completely different meanings in different converters. So if your prefered name is already used, please invent another one. On the other hand, it may be the intention to support "foreign" tags as well, in a way that fits into your target format. In this case, the "foreign" names (and their syntax) \I to be used, of course. All tag names are made of uppercased letters. Underscores and digits are allowed as well. The parser does not recognize a tag if its name does not match these rules. ==Tag option conventions You are free to invent whatever option names you prefer. Well, almost. There are a few simple conventions: * Options \I (the documented ones ;-) should not begin or end with an underscore. * \I evaluated by the parser begin and end with \I underscore. They are made known to the user, and this convention distinguishs them from the tags own options. \C<\\REF>'s option \C<_cnd_> is an example. * Informations intended to be used \I (to pass informations between various tag hooks or to the backend) begin and end with \I underscores. That's all to take care of here. ==Writing a tag module Now when your \GREEN are designed, you need to define them \I in the \BC namespace and make it a subclass of \BC: \GREEN<# declare a tag declaration package> package PerlPoint::Tags::New; \GREEN<# declare base "class"> use base qw(PerlPoint::Tags); The base module \BC<\PP::Tags> contains a special \C method which arranges that the parser learns new tag definitions when a tag module is loaded by \C. \BC<\PP::Tags> is provided as part of the converter framework \BC<\PP::Package>. It is recommended to have a "top level" tag declaration module for each \PP converter, so there could be a \C>, a \C>, \C>, a \C> and so on. (These modules of course may simply \XREF{name="Integrating foreign tags"} if appropriate.) To complete the intro, configure variable handling: \GREEN<# pragmata> use strict; use vars qw(%tags %sets); \C<%tags> and \C<%sets> are important variables used by convention. They will be explained in the next sections. ===Tag definition Now the tags can be declared really. Tag declarations are expected in a global hash named \BC<%tags>. Each key is the name of a tag, while the tag descriptions are nested structures stored as related values. \GREEN<# tag declarations> %tags=( \RED => {...} \RED => {...}, \RED => {...}, ... ); Please note that there are no tag namespaces. Although Perl modules are used to define the tags, tags declared by various \C share the same one "global scope", because a \PP document author simply uses all tag names the same way, regardless where they were defined. This means that different tags should be \I different. Each tag description consists of several parts: \LOCALTOC{type=linked} Most of these parts are optional. ====Base definition What the parser basically needs to know about a tag is if it takes options and a body, because this influences parsing directly. If a tag has no body but the parser looks for it, a parsing error might occur for no real reason, or bodylike source parts following the tag immediately would be misinterpreted. Providing the necessary informations is simple. Here's the example from the last section again, expanded by the related details. \GREEN<# tag declarations> %tags=( EMPHASIZE => { \GREEN<# options> \RED TAGS_OPTIONAL>, \GREEN<# don't miss the body!> \RED TAGS_MANDATORY>, }, COLORIZE => {...}, FONTIFY => {}, ... ); This is easy to understand. The \C