The SDF Document Development System

Ian Clatworthy (, Research Architect, Mincom Pty Ltd
25 May 1999

Table of Contents


SDF (Simple Document Format) is a freely available document development system which generates high quality outputs in a variety of formats from a single source. The output formats supported include HTML, PostScript, PDF, man pages, POD, LaTeX, SGML, MIF, RTF, Windows help and plain text.

SDF documents are simple to create and maintain, minimising the time spent on documentation. In particular, SDF directly supports the creation and maintenance of large, on-line documentation systems (including intranets) via centralised hypertext management and rule-based hypertext generation.

SDF has been completely developed in Perl, a popular and highly portable scripting language. Like Perl, SDF is freely available for commercial and non-commercial use.

1. Overview

1.1. Introduction

SDF is a document publishing system which aims to solve some common problems that many software organisations encounter with documentation:

  1. How do we support multiple formats (and produce high quality output for each)?
  2. How do we reduce the time it takes to create documents?
  3. How do we maintain a large on-line documentation system?
  4. How can we generate and easily maintain hypertext links?
  5. How do we keep the documentation up to date with the code?
  6. How can we centrally manage corporate formatting standards?

The basic design principles are:

  1. Documents should be specified in a logical manner.
  2. Make common things easy.
  3. Give power users full control when they need it.
  4. The architecture should be open and easy to extend.

The key feature of SDF is the division of responsibility:

SDF consists of the following key components:

Unlike SGML, XML, HTML and many other markup languages, the SDF language has been designed to be author-friendly, rather than parser-friendly. As a result, most SDF documents look quite similar to plain text email, making them easy-to-write and easy-to-read.

1.2. A Sample SDF Document

A sample SDF document is shown below:

    # Build the title
    !define DOC_NAME           "GalaxyBuilder"
    !define DOC_TYPE           "Discussion Paper"
    !define DOC_AUTHOR         "Joe Bloggs"
    H1: Introduction
    After extensive market research, I believe there is
    an excellent opportunity for us to develop software
    for the I<galaxy construction industry>. Potential
    customers include:
    * NASA
    * European Community
    * China
    * Japan.
    Note: The proposed name of the software package to be
    developed is [[DOC_NAME]]. If you want to suggest a
    better name, send email to {{}}.
    H2: Software Requirements
    The key requirements are:
    ^ support for the design and simulation of galaxies
      containing up to:
      - 1000 large planets, or
      - 5000 small planets
    + the package needs to be easy to use
    + the package needs to be well documented.
    H2: Project Team
    Exploding galaxies will be B<very> bad for business,
    so we need the best team possible for this project:
    !block table
    Person          Role
    Mary Jones      Project Manager
    Hans Blass      Architect
    Bill Smith      Software Engineer

1.3. A Brief Explanation

Comments begin with a # character as the first non-whitespace character on a line.

Macros are embedded commands which begin with a ! as the first non-whitespace character on a line. The define macro is used to set variables. The value of a variable can be embedded in paragraph text by using the [[...]] syntax.

The DOC_NAME and DOC_TYPE variables are used by the build_title macro which creates:

Paragraphs can be tagged in different ways. For the vast majority of SDF documents, the only tags used are:

Tag Meaning
H1: level 1 heading
H2: level 2 heading
* item in level 1 bulleted list
- item in level 2 bulleted list
^ first item in level 1 ordered list
+ next item in level 1 ordered list
> fixed-width, verbatim text
Note: note

Phrases can also be tagged in several ways. Any phrase can be tagged using the syntax:


where XYZ is the tag. For single, uppercase character tags like I (Italics) and B (Bold), POD-style syntax is also supported:


where X is the tag.

Tables can be specified using the table filter, typically in combination with the block and endblock macros. The first row is the headings. Remaining rows are data.

1.4. Generating Output Formats

The sdf command is used to convert SDF to other formats. The general syntax is:

   sdf [options] file ...

If an extension is not given (and a file is not found with that name), an extension of sdf is assumed.

The most commonly used option is the -2 option. For example, to convert mydoc.sdf to HTML and PostScript, the respective commands are:

   sdf -2html mydoc
   sdf -2ps mydoc

These commands create files called mydoc.html and respectively.

To convert mydoc.sdf to a set of HTML topics, the command is:

   sdf -2topics mydoc

This creates the following files:

By default, topics are created whenever a level 1 heading is encountered or a file is explicitly included. The -n option can be used to specify a different level for splitting into topics, e.g.

  sdf -2topics -n2 mydoc

1.5. Requirements

SDF requires the following:

1.6. Architecture

The architecture of SDF is given in the diagram below.

2. Alternative Systems

2.1. WYSIWYG Tools

WYSIWYG (What-You-See-Is-What-You-Get) tools are great for creating small to medium-sized:

However, the WYSIWYG approach is inefficient when it comes to creating and maintaining large documentation systems, particularly if you want high quality paper-based and on-line outputs.

The reasons are:

  1. WYSIWYG is meaningless when it comes to supporting multiple formats. For example:
    • paper documents are formatted on fixed-size pages; on-line documents are formatted as topics displayed in user-sized windows
    • different browsers may display a given HTML document differently.
  2. For good-looking results, the formatting rules need to be tuned for each format. For example, a certain phrase may be:
    • a hypertext jump within HTML
    • emphasised using italics for paper-based formats
    • a pop-up window within Windows help.
  3. The content often needs to be tuned for each format. For example, you may wish to describe a given procedure in different ways:
    • in detail for a paper-based document
    • terse for on-line help.
  4. Centralised management of formatting rules and hypertext generation is essential for minimising the cost and maximising the quality of a large documentation system. By making it too easy for individual authors to customise formatting, WYSIWYG tools often make it harder for workgroups to centrally manage formatting!

Nevertheless, WYSIWYG tools are often used in combination with SDF when they save time. For example, diagrams can be created in most packages and imported into SDF documents.


The SGML/XML approach of specifying documents semantically is an extremely powerful one and SDF uses the same approach whenever possible. However, as SDF does not use document structure rules and DTDs, it is much simpler than SGML. SDF is also more readable than SGML, so high-cost authoring tools are not needed on every desktop, making SDF much cheaper to implement than SGML.

Like SDF, XML has built on SGML's good ideas but minimised the overall complexity. However, XML has retained SGML's unfriendly appearance.

2.3. POD

In many ways, the system closest to SDF is POD (Plain Old Documentation) which is widely used in the Perl community. Like SDF, POD:

Currently, SDF has several advantages over POD:

  1. SDF supports more features, e.g.:
    • tables
    • figures
    • formatting within example text
  2. SDF is more extensible as sites can add their own:
    • paragraph and phrase styles
    • paragraph and phrase attributes
    • filters, macros, etc.

Furthermore, versions 2.000beta10 and later of SDF are POD friendly:

As a result, POD users can use SDF or migrate to SDF when POD isn't powerful enough. Refer to SDF for POD Users for further details.

3. Language Overview

3.1. Basic Concepts

The basic concepts within SDF documents are:

Concept Description
paragraph one or more lines of text
phrase a section of text within a paragraph
style the type of a document, paragraph, phrase or table (e.g. H1)
macro a command embedded in a document (e.g. !define)
variable a named value (e.g. DOC_NAME)
filter a rule to use when processing certain sections of text (e.g. table)
attribute a named parameter of a paragraph, phrase or filter (e.g. jump)
expression a literal or expression to evaluate (e.g. "Ian Clatworthy").

Further details about these are given below.

3.2. Paragraphs

Paragraphs have the following format:


Leading and trailing whitespace on lines is generally ignored. Paragraphs are separated by:

For normal paragraphs, simply specify the text on one or more lines. For example:

   I like products which are simple to use and
   do what I expect. We should encourage engineers
   to design more products with these qualities.

A paragraph can be given a style using the following syntax:


Some commonly-used paragraph styles are:

Style Description
N normal paragraph (the default)
H1 .. H6 chapter heading at level 1-6
A1 .. A6 appendix heading at level 1-6
P1 .. P6 plain heading at level 1-6
Note a single paragraph note
E fixed-width (example) text

For example:

   Note: Life is too short to drink bad wine.

The result is:

Note: Life is too short to drink bad wine.

3.3. Special styles

For certain styles, the following syntax is also supported:

   special_style line1

The special styles available are:

Style Description
> fixed-width, verbatim paragraph
. .. ...... paragraph or plain list item at level 1-6
* .. ****** unordered list at level 1-6
- .. ----- unordered list at level 2-6
& .. &&&&&& enumerated list at level 1-6
^ .. ^^^^^^ first entry in an ordered list at level 1-6
+ .. ++++++ next entry in an ordered list at level 1-6

For example:

^ fruits:
  - peach
  - banana
+ vegetables:
  - potato
  - carrots.

The result is:

  1. fruits:
    • peach
    • banana
  2. vegetables:
    • potato
    • carrots.

3.4. Phrases

A phrase is a section of text within a paragraph enclosed in the symbols {{ and }}. Like paragraphs, phrases are optionally tagged with a style.

The commonly-used phrase styles are:

Tag Description Sample Output
1 1st level emphasis (default) emphasis 1
2 2nd level emphasis emphasis 2
3 3rd level emphasis emphasis 3
N normal some normal text
I italic some italic text
B bold some bold text
U underline some underline text
EX example some example text
EMAIL email address
F (or FILE) Filename myfile.sdf
SECT Section Paragraphs
URL Uniform Resource Locator
DOC document title SDF User Guide
REF reference (document code) MTR-SDF-0002
ORG organisation Mincom
PRD product MIMS
E (or CHAR) escape (i.e. special character) ©
S spaces are non-breaking section 2.1
IMPORT name of a figure to import

For single character, uppercase phrase styles, POD's [A-Z]<..> syntax can be used. For example:

E:  I<Please> reply to {{}} quickly.

The result is:

Please reply to quickly.

This example also illustrates the advantage of specifying documents in a logical manner:

3.5. Types vs Classes

A type (e.g. EMAIL) simply marks a phrase as a logical entity. Rules may be defined for processing (e.g. generating hypertext) for these types. Refer to Rule-based Hypertext Generation, later.

A class (e.g. DOC) is a special kind of type where the entity must be a member of a predefined set. As a result, SDF can validate the entry name, lookup a hypertext jump and do other clever things (like replace the text with a longer name).

Rules can also be defined for processing classes, although hypertext jumps are often defined for each entity in the tables which define the known entities. Refer to Centralised Hypertext Management, later.

3.6. Special Phrases

Special phrases are used for entering things like:

3.7. Macros

A macro is a command which can be embedded within SDF. Macros begin with an exclamation mark (!) or equals sign (=) as the first non-whitespace character on a line. !-style macros terminate at the end of the line, unless explicitly continued using a backslash character at the end of the line. =-style macros terminate at the next blank line (ala POD).

Some examples are:

  !define DOC_NAME   "The SDF Document Development System"
  !define DOC_AUTHOR "Ian Clatworthy"

Some commonly-used macros are:

Macro Description
init variables initialise variables
define variable [expression] set a variable's value
build_title build a title page
block filter begin a block of text
endblock end a block of text
include file[; filter] include another file
import file[; parameters] import a figure

3.8. Variables

A variable is a named value. Document-wide settings are controlled in SDF using variables. Likewise, authors can define and access their own variables. In either case, the value of a variable can be referenced in a paragraph by delimiting it with the special symbols [[ and ]].

For example:

  !define MY_EMAIL ''
  My electronic mail address is [[MY_EMAIL]].

The result is:

My electronic mail address is

Some commonly used system variables are:

Variable Description
DOC_NAME the title, excluding the type (e.g. SDF)
DOC_TYPE the title type (e.g. User Guide)
DOC_AUTHOR the author
DOC_TOC the number of heading levels in the table of contents
DOC_START the time processing of the document started

The define macro is usually used to set variables. However, variables beginning with OPT_ need to be set before processing of the document begins. The init macro is used on the top line of a document to set these variables. For example:

   !init OPT_STYLE="manual"

3.9. Formats

As the value of date-time variables is the number of seconds since January 1, 1970, a format is typically applied to them when they are embedded in text. For example:

  The date is [[DATE:DOC_START]].

The predefined formats available include:

Format Description
FULL complete date-time format
TIME time only
DATE date only
CONCISE concise date only
YEAR 4-digit year

New formats can be created and the definitions of existing formats can be changed. Furthermore, a format can be applied to any variable (or embedded Perl expression), although the date-time formats are obviously only useful when applied to date-time values.

3.10. Document styles

The general type of an document can be controlled by either:

The available document styles include:

Style Purpose
document a normal document
manual a manual
paper a technical paper
fax a facsimile
memo a memorandum
minutes minutes of a meeting

It is relatively simple to create new document styles by inheriting details from an existing one.

3.11. Filters

A filter controls how a block of text is interpreted. The text is usually delimited by block and endblock macros.

For example, tables are usually defined via the table filter:

!block table
Option  Description
-h      display help
-o      specify the output extension

The result is:

Option Description
-h display help
-o specify the output extension

Other macros also support filters. These include:

Some of the commonly-used filters are:

Filter Description
table the lines are a table in SDF's TBL format
example the lines are example paragraphs
title used to build a title block for memos, faxes, etc.
topics include files as sub-topics
appendix replace H1 styles with A1, etc.
plain replace H1 styles with P1, etc.

For example, the following macro includes another SDF file and formats it as an appendix:

   !include "tips.sdf"; appendix

Note: The appendix and plain filters enable authors to construct topics without needing to worry about how those topics will be used, e.g. a topic may be a chapter in one document and an appendix in another!

3.12. Attributes

Paragraphs and phrases can be given attributes via the syntax:

  style"["attributes"]" text

where the syntax of attributes is:

  name1["="expression1]";" name2["="expression2] ...

Attributes are used for specifying custom formatting, hypertext targets and jumps, indexing information, etc.

Some commonly used attributes are:

Name Purpose
align Left, Right, Center, or Full (i.e. justified)
notoc take this heading out of the table of contents
family font family
size font size
id hypertext target tag
jump URL (Uniform Resource Locator) to jump to

For convenience, phrase attributes can be applied to paragraphs.

For example:

  Life is too short to drink bad wine.

The result is:

Life is too short to drink bad wine.

4. Web Publishing Features

4.1. Centralised Hypertext Management

SDF has a generic object management feature which enables:

  1. the definition of objects in configuration files
  2. the use of those objects in documents.

For example, a configuration file may have the following declaration of products:

  !block products; data
  Name     Jump

If an object with a Jump attribute is specified in a document, SDF will automatically include a hypertext jump to the appropriate place. For example, the objects above could be specified in a document like this (PRD is the style used to specify a product object):

  The user documentation for {{PRD:MIMS}} is now
  produced using {{PRD:SDF}}.

SDF has built-in support for the following classes of objects:

  1. references/documents
  2. terms/definitions
  3. organisations
  4. products.

Additional classes of objects can be easily added, so workgroups can centrally manage all sorts of URLs: mail addresses, FTP sites and so on.

4.2. Rule-based Hypertext Generation

SDF provides a generic, rule-based feature called event processing for automating tasks like hypertext generation, index generation and custom formatting.

The on macro allows you to execute an arbitrary piece of Perl code when an event occurs during document conversion. Its syntax is:

 !on type pattern; [id]; action


The types supported and the Perl symbols available in the respective actions include:

Type Symbols
paragraph $style, $text, %attr, &ParaPrepend, &ParaAppend
phrase $style, $text, %attr
macro $name, $args
filter $name, $params
table $style, %param

Some examples are:

  # Make every heading a hypertext target named itself
  !on paragraph 'H\d';; $attr{'id'} = $text

  # If the target is HTML, add a line above level 1 headings
  !if $var{'OPT_TARGET'} eq 'html'
  !on paragraph 'H1';; &PrependText("Line:")

4.3. Embedded Perl Scripting

As systems built by embedding scripts within HTML are typically easier to customise than systems build by generating HTML from scripts, embedded scripting is now a well established web-publishing solution.

Likewise, Perl embedded in SDF can be a powerful combination providing this same flexibility with some additional benefits:

To embed a block of Perl code, the script filter is used. For example:

    !block script
    for $i ('a' .. 'z') {
        print "$i";

To embed an expression within paragraph text, the [[..]] syntax is used. For example:

  Hello [["wor" . "ld"]]

Note: If the expression is a single word, it is assumed to be a variable name, otherwise the expression is treated as a Perl expression.

For single line scripts, the script macro can be used. For example:

  !script $next_version = $var{'VERSION'} + 1

As shown above, SDF variables are available within Perl expressions via the var associative array.

4.4. Inline HTML

As HTML is constantly evolving and contains features which SDF doesn't explicitly support (e.g. frames), it is occasionally necessary to directly embed native HTML. To do this, use the inline filter. For example:

    !block inline
    My name is <B>Bill</B>.

Inlined HTML is ignored when PostScript is being generated. Likewise, if text for another format is inlined (by using the target parameter to the inline filter, say), it is ignored when HTML is generated.

If you want to use embedded expressions (enclosed in [[ and ]]) and macros within the inline text, add the expand parameter like this:

    # If the SHOW_DATE variable is true, show the date, otherwise the time
    !block inline; expand
        !if SHOW_DATE
            The date is [[DATE:DOC_START]].
            The time is [[TIME:DOC_START]].

Likewise, you can use the INLINE phrase style within a paragraph to embed HTML within a paragraph. For example:

  My name is {{INLINE:<B>Bill</B>}}.

5. Other Special Features

SDF supports a number of special features which collectively make it useful for building large documentation systems like intranets and software manuals.

These features are:

Feature Description
Modules and Libraries an extensible way of packaging and reusing configuration information
Reusable Topics topics can be defined without needing to know how they are used
Conditional Text inclusion and exclusion of certain text
Extracting Documentation extraction of documentation embedded in source code
Programming Language Support attractive formatting of source code
Multiple Looks packaged document-wide presentation styles

Further details about these are given below.

5.1. Modules and Libraries

A module is a normal SDF document which contains reusable information. As SDF has an open architecture, a module can provide:

A library is a directory containing modules with a common theme. Importantly, a library can inherit information from other libraries, making it easy to:

  1. create new libraries, and
  2. maximise the reuse of configuration information.

5.2. Reusable Topics

Documentation systems can be developed as a set of topics, where each topic uses H-style headings, starting at H1. A topic does not generally need to know how it will be used - each document including a given topic can:

For example:

 !include "types.sdf"

 H1: Subroutines
 !block topics

 !include "errors.sdf"; appendix

In this example, the results are:

  1. types.sdf will be included into the document unmodified
  2. the documentation for each subroutine (create.sdf, delete.sdf and delete.sdf) will be included as sub-sections in the Subroutines chapter, and the heading levels will be modified accordingly
  3. errors.sdf will be included as an appendix.

Note: Providing tabular data to a filter is a common idiom in SDF. In this case, the only column provided is the Topic column.

5.3. Conditional Text

Sections of text can be conditionally included or excluded using the following macros:

   !if condition
   !elsif condition

These macros allow you to tune the output for different audiences and different target formats.

5.4. Extracting Documentation

Software documentation can be embedded within comments in source code (and text-based data files) and extracted by the sdfget utility. The name, order and formatting of extracted sections can be controlled via report files.

Consider the following scenarios:

Creating the necessary reports and mapping the extraction commands to SDF macros is relatively easy.

5.5. Programming Language Support

Blocks of source code can be nicely formatted via the lang parameter to the example filter. For example:

!block example; lang='Perl'
sub hello {
    local($planet) = @_;

    # Output a nice message
    print "hello $planet!\n";

If SDF encounters an unknown filter which is defined as a programming language, the example filter is implicit. Therefore, the above example can be simplified to:

!block Perl
sub hello {
    local($planet) = @_;

    # Output a nice message
    print "hello $planet!\n";

In either case, the result is:

sub hello {
    local($planet) = @_;

    # Output a nice message
    print "hello $planet!\n";

There is built-in support for numerous languages, including Perl, C, C++, Java, Delphi and CORBA IDL. New language definitions can be easily added (vgrind definitions are used).

Pretty printing of source code is directly supported by sdf's -P option. For example:

  sdf -2ps -Psh myscript
  sdf -2ps -P myapp.c
  sdf -2ps -P -N5

The language to use can be specified as a parameter. The default language is derived from the extension of the file. The -N option adds line numbers at the frequency given.

5.6. Multiple Looks

The overall appearance of an document can be controlled by either:

The available looks include:

It is relatively simple to create new looks by inheriting details from an existing one.

Note: At this time (January 98), multiple looks are currently only supported for paper documentation generated via FrameMaker. Multiple look support will hopefully be added to other SDF output drivers during 1998.

6. Other Issues

6.1. Why should I use SDF?

The common reasons for using SDF are:

  1. it provides a single-source solution for high quality PostScript, HTML, Windows help, etc.
  2. it saves time
  3. it is highly portable
  4. it is free.

6.1.1. High quality outputs from a single source

As SDF can be used to specify a logical definition of a document, it is reasonably good at maintaining quality across the different output formats it supports.

6.1.2. Saving time

SDF often saves time in a number of ways:

  1. developers can focus on document content and leave the formatting to SDF, i.e. minimal effort is required for good formatting
  2. developers can use their favorite text editor, rather than switching between two different tools for source code and documents
  3. documentation can be extracted or generated from source code
  4. hypertext can be automatically generated using event processing or object management.

6.1.3. High portability

SDF documents are plain text files which can be created on any platform. The underlying technology is Perl, a highly portable scripting language available on all commonly used platforms.

6.1.4. SDF is free

SDF is freely available for commercial and non-commercial use. However, like most software packages, there are costs to be considered when using SDF. These include:

  1. training costs
  2. the cost of coverting existing documents, if you wish to do this.

6.2. Converting Documents to SDF

For POD documents, SDF contains a pod2sdf conversion program.

For other formats, the easiest way to convert existing documents is usually to convert them to plain text and add the necessary markup. For Word users, an RTF-to-SDF converter is available.

In any case, remember that:

  1. most document formats are physical markup languages, and
  2. the main benefits of SDF come from specifying logical documents.

Therefore, regardless of what conversion tools you use, expect to do some manual changes.

6.3. Further Information

The SDF home page URL is From the home page, you can:

SDF is also available using anonymous ftp. The URL of the base directory is The latest release is available in a variety of formats.

The following mailing lists are available:

To subscribe to these lists, send email to and/or for instructions on using factotum, the majordomo variant that manages these lists. In short, send email to with a message body of subscribe sdf-users or subscribe sdf-bugs.

7. Summary

By using a logical document approach, where content and formatting are largely separated, SDF delivers high-quality Web-based and paper documentation from a single source. Importantly, the SDF language is easier for (most!) humans to read and write than alternatives like SGML or XML.

In addition, SDF lowers the cost of document production by providing a number of features for managing large documentation systems. These features include:

  1. rule-based hypertext generation, index generation and formatting
  2. centralised hypertext management
  3. an extensible way of packaging and reusing configuration information
  4. reusable topics
  5. conditional text, and
  6. extraction of documentation embedded in source code.

If all else fails, Perl scripts can be directly embedded in SDF documents to generate the output required!

Finally, as SDF is freely available to everyone, the costs associated with using SDF are relatively low.


Thanks to Tim Hudson ( for being SDF's biggest fan. Tim has assisted in design, fixed bugs, added features, promoted SDF to many of its current user base and generally helped with SDF whenever time permitted. Thanks also to Chris Moran ( for maintaining the SDF Home Page, assisting in design, reviewing documents, etc. Many other Mincom colleagues have assisted with SDF, particularly David Cox and Craig Willis. Thanks to everyone involved.

Thanks also to my former Leeds and Northrup Australia colleagues, particularly Tom Beale, Craig Gibbings and Greg Birnie, for inspiring my initial interest in literate programming.

The initial version of the mif2rtf program is Copyright 1992 by Convex Computer Corporation, Richardson, Texas. (The full copyright notice and permission notice is included in the SDF distribution.)


FrameMaker, FrameViewer, Acrobat, PDF and PostScript are trademarks of Adobe Systems Incorporated.

Delphi is a trademark of Borland International, Incorporated.

Windows is a trademark of Microsoft Corporation.