dtdtree

dtdtree outputs the content hierarchy tree (in ASCII) of SGML elements defined in a DTD.


Usage

dtdtree is invoked from the command-line as follows:

% dtdtree [options] elementname elementname ...

Any strings after, and not part of, command-line options are treated as the elements (elementname) to output trees for. If no elements are specified, than the tree(s) for the top-most element(s) defined in the DTD are printed.

The following are the list of options available:

-catalog filename

Use filename as the file for mapping public identifiers and external entities to system files. If -catalog is not specified, "catalog" is used as the default filename. See Resolving External Entities for more information.

-dtd filename

Use filename as the SGML DTD to parse. Otherwise, read from standard in.

-help

Print a brief usage description. No other action is performed.

-level #

Set the prune level of the content hierachy tree to # Defaults to 15.

-treefile filename

Output element content tree(s) to filename. Otherwise, dtdtree prints to standard out.

-verbose

Ouput to standard error messages of what dtdtree is doing. This option is mainly for debugging purposes.


dtdtree Output

The tree shows the overall content hierarchy for an element. Content hierarchies of descendents will also be shown. Elements that exist at a higher (or equal) level, or if the maximum depth has been reached, are pruned. The string "..." is appended to an element if it has been pruned due to pre-existance at a higher (or equal) level. The content of the pruned element can be determined by searching for the complete tree of the element (ie. elements w/o "..."). Elements pruned because maximum depth has been reached will not have "..." appended.

Example:

     |__section+)
         |_(effect?, ...
         |__title, ...
         |__toc?, ...
         |__epc-fig*,
         |   |_(effect?, ...
         |   |__figure,
         |   |   |_(effect?, ...
         |   |   |__title, ...
         |   |   |__graphic+, ...
         |   |   |__assoc-text?)
Note

Pruning must be done to avoid a combinatorical explosion. It is common for DTD's to define content hierarchies of infinite depth. Even with a predefined maximum depth, the generated tree can become very large.

Since the tree outputed is static, the inclusion and exclusion sets of elements are treated specially. Inclusion and exclusion elements inherited from ancestors are not propagated down to determine what elements are printed, but special markup is presented at a given element if there exists inclusion and exclusion elements from ancestors. The reason inclusions and exclusions are not propagated down is because of the pruning done. Since an element may occur in multiple contexts -- and have different ancestoral inclusions and exclusions in effect -- an element without "..." may be the only place of reference to see the content hierarchy of the element.

Example:

    D1
     |  {+} idx needbegin needend newline
     | 
     |_(head,
     |   | {A+} idx needbegin needend newline
     |   |  {-} needbegin needend
     |   | 
     |   |_(((#PCDATA |
     |   |____((acro |
     |   |       | {A+} idx needbegin needend newline
     |   |       | {A-} needbegin needend
     |   |       | 
     |   |       |_(((#PCDATA |
     |   |       |____((super | ...
     |   |       |______sub)))*)) ...

Ignoring the lines starting with {}'s, one gets the content hierachy of an element as defined by the DTD without concern of where it may occur in the overall structure. The {} lines give additional information regarding the element with respect to its existance within a specific context. For example, when an ACRO element occurs within D1,HEAD -- along with its normal content -- it can contain IDX and NEWLINE elements due to inclusions from ancestors. However, it cannot contain NEEDBEGIN and NEEDEND regardless of its defined content since an ancestor(s) excludes them.

Note
Exclusions override inclusions. If an element occurs in an inclusion set and an exclusion set, the exclusion takes precedence. Therefore, in the above example, NEEDBEGIN, NEEDEND are excluded from ACRO.

Explanation of {}'s keys:

{+}
The list of inclusion elements defined by the current element. Since this is part of the content model of the element, the inclusion subelements are printed as part of the content hierarchy of the current element after the base content model. Subelements that are inclusions will have {+} appended to the subelement entry.
{A+}
The list of inclusion elements due to ancestors. This is listed as reference to determine the content of an element within a given context. None of the ancestoral inclusion elements are printed as part of the content hierarchy of the element.
{-}
The list of exclusion elements defined by the current element. Since this is part of the content model of the element, any subelement in the content model that would be excluded will have {-} appended to the subelement listing.
{A-}
The list of exclusion elements due to ancestors. This is listed as reference to determine the content of an element within a given context. None of the ancestoral exclusion elements have any effect on the printing of the content hierarchy of the current element.

Resolving External Entities

Defining the mapping between external entities to system files may be done via the -catalog command-line option. The catalog provides you with the capability of mapping public identifiers to system identifiers (files) or to map entity names to system identifiers.

Catalog Syntax

The syntax of a catalog is a subset of SGML catalogs (as defined in SGML Open Draft Technical Resolution 9401:1994).

A catalog contains a sequence of the following types of entries:

PUBLIC public_id system_id

This maps public_id to system_id.

ENTITY name system_id

This maps a general entity whose name is name to system_id.

ENTITY %name system_id

This maps a parameter entity whose name is name to system_id.

Syntax Notes

Example catalog file:

        -- ISO public identifiers --
PUBLIC "ISO 8879-1986//ENTITIES General Technical//EN"            iso-tech.ent
PUBLIC "ISO 8879-1986//ENTITIES Publishing//EN"                   iso-pub.ent
PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN"  iso-num.ent
PUBLIC "ISO 8879-1986//ENTITIES Greek Letters//EN"                iso-grk1.ent
PUBLIC "ISO 8879-1986//ENTITIES Diacritical Marks//EN"            iso-dia.ent
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN"                iso-lat1.ent
PUBLIC "ISO 8879-1986//ENTITIES Greek Symbols//EN"                iso-grk3.ent 
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 2//EN"                ISOlat2
PUBLIC "ISO 8879-1986//ENTITIES Added Math Symbols: Ordinary//EN" ISOamso

        -- HTML public identifiers and entities --
PUBLIC "-//IETF//DTD HTML//EN"                                    html.dtd
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"          ISOlat1.ent
ENTITY "%html-0"                                                  html-0.dtd
ENTITY "%html-1"                                                  html-1.dtd

Environment Variables

The following envariables (ie. environment variables) are supported:

P_SGML_PATH

This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. For example, if a system identifier is not an absolute pathname, then the paths listed in P_SGML_PATH are used to find the file.

SGML_CATALOG_FILES

This envariable is a colon (semi-colon for MSDOS users) separated list of catalog files to read. If a file in the list is not an absolute path, then file is searched in the paths listed in the P_SGML_PATH and SGML_SEARCH_PATH.

SGML_SEARCH_PATH

This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. This envariable serves the same function as P_SGML_PATH. If both are defined, paths listed in P_SGML_PATH are searched first before any paths in SGML_SEARCH_PATH.

The use of P_SGML_PATH is for compatibility with earlier versions. SGML_CATALOG_FILES and SGML_SEARCH_PATH are supported for compatibility with James Clark's nsgmls(1).

Note
When searching for a file via the P_SGML_PATH and/or SGML_SEARCH_PATH, if the file is not found in any of the paths, then the current working directory is searched.
Note

The file specified by -catalog is read first before any files specified by SGML_CATALOG_FILES.


Availability

This software is part of the perlSGML package; see (http://www.oac.uci.edu/indiv/ehood/perlSGML.html)


Author

Earl Hood
ehood@medusa.acs.uci.edu
Copyright © 1997