Package Format for the IMSS Release Manager System
Last updated: 12/22/98 1:46 PM
Purpose and Scope
This document is intended to provide a basic guide to the format of archival packages in the context of the IMSS Release Management System. The packages are manipulated and deployed on servers by the components of the Release Management System, described elsewhere. Ordinarily these packages are assembled and transmitted to the servers by tools developed within IMSS specifically designed for that purpose. However, changes in the way web sites are developed and maintained call for extended ability to deploy content from hosts outside of IMSS, and it is for this purpose that this document explains the structure of the release packages.
Package Format and Contents
The packages are distributed in the UNIX "tar" (tape archive) format. The tar application is commonplace on UNIX systems and should be present on any version of the UNIX operating system. There are also applications available for Microsoft platforms that can generate the format.
The important issue is in the archive’s contents and their directory structure. For these examples, assume a project exists that is named "test". Note that there are no restrictions on the name of the archive itself—when received by the release manager server it is assigned a name at that point.
All of the file paths in the archive must start with the designated project name. Contents of a project are kept under that name in the development area on internal development servers, and the files are deployed to an area of the web server document tree that also uses that name. For example:
This archive contains three files. All three files are under the directory name that corresponds to the project. The third file name is of particular interest. It is a special file used by the release management system for task of validating and deploying the content. More will be discussed concerning the "weblist" file later.
When a package is release for a project—whether by the developer tools, the web-form or a scripted solution developed by external agencies, it has the project name associated with it. That project name must correspond to the top-level directory in the archive. The above three-file archive could not be used with a project name of "imss", as that would not match the contents of the archive.
At present, the archive must be provided in an uncompressed, unencrypted form. Support of compression (specifically LZW and Zlib, the algorithms of the UNIXcompress and gzip utilities respectively) is in the planning stage. Also in the design stage is support for encrypted transmission, using either SSL via the web server, or existing algorithms such as IDEA or DES, possibly public-key algorithms, as well. At this time, the potential benefits of encrypted transmission have not been demonstrated, so the effort is more focused on compression of the data so as to shorten transmission time.
The Special Fileweblist
Important note: this file’s format description applies only to IMSS-managed release servers. The software in use by the corporate site (www.hp.com) is different, and the format of their contents listing is different. It is also called "Weblist", with a capital.
The weblist file is essentially the manifest, or table of contents, for the project being released. Each line that is not a blank or a comment refers to one file in the project. It is either deployment information for files that are new or being updated, or it is a deletion directive for those files that have been removed or made obsolete in the project.
Using the sample weblist shown below:
# test/weblist - written by stage for randyr
Doc index.html /test
Fig test.gif /test
Mp2 test.map /test
Doc test1.html /test
Doc test2.html /test
The first line is a comment. These are lines whose first non-whitespace character is a hash mark ("#"). These lines, as well as blank lines, are ignored when processing the package contents.
The remaining lines each consist of three parts: a type, a source, and a destination.
This field is used primarily to distinguish executable content from non-executable. Only the value of "Bin" is significant (and letter-case is not significant). Files with a type set to "Bin" are given different access permissions. They receive read and execute permission, whereas other files receive read permission only. The existing permissions on the file that are recorded in the archive do not matter.
The remaining types are currently for informative use only. They could become significant should the architecture of the web hosts change such that images and HTML were to be treated differently. For now, the only other values in use are: "Map" or "Mp2" for files ending in ".map" which are treated as image maps, "Doc" for files that are HTML, and "Fig" for everything else. The only other type that is handled specially is "OBS": this indicates a file that is no longer in use, and should therefore be deleted from the server. If the file does not exist (likely due to having already been specified as OBS at least once before), no error is reported, it is simply passed over.
Support for Java servlets and applets is currently being designed. The type identifier for these will likely be "Jvs" for servlets and "Jva" for applets. However, as this is still in the design phase, these should not be used in any weblist files just yet.
The second field specifies where the file is located in the archive area. The path is assumed to be relative to the project name, since the project name is already a given part of the archive structure. The first line in the sample weblist has a second field of simply "index.html". Given that the project in this example is "test", the actual path of the file in the archive is "test/index.html". However, the leading "test/" part of the path is removed for the weblist, so what is left is just "index.html". Though there are no CGI examples in the sample weblist, there are several considerations when dealing with CGI components, covered in a separate section below.
This last field determines where the file should be moved. If the type field specifies a "Bin" file, then the destination path is considered relative to the root of the CGI area for the server. Other file types are considered to be relative to the documents area. The actual definition of these areas is server-specific and not something that the archive or the developers should be concerned with. There are path conventions that apply to CGI content, see below. The destination specification does not include the actual file name, as the release manager via the second field already knows the name. The file is moved into the destination area, then the access permissions are set according to type.
CGI Content and Path Translations
When files are released as CGI content, they generally undergo some translation in their destination path. This is due to the fact that earlier web hosts were home to several (dozen) separate projects, and as such there needed to be some layer of separation so that two project do not clash over ownership of a name such as "mailform.pl" or similar. Hosts that have been set up more recently make use of virtual hosting to associate a unique web address with a single project. For them, the CGI translation is optional.
The basic approach to the translation has been geared towards swapping the position of the sequences "cgi-bin" and the actual project name. Within CVS repositories, a project’s CGI area falls as a sub-directory under the main project directory. When deployed to an external host, however, the CGI area is based from a single "cgi-bin" directory in which project-specific directories are placed. To achieve this, the following translations are made on the destination paths in the weblist file for all items of type "bin":
Note that these translations are only done on the destination path value. The result is that the file is still picked up and archived from within the CVS repository as expected, and the release tool manages the final destination for the file based on the host it is being released to.
Creating CGI Entries inweblist
That having been said, a "Bin" entry in a weblist file is dependent on the target host. For project "test" on host "www.interactive.hp.com", a CGI script "test.pl" will appear in the list as:
Bin cgi-bin/test.pl /cgi-bin/test
Remembering that the source (second field) is already presumed to be relative to "test", and that destination entries do not need to include the actual filename, the file name transitioned as such:
/test/cgi-bin/test.pl # The starting point
/cgi-bin/test/test.pl # Swap test and cgi-bin
/cgi-bin/test # Drop the file (leaf) component of path
The final value is then used as the third field, with the original path (minus the leading project name) in the second position and the file type in the first position.