PApp - creating applications for the WorldWideWeb [Vortrag]

(c) 2000 Marc Lehmann <schmorp@schmorp.de>
20 September 2000


Table of Contents


1. PApp - what is it?

PApp (which is simply short for "Perl APPlication") is a basically a collection of perl modules directed at seasoned perl programmers that allows one to create large applications for stateful protocols (like http or wap). It is possible to implement a simple-but-complete content management system in a few hundred lines of papp-code.


2. About this document/presentation

The presentation I will hold at the linuxworldexpo will explain a lot of technical details not included in this document. The slides for the linuxworldexpo presentation will be available at http://www.goof.com/pcg/marc/docs.html, shortly after the linuxworldexpo.


3. What was the motivation for creating PApp?

The original motivation for writing PApp came when our company (nethype GmbH) started to implement a highly interactive website that required at least limited forms of content management.

Creating large or even medium-sized applications for the Web using CGI is a tedious task. Things like preserving state (a.k.a. session-tracking and/or user-tracking) is a major headache, both for the programmer and the security advisor.

PApp solves these and a lot of other problems we didn't originally envision by providing a generic and easy API.


4. Features of PApp / Advantages over other solutions

While PApp could be mistaken to be yet another CGI-wrapper, this is not the case. PApp concentrates on providing an application-centric view, rather than a web-page-centric one. This means that there can be one web page per file (as with CGI), but it is also possible to put multiple pages into a single file (papp pages, called modules, are often so short that it makes a lot of sense to group related pages together), or to distribute them between many files. Of course, the good-old include mechanism is still there (although not named include).

4.1. State management / Persistance

At your option (you want this!), PApp can manage persistent variables for a single session or user. This means that variables almost automatically stay persistent for the whole session. This marking mechanism is very generic: State variables (called state keys in PApp parlance) can be marked as session-dependent (the default), user-dependent (persistent over session borders, also called preference items), local to a page or a group of pages or any mixture thereof. Interesting planned extensions include things like transactions and transaction-dependent state keys.

An example for a user-dependent state key is the language the user selected last. A typical session-dependent variable is the flag wether the user has authorized herself. A local variable could be data from a multi-page transaction.

4.2. Security

A common bug in cgi scripts is passing of sensitive data in so-called hidden fields of web forms, hidden from the casual user but of course open to attacks. With PApp this is very unnatural: The state data never leaves the server, but instead an encrypted (128 bit twofish code) cookie is used (usually encoded in the URL, not to be mistaken with the cookies netscape implements). Compromising the server key gives access to other sessions (similar to a broken caching proxy), but still makes it impossible to change the data.

In general, the design of PApp makes security the first priority, not only by careful design of the network/server protocol but also by providing easy and standardized methods for common tasks.

4.3. Session/User-tracking

Session and user-tracking are done automatically by PApp. The application can react to session starts if necesary (e.g. by redirecting the user to a start page when the data on the accessed page is no longer available or initializing state keys on session start). This is also possible when new users start a new session, of course. A session is defined by PApp as a tree with the page that started a session (i.e. one without or with an invalid session cookie)

User tracking beyond sessions is currently done using the http-cookie mechanism. Care has been taken to do this sensibly, however: The session data from cookies is ignored and a user without (or with disabled) cookies will not be flooded with cookie request more than once or twice a day. This is another example of how PApp can easily adapt to users.

4.4. User administration

PApp manages users per-server, not per-application. Applications can use an access right system similar to the unix user/group mechanism to aministrate its users, but can also implement their own system. PApp identifies every user using a unique user id with optional attributes like name/password/group and preferences.

4.5. I18n

I18n is short for Internationalization, which, in the context of PApp means multiple language support. PApp does this using language tagging, string scanning and a generic translation editor. The I18n model of PApp is more general than the widely-known gettext model implimented by GNU and sun, among others. The target language is chosen using the users preferred language and protocol-specific data (e.g. the Accept-Language-http-header).

Every source file can use a different language (if necessary; the language format allows finer-grained distribution of languages but this is not yet implemented). PApp can scan for strings in papp-sources, text/html/xml files and even database fields (e.g. you can declare a single database table row as english, to be translated, or as mixed language, to be scanned for language tags). Every application specifies the destination languages it wants to support. A translation editor (an example application "delivered" with PApp) can be used by translators to translate as-of-yet-untranslated messages, updates can be done on the fly.

I18n is as easy as writing __"Translate this" (the tagging syntax is a reminiscent of the widely used _"message"-syntax in C) in your documents or program.

4.6. Unicode / Multi-charset ability

Internally, PApp supports only two datatypes: binary and text. Binary is usually used for images or similar data, while text should be used for html (or xml or wml...) pages. Relatively recent fixes to the unicode standard mostly pacified the objections raised by a lot of cultural groups against this standard, so PApp encodes everything using unicode.

The internal representation is independent of the output encoding or even the output character set. You can opt to output iso-8859-1 text or, if the user wants something else, another charset + encoding like iso-2022-jp. This selection can be based on program choice (i.e. hardwired) or can be based on the users preferences or protocol data (e.g. the Accept-Charset-http-header), similar to the selection of the target language.

4.7. Speed

Speed was a major concern when designing the API. Although implemented to a large part in perl, PApp aims at providing a similar speed than bare in-server scripts (like Apache::Registry-based scripts). At least for non-trivial ("real-world") pages, PApp pages should be very similar in performance, but of course cannot beat Apache::Registry. However, features like I18n come basically at no cost, both with respect to programming time and to the runtime, often leading to correctly tagged applications even when multi-language was not originally a target of an application ("because it's so easy to do").

4.8. Scalability

PApp applications easily scale to many servers if a single server cannot sustain the load. The only limitation is that there must be a single database-server managing the state keys, which is usually a very small amount of processing power a PApp application consumes.

4.9. XML

PApp support for XML is two-fold: First, PApp uses XML for its own papp-source-format specifying basic layout of an application. The decision for using XML was not easy, as XML can be regarded as a format designed for machines (but still decipherable and writable by humans, if necessary), while humans are generally better suited with SGML.

However, XML is used not only for internal source files, but is fully supported as a text format. together with its evil brother PXML, which is basically xml (text) with embedded perl code (or vice versa), it allows PApp to apply stylesheets (XSLT) to webpages either at compliation time or at runtime. PApp applications can be written fully in XML, and only the output stylesheets decides wether to actually output HTML, WML (for mobile phones) or XML (for XML-capable-browsers, if sensible).

As an added gimmick, PApp can dynamically fetch XML or PXML data (or code!) from other sources at runtime. Content management systems usually want to store pages (and/or layout) inside a database. An example on how to implement this even fits as an example into the manpage for the PApp::XML-module.

4.10. Protocol- & Layout-independence

Together with stylesheets, PApp applications can be written with only minimal dependencies on layout or target protocol. It is possible to write applications that only provide "modules", and only the final stylesheet decides how it is rendered. Since PApp supports this implicitly this enables layout- and protocol-independent applications.

Together with the I18n model, this enables the almost complete seperation of translation/programming/design and environment. I18n comes at almost no cost, while layout seperation of course requires though on the programmer's side, while protocol independence requires translation stylesheets to translate internal xml representations into the target "language".

4.11. Platform-independence

Although the main target of PApp is Apache/mod_perl, an interface based on CGI (or similar mechanisms) is possible (yet slower). PApp applications are indifferent to the environment (i.e. it is easy to write an application that runs both under CGI and inside apache).

4.12. Database support

Serious web applications without database support are, of course, impossible. Therefore PApp provides a lot of conviniences for applications: Each application (and sub-application) can define a default database connection which is persistent, cached, and checked (like all database connections in PApp). SQL support is compatible to the underlying DBI interface, but PApp programmers rarely have to resort to that API. PApp automatically caches prepared SQL statements (allowing the query optimiser to work once). Since a code-excerpt is better than a thousand marketing words, here is an actual example:

<:
   my $st = sql_exec \my($id, name),
                     "select id, name from user where name like ?",
                     $S{name};
:>
<table><tr><th>ID</th><th>Name</th></tr>
<:
   while ($st->fetch) {
      ?><tr><td>$id</td><td>$name</td></tr><:
   }
:>
</table>

This displays all id/name-pairs in a given table using a nicely-formatted html-table.

PApp currently uses MySQL for internal state management (MySQL is very fast for the task at hand). Applications are free to use any database they want, of course.

4.13. Persistent helper objects

As mentioned earlier, database connections are persistent. PApp provides a number of "unusual" persistent objects, for example, it is possible to tell PApp that a given callback needs to be called after a specific URL has been clicked, independent of the target page (i.e. the target page has to know nothing about who called it). Another helper object is the persistent SQL row object, which maps SQL rows into perl hashes. The following code excerpt (using the editform library which is part of PApp, and fully language-tagged) displays the id and name fields of a table in a freely-editable (HTML-) form. Updates to the database are done automatically, thus, the example is complete:

<:
   my $row = new PApp::DataRef 'DB_row',
                               table => "user",
                               where => [id => $userid];

   # pre-set name
   $row->{name} ||= "<username>";

   ef_begin;
   :><br>__"ID:"   <?ef_string \$row->{id}  ,  5:><:
   :><br>__"Name:" <?ef_string \$row->{name}, 20:><:
   ef_submit __"Update";
   ef_end;
:>

4.14. Debugability

Since PApp is written by its main users (or, conversely, the main users of PApp also develop it), debugging support is relatively strong. PApp usually is able to deliver a complete backtrace of the program, including "interesting objects", when a fatal error occurs and features a powerful exception mechanism to gather information from all stages (low- to high-level) of a request. It is possible to store the backtrace and error information into a database, providing the user with a nice error message while mailing the administrator about the incident. The information saved by PApp allows the programmer to precisely reproduce the error situation (as far as possible), but usually the URL suffices the uniquely identify the precise state of the session.

Schemes where the user could add comments to such a "coredump" or even browse the data structures interactively (given correct access rights) should be possible, but not implemented so far (this would be implemented using a specialized PApp application supposedly called the "error browser").

4.15. "Web-Widgets"

The newest addition to PApp (and therefore not final in its implementation) are reusable components also dubbed Web-Widgets, similar to reusable GUI-objects named "widgets". An example for such a component would be a standard "forum" widget. In one of our projects we use the same forum widget to provide "web chat", "small ads" and the "news!" page, all with the same code but using different stylesheets to customize the layout.

A PApp-application is, in some sense, a large state-machine. this is a limitation of stateless protocols which is hard, but not impossible, to circumvent (if perl only had efficient continuations). Any application can be embedded into other applications, while retaining their own state and their own state machine and of course their own set of variables / state keys. Standard PApp applications are not much more than a single page with html header/footer, authenticitation check and an embedded "Web-widget".

4.16. Logging

Logging is a required part of any serious business. Gathering statistical data is a basic requirenment today. PApp saves every state key to a database for each page impression/request (usually between 300 and 900 bytes). This saved information contains everything nedeed to recreate a given page except the actual program code. When session/state data is being expired (necessary to put a sensible bound on the size of the state database), PApp conviniently allows applications to gather statistical data from individual "hits", using almost the same environment as at runtime. Expiring and data gathering can, of course, be done seperately if necessary.

4.17. PApp is Free Software

Although we are not sure wether we'll publish all versions of PApp under the GPL, or which modules from our own applications might become standard components (like the forum), we are determined to make PApp free software. The principal PApp architect (me!) did a lot of free software modules and works at quite a few free software programs (like GCC or The Gimp) and is determined to make PApp as free and as powerful as possible.


5. Disadvantages

PApp is not actually a revolution, judged by its components. It does, however draw a lot of functionality and ideas into a single, well-contained package. Nevertheless, there are quite a few reasons on why NOT to use PApp, or at least not to use PApp YET.


A. References

A.1. Apology

I'd like to apologize for any typoes or other mistakes in this document. It was written in a single session without access to a spellchecker and so has not been debugged it yet ;)