BuzzSaw - Design

The following sections give a high-level overview of the design of the BuzzSaw log processing framework. The implementation is based on the design philosophy described in the introductor section of the documentation.

The entire BuzzSaw system can really be reduced down to the need to do two specific tasks: importing of data and report generation. The whole system revolves around the central database into which all necessary data is stored.

The Database

All events of interest are stored in the database. The decision was made to use the PostgreSQL server because of it's excellent feature set, reliability and scalability. It was clear from the outset that there would be the potential to eventually store a very large number of log messages (and associated derived data) so scalability and speed is of particular concern.

A full description of the database schema is given elsewhere. The high-level view is that each log message of interest is recorded as an event. Associated with each event is a set of zero or more tags and zero or more pieces of extra_info. An event is split down into fields representing the date/time, hostname, user, program, process pid of the program and the full message. Tags are simple labels applied to an event (e.g. auth_failure) whereas extra information entries have both an arbitrary name and value (e.g. source_address. For speed many of these fields and combinations of fields are indexed to improve query times.

The BuzzSaw interface to the database (see the BuzzSaw::DB module for full details) is built using the Perl DBIx::Class object-relational mapper. This is an excellent module which provides the ability to very easily handle complex queries. For speed in a few parts of the code base we do use raw SQL statements via the standard DBI module but that is only where absolutely essential.

The implementation of various internal processes relies on PostgreSQL functions and triggers which means that BuzzSaw is currently only going to work with PostgreSQL. Having said that, it's not likely to require a lot of work to rewrite those features into the language supported by some other database engine if required.

Importing

The import process is driven by the BuzzSaw::Importer Perl module. The import process reads through the log messages from each data source. If an event has not previously been stored in the database then it will be parsed and the event data will be put through the stack of filters. If any filter declares an interest in an event then it will be stored at the end of the process. Additionally, any filter can attach tags and associated extra information even if it does not declare an interest in the event being stored.

Data Sources

The importer process can have any number of data sources. A data source is any implementation of the BuzzSaw::DataSource Moose role. The data source is required to deliver log messages one at a time to the importer process.

Currently there is only the BuzzSaw::DataSource::Files Perl module. This module can search through a hierarchy of directories and find files which match a POSIX or Perl regular expression. As well as standard text files, it supports opening files which are compressed with gzip or bzip2. When a file is opened a lock is recorded in the database to avoid multiple processes working on the same data concurrently. When the reading of a file has completed the name is recorded in the database along with the SHA-256 checksum of the file contents. This helps avoid reprocessing files which have been seen previously.

Parsing

Each data source requires a parser module which implements the BuzzSaw::Parser Moose role. The parser module is used to split a log entry into separate parts, e.g. date, program, pid, message. Mostly this is a case of being able to handle the particular date/time format being used in the log entry. The parser module is called on every log message so it is expected to be fast.

Currently there is only the BuzzSaw::Parser::RFC3339 Perl module. This handles date/time stamps which are formatted according to the guidelines in RFC3339 (e.g. looks like 2013-03-28T11:57:30.025350+00:00).

Filtering

After a log message has been parsed into various fields as an event it is passed through a stack of filters. All events will go through the filter stack in the same sequence. It is possible to make decisions in one filter based on the results of previous filters. If one or more filters declare an interest in an event it will be stored. It is not possible for a filter to overturn a positive vote from any previous filter.

A filter is an implementation of the BuzzSaw::Filter Moose role. Currently there are the following filters: Cosign, Kernel, Sleep, SSH and UserClassifier. Most of them are straightforward filters that examine events and return a note of interest, where necessary, along with some tags or other information. The UserClassifier module is slightly different in that it never declares an interest, it just adds extra details when the userid field has been set by any previous filter in the stack (e.g. Cosign or SSH). Typically this module is added last in the stack so that it can process the userid value from any previous filter.

Reporting

The reporting process is driven by the BuzzSaw::Reporter Perl module. This module has a record of reports which should be generated on an hourly, daily, weekly or monthly basis. When it is run it is possible to run it in two modes. Either it is limited to running a specific set of reports (e.g. only hourly) or it is possible to ask it to run all jobs of all types which have not been run recently enough. So, in the latter case, if a weekly job has not been run for 8 days it would be run immediately. A record is kept of when each report was last run.

A report will select all events which are have certain tags which occurred within a specified time period. The ordering of the events records retrieved can be controlled.

A report can be generated using the generic BuzzSaw::Report module or, more typically, by implementing a specific sub-class which is used to specify the names of the relevant tags, the time period of interest, the name of the template to be used, etc. For convenience, when using a sub-class most of these attributes will have sensible defaults based on the name of the Perl module.

A sub-class of the BuzzSaw::Report module can override specific parts of the process to do additional complex processing beyond the straightforward selection of events and subsequent printing of the raw data. For example, the Kernel report carries out extra analsis of the kernel logs to collate events which are associated with particular types of problem (e.g. an out-of-memory error or a kernel panic).

A report is generated by passing the events and any results from additional processing to a template which is handled using the Perl Template Toolkit. A report can be simply printed to stdout or sent via email to multiple recipients.