BuzzSaw - Database

Schema

At the heart of the design of the BuzzSaw database schema is a representation of a log message as an event, associated with each event are zero or more tags and zero or more pieces of extra_info.

The database schema also contains tables which are used for tracking data sources to help avoid parsing the sources multiple times. There are also tables available which can be used to provide locking facilities so that multiple importer processes can be run concurrently without risk of conflicts.

Events

Each log message is modelled as an event and is stored in the table of that name. The attributes for the event table are as follows:

Name Type Nullable Purpose
id Integer. Must be unique. No Primary Key
raw String (max length 1000 chars). Yes The full original log message.
digest Base64 encoded string (max length 200 chars). Must be unique. No The Base64 encoded digest of the full original log message.
logtime Timestamp. No The full timestamp (date and time) with timezone for the log message. Must not be null.
logdate Date No The date part of the timestamp, automatically extracted by a database trigger.
hostname String (max length 100 chars). No The name of the host from which the log message originated.
message String (max length 1000 chars). Yes The extracted log message (timestamp, hostname, program, etc. have been removed).
program String (max length 100 chars). Yes The name of the program (e.g. sshd) which generated the log message. Not always recorded in the log message.
pid Integer Yes The process ID of the program which generated the log message. Not always recorded in the log message.
userid String (max length 20 chars). Yes The ID of the user associated with this log message. Often this is not recorded in the log message. It might be a string username or an integer UID.

If you need to do queries based on dates and you are not interested in the times then you definitely want to use the logdate column, it is much faster.

Tags

Associated with each event are zero or more tags. A tag is a simple label (e.g. auth_failure, segfault) which describes some aspect of the log message. The attributes for the tag table are as follows:

Name Type Nullable Purpose
id Integer. Must be unique. No Primary Key
name String (max length 20 chars). No The name of the label.
event Integer. No Foreign key reference to the event table

Extra Information

Associated with each event are zero or more pieces of extra information. Extra information has both an arbitrary name and a value which describes some aspect of the log message (e.g. the source_address of an SSH login). The attributes for the extra_info table are as follows:

Name Type Nullable Purpose
id Integer. Must be unique. No Primary Key
name String (max length 20 chars). No The name of the extra information.
val String (max length 100 chars). No The value of the extra information.
event Integer. No Foreign key reference to the event table

Note that when old entries in the events table are anonymised that all extra information associated with an event is deleted. Contrast this with the tags table where they are all kept, it is assumed that all tags are "safe".

The BuzzSaw::DB API

When you need to make arbitrary buzzsaw database queries you could just use SQL (with or without the perl DBI interface) and that will give you everything you need. However there is a handy API based on the excellent Perl DBIx::Class module which can make creating complex queries much simpler. This API is exposed via the BuzzSaw::DB module. It can be used like this:

   use BuzzSaw::DB;

   my $db = BuzzSaw::DB->new( name => 'logdb',
                              user => 'fred',
                              pass => 'letmein' );

   my $schema = $db->schema;

   my @events = $schema->resultset('Event')->search( { hostname => 'foo' } );

More typically you will not want to specify the user credentials in the script itself so the better approach is to use a configuration file. That can be done like this:

   use BuzzSaw::DB;

   my $db = BuzzSaw::DB->new_with_config();

   my $schema = $db->schema;

The default configuration file is /etc/buzzsaw/db.yaml but it can be overridden like so:

   use BuzzSaw::DB;

   my $db = BuzzSaw::DB->new_with_config( configfile => 'buzzsaw_db.yaml' );

   my $schema = $db->schema;

The format of the configuration file must be YAML. It is a simple hash of key-value pairs which map to the various attributes in the BuzzSaw::DB module (e.g. user, pass, name, host, port).

As shown above, to get access to the DBIx::Class schema object you need to use the schema accessor. There are result set classes available for all DB tables (named Event, Tag, ExtraInfo, Log and CurrentProcessing). These result sets are used for querying the entries in the table. The parameters of an entry in the result set can be easily queried through the accessors. Here's an example:

   my $schema = $db->schema;

   my @events = $schema->resultset('Event')->search( { hostname => 'foo' } );

   for my $event (@events) {
      my $time = $event->logtime;
      my $program = $event->program;

      print "$time $program\n";
   }

The full details for each result set can be found by looking at the perl docs for the modules, they are in the BuzzSaw::DB::Schema::Result namespace (e.g. BuzzSaw::DB::Schema::Result::Event).

Thanks to DBIx::Class it is possible to easily retrieve the data associated with an event which is stored in a foreign table (e.g. tags and extra_info). Here is an example of retrieving the tags for an event:

  for my $event (@events) {

    my @tags = $event->search_related('tags')->all;

  }