=pod =head1 NAME KiokuDB::Tutorial - Getting started with L =head1 INSTALLATION The easiest way to install L along with a number of backends is L. L depends on L and a few other modules out of the box, but no specific storage module. L is a frontend to several backends, much like L uses DBDs to connect to actual databases. For development and testing you can use the L backend, which is an in memory store, but for production use L or L are the recommended backends. See below for instructions on getting L installed. =head1 CREATING A DIRECTORY HANDLE A KiokuDB directory is the main object through which all work is done. The simplest directory that is ready for use can be created like this: my $dir = KiokuDB->new( backend => KiokuDB::Backend::Hash->new ); We will revisit other more interesting backend configuration later in this document, but for now this will do. You can also use DSN strings to connect to the various backends: KiokuDB->connect("hash"); KiokuDB->connect("dbi:SQLite:dbname=foo", create => 1); KiokuDB->connect("bdb:dir=foo", create => 1); You can also use a configuration file: KiokuDB->connect("/path/to/my_db.yml"); Which is just a YAML file: --- # these are basically the arguments for 'new' backend: class: KiokuDB::Backend::DBI dsn: dbi:SQLite:dbname=/tmp/test.db create: 1 =head1 USING THE DBI BACKEND During this tutorial we will be using the DBI backend for two reasons. The first is L's ubiquity. The second is the possibility of easily looking behind the scenes, to more clearly demonstrate what L is doing. That said, the examples will work with all backends exactly the same. First we create C<$dir>: my $dir = KiokuDB->connect( "dbi:SQLite:dbname=kiokudb_tutorial.db", create => 1, # this causes the tables to be created ); Note that if you are connecting with a username and password you need to specify these as named arguments: my $dir = KiokuDB->connect( $dsn, user => $user, password => $password, ); =head1 INSERTING OBJECTS Let's start by defining a simple class using L: package Person; use Moose; has name => ( isa => "Str", is => "rw", ); We can instantiate it: my $obj = Person->new( name => "Homer Simpson" ); and insert the object to the database as follows: my $scope = $dir->new_scope; my $homer_id = $dir->store($obj); This is very trivial use of L, but it illustrates a few important things. First, no schema is necessary. L uses L to introspect your object without needing to predefine anything like tables. Second, every object in the database has an ID. If you don't choose an ID for an object, L will assign a UUID instead. This ID is like a primary key in a relational database. You can also specify an ID instead of letting one be generated: $dir->store( homer => $obj ); Third, all L operations need to be performed within a B. The scope is not really doing anything important in this simple example, but becomes necessary when cycles and weak references are in use. We will look into that in more detail later. =head1 LOADING OBJECTS So now that Homer has been inserted into the database, we can fetch him out of there using the ID we got from C. my $homer = $dir->lookup($homer_id); Assuming that C<$scope> and C<$obj> are still in scope, C<$homer> and C<$obj> will actually be the same object: # this is true: refaddr($homer) == refaddr($obj) This is because L tracks which objects are "live" in the B (L). If the object wasn't already in memory then L would have fetched it from the backend instead. =head1 WHAT WAS STORED Let's peek into the database: % sqlite3 kiokudb_tutorial.db SQLite version 3.4.0 Enter ".help" for instructions sqlite> The database schema has two tables, C and C: sqlite> .tables entries gin_index C is used for more complex queries, and we'll get back to it at the end of the tutorial. For now let's just have a closer look at C: sqlite> .schema entries CREATE TABLE entries ( id varchar NOT NULL, data blob NOT NULL, class varchar, root boolean NOT NULL, tied char(1), PRIMARY KEY (id) ); The main columns are C and C. In L every object has an ID which serves as a primary key and a BLOB of data associated with it. Since the default serializer for the DBI backend is L, we examine the data. First let's set C's output mode to C. This is easier to read with large columns: sqlite> .mode line And select the data from the table: sqlite> select id, data from entries; id = 201C5B55-E759-492F-8F20-A529C7C02C8B data = {"__CLASS__":"Person","data":{"name":"Homer Simpson"},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true} As you can see the C attribute is stored under the C key inside the blob, as is the object's class. The C column contains all of the data necessary to recreate the object. All the other columns are only for searches. Later on you'll also see how to create user defined columns. When using L the on-disk format is just a hash of C to C with no additional columns. =head1 OBJECT RELATIONSHIPS Let's extend the C class to hold some more interesting data than just a C: package Person; has spouse => ( isa => "Person", is => "rw", weak_ref => 1, ); This new C attribute will hold a reference to another person object. Let's first create and insert another object: my $marge_id = $dir->store( Person->new( name => "Marge Simpson" ), ); Now that we have both objects in the database, let's link them together: { my $scope = $dir->new_scope; my ( $marge, $homer ) = $dir->lookup( $marge_id, $homer_id ); $marge->spouse($homer); $homer->spouse($marge); $dir->store( $marge, $homer ); } Now we have created a persistent B, that is several objects which point to each other. The reason C had the C option was so that this circular structure will not leak. When then objects are updated in the database, L sees that their C attribute contains references, and this relationship will be encoded using their unique ID in storage. To load the graph, we can do something like this: { my $scope = $dir->new_scope; my $homer = $dir->lookup($homer_id); print $homer->spouse->name; # Marge Simpson } { my $scope = $dir->new_scope; my $marge = $dir->lookup($marge_id); print $marge->spouse->name; # Homer Simpson refaddr($marge) == refaddr($marge->spouse->spouse); # true } When L is loading the initial object, all the objects the object depends on will also be loaded. The C attribute contains a reference to another object (by ID), and this link is resolved at inflation time. =head2 The purpose of C This is where C becomes important. As objects are inflated from the database, they are pushed onto the live object scope, in order to increase their reference count. If this was not done, by the time C<$homer> was returned from C his C attribute would have been cleared because there is no other reference to Marge. This demonstrates why: sub get_homer { my $homer = Person->new( name => "Homer Simpson" ); my $marge = Person->new( name => "Marge Simpson" ); $homer->spouse($marge); $marge->spouse($homer); return $homer; # at this point $homer and $marge go out of scope # $homer has a refcount of 1 because it's the return value # $marge has a refcount of 0, and gets destroyed # the weak reference in $homer->spouse is cleared } my $homer = get_homer(); $homer->spouse; # this returns undef By using this idiom: { my $scope = $dir->new_scope; # do all KiokuDB work in here } You are ensuring that the objects live at least as long as is necessary. In a web application context you usually create one new scope per request. In fact, L does this automatically. =head1 REFERENCES IN THE DATABASE Now that we have an object graph in the database let's have another look at what's inside. sqlite> select id, data from entries; id = 201C5B55-E759-492F-8F20-A529C7C02C8B data = {"__CLASS__":"Person","data":{"name":"Homer Simpson","spouse":{"$ref":"05A8D61C-6139-4F51-A748-101010CC8B02.data"}},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true} id = 05A8D61C-6139-4F51-A748-101010CC8B02 data = {"__CLASS__":"Person","data":{"name":"Marge Simpson","spouse":{"$ref":"201C5B55-E759-492F-8F20-A529C7C02C8B.data"}},"id":"05A8D61C-6139-4F51-A748-101010CC8B02","root":true} You'll notice the C field has a JSON object with a C<$ref> field inside it holding the UUID of the target object. When data is loaded L queues up references to unloaded objects and then loads them in order to materialize the memory resident object graph. If you're curious about why the data is represented this way, this format is called C, or JavaScript Persistent Object Notation (L). When using L the L and L objects are serialized with their storable hooks instead. =head1 OBJECT SETS More complex relationships (not necessarily 1 to 1) are usually easy to model with L. Let's extend the C class to add such a relationship: package Person; has children => ( does => "KiokuDB::Set", is => "rw", ); L objects are L specific wrappers for L. my @kids = map { Person->new( name => $_ ) } qw(maggie lisa bart); use KiokuDB::Util qw(set); my $set = set(@kids); $homer->children($set); $dir->store($homer); The C convenience function creates a new L object. A transient set is one which started its life in memory space (as opposed to a set that was loaded from the database). The C convenience function also exists, creating a transient set with L used internally to help avoid circular structures (for instance if setting a C attribute in our example). The set object behaves pretty much like a normal L: my @kids = $dir->lookup($homer_id)->children->members; The main difference is that sets coming from the database are deferred by default, that is the objects in C<@kids> are not loaded until they are actually needed. This allows large object graphs to exist in the database, while only being partially loaded, without breaking the encapsulation of user objects. This behavior is implemented in L and L. This set object is optimized to make most operations defer loading. For instance, if you intersect two deferred sets, only the members of the intersection set will need to be loaded. =head1 THE TYPEMAP Storing an object with L involves passing it to L, the object that "flattens" objects into L before the entries are inserted into the backend. The collapser uses a L object that tells it how objects of each type should be collapsed. During retrieval of objects the same typemap is used to reinflate objects back into working objects. Trying to store an object that is not in the typemap is an error. The reason behind this is that it doesn't make sense to store every type of object (for instance C handles need a socket, objects based on XS modules have an internal pointer as an integer, whose address won't be valid the next time it's loaded), and even though the majority of objects are safe to serialize, even a small bit of unreported fragility is usually enough to create large, hard to debug problems. An exception to this rule is L based objects, because they have sufficient meta information available through L's powerful reflection support in order to be safely serialized. Additionally, the standard backends provide a default typemap for common objects (L, L, etc), which by default is merged with any custom typemap you pass to L. So, in order to actually get L to store things like L based objects, you can do something like this: KiokuDB->new( backend => $backend, allow_classes => [qw(My::Object)], ); Which is shorthand for: my $dir = KiokuDB->new( backend => $backend, typemap => KiokuDB::TypeMap->new( entries => { "My::Object" => KiokuDB::TypeMap::Entry::Naive->new, }, ), ); L is a type map entry that performs naive collapsing of the object, by simply walking it recursively. When the collapser encounters an object it will ask L for a collapsing routine based on the class of the object. This lookup is typically performed by C, not using inheritance, because a typemap entry that is safe to use with a superclass isn't necessarily safe to use with a subclass. If you B want inherited entries, specify C: KiokuDB::TypeMap->new( isa_entries => { "My::Object" => KiokuDB::TypeMap::Entry::Naive->new, }, ); If no normal (C keyed) entry is found for an object, the isa entries are searched for a superclass of that object. Subclass entries are tried before superclass entries. The result of this lookup is cached, so it only happens once per class. =head2 Typemap Entries If you want to do custom serialization hooks, you can specify hooks to collapse your object: KiokuDB::TypeMap::Entry::Callback->new( collapse => sub { my $object = shift; ... return @some_args; }, expand => sub { my ( $class, @some_args ) = @_; ... return $object; }, ); These hooks are called as methods on the object to be collapsed. For instance the L related typemap ISA entry is: 'Path::Class::Entity' => KiokuDB::TypeMap::Entry::Callback->new( intrinsic => 1, collapse => "stringify", expand => "new", ); The C flag is discussed in the next section. Another option for typemap entries is L, which is appropriate when you know the backend's serialization can handle that data type natively. For example, if your object has a L hook which you know is appropriate (e.g. contains no sub objects that need to be collapsible) and your backend uses L. L is an example of a class with such storable hopes: 'DateTime' => KiokuDB::Backend::Entry::Passthrough->new( intrinsic => 1 ) =head2 Intrinsic vs. First Class In L every object is normally assigned an ID, and if the object is shared by several objects this relationship will be preserved. However, for some objects this is not the desired behavior. These are objects that represent values, like L, L entries, L objects, etc. L can be asked to collapse such objects B, that is instead of creating a new L with its own ID for the object, the object gets collapsed directly into its parent's structures. This means that shared references that are collapsed intrinsically will be loaded back from the database as two distinct copies, so updates to one will not affect the other. For instance, when we run the following code: use Path::Class; my $path = file(qw(path to foo)); $obj_1->file($path); $obj_2->file($path); $dir->store( $obj_1, $obj_2 ); While the following is true when the data is being inserted, it will no longer be true when C<$obj_1> and C<$obj_2> are loaded from the database: refaddr($obj_1->file) == refaddr($obj_2->file) This is because both C<$obj_1> and C<$obj_2> each got its own copy of C<$path>. This behavior is usually more appropriate for objects that aren't mutated, but are instead cloned and replaced, and for which creating a first class entry in the backend with its own ID is undesired. =head2 The Default Typemap Each backend comes with a default typemap, with some built in entries for common CPAN modules' objects. L contains more details. =head1 SIMPLE SEARCHES Most backends support an inefficient but convenient simple search, which scans the entries and matches fields. If you want to make use of this API we suggest using L since simple searching is implemented using an SQL where clause, which is much more efficient (you do have to set up the column manually though). Calling the C method with a hash reference as the only argument invokes the simple search functionality, returning a L with the results: my $stream = $dir->search({ name => "Homer Simpson" }); while ( my $block = $stream->next ) { foreach my $object ( @$block ) { # $object->name eq "Homer Simpson" } } This exact API is intentionally still underdefined. In the future it will be compatible with L 0.09's syntax. =head2 DBI SEARCH COLUMNS In order to make use of the simple search API we need to configure columns for our DBI backend. Let's create a 'name' column to search by: my $dir = KiokuDB->connect( "dbi:SQLite:dbname=foo", columns => [ # specify extra columns for the 'entries' table # in the same format you pass to DBIC's add_columns name => { data_type => "varchar", is_nullable => 1, # probably important }, ], ); You can either alter the schema manually, or use C to back up your data, delete the database, connect with C<< create => 1 >> and then use C. To populate this column we'll need to load Homer and update him: { my $s = $dir->new_scope; $dir->update( $dir->lookup( $homer_id ) ); } And this is what it looks in the database: id = 201C5B55-E759-492F-8F20-A529C7C02C8B name = Homer Simpson =head1 GETTING STARTED WITH BDB The most mature backend for L is L. It performs very well, and supports many features, like L integration to provide customized indexing of your objects and transactions. L is newer and not as tested, but also supports transactions and L based queries. It performs quite well too, but isn't as fast as L. =head2 Installing L L needs the L module, and a recent version of Berkeley DB itself, which can be found here: L. BerkeleyDB (the library) normally installs into C, while L (the module) looks for it in C, so adding a symbolic link should make installation easy. Once you have L installed, L should install without problem and you can use it with L. =head2 Using L To use the BDB backend we must first create the storage. To do this the C flag must be passed: my $backend = KiokuDB::Backend::BDB->new( manager => { home => Path::Class::Dir->new(qw(path to storage)), create => 1, }, ); The BDB backend uses L to do a lot of the L gruntwork. The L object will be instantiated using the arguments provided in the C attribute. Now that the storage is created we can make use of this backend, much like before: my $dir = KiokuDB->new( backend => $backend ); Subsequent opens will not require the C argument to be true, but it doesn't hurt. This C call is equivalent to the above: my $dir = KiokuDB->connect( "bdb:dir=path/to/storage", create => 1 ); =head1 TRANSACTIONS Some backends (ones which do the L role) can be used with transactions. If you are familiar with L this should be very familiar: $dir->txn_do(sub { $dir->store($obj); }); This will create a L level transaction, and all changes to the database are committed if the block was executed cleanly. If any error occurred the transaction will be rolled back, and the changes will not be visible to subsequent reads. Note that L does B touch live instances, so if you do something like $dir->txn_do(sub { my $scope = $dir->new_scope; $obj->name("Dancing Hippy"); $dir->store($obj); die "an error"; }); the C attribute is B rolled back, it is simply the C operation that gets reverted. Transactions will nest properly, and with most backends they generally increase write performance as well. =head1 QUERIES L is a subclass of L that provides L integration. L is a framework to index and query objects, inspired by Postgres' internal GIN api. GIN stands for Generalized Inverted Indexes. Using L arbitrary search keys can be indexed for your objects, and these objects can then be looked up using queries. For instance, one of the pre canned searches L supports out of the box is class indexing. Let's use L to do custom indexing of our objects: my $dir = KiokuDB->new( backend => KiokuDB::Backend::BDB::GIN->new( extract => Search::GIN::Extract::Callback->new( extract => sub { my ( $obj, $extractor, @args ) = @_; if ( $obj->isa("Person") ) { return { type => "user", name => $obj->name, }; } return; }, ), ), ); $dir->store( @random_objects ); To look up the objects, we use the a manual key lookup query: my $query = Search::GIN::Query::Manual->new( values => { type => "person", }, ); my $stream = $dir->search($query); The result is L object that represents the search results. It can be iterated as follows: while ( my $block = $stream->next ) { foreach my $person ( @$block ) { print "found a person: ", $person->name; } } Or even more simply, if you don't mind loading the whole resultset into memory: my @people = $stream->all; L is very much in its infancy, and is very under documented. However it does work for simple searches such as this and contains pre canned solutions like L. In short, it works today, but watch this space for new developments.