=head1 NAME DBIx::DataModel::Doc::Design - Architecture and design principles =head1 DOCUMENTATION CONTEXT This chapter is part of the C manual. =over =item * L =item * DESIGN =item * L =item * L =item * L =item * L =item * L =back This chapter covers the basic architecture of C, and the main motivating principles for proposing yet another ORM. Read it if you are currently evaluating whether C is suitable for your context, or if you want to globally understand how it works. Skip it and jump to the L chapter if you want to directly start using the framework. =head1 GENERAL ARCHITECTURE =head2 Classes The following picture shows the class hierarchy : FRAMEWORK CLASSES ================= +-----------------+ +----------------------------+ | DBIx::DataModel | | DBIx::DataModel::Statement | +-----------------+ +----------------------------+ +-----------------------+ | DBIx::DataModel::Base | +-----------------------+ / \ / \ +-------------------------+ +-------------------------+ | DBIx::DataModel::Schema | | DBIx::DataModel::Source | +-------------------------+ +-------------------------+ | / \ | / \ | +------------------------+ +-----------------------+ | | DBIx::DataModel::Table | | DBIx::DataModel::View | | +------------------------+ +-----------------------+ | | / ====|======================|=============================/======== | | APPLICATION CLASSES / | | =================== / | | / +----------+ +-------------------+ / | MySchema | | MySchema::Table_n |-+ / +----------+ +--+----------------+ |-+ / +--+-------\-------+ | / +--------\--\-----+ / \ \ \ / +---------------------+ | auto_generated_view +-+ +--+------------------+ |-+ +--+-----------------+ | +-------------------+ The top half of the picture represents the parent classes distributed with C. The bottom half represents derived classes created for a given application. Most objects created during the lifetime of the application will be either instances of those application-specific classes (tables and views), or instances of the C class. The entry class L is just a façade interface to L. The helper class L implements the L method. Subclasses of L are created by the L method in C; in most cases only one such class will be needed, unless the application talks to several databases simultaneously. Subclasses of L represent tables in the database and are created by the L method in L. Subclasses of L represent specific SQL queries, in particular queries that join several tables. They may be created explicitly by calling the L method in L; but in most cases they will be indirectly created through calls to the L method. C subclasses use multiple inheritance : they inherit first from L, but also from the supplied list of I. As a result, instances of such views can exploit all role methods of their parent tables. =head2 Instances Data rows retrieved from the database are encapsulated as instances of the application-specific C and C subclasses. Methods in those objects are either various ways to navigate through the associations in the database and retrieve related rows, or methods to modify the data. A request to the database is encapsulated as an instance of L. This instance has methods for preparing the SQL query, binding parameters to it, executing the query, and getting at the resulting data rows. Statement instances are usually short-lived and confined to specific internal parts of the application, while table or view instances are usually transmitted to the presentation layers of the application, in order to exploit the data within reports, forms, etc. Data rows know from which source they were created, because they are blessed into table or view classses; but they do not know from which statement they were queried. In contrast with some other ORMs, C subclasses have no runtime instances : all information is within the schema subclass. This design is discussed below in the L section. =head2 Polymorphic methods Methods C and C in order to get at data rows. =head3 Polymorphic C method, when applied to a table or a view, is a class method that generates a statement object, and returns either that object, or something generated by that object (data rows, SQL code, or a low-level C handle). Users can control the return value through the C<-resultAs> parameter. When applied to an already existing statement object, C is meant to start from some kind of data source, and yield either immediate data rows or some intermediate object that later will produce data rows. =head1 STATEMENT OBJECTS The following section is about I and their role in the general C architecture. =head2 Difference between views and statements Both views and statements encapsulate SQL SELECT queries, so some clarification is of order. A view is a I of C, and therefore also a subclass of C. Data rows retrieved from that source become instances of the view. The view usually encapsulates a database join, and the table classes corresponding to the joined tables are also parent classes for the view (multiple inheritance). This means that instances of the view inherit all parent methods for manipulating columns, navigating through associations, etc. The view may include a WHERE clause to restrict the database query, but this is very unfrequent : the main purpose of a view is to encapsulate a join. By contrast, a statement is an I of C, most frequently without any subclassing, that represents a particular request to a particular data source (a datasource is a subclass of C, i.e. either a table or a view), and technically encapsulates a reference to a C statement handle (sth). The statement object goes through a I with following steps : assembling query clauses, generating the SQL, binding values, preparing the database statement, executing the database query, retrieving the results and blessing them into appropriate classes. Statements also have pagination methods to walk through the results in chunks of several data rows. With respect to the generated SQL, we could say in short that a view represents the FROM clause of the SQL, while a statement represents all other clauses (list of columns, WHERE, ORDER BY, GROUP BY, etc.). =head2 Stepwise building of the SQL query =head3 Principle A statement object can accumulate requirements in several steps, before generating the actual database query. Therefore it is a collaborative platform where various independent software components can contribute to various parts of the final SQL. =head3 Example This is useful for example when doing something like # create a statement with initial conditions on the department my $statement = $department->join(qw/activities employee/); # add a date condition (from config file or CGI params or whatever) my $date = get_initial_date_from_some_external_source(); $statement->refine(-where => {d_begin => {">" => $date}}); # now issue the SQL query my $rows = $statement->select(-columns => [qw/d_begin lastname firstname/]); This code generates the following SQL : SELECT d_begin, lastname, firstname FROM activity INNER JOIN employee ON activity.emp_id=employee.emp_id WHERE dpt_id = $departement->{dpt_id} AND d_begin > $date Behind the scene, the C method first created a view representing the database join between C and C; then it created a statement object that would query that view with an initial condition on C. The C call added a second condition on C. Finally the C
and C subclasses, all information in C is set by "compile-time methods" (methods starting with an uppercase letter) and then stays immutable : joins, primary keys, etc.; so there is no potential conflict. =head2 Why no accessor methods for columns ? The philosophy of C is that a record is nothing more than a blessed hashref, where hash keys are column names and hash values are column values. So the recommended way of accessing the data is through the hashref API : this allows you to exploit all common Perl idioms, like my @column_names = keys @$row; # inspect hash keys s/^\s+// foreach values @$row; # remove leading spaces in all columns print @{$row}{qw/col1 col2 col3/}; # print a slice ($row->{col1}, $row->{col2}) = ($row->{col2}, $row->{col1}); # swap values @{$row}{qw/col1 col2/} = @{$row}{qw/col2 col1/}; # idem Now if you insist, there is the L method which will give you column accessors. As the name suggests, this relies on Perl's AUTOLOAD mechanism, and therefore will be a bit slower than generating all accessors explicitly at compile time (through L or something similar). Pre-compiling accessor methods is not possible in C, because column names are never known in advance : two instances of the same Table do not necessarily hold the same set of columns, depending on what was requested when doing the L. =head2 Serialization C includes support for the standard L serialization / deserialization methods C and C : so records and record trees can be written into files or sent to other processes. Dynamic subclasses for database joins are re-created on the fly during deserialization through C. However, there is no support for serializing database connections (this would be hazardous, and also insecure because serialization data would contain database passwords). Therefore the process performing deserialization is responsible for opening the database connection by its own means, before calling the C method. =head1 TO DO Here are some points that hopefully will be improved in a future release C : - 'hasInvalidColumns' : should be called automatically before insert/update ? - 'validate' record handler (not only column handlers) - 'normalize' handler : for ex. transform empty string into null - walk through WHERE queries and apply 'toDB' handler (not obvious!) - decide what to do with multiple inheritance of role methods in Views; use NEXT ? - maybe it is not a good idea to modify data in place when performing inserts or updates; should perhaps clone the arguments. - more extensive and more organized testing - add support for UPDATE/DELETE ... WHERE ... - add PKEYS keyword in -columns, will be automatically replaced by names of primary key columns of the touched tables - design API for easy dynamic association of objects without dealing with the keys - remove spouse example from doc (because can't have same table twice in roles) - quoting - DBI catalog and schema - dbiPrepareMethod as argument to select() - pre/post callbacks: support arrays of handlers, refine(..) adds to the array - refine(-orderBy => ..) should add to the ordering - warn about missing FK in object when trying to follow a join - reflection methods (list of roles, etc.) - update with subtrees (insert/update on dependent records. Quid: delete?) - auto-unjoin (API for partioning columns into subobjects).