DAS Workshop 2010 ProServer Tutorial

Part 2

Andy Jenkinson, EMBL-EBI, 7th April 2010

Overview

Now that you have completed Part 1 of the tutorial and so are familiar with running and configuring ProServer, let us imagine you wish to run a DAS server from some annotations generated in your lab. The data are in a custom file format, for which you will now write your own SourceAdaptor plugin.

SourceAdaptor plugins

Each data source in ProServer is backed by an object that is a subclass of Bio::Das::ProServer::SourceAdaptor. Note that each data source gets its own instance, and thus the same class/module could be used independently by different data sources. In addition to subclassing Bio::Das::ProServer::SourceAdaptor, each adaptor must exist in the Bio::Das::ProServer::SourceAdaptor namespace (e.g. Bio::Das::ProServer::SourceAdaptor::myplugin):

package Bio::Das::ProServer::SourceAdaptor::myplugin;

use strict;
use base qw(Bio::Das::ProServer::SourceAdaptor);

# ... code ...

1;

Implementing commands

The contract of a SourceAdaptor is to provide the data for a DAS query in a data structure that the ProServer core can understand. In simplistic terms, this is done by implementing a single method for each DAS command. For example, most DAS sources implement the DAS features command. This is done by creating a build_features method in the adaptor. This method is passed a hashref containing the request arguments, and is expected to return an array of hashes. Each hash represents a single annotation. This pseudocode gives an example:

sub build_features {
  my ($self, $args) = @_;

  # e.g. /das/mysource/features?segment = X:1,100
  my $segment = $args->{'segment'}; # X
  my $start   = $args->{'start'};   # 1
  my $end     = $args->{'end'};     # 100

  return (
    {
      # feature 1 ...
    },
    {
      # feature 2 ...
    },
    # etc
  );
}

The 'init' method

The init method is called by ProServer when each data sources' adaptor object is created, and is useful for initialising your plugin and setting metadata. One of its uses is to tell ProServer which DAS commands your data source will support. ProServer needs this information before it will forward DAS requests to the plugin. You can declare support for the features command like this:

sub init {
  my $self = shift;
  $self->{'capabilities'} = {'features' => '1.1'};
}

Creating a new SourceAdaptor

Your goal is to build a plugin to serve annotations from a custom file. The example file exons.txt contains a few exons annotated onto human chromosome 5, and each field is separated by pipe ("|") characters. The data were taken from the Ensembl database.

Create the adaptor file

Using the code templates in the preceding section, create a new adaptor module in the Perl library path. You may find it easiest to do this within ProServer's "lib" directory:

cd ~/Bio-Das-ProServer
touch lib/Bio/Das/ProServer/SourceAdaptor/myplugin.pm

Fill the file with the template code using a text editor. You can copy and paste the below contents (click here to show/hide):

Read the file contents

You now have the shell of a new plugin, but so far it will not do very much. Your plugin will need to read the contents of the file line by line in order to extract its information. Here is some standard code to do that (click here to show/hide):

Create annotation hashrefs

Next we need to translate the contents of the file into DAS features, and select those that overlap the segment of query sequence (the segment, start and end parameters). Take a look at the file contents. Each line of our file an exon, along with the transcript and gene to which it belongs:

|5|ENSG00000153404|140373|190087|ENST00000398036|140373|157131|ENSE00001648483|156888|157131|

Here we have the following fields:

  1. chromosome
  2. gene ID
  3. gene start
  4. gene end
  5. transcript ID
  6. transcript start
  7. transcript end
  8. exon ID
  9. exon start
  10. exon end

Firstly, the chromosome field is our DAS segment ID. But each line actually contains three DAS annotations - one for the exon, one for the transcript, and one for the gene. The gene and transcript annotations are duplicated on different lines. The build_features method must return an array of hashrefs for each annotation that overlaps the query segment. Each hashref should contain the following information:

Now modify the plugin to do the following (click here to show/hide the code):

  1. parse each field
  2. check if the gene, transcript or exon overlap the query segment
  3. filter duplicate transcripts and genes
  4. create hashrefs for each annotation
  5. return an array of the selected hashrefs

Create an INI file

Your plugin is now complete. To test it you will need a configuration file. Save the following as eg/mysource.ini:

[mysource]
adaptor = myplugin
state   = on

Rebuild and run the server

Now rebuild ProServer, and run it with the new configuration:

./Build
eg/proserver -x -c eg/mysource.ini

And see if it works:

http://localhost:9000/das/mysource/features?segment=5:144942,155558