=pod =head1 NAME Tutorial - Hands-on tutorial for using Bio::NEXUS module. =head1 DESCRIPTION Tutorial to get started using Bio::NEXUS module. =head2 INTRODUCTION The NEXUS file format standard of Maddison, et al. (1997) is designed to represent sets of data, including character data (e.g., molecular sequence alignments, morphological character sets), trees, assumptions about models and methods, meta-information such as comments, and so on. L is an object-oriented Perl applications programming interface (API) for the NEXUS file format. Accordingly, Bio::NEXUS provides methods for managing character data, trees, assumptions, meta-information, and so on, via the NEXUS format. This tutorial provides a quick introduction to developing applications that carry out basic manipulations with data sets in NEXUS files, as well as importing data sets from (and exporting to) foreign data formats (using BioPerl and L). You may wish to continue reading, and to complete the tutorial exercises, *if* =over =item * you have a set of data (e.g., a sequence alignment) that you want to manipulate or analyze in some way (the data do not need to be in NEXUS format: we will show you how to import it) =item * you want to write your own scripts (applications) in order to achieve automation, flexibility and control =item * you know (or are willing to learn) how to program in Perl (one of the easiest computer languages) =back =head3 Structure of the tutorial This tutorial is organised into seven sections: =over =item * This introduction, explaining the content and requirements =item * A quick start to using Perl and L on your system =item * Exercises involving basic manipulations =item * Examples of converting to and from foreign file formats using BioPerl =item * An advanced example =item * A brief introduction to using some tools built with L (nexplot.pl, nextool.pl, and Nexplorer) =item * Information on where to go from here =back =head3 Requirements Bio::NEXUS naturally requires Perl, but it does not require any non-standard Perl modules. To carry out the tutorial exercises below, you must have a UN*X (or UN*X work-alike or Windoze) shell, an installation of Perl, and an installation of Bio::NEXUS. For the format conversion exercises, you also need an installation of BioPerl (see www.bioperl.org, or simply run the command "perl -MCPAN -e'install Bundle::Bio'"). For the advanced exercise, there are additional requirements as described below. If you have tried to install these things and they do not work, the most likely cause is that you do not have permission to do the default system-wide installation, or that you have not issued the correct commands for a custom user-specific installation. See the L guide for further information. =head3 Notation The following are the conventions used in this tutorial. =over =item * C is the command prompt for the shell running in your terminal window. =item * C is used for Perl codes and outputs produced in the shell. C is also used for the shell commands shown after the command prompt C. =item * I is used for shell commands, when shell commands are NOT shown after the command prompt. =back =head2 Getting started with your UN*X shell, Perl, and Bio::NEXUS Before getting started with Bio::NEXUS methods, begin by opening a terminal window and checking a few things using your shell (i.e., UNIX or UNIX-work-alike shell, or Windoze shell). =over =item * Check for Perl. If it isn't installed, have your system administrator install it. system$ perl -e 'print "hello!\n" ' hello! =item * Check for Bio::NEXUS. If it isn't installed, read the Bio::NEXUS installation document. system$ perl -MBio::NEXUS -e 'print "hello!\n" ' hello! =item * Check that nextool.pl and nexplot.pl are in your $PATH (if not, read the installation docs). system$ nextool.pl -h (this should result in a page of command-line options) system$ nexplot.pl -h (this should result in a page of command-line options) =item * Execute commands saved in a file system$ echo ' print "hello!\n"; ' > my_commands.pl system$ perl my_commands.pl hello! =item * Make an executable script (note where the semi-colon and >> symbol are used) system$ echo '#!/usr/bin/env perl' > my_script.pl system$ echo 'print "hello!\n"; ' >> my_script.pl system$ echo 'exit;' >> my_script.pl system$ chmod +x my_script.pl system$ cat my_script.pl #!/usr/bin/env perl print "hello!\n"; exit; system$ ./my_script.pl hello! =back =head2 Basic manipulations with trees, OTU sets, and characters =head3 Creating the example1.nex NEXUS file used in several exercises =over =item I For the first few exercises, we will use a sample NEXUS file with a taxa block, a characters block and a trees block. For this reason, please create a file named "example1.nex" from the following text: =item example1.nex #NEXUS BEGIN TAXA; DIMENSIONS ntax=4; TAXLABELS A B C D; END; BEGIN CHARACTERS; DIMENSIONS ntax=4 nchar=25 FORMAT DATATYPE=protein; MATRIX A IKKGANLFKTRCAQCHTVEKDGGNI B LKKGEKLFTTRCAQCHTLKEGEGNL C STKGAKLFETRCKQCHTVENGGGHV D LTKGAKLFTTRCAQCHTLEGDGGNI ; END; BEGIN TREES; TREE my_tree = (((A:1,B:1):1,D:0.5):1,C:2)root; END; Use the "cat" command to check the file (this should reproduce the text given above): system$ cat example1.nex =item I NEXUS files can have many different types of blocks. Each block has commands, e.g., the TAXA block has two possible commands, dimensions and taxlabels. Some further blocks and commands will be introduced in the examples below. A complete presentation of the NEXUS standard is given by Maddison, D.R., D.L. Swofford, and W.P. Maddison (1997), "NEXUS: an extendible file format for systematic information" (I 46: 590-621). =back =head3 Renaming some or all OTU names =over =item I We often have the need to rename OTUs systematically for purposes of compatibility. =item I