LVB Manual – LVB phylogeny program, version 3.0 Beta
CONTENTS
COPYRIGHT
Part of this document is based on PHYLIP documentation (see ACKNOWLEDGEMENTS).
The PHYLIP component of this document:
© Copyright 1986-2000 by the University of Washington. Permission is granted to copy thisdocument provided that no fee is charged for it and that this copyright notice is not removed.
The remainder of this document:
© Copyright 2003-2012 by Daniel Barker.
© Copyright 2013 by Daniel Barker and Maximilian Strobl.
Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.
lvb seeks parsimonious trees from an aligned nucleotide data matrix. It uses heuristic searches consisting of simulated annealing followed by hill-climbing. In contrast to the more usualheuristic searches used to find parsimonious trees (e.g. stepwise addition followed by hill-climbing), simulated annealing can ‘jump out’ of local optima. Especially with large, complex data matrices, the simulated annealing heuristic may run faster and/orfind a shorter tree.
CITING LVB
Please cite the following paper if you use LVB:
Barker, D. 2004. LVB: Parsimony and simulated annealing in the search for phylogenetic trees. Bioinformatics, 20, 274-275.
The following may also be relevant:
LVB.https://eggg.st-andrews.ac.uk/lvb.
Barker, D. 1999. Simulated annealing in the Search for Phylogenetic Trees. PhD Thesis,University of Edinburgh.
Barker, D. 1997.LVB1.0: Reconstructing Evolution with Parsimony and Simulated Annealing (Edinburgh: Daniel Barker).
RUNNING LVB
lvb is a command-line program.
lvb reads the alignment file from the current directory (folder) and writes its main output to a file in the current directory. The user is prompted for the matrix format, the interpretation of gaps in the alignment, the type of simulated annealing heuristic searches to run (with a sensible default), the seed for the pseudorandom number generator (with a sensible default), and whether bootstrap replicates are required (by default, no). Answers are entered using the keyboard.
lvb logs progress information and errors to the screen.
MacOS X
The Apple Mac OS X version of LVB runs under OS X 10.7 (Lion) on 64-bit Intel-basedhardware. It is also expected to run on more recent versions of OS X.
After downloading, extractlvb
from the file lvb_3_0_BETA_macos.tar.gz
. Once this is done, you may launch it from the Terminal command-line. Terminal is usually found in theApplications/Utilities
folder. If lvb is on your desktop, you may launch it by typing the following commands in Terminal:
cd Desktop
./lvb
If lvb
is in a directory in your PATH
environment variable, it should be accessible in Terminal fromany location, as lvb
.
Raspberry Pi
The Raspberry Pi version of LVB runs under Raspbian Linux for Raspberry Pi.
After downloading, extractlvb
from the file lvb_3_0_BETA_raspi.tar.gz
. Once this is done, you may launch it from a terminal command-line. A suitable terminal in Raspbian is LXTerminal, usually found in the Menu under Accessories. If lvb is on your desktop, you may launch it by typing the following commands in LXTerminal:
cd Desktop
./lvb
If lvb
is in a directory in your PATH
environment variable, it should be accessible in Terminal fromany location, as lvb
.
Other Linux and UNIX
After downloading, compilelvb from the source code (see COMPILINGLVB). Once this is done, it may be launched as for Mac OS X or Raspberry Pi. The only difference may be in the mechanism to start a terminal window or remote connection.
Other Systems
It should be possible to compile and run lvb on Windows and many other operating systems, if you have a C compiler. The details will vary, but to help you get started see COMPILING LVB.
INPUT
Keyboard (standard input)
Keyboard input is case-independent. So, for example, where the instructions belowsuggest you type I
, typing i
will have the same effect.
Matrix format
lvb can read matrices in PHYLIP 3.6 interleaved or PHYLIP 3.6 sequential format. These are described in the section on infile.
When prompted for the data matrix format, type I
or S
followed by RETURN
for ‘interleaved’ or ‘sequential’, respectively.
Treatment of gaps
See the the table under Bases for a list of base codes allowed by lvb.
A gap represented by theletter ‘O
‘in the data matrix is always treated as a character state in its own right (fifth state). lvb can treat gaps represented by ‘-
‘ in either of the following ways:
Fifth state
‘
-
‘is treated as equivalent to ‘O
‘.
‘
-
‘is treated as equivalent to ‘?
‘,i.e., as an ambiguous site that may contain ‘A
‘or ‘C
‘or ‘G
‘or ‘T
‘or ‘O
‘.
When prompted for the treatment of ‘-
‘,type U
or F
followed by RETURN
for ‘unknown’ or ‘fifth state’, respectively.’
Fifth state’ may give excessive weight to multi-site gaps, since each affected base position will be counted as one event.
Cooling schedule
When prompted for the cooling schedule, press RETURN
for the default or enter G
or L
for ‘geometric’ or ‘linear’, respectively.’
Geometric’ causes lvb to run rapidly and usually gives results of good quality. (In the simulated annealing heuristic search, the relation between one level of the ‘temperature’ and the next is set to exponential decay.) This is the default.
Linear’ causes lvb to run more slowly and may give results of even better quality.(The relation between one level of the ‘temperature’ and thenext is set to linear decrease.)
Random number seed
When prompted for the random number seed, press RETURN
for the default or enter an integer in the range 0 to 900000000inclusive.
The default value is taken from the system clock and hence will vary from one analysis to the next, changing every second. The default is usuallyappropriate.
Bootstrapping
When prompted for thenumber of bootstrap replicates, enter the number of replicatesrequired. If bootstrapping is not required, enter the number 0 or just press RETURN
.
lvb allows any number of replicates from 1 to 1000000 inclusive. For each replicate, a bootstrap sample of sites in the alignment is generated and analyzed.
For an alignment matrix of m sites, each bootstrap replicate contains m sites, randomly sampled with replacement from the originals.Compared to the original alignment, it is likely that some sitesare left out, some are present once, and others are present twiceor more. In lvb the probability of including a site is equal for all sites, irrespective of whether the site varies or is constant.
The most parsimonious tree(s) for each replicate are output. There will be at least onetree for each replicate. If the search for any replicate foundmore than one equally parsimonious tree, all are output and thenumber of trees will exceed the number of replicates. Generationof a consensus from all trees will over-represent thosereplicates for which more trees were found. If each bootstrapreplicate finds a single tree, this is not an issue.
infile
The data matrix must be in a file called infile
. lvb expects this file to contain a single nucleotide matrix in PHYLIP 3.6 format.
Layout
The simplest type of datamatrix file looks something like this:
6 13
Archaeopt CGATGCTTAC CGC
HesperorniCGTTACTCGT TGT
BaluchitheTAATGTTAAT TGT
B. virginiTAATGTTCGT TGT
BrontosaurCAAAACCCAT CAT
B.subtilisGGCAGCCAAT CAC
The first line of the input file contains the number of sequencesand the number of characters (sites). These are in free format,separated by blanks. The information for each sequence follows,starting with a ten-character sequence name (which can include blanks and some punctuation marks), and continuing with the characters for that sequence.
The name should come right at the start of the line, without any preceding blanks or tabs. It should be ten characters in length, filled out to the full ten characters by trailing blanks if shorter. Any printable ASCII/ISOcharacter is allowed in the name, except for parentheses ‘(‘ and’)’, square brackets ‘[‘ and ‘]’, colon ‘:’, semicolon ‘;’ andcomma ‘,’. If you forget to extend the names to ten characters in length by blanks, an error message will result.
The biological characters(bases or gaps) are each a single ASCII character, sometimes separated by blanks.
The sequences can continue over multiple lines. When this is done the sequences must beeither in interleaved format or sequentialformat. In sequential format all of one sequence is given,possibly on multiple lines, before the next starts. Ininterleaved format the first part of the file should contain thefirst part of each of the sequences, then possibly a linecontaining nothing but a carriage-return character, then thesecond part of each sequence, and so on. Only the first parts ofthe sequences should be preceded by names. The name must be onthe same line as the first character of the data for thatsequence. Here is a hypothetical example of interleaved format:
5 42
Turkey AAGCTNGGGC ATTTCAGGGT
Salmo gairAAGCCTTGGC AGTGCAGGGT
H. SapiensACCGGTTGGC CGTTCAGGGT
Chimp AAACCCTTGC CGTTACGCTT
Gorilla AAACCCTTGC CGGTACGCTT
GAGCCCGGGC AATACAGGGT AT
GAGCCGTGGC CGGGCACGGT AT
ACAGGTTGGC CGTTCAGGGT AA
AAACCGAGGC CGGGACACTC AT
AAACCATTGC CGGTACGCTT AA
while in sequential format the same sequences would be:
5 42
Turkey AAGCTNGGGC ATTTCAGGGT
GAGCCCGGGC AATACAGGGT AT
Salmo gairAAGCCTTGGC AGTGCAGGGT
GAGCCGTGGC CGGGCACGGT AT
H. SapiensACCGGTTGGC CGTTCAGGGT
ACAGGTTGGC CGTTCAGGGT AA
Chimp AAACCCTTGC CGTTACGCTT
AAACCGAGGC CGGGACACTC AT
Gorilla AAACCCTTGC CGGTACGCTT
AAACCATTGC CGGTACGCTT AA
If each sequence only occupies one line in the matrix file, there is no difference between sequential and interleaved format and lvb can read the file in either way. Other thanthis special case, it is important not to read an interleavedmatrix as sequential or a sequential matrix as interleaved. A BADBASE
error message often indicates that thewrong format has been specified.Note that a portion of a sequence like this:
300 AAGCGTGAAC GTTGTACTAA TRCAG
is perfectly legal, assuming that the sequence name has gone before and is filled out to full length by blanks. The above digits and blanks will be ignored, the sequence being taken as starting at the first base symbol (in this case an A). This should enable you to use output from many multiple-sequence alignment programs with only minimal editing.
lvb may have difficulties with spaces at the end of lines. The symptoms ofthis problem are that lvb complains about a BADBASE
, and you can find no other cause forthis complaint. The problem may be avoided by deleting any spaces at the end of lines.
In interleaved format the present version of lvb may sometimes have difficulties with the blank lines between groups of lines, and if so you might want to retype those lines, making sure that they have only a carriage-return and no blank characters on them, or you may perhaps have to eliminate them. The symptoms of thisproblem are that lvb complains that thesequences are not properly aligned, and you can find no othercause for this complaint.
Bases
The sequences may containA’s, G’s, C’s and T’s (or U’s, which lvb treatsas equivalent to T’s). Each ASCII character in the sequence mustbe one of the letters A
,B
,C
,D
,G
,H
,K
,M
,N
,O
,R
,S
,T
,U
,V
,W
,X
,Y
,?
,or -
(a period is not allowed, because it is used in different sensesin different programs). Blanks will be ignored, and so will numerical digits.
These characters can be either upper or lower case, because the algorithms convert all input characters to upper case (which is how they are treated).The characters constitute the IUPAC (IUB) nucleic acid code plussome slight extensions. They enable input of nucleic acid sequences taking full account of any ambiguities in the sequence.
For further information on’-
‘,See Treatment of gaps.
Symbol: Meaning:
A Adenine
G Guanine
C Cytosine
T Thymine
U Uracil (treated as T by lvb)
Y pYrimidine (C or T)
R puRine (A or G)
W 'Weak' (A or T)
S 'Strong' (C or G)
K 'Keto' (T or G)
M 'aMino' (C or A)
B not A (C or G or T)
D not C (A or G or T)
H not G (A or C or T)
V not T (A or C or G)
N aNy base (A or C or G or T)
X any base (A or C or G or T)
? unknown (A or C or G or T or O)
O gap
- gap (O; alternatively, A or C or G or T or O)
OUTPUT
Screen (standard output)
lvb logs its version, details of the analysis, indication of progress and any errors encountered to the standard output, which is usually the screen.
Without bootstrapping, the arrangement number (iteration) of the search and current tree length is logged every 50000 trees. During simulated annealing, the tree length can go up as well as down. LVB keeps and outputs the shortest treesencountered during its search. The length of this tree or trees is logged to the screen near end of the analysis.
With bootstrapping, the replicate number is logged, along with the number of rearrangements tries, the number of trees found and length of trees found for that replicate.
outtree
Without bootstrapping, the file outtree
contains the most parsimonious tree or trees found.
With bootstrapping, outtree
contains the most parsimonious tree or trees found for each replicate. Results for the replicates are given in order so, for example, if 40 trees were found for the first replicate, these are the first 40 trees in outtree
.
Trees use a subset of the ‘Newick standard’ tree format. This is accepted by many otherprograms.
Trees may be converted to graphics files using the drawtree
program of the PHYLIP package. They may also be viewed and printed using Mesquite.
Without bootstrapping, if more than one equally parsimonious tree is found, these may be combined in various ways using consense
in the PHYLIP package. With bootstrapping, consense
is useful to generate the majority rule consensus tree.
Output trees are unrooted and branch lengths are not given. Trees may be rooted with the retree
program of the PHYLIP package. Trees may also be rooted andbranch lengths (under various models of character state change)may be obtained by importing the tree and data matrix intoMesquite.
COMPILINGLVB
lvb is available at the LVB Web page as ready-to-run software for AppleMac OS X and for Raspbian Linux on the Raspberry Pi.
For other platforms, or if you wish to modify the source code, you will have to compile lvb. It is written in ANSI C and is expected to compile and run on a variety of operating systems.
Assuming your system isUNIX-like, uses GNU make
and has Perl installed, follow the instructions below. If usingnon-UNIX-like system such as Windows, the instructions below willrequire adjustment.
Unpackingthe source code
Assuminglvb_3_0_BETA_source.tar.gz
is in the current directory, enter the following commands:
tar xzvf lvb_3_0_BETA_source.tar.gz
This gives you a main directory lvb_3_0_BETA
with two subdirectories, LVB_MAIN
and PHYLIP_FOR_LVB
.
Compileroptions
By default, LVB is builtusing compiler options which make sense for GNU C (gcc). To useother compiler options, edit the file LVB_MAIN/Makefile
before compiling.
Compilation
Now, assuming you begin int he lvb_3_0_BETA
directory, the following sequence of commands will build lvb and test it:
cd LVB_MAIN
make
make test
Results of the above commands are:
- A report on the tests, which is sent to the screen. All tests should pass. Any failure may indicate that lvb won’t work properly on your system.
- A stand-alone executable file,
lvb
. This is all that is required to run the program. - Internal documentation of the LVB program, consisting of HTML files in the directory
docs_programmer
(see below).
After changing the source code or Makefile
,it is safer to always make again from scratch.
Documentation
The main documentation(i.e. this file) is lvb_manual.htm
in the LVB_MAIN
directory.
Internal documentation will b e of interest to people who wish to modify or re-use the source code of LVB. During a successful build, documentation ind ocs_programmer/
is automatically extracted from POD-format comments within theLVB source code. The internal documentation is incomplete and out of date.
Documentation of PHYLIP code within LVB is given separately, inPHYLIP_FOR_LVB/README_phylip_code_in_lvb.rtf
. This PHYLIP code should not be used to build PHYLIP itself, as it contains modifications specifically for LVB. PHYLIP proper may bebuilt by downloading its source code from the PHYLIP Web page.
SUPPORTAND REGISTRATION
Please send questions and bug reports to:
To be placed on an emaillist to receive information on new versions, please
email [email protected] with subject ‘Register as LVB user’.
ACKNOWLEDGEMENTS
lvb contains portions of PHYLIP 3.6a. This allows lvb to read PHYLIP-format matrix files. Also, most of the abovedocumentation for infile is taken from thePHYLIP 3.6a manual. I wish to thank Joe Felsenstein for makingPHYLIP freely available, and for advising on how to re-use it in lvb.
SEEALSO
https://eggg.st-andrews.ac.uk/lvb