Software
- Next-Generation Sequence Alignment Software
- Computational Gene Finding
- Genome Assembly and Large-Scale Genome alignment
- Sequence Analysis Tools
- Variant Analysis Tools
- Webservers and Databases
Next-generation sequence alignment software
|
||
| Bowtie |
Bowtie2 released in October 2011 An ultrafast, memory-efficient short read aligner that aligns short DNA sequences to the human genome at a rate of about 25 million reads per hour on a typical workstation with 2 GB of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.1 GB for the human genome. | |
| TopHat |
A spliced alignment system for RNA-seq experiments. TopHat finds known and novel exon-exon splice junctions and is extremely fast due to its use of the Bowtie2 aligner. The latest release, TopHat2, runs with either Bowtie1 or Bowtie2 and includes new algorithms that significant enhance TopHat's sensitivity, particularly in the presence of pseudogenes. TopHat2 includes TopHat-Fusion as an option. | |
| TopHat-Fusion |
TopHat-Fusion is an enhanced version of TopHat with the ability to align reads across fusion points, which results from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome. | |
| Cufflinks |
A transcript assembler and abundance estimator for RNA-seq data. Cufflinks assembles transcripts from the alignments produced by TopHat, including novel isoforms, and quantitates those transcripts. | |
| CloudBurst |
A program for highly sensitive short read mapping using MapReduce. CloudBurst, developed by Michael Schatz (now at Cold Spring Harbor Laboratory), uses Hadoop - an open source version of Google's parallel computing software MapReduce - to efficiently parallelize the short read mapping problem to dozens or hundreds of computers. This enables CloudBurst to execute highly sensitive read mappings with any number of mutations or indels. | |
| Crossbow |
Crossbow is a scalable software pipeline for whole genome resequencing analysis. It combines Bowtie, an ultrafast and memory efficient short read aligner, and SoapSNP, an accurate genotyper, within Hadoop to distribute and accelerate the computation with many nodes. In the CrossBow paper, we used it to analyze 35x coverage of a human genome in 3 hours for about $100 using a 40-node, 320-core cluster rented from Amazon's EC2 utility computing service. | |
| EDGE-pro | EDGE-pro is a program for estimating gene expression from bacterial RNA-seq. EDGE-pro uses Bowtie2 for alignment but, unlike TopHat and Cufflinks, does not allow spliced alignments. It also handles overlapping genes, a common phenomenon in bacteria that is largely absent in eukaryotes. | |
Computational Gene Finding
|
||
| A system that uses interpolated Markov models to find genes in microbial DNA. Used to annotate hundreds (possibly thousands) of bacterial, archaeal, and viral genomes. Current version is 3.02. | ||
| GlimmerHMM |
A Generalized Hidden Markov Model gene-finder which makes use of the techniques implemented previously by GlimmerM. | |
| GeneSplicer |
A fast system for detecting splice sites in genomic DNA of various eukaryotes. | |
| SIM4CC | An accurate and efficient program to align cDNA sequences (mRNAs, ESTs) to genomic sequences, specifically designed for cross-species alignment. | |
| sim4db / leaff | Fast high-throughput spliced alignment (sim4, sim4cc) and sequence indexing. | |
Genome assembly and large-scale genome alignment |
||
| a system for aligning whole genomes, chromosomes, and other very long DNA sequences. New (May 2008): see how to use MUMmer to align Solexa reads to the human genome. | ||
| High throughput sequence alignment using Graphics Processing Units (GPUs). Uses a technique called general-purpose GPU programming (GPGPU programming) to harness the extreme parallelism of GPUs for non-graphics tasks. In this application, hundreds of query sequences are simultaneously aligned to a reference sequence, creating an order of magnitude speed up over the same alignment on the CPU. | ||
| AMOS Assembler project | This is a set of tools, libraries, and freestanding genome assemblers, all open source. AMOS is also an open consortium that includes TIGR, the University of Maryland, The Karolinska Institutet, and the Marine Biological Laboratory. | |
| ABBA |
Assembly Boosted By Amino
acid sequence is a comparative gene assembler, which uses amino acid
sequences from predicted proteins to help build a better
assembly. See the
journal paper. Link for
installation and more information..
|
|
| AMOScmp |
is a comparative genome assembler, which uses
one genome as a reference on which to assemble another, closely related
species. See the journal paper
here.
|
|
| MINIMUS |
A small, lightweight assembler for small jobs such as assembling a viral genome, assembling a set of reads that match a single gene, or other tasks that don't require the complex infrastructure of a large-genome assembler. | |
| Hawkeye |
A visual analytics tool for genome assembly
analysis and validation, designed to aid in identifying and correcting
assembly errors. All levels of the assembly data hierarchy are made
accessible to users, along with summary statistics and common assembly
metrics. A ranking component guides investigation towards likely
mis-assemblies or interesting features to support the task at hand. Can
be used to interactively analyze assemblies from many popular
assemblers on your desktop computer. See the journal paper
here.
|
|
| AutoEditor | A tool for correcting sequencing and basecaller errors using sequence assembly and chromatogram data. On average AutoEditor corrects 80% of erroneous base calls, with an accuracy of 99.99%. | |
| Celera Assembler |
A whole genome assembler
originally developed at Celera Genomics for the assembly of the human
genome. CeleraAssembler is now an open-source project at
SourceForge. The code is actively maintained by researchers at
CBCB and the
Venter Institute (formerly
known as TIGR, The Institute for Genomic Research). |
|
| Quake | A software package to detect and correct substitution sequencing errors in WGS data sets with deep coverage. | |
| FLASH | A fast, accurate system to increase the length of reads by overlapping and merging mate pairs from fragments shorter than twice the read length. | |
Other sequence analysis tools |
||
| BRCA gene testing |
a computational screening test that takes the raw DNA sequence data from a whole-genome sequence of an individual human and tests for each of 68 known mutations in the BRCA1 and BRCA2 genes.
|
|
| rddChecker | A program for determining sites of RNA-DNA differences (RDDs) and candidate RNA editing sites from RNA-seq data. | |
| ELPH | A motif finder based on Gibbs sampling that can find ribosome binding sites, exon splicing enhancers, or regulatory sites. | |
| Insignia | A comprehensive system for finding unique DNA sequences that can be used to identify any bacterial or virus species or strain. Currently has over 13,000 species and strains in its database.. | |
| A highly accurate program that finds rho-independent transcription terminators in bacterial genomes. The site includes a database with pre-computed predictions for hundreds of species. | ||
| Software and a database of operons covering a large
number of prokaryotic genomes. Described in M.
Pertea et al., Nucl. Acids Res 37 (2009), D479-D482. |
||
| SEE ESE | an online tool for identifying exon splicing enhancers (ESEs) in Arabidopsis and Drosophila. | |
| RepeatFinder | an older system for finding and characterizing repetitive sequences in complete and partial genomes. | |
| PhymmBL | A one-stop system for taxonomically classifying metagenomic short reads. | |
| Scimm | A tool for unsupervised clustering of metagenomic sequences using interpolated Markov models. | |
Variant Analysis Tools |
||
| CHASM and SNVBox | Software to predict the functional sigificance of somatic missense mutations observed in the genomes of cancer cells, and a database of pre-computed features of all possible amino acid substitutions at every position of the annotated human exome. | |
| CRAVAT | Cancer-related analysis of variants toolkit. Web tool for functional predictions and annotations of both somatic and germline variants. | |
| muPIT | Web tool for interactive structural annotations and visualizations of non-synonymous variation/mutation on proeins. | |
| LS-SNP/PDB | Web tool for structural annotations and visualizations of missense variants in dbSNP. | |
Other web servers and databases |
||
| ARDB | New in early 2009 Antibiotic Resistance Genes Database | |
| Web servers for displaying alignments and annotations of bacterial genomes. | ||
| A collection of links to external sequence analysis programs. | ||

