This site will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device.
Contributors: Celeste J. Brown, Audra K. Johnson, James A. Foster, Larry J. Forney
The high throughput sequence analysis (HiTSA) program identifies and groups closely related sequences. First, the HiSTA program culls incorrect and low quality sequences, then compares valid sequences to those in databases using BLAST. It then aligns valid sequences and their best matches using ClustalW, and clusters the sequences by similarity using the neighbor-joining algorithm. The HiTSA program uses freely available software and databases, and runs under UNIX on either single processor or cluster computers.
The parameter file makes the pipeline adaptable and flexible. The user can specify their own database and reference sequences, as well as other choices.
Bad sequences removed based upon percent Ns and direction; Amplification primers removed from good sequences, and orientation corrected if necessary.
The sequence database is searched by BLAST to identify sequences that are similar to the good sequences.
Summary of BLAST search.
The best match from BLAST for each of the good sequences.
The reference sequences form a phylogenetic framework, allowing different communities to be compared.
Contains:
Defines the common start and end points of sequences.
Can be fed into Statgen for statistical analysis or used for other purposes.
We are currently working on putting HiTSA up on SourceForge.