This site will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device.


IBEST : Tools : HiTSA





High Throughput Sequences Analysis

Contributors: Celeste J. Brown, Audra K. Johnson, James A. Foster, Larry J. Forney

The high throughput sequence analysis (HiTSA) program identifies and groups closely related sequences. First, the HiSTA program culls incorrect and low quality sequences, then compares valid sequences to those in databases using BLAST. It then aligns valid sequences and their best matches using ClustalW, and clusters the sequences by similarity using the neighbor-joining algorithm. The HiTSA program uses freely available software and databases, and runs under UNIX on either single processor or cluster computers.

  • Parameter File
  • The parameter file makes the pipeline adaptable and flexible. The user can specify their own database and reference sequences, as well as other choices.

  • Bad Sequences and Good Sequences
  • Bad sequences removed based upon percent Ns and direction; Amplification primers removed from good sequences, and orientation corrected if necessary.

  • Sequence Database
  • The sequence database is searched by BLAST to identify sequences that are similar to the good sequences.

  • Summary Table
  • Summary of BLAST search.

  • Best Match Sequences
  • The best match from BLAST for each of the good sequences.

  • Reference Sequences
  • The reference sequences form a phylogenetic framework, allowing different communities to be compared.

  • Aligned Sequences
  • Contains:

    • good original sequences
    • their best BLAST matches
    • reference sequences.

  • Alignment Points
  • Defines the common start and end points of sequences.

  • Distance Matrix and Newick Tree
  • Can be fed into Statgen for statistical analysis or used for other purposes.

    We are currently working on putting HiTSA up on SourceForge.