Contents |
[edit] BCB 404 Special Topic: COMPUTING FOR BIOINFORMATICS
This course will teach basic computing skills for biologists and others who are interested in bioinformatics. It is a 4 week course that will have classroom time for two weeks at the start of Summer semester 2009 from 9am to 5pm (May 18- May 29). The last two weeks will allow students time to complete their projects. The format of the course will be a combination of lecture and exercises that reinforce the lecture content. We will start with an introduction to the IBEST Bioinformatics Core facilities. We will cover the basics of Unix operating systems and then cover shell and PERL scripting. The course will introduce several general bioinformatics computing tools, such as EMBOSS and Bio-PERL. In the second week, we will cover using the cluster computers for running programs and using the statistical package, R, for data analysis. Pass/Fail grades will be based upon completion of assigned exercises and a final project. The final project will preferably relate to the student’s research. Recommended text is Beginning Perl for Bioinformatics, by James Tisdall, O’Reilly and Associates, Inc.
Please contact Celeste Brown (celesteb@uidaho.edu) if you have any questions about the course.
[edit] Grading
Grades are based upon completion of assigned exercises and a final project.
Please email me the complete path to any of your programs that you wish to have evaluated, along with the path to any files that might be required to run the program. Be sure to include the computer that the program is running on (styx/acheron or fourtytwo).
Giving me the complete command (with pathways) so that all I have to do is cut and paste into the terminal would be best. Be sure that your files have the 755 ownership so I can run them.
For the final projects, this information is due by Friday, June 12. BUT if you want to give it to me sooner, that would be fabulous. This information for the exercises is also due by Friday, June 12, but I would prefer to get it during the first week of June.
For the R section of the class, fill out the sections in 03_r_graphics_questions.R, also complete both mini-projects. Put the three R files in a directory on Styx and email me (shunter at gmail dot com) the path to your files.
[edit] Schedule
| Topic | Lecture | Exercises |
|---|---|---|
| IBEST Core facilities | Mon AM | |
| Unix environment | Mon AM | Mon PM |
| Shell scripting | Tues AM | Tues AM |
| EMBOSS | Tues PM | Tues PM |
| Perl | Wed, Thurs, Fri AM | Wed, Thurs, Fri PM |
| Bio-Perl(Ruby, Python) modules | Fri AM | Fri PM |
| Cluster computing | Tues, Wed | Tues,Wed |
| R-language | Thurs, Fri | Thurs, Fri |
[edit] Day 1
[edit] Course Overview
THE MODEL - Bioinformatics: Writing Software for Genome Research
THE REALITY - See Schedule
[edit] Introduction to IBEST Computing Facilities
- Computational Facilities
- Solaris machines (on the way out)
- Linux servers (on the way out)
- Clusters
- The IBEST WIKI (information at your fingertips)
- The IBEST Online Support Center (help from Sys Admins)
- Bioinformatics Coordinator (help with running programs)
- Programs on the IBEST computers
[edit] Introduction to Unix/Linux/MacOSX
- Unix tutorial Tutorials 1-6, 8
- The file "science.txt" is in /mnt/home/celesteb/BCB404/
- A few really useful Unix commands
[edit] Exercises
- Use fastacmd to extract P52202 from the nr database
- BLAST P52202 against the nr database use the –m flag to get a table
- Use awk and sed to get just the gi numbers for the top 10 hits
- Use a “for” loop to get the top 10 hits from the nr database using fastacmd
- Concatenate these sequences
- Run muscle to align the sequences, output in clustalw format
- Open the aligned sequences in clustalx and check the alignment.
- Check the documentation for fastacmd. Was there an easier way to get the sequences?
- What would you do to get JUST the top 10 sp (swissprot) entries from your blast hits?
[edit] Day 2
[edit] BASH scripts
- Why learn scripting (bash and perl are both scripting languages)?
- First you need a text editor, like nano
[edit] EMBOSS
[edit] Day 3
[edit] Chapter 4 Sequences and Strings
- Protein sequence in /mnt/home/celesteb/BCB404/NM_021964fragment.pep
- Exercises 4.2, 4.3, 4.4, 4.6
[edit] Chapter 5 Motifs and Loops
- Exercises 5.1, 5.3, 5.7, 5.8
[edit] Day 4
[edit] Chapter 6 Subroutines and Bugs
- Exercises 6.1, 6.2, 6.7
[edit] Chapter 8 The Genetic Code
- Exercises 8.1, 8.2, 8.5
[edit] Chapter 9 Restriction Maps and Regular Expressions
- Exercises 9.1, 9.2, 9.4
[edit] Day 5
[edit] BioPerl Modules
- A helpful tutorial for getting started
- Get a copy of the bioperl chapter from my directory /mnt/home/celesteb/BCB404/examples/ch09.pdf
[edit] Exercises
- Use Bio::Seq to extract P52202 from the nr database
- BLAST P52202 against the swissprot database using the appropriate BioPerl module
- Extract the top ten sequences from the nr database
- Concatenate these sequences and run muscle using the system command
[edit] Day 6
[edit] Cluster Exercise
- Pass the fasta sequence for P52202 to a script that will:
- BLAST P52202 against the nr database using mpiblast
- Extract the top ten sequences from the nr database using fastacmd in a qsub command
- Concatenate these sequences and run clustalw-mpi using qsub
- Use RAxMLHPC to find a phylogenetic tree
To use Celeste's databases:
cd /my/db for i in `ls /mnt/home/celesteb/db/nr*` ; do ln -s $i `basename $i` ; done

