Main Page | About | Help | FAQ | Special pages | Log in

[edit] or how I learned to stop freaking out and love the data deluge

This document is a chatty, blog-like entry about some of the stuff I went through, analyzing some of the Forney lab data. Many of the best things I've learned were from looking over someone's shoulder while they did their voodoo. My intention is that this lets you look over my shoulder while I'm using the IBEST systems to actually analyze some data.

The scenario. the Forney lab gets a ton of 454 sequence data. A BIG TON. More like a tonne. I want to learn how to analyze just this sort of data. I get a copy of the data. What do I do now?

My rough plan of attack is:

Tip on getting started: your biggest problem will be keeping track of all the files, and remembering what you have done to each. I recommend having a README file in every directory, documenting what you've done in that directory. Also, it really helps to know some basic bash scripting tricks (like for ... do ... done). My howto page has some tips.

For what it's worth, I also am building a howto page, with lots of tricks and tips.

Here are shortcuts to:

Note: I am beginning to think "we" really need to put all the data in a real database first, so that we can pull out individual items (like 14 character name, 10 character equivalent, sequences, alignments, quality scores, etc) and never lose the relationships between the items. This would also make it easier to do on-the-fly analysis between arbitrary subgroups of data. This would also make it easier to archive a project when one is finished. Perhaps I'll have time to work on this someday.

(last major update james 14:30, 17 June 2009 (PDT))

Retrieved from "http://www.ibest.uidaho.edu/wiki/index.php/Experiment_in_brain_dumping"

This page has been accessed 507 times. This page was last modified 21:30, 17 June 2009.