GS559: Introduction to Statistical and Computational Genomics (Winter 2011)
Instructors:
Jim Thomas, jht@uw.edu
Elhanan Borenstein, elbo@uw.edu
Schedule: Tues. Thurs, 3:30-4:50, Hitchcock 220. First class Jan. 4, last class Mar. 10.
News:
» The final exam will have two parts: The first will focus on the bioinformatics topics covered in class and the second on programming. You can use one cheat sheet for the first part. The second part will be open book (basically, any resource you want to use is ok). Both parts will comprise very simple and brief questions to account for the short time allowed for the exam.
» Problem Set 8 is posted.
» Problem Set 4 answers are posted.
» NOTE: Question 4 in Problem Set 7 is not a challenge question. It is worth 20 points (as is now indicated) which are part of the total 100 points.
» Problem Set 7 is posted.
» Problem Set 6 is posted.
» Per your requests, assignments are due by the start of the following Thursday class (rather than Tuesday class). It is still strongly recommended to start working on the problem sets as early as you can.
» You are welcome (and encouraged) to submit your assignments electronically (e.g., as a text or word document via email).
» Problem Set 5 is posted.
» Problem Set 4 Question 2 was not well-conceived, you can skip it.
» Problem Set 3 answers are posted. Suggestion - even if you think you got everything correct, read my answers carefully because I've added comments and other information that is helpful.
» Problem Set 4 is posted.
|
Assignments:
You are welcome to talk to classmates about principles for solving problems, but do NOT solve specific problems together. In many ways, the problem solving is where you will learn the most for this class, especially the programming.
TIP: Google your programming problem. For example, "python string search" will get you relevant information on how to search a string pretty easily. Try it for those cryptic error messages, too.
All problem sets are due by the start of class on
the date listed. Grades will come 80% from problem sets and 20% from one final exam. There will be no mid-term exams.
|
Test/Demo Files
The following files are used in some of the in-class exercises and demos.
ko.txt
reaction.txt
genome.txt
organisms
enzyme.txt
warandpeace.txt
crispian.txt
sonnet.txt
scores.txt
seq_names.txt
small.fasta (these are text files despite the .fasta extension)
large.fasta
speech.txt
matrix.txt
|
Lectures and Reading: |
# | Date | Lecture Topic | Programming Topic | Reading |
1 | 01/04 |
Overview of course. Introduction to sequence comparison. BLAST, alignment scoring | PDF,PP |
Introduction to Python. Interpreter, objects, types, variables, command line | PDF,PP |
[1, 2] |
2 | 01/06 |
Sequence alignment - dynamic programming | PDF,PP |
Strings | PDF,PP |
|
3 | 01/11 |
Sequence alignment | PDF,PP |
Numbers, lists, tuples | PDF,PP |
|
4 | 01/13 |
Sequence alignment - protein score matrices | PDF,PP |
File input-ouput, if-then-else | PDF,PP |
|
5 | 01/18 |
Sequence alignment - signficance of similarity scores | PDF,PP |
For loops | PDF,PP |
|
6 | 01/20 |
Signficance of similarity scores continued | PDF,PP |
While loops and review of programming | PDF,PP |
|
7 | 01/25 |
Whole genome alignments, Sequence trees - introduction | PDF,PP |
More on loops, Programming efficiently | PDF,PP |
|
8 | 01/27 |
Sequence trees - distance trees | PDF,PP |
Dictionaries (hash maps) | PDF,PP |
[3] |
9 | 02/01 |
Parsimony | PDF,PP |
Functions | PDF,PP |
|
10 | 02/03 |
Small parsimony | PDF,PP |
Functions as arguments, sorting | PDF,PP |
|
11 | 02/08 |
Gene ontology and functional enrichment | PDF,PP |
More on functions, modules | PDF,PP |
|
12 | 02/10 |
Gene set enrichment analysis | PDF,PP |
Recursion | PDF,PP |
[4] |
13 | 02/15 |
Gene expression: Clustring | PDF,PP |
Regular expressions | PDF,PP |
|
14 | 02/17 |
Gene expression: K-mean clustring | PDF,PP |
More regular expressions | PDF,PP |
|
15 | 02/22 |
Biological networks; Dijkstra algorithm | PDF,PP |
Classes and objects | PDF,PP |
|
16 | 02/24 |
Degree distribution and network motifs | PDF,PP |
More on classes and objects | PDF,PP |
|
17 | 03/01 |
Gene prediction | PDF,PP |
Exceptions | PDF,PP |
|
18 | 03/03 |
Artificial neural networks | PDF,PP |
More on classes, Biopython | PDF,PP |
|
19 | 03/08 |
Project | PDF,PP |
|
20 | 03/10 |
Final Exam |
|
References:
Electronic access to journals is generally free from on-campus computers. For off-campus access, follow the "[offcampus]" links or look at the library "proxy server" instructions.
- Noble, WS, "A quick guide to organizing computational biology projects." PLoS Comput. Biol. 5 (2009) e1000424. Pmid: 19649301 [Offcampus]
- Dudley, JT and Butte, AJ, "A quick guide for developing effective bioinformatics programming skills." PLoS Comput. Biol. 5 (2009) e1000589. Pmid: 20041221 [Offcampus]
- How dictionaries work (aka hash tables or hash maps)
- Subramanian et al., "Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles"PNAS102(43) (2005)
|
Python Resources:
General
Regular Expressions
"RegExPal" (For Javascript rather than Python, but similar and quite handy. Try it!)
|
Biopython
Python Books
Learning Python by Mark Lutz. O'Reilly (Very comprehensive. Much is accessible to beginners.)
Dive Into Python 3 by Mark Pilgrim. (Another online book. Based on Python 3, so some differences, and more advanced, but also free.)
|
|
Bioinformatics Books
» Biological sequence analysis: probabilistic models of proteins and nucleic acids, R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge. (Excellent reference, classics)
» Inferring Phylogenies, Joseph Felsenstein, Sinauer, 2004. (Excellent reference on this topic.)
» Introduction to Computational Genomics: A Case Studies Approach, Cristianini, Nello & Hahn, Matthew, Cambridge, 2007.
» An Introduction to Bioinformatics Algorithms, Neil C. Jones & Pavel A. Pevzner, 2004.
» Bioinformatics: Sequence and Genome Analysis, David W. Mount, Cold Spring Harbor Laboratory Press.
» Python for Bioinformatics, Sebastian Bassi, CRC Press, 2010. (A little too advanced as a progamming book for beginners, but fine now that you're experienced.)
» Python for Bioinformatics, Jason Kinser, Jones and Bartlett, 2009. (Ditto.)
|
James H. Thomas
Department of Genome Sciences
University of Washington
|
Elhanan Borenstein
Departments of Genome Sciences
University of Washington
|
|