Devloping Algorithms for Large-scale Biosequence Comparison, Genome Annotation, and Proteomics

Jeremy D. Buhler, Ph.D.

DEPARTMENT OF Genetics
Keywords: bioinformatics, genome analysis, computational biology, annotation

Detecting local similarities among biosequences is fundamental to the task of assigning identity, historical context, and putative function to parts of genomes. While efficient and sensitive similarity search algorithms such as BLAST are available to the biological community today, we must adapt these tools to meet the emerging needs of genomics. Specific directions for improvement include large-scale alignment across three or more genomes at once, automatic specialization of similarity search tools to work better on particular organisms or types of sequence (coding DNA, UTRs, repeats, etc), and dealing efficiently with exponentially growing genomic sequence databases.

My lab explores new algorithmic techniques for designing biosequence similarity search tools. We seek to anticipate the needs of the biological community for new tools, to establish firm theoretical foundations for the design of existing tools, and to discover techniques that make new classes of search problem feasible. For example, we have developed the PROJECTION algorithm for finding conserved DNA sequence motifs, which can detect motifs that are inaccessible to standard motif-finding software. We have also studied how to automatically optimize BLAST-like seeded alignment heuristics to exploit known statistical properties of alignments. Such properties include the relative rates of different types of substitutions as well as nonuniform patterns of substitution (e.g. non-conservation of third codon positions in coding DNA).

We are also interested in challenging search problems from other bioinformatics domains, such as protein identification by mass spectrometry.

Faculty Research by Name: A B C D E F G H I J K L M N O P R S T U V W X Y Z