The SSAHA Trace Server
Zemin Ning, Will Spooner, Adam Spargo, Mark Rae, Steven Leonard, Tony Cox
The Wellcome Trust Sanger Institute
Various genome projects have brought the creation of many large biological databases. The total data
size of DNA sequences, for example, is estimated to be approximately 200 GB, including WGS and clone
reads, finished sequences, refSeq etc. Designing services to make all the data searchable in a fast,
sensitive and flexible way, poses significant challenges in both development of algorithms and hardware
architecture implementation. In this poster, we outline a system with the potential to accomplish this
challenging but extremely worthwhile task. The search engine will be SSAHA2, a package combined of SSAHA
(Sequence Search and Alignment by Hashing Algorithm) with cross_match developed by Phil Green at the
University of Washington. Matching seeds of a few kmer words are detected by the SSAHA algorithm. Both
query and subject sequences are cut off according to the locations of the matching seeds and then passed
to cross_match for full alignment. A platform-independent client/server code has been developed for data
input and alignment output under multiple machines. It is aimed to provide a near real-time (under 10
seconds) search service for a clustered 200 GB database. The requirement of hardware for the server system
will be either 7 Linux boxes of 4 CPUs with 16 GB RAM memory or 4 boxes with 32 GB RAM memory.
Agent-oriented Approach to DNA Computing
Grace Steele and Vojislav Stojkovic
Morgan State University, Baltimore, MD 21251
Leonard Adlemans work has been quite instrumental to our study of DNA computing. Theoretically, problems must
be solved at three levels: agent, agent-DNA, and DNA. Agent and agent-DNA levels can be implemented using:
von Neuman machines with standard processors, operating systems, and agent programming languages. Agent-DNA
level can be viewed as internal assembly programming that serves as an interface between agents and DNA-s. A
standard interface is not possible and has to be developed for each problem. DNA level can be implemented
at DNA (chemical) computers or simulated at von Neuman machines. DNA level may be seen as machine programming.
The chemical reactions between DNA-s can be interpreted as an execution of machine code. Practically
(because at present we do no have DNA (chemical) computers) we have to solve problems at a combined agent
level. We use an agent as an abstract computational unit - processor. In the Traveling Salesman Problem,
while Adleman represented graph nodes by DNA-s, we use agents. Current MAC-G4 hardware limitations allow us
to work with only about 10000 agents implemented in Easel programming language. It is too far from billions
of DNA sequences but still sufficient to demonstrate the basic idea and provide a solution to the problem.
We propose that in the near future it will be possible to join von Neuman and DNA computers in a functional
super biocomputer. Using the recommended approach, our research focuses on: database search, theorem proving
using the principle of resolution, cryptography, communication security, diseases detection, prediction, etc.
Return to Poster Abstract Index
Return to Top
Accelerating the Kernels of BLAST with An Efficient PIM (Processor-In-Memory) Architecture
Jung-Yup Kang, Sandeep Gupta, and Jean-Luc Gaudiot
University of Southern California
BLAST is a widely used tool to search for similarities in protein and DNA sequences. However, the kernels of
BLAST are not efficiently supported by general-purpose processors because of the special computational
requirements of the kernels. The kernels involve large amounts of computations with a high degree of
potential parallelism which general-purpose processors can only exploit to a very limited extent. The
kernels handle operands that are small (one byte) and not efficiently manipulated by general-purpose processors.
The kernels entail only simple operations whereas current general-purpose processors expend significant
proportion of their chip area to support complex operations, such as floating-point operations. The kernels
perform a large number of memory accesses, which translates into severe penalties. In this paper, we propose
an efficient PIM (Processor-In- Memory) architecture to effectively execute the kernels of BLAST. We propose
not only to reduce the memory latencies and increase the memory bandwidth but also to execute the operations
inside the memory where the data are located. We also propose to execute the operations in parallel by dividing
the memory into small segments and by having each of these segments executes operations concurrently. Our
simulation results show that our computing paradigm provides a 242X performance improvement for the executions
of the kernels and a 12X performance improvement for the overall execution of BLAST.
Return to Poster Abstract Index
Return to Top
|