A FAST LOCAL SEARCH ALGORITHM USING HISTOGRAM FEATURES FOR DNA SEQUENCE DATABASE
Authors: QIU CHEN, KOJI KOTANI, FEIFEI LEE, TADAHIRO OHMI
Number of views: 410
DNA sequence search is a very important topic in bioinformatics algorithm development. However, this task usually spends much computational time to search on large DNA sequence database. In this paper, we propose an efficient hierarchical DNA sequence search algorithm to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search algorithm using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank sequence data show the proposed algorithm combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.