Constraint-Based Measures for DNA Sequence Mining using Group Search Optimization Algorithm
Authors: Kuruva Lakshmanna, Neelu Khare
Number of views: 413
In this paper, we propose a 3-step DNA sequence mining algorithm, called 3s-DNASM, incorporating prefix span, length and width constraints and group search optimization. The complete mining process is comprised into following vital steps: 1) applying prefix span algorithm, 2) length and width constraints, 3) Optimal mining via group search optimization (GSO). We first present the concept of prefix span, which detects the frequent DNA sequence. Based on this prefix tree, length and width constraints are applied to handle restrictions. Finally, we adopt the group search optimization (GSO) algorithm to completeness of the mining result. The experimentation is carried out using DNA sequence dataset, and the evaluation with DNA sequence dataset showed that the 3s-DNASM system is good for sequence mining. The simulation results illustrated that when min_support=4, the number of DNA sequence mined only 29 patterns by 3s-DNASM system, and in this case prefix span mined about 2168 patterns.