A degenerate or indeterminate string on an alphabet is a sequence of non-empty subsets of . Given a degenerate string t of length n and its Burrows–Wheeler transform we present a new method for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet . Furthermore, it is a hybrid pattern matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search time complexity is O(qm2) for counting the number of occurrences and O(qm2 + occ) for reporting the found occurrences where occ is the number of occurrences of the pattern in t. Experimental results show that our method performs well in practice.
Reference:
Daykin, J.W. (et.al). 2019. Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform. Information Processing Letters, (v)147, pp 82-87.
Daykin, J., Groult, R., Guesnet, Y., Lecroq, T., Lefebvre, A., Leonard, M., ... Watson, B. A. (2019). Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform. http://hdl.handle.net/10204/11385
Daykin, JW, R Groult, Y Guesnet, T Lecroq, A Lefebvre, M Leonard, L Mouchard, E Prieur-Gaston, and Bruce A Watson "Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform." (2019) http://hdl.handle.net/10204/11385
Daykin J, Groult R, Guesnet Y, Lecroq T, Lefebvre A, Leonard M, et al. Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform. 2019; http://hdl.handle.net/10204/11385.