Comparison of genomic data via statistical distribution.

TitleComparison of genomic data via statistical distribution.
Publication TypeJournal Article
Year of Publication2016
AuthorsAmiri, Saeid, and Dinov Ivo D.
JournalJ Theor Biol
Volume407
Pagination318-27
Date Published2016 Oct 21
ISSN1095-8541
Abstract

Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome. This work is devoted to illustrating different methods recently introduced for quantifying sequence distances and variability. Most of the alignment-free methods rely on counting words, which are small contiguous fragments of the genome. Our approach considers the locations of nucleotides in the sequences and relies more on appropriate statistical distributions. The results of this technique for comparing sequences, by extracting information and comparing matching fidelity and location regularization information, are very encouraging, specifically to classify mutation sequences.

DOI10.1016/j.jtbi.2016.07.032
Alternate JournalJ. Theor. Biol.
PubMed ID27460589
PubMed Central IDPMC5361063
Grant ListP30 AG053760 / AG / NIA NIH HHS / United States
U54 EB020406 / EB / NIBIB NIH HHS / United States
P20 NR015331 / NR / NINR NIH HHS / United States
P30 DK089503 / DK / NIDDK NIH HHS / United States
P50 NS091856 / NS / NINDS NIH HHS / United States