TitleA bioinformatics analysis of the cell line nomenclature.
Publication TypeJournal Article
Year of Publication2008
AuthorsSarntivijai, Sirarat, Ade Alexander S., Athey Brian D., and States David J.
Date Published2008 Dec 1
KeywordsCell Line, Computational Biology, Databases, Factual, MEDLINE, Terminology as Topic

MOTIVATION: Cell lines are used extensively in biomedical research, but the nomenclature describing cell lines has not been standardized. The problems are both linguistic and experimental. Many ambiguous cell line names appear in the published literature. Users of the same cell line may refer to it in different ways, and cell lines may mutate or become contaminated without the knowledge of the user. As a first step towards rationalizing this nomenclature, we created a cell line knowledgebase (CLKB) with a well-structured collection of names and descriptive data for cell lines cultured in vitro. The objectives of this work are: (i) to assist users in extracting useful information from biomedical text and (ii) to highlight the importance of standardizing cell line names in biomedical research. This CLKB contains a broad collection of cell line names compiled from ATCC, Hyper CLDB and MeSH. In addition to names, the knowledgebase specifies relationships between cell lines. We analyze the use of cell line names in biomedical text. Issues include ambiguous names, polymorphisms in the use of names and the fact that some cell line names are also common English words. Linguistic patterns associated with the occurrence of cell line names are analyzed. Applying these patterns to find additional cell line names in the literature identifies only a small number of additional names. Annotation of microarray gene expression studies is used as a test case. The CLKB facilitates data exploration and comparison of different cell lines in support of clinical and experimental research.AVAILABILITY: The web ontology file for this cell line collection can be downloaded at http://www.stateslab.org/data/celllineOntology/cellline.zip.

Alternate JournalBioinformatics
PubMed ID18849319
PubMed Central IDPMC2639272
Grant ListR01 LM008106 / LM / NLM NIH HHS / United States
U54 DA021519 / DA / NIDA NIH HHS / United States