Select Page

Natural Language Processing (NLP) & Text Mining

The NLP & Text Mining Group

The NLP and TM Group has consistently achieved high quality research outputs, attracted significant funding and  trained outstanding PhD students. Its roots lie in the pioneering research in NLP conducted between 1980 and 2000 at the Centre for Computational Linguistics of UMIST (one of the two founding universities of the University of Manchester). Since 2004, the Group has focussed its activities around the interplay of NLP and TM. Its pre-eminence in TM was recognised in 2004 by the award of major funding from JISC/BBSRC/EPSRC to set up the world’s first publicly-funded National Centre for Text Mining (NaCTeM), which immediately became an international centre of text mining expertise.

National Centre for Text Mining (NaCTeM)

The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community. Text mining offers a solution to the challenge of ‘data deluge’, information overload and information overlook.

NaCTeM has developed text mining services and service exemplars for the UK academic community. Our services are underpinned by a number of generic natural language processing tools:

  • TerMine is a Term Management System which identifies key phrases in text.
  • RobotAnalyst is a tool to minimise the human workload involved in the study identification phase of systematic reviews.
  • Thalia is a semantic search engine for Pubmed abstracts.
  • AcroMine is an acronym dictionary which can be used to find distinct expanded forms of acronyms from MEDLINE.
  • Kleio is an advanced information retrieval system providing knowledge enriched searching for biomedicine.
  • FACTA+ is a MEDLINE search engine for finding associations between biomedical concepts.
  • History of Medicine (HOM) – A semantic search system over historical medical archives
  • APLenty – An annotation tool for creating high-quality sequence labelling datasets using active and proactive learning
  • Paladin – A document classification annotation web application which supports active/proactive learning.
  • MEDIE uses semantic search to retrieve biomedical correlations from MEDLINE.
  • Info-PubMed uses a gene/protein dictionary and deep parsing to understand protein interactions

Lead Researchers in NLP & Text Mining