Text Analytics

An alert reader will make connections between seemingly unrelated facts to generate new ideas or hypotheses. However, the burgeoning of published text means that even the most avid reader cannot hope to keep up with all the reading in a field, let alone adjacent fields. Nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature.

Text Mining offers solutions to the problem of data deluge by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. In Text Mining, software is developed that analyses large collections of documents to discover previously unknown information. The information might be relationships or patterns that are buried in the document collection and which would otherwise be extremely difficult, if not impossible, to discover.

Text Mining Research Group

The Text Mining group, led by Prof Sophia Ananiadou, is the research group that encompasses all text mining activities within the School of Computer Science. Their research combines methods from computational linguistics (e.g. shallow parsing, local grammar modelling), knowledge representation (ontologies) and intensive data mining (feature selection, classification and clustering).

National Centre for Text Mining (NaCTeM)

Led by Prof Sophia Ananiadou, NaCTeM sits within the Text Mining Research Group. This is the first publicly-funded text mining centre in the world. They have developed several text analytics tools and search services. NaCTeM has developed text mining services and service exemplars for the UK academic community. Some of the exemplars produced are TerMineAcroMine and MEDIE. NaCTEM is part of the largest European network of Excellence META-NET, which is dedicated to building the technological foundations of a multilingual European information society.

External collaborations:

NaCTeM is currently collaborating with a number external partners on research projects:


Prof. Sophia Ananiadou

Sophia is the Director of NaCTeM and leads the Text Mining Research Group. She has led the development of the text mining tools and services currently used in NaCTeM with the aim to provide scalable text mining services: information extraction, intelligent searching, association mining, etc. Her main contributions are in the area of natural language processing, and in particular computational terminology and biomedical text mining. The work in computational terminology and term recognition led to the development of the C-value method for automatic term recognition which has been adopted as a standard method internationally.

She has been a principal investigator on a number of projects, such as EMPATHY, which aimed to support metabolic pathway model curation through the integration of text mining methodologies into a pathway reconstruction platform. In collaboration with the University of Liverpool and the National Institute for Health and Care Excellence, she was the principal investigator for Mining for Public Health which aims to conducting novel research in text mining and machine learning to transform the way in which evidence-based public health reviews are conducted.

John McNaught

John is a lecturer in the School of Computer Science and the Deputy Director of NaCTeM. He has worked on machine translation (MT) aspects, specifically on MT software design, on sublanguage-based MT, and on computational dictionaries. Multilingual issues and sublanguage concerns also brought me to develop strong interests in computational terminology and the representation of special knowledge. He also became involved in various language engineering standardisation initiatives such as EAGLES and ISLE. These focused particularly on issues of reusability of language resources such as text corpora and electronic dictionaries, and design for reuse. Recently, the reusability issue has been exercising researchers in the context of semantic web ontologies.

Prof Jun-ichi Tsuji

Before taking up the post of Director at the Artificial Intelligence Research Centre (AIST), J. Tsujii was a professor at the University of Tokyo in Japan as well as a professor at University of Manchester in UK. He was appointed as the first director of National Centre for Text Mining (NaCTeM) in UK at 2005, and he is now the scientific advisor of NaCTeM, part-time professor at the School of Computer Science. His research achievements include the development of the HPSG-based parser (Enju), its application to pathway extraction from text, and construction of the GENIA corpus. The GENIA corpus has been used as one of the gold standard corpora for tasks in Bio Text mining such as event extraction, named entity recognition, and pathway extraction.

He has received a number of awards such as IBM Faculty Award (2005), Achievement Award of Japan Society for Artificial Intelligence (2008), Fellow of Information Processing Society Japan (2010), etc. He received the Medal of Honor with Purple Ribbons from the Japanese government for his contribution to Bio Text mining, Machine Translation (2010), the Funai achievement award (2014), he has been named ACL fellow (2014) and the Okawa price (2016) He was President of ACL (Association for Computational Linguistics, 2006), and is a permanent member and chair of ICCL (International Committee of Computational Linguistics).

Dr Tingting Mu

Tingting is a member of NaCTeM and the text mining research group. As a part of her work, she focuses on developing advanced mathematical modelling and large-scale optimisation techniques to (1) simulate human intelligence and (2) analyse real-world complex data. For (1), she aims at constructing effective machine learning models to automate tasks such as matching, recognition, prediction, ranking, inference, characterisation, language and vision understanding. For (2), she develops algorithms to discover latent structure and extract information from large-scale, noisy and unstructured data, e.g., text, image, video, signal and network data to support development of text mining systems and other related research areas such as bioinformatics. During her research she has developed a number of software tools for MATLAB.

Dr Riza Batista-Navarro

Riza is a Lecturer at the School of Computer Science of the University of Manchester, and a member of NaCTeM and the text mining research group. She obtained her PhD in Computer Science from the University of Manchester, with Biomedical Text Mining as her specialisation. During her time at Manchester, Riza has conducted research into using Natural Language Processing to extract meaningful information from numerous scientific documents. Along with using NLP in her work, she optimises machine learning algorithms to derive meaning from texts. 

She will soon begin research as part of a Newton Fund project on social media analytics in the Philippines. It will aim to automatically analyse social media posts to detect and extract the emotions of the poster and find any potential distress. She will work alongside two Universities and an NGO within the Philippines to assist in identifying those who might be experiencing mental distress so that the correct assistance can be provided for them.

Dr Goran Nenadic

Goran’s research focuses on unstructured data science, specifically on making sense of large-scale free text data by combing rule-based and data-intensive approaches. His work mainly aims at engineering deep features to train machine-learning algorithms to process free-text documents. He is based in the School of Computer Science, and affiliated with the Manchester Institute of Biotechnology (MIB) and The Farr Institute’s Health eResearch Centre (HeRC). His recent and current research projects (funded by NIHR, EPSRC, BBSRC, Welcome, AZ, Pfizer) include large-scale extraction and curation of biomedical information from the literature (including processing table data) and understanding patient free-text data. He has worked with a number of local hospitals, charities and industry on unlocking evidence contained in clinical narratives and healthcare social media in many areas including oncology, stroke, rheumatology, veterinary science, mental health, radiology, complementary medicine, pain management, etc. His team has also worked on semi-automated anonymisation of clinical free text and has taken a major part in many clinical text mining challenges. Goran leads the UK healthcare text analytics network (Healtex), which has been funded by EPSRC to identify the main challenges in processing healthcare free-text. He is also the Editor-in-Chief of Journal of Biomedical Semantics.

In addition to biomedicine, Goran is also interested in integrative data analytics that combines multi-modal data streams to uncover new patterns in other domains (e.g. Connected Health Cities focusing on linking wellbeing and citizen data; the EPSRC-funded HOME-Offshore project to analyse renewable energy data signals).