Principal Investigator: Professor Niels Peek
This programme of work will focus on methodology for complex data and learning individual-level risk prediction models, and a substantive application.
Advancing methodology for predictive healthcare
Health data, for example from electronic health records (EHRs) or other sources such as wearables or Internet, are subject to complex data generation processes, which are not addressed with existing methods for risk prediction. We will focus on three key issues as described below.
We will develop approaches that exploit the fact that data from complementary sources (e.g. data collected through EHRs and data collected through a wearable device) are subject to different and perhaps complementary biases. Thus, we can exploit these different ‘windows’ on the underlying health of the population.
We will build more advanced models that allow ‘what-if’ prediction modelling using messy observational data. This will also allow consideration of dynamic treatment allocation. For example, the mean duration of adherence to preventive therapy with statins is 18 months, but this might vary with disease stage, symptoms and perceived risk of adverse outcomes. As such, there are opportunities for potential outcome CPMs to inform optimal timing of treatment initiation so that patients are most likely to adhere during the time in which they will receive maximum benefit.
Research is currently ongoing at University of Manchester to measure generalisability of risk prediction models across multiple heterogeneous clinical sites. Initial results have found substantial variability in the predictive performance of QRISK between different clinical sites. The objective will be to develop and implement the methods that adjust for this heterogeneity in risk prediction models.
Case study: risk prediction models for the prevention and early intervention in Alzheimer’s disease (AD), other dementias and the identification of high risk group.
The key objectives of are:
- Producing a risk prediction model for AD based on lifestyle (e.g. exercise, diet) and genetic factors (combined SNP panels which help to define genetic predisposition). This will build on the models for informative presence discussed previously.
- Utilising counterfactual approaches to identify causal and modifiable factors that can refine the model and the specific interventions required, further exploiting Mendelian randomisation to infer causal structure.
- Utilisation of machine learning and AI approaches to identify pre-diagnostic and pre-disease changes in the radiological image data.
To develop the model we have approved access to the UK Biobank dataset in which around 15,000 people have dementia. To validate our models, we will have access to data through Manchester based longitudinal studies of ageing and other studies held in the Dementia Platform UK (DPUK). We will establish links with the Turing ‘data science for mental health’ special interest group.