Developing machine learning-enabled experimental design, model building and scientific discovery in particle physics

principal investigator: Darren price

Bid to the ATI Data Science for Science Programme Stream

Experimental searches for 'dark matter' (DM) are challenging and time-consuming. Without compelling theoretical guidance as to the nature of DM it is critical that future searches consider all experimentally and theoretically allowed models, and are guided to make the experimental measurements that give the greatest chance of discovery. With this in mind, I propose a new research direction which aims to develop and apply unsupervised learning techniques to the large datasets produced at CERN and other particle physics laboratories to develop and test theories against data, with minimal theoretical and experimental prejudice.

Executive summary

The ultimate goal of this proposal will be the proof-of-principle use of unsupervised machine learning to enable the automation of scientific hypothesis testing. A key outcome of this research is to identify the limitations and challenges that need to be overcome to extend this proof-of-principle study beyond the specific search for dark matter that frames the study, and enable future research to generalise this to other use cases in particle physics and the wider scientific community.
To this end, this bid focuses on two milestones:
1. Development of new software tools to characterise DM with data from multiple experiments.
2. The proof-of-principle use of unsupervised learning tools for scientific discovery.

Future directions enabled by this funding

The studies enabled by this proposal will be scientifically valuable in their own right, but will also serve as the seed for many future possibilities. This research will provide the first global search for dark matter across multiple diverse datasets from all relevant experimental measurements, and provide a new and proven unsupervised learning framework for model-building and experimental design in particle physics that can be further developed with new datasets in future, but also applied to other scientific data
intensive field reliant on testing of models providing quantitative predictions. Within particle physics I would expect to apply for future funding from STFC and the European
Research Council to fund the embedding of these techniques as part of offline data analysis in future dark matter search and measurement programmes, and the expansion of this work to encompass other puzzles in particle physics.

Developments of networks/connections

This research will enable new connections to be developed with other institutes and international research collaborations studying the application of machine learning to particle physics research. These include the NYU Center for Data Science, which is particularly interested in applications of unsupervised and semi-supervised machine learning to particle physics, with whom I intend to collaborate. This funding will create new engagement from the UK with the inter-experimental machine learning (IML) working group, which is an international organisation of physicists and computer scientists working together on common problems and solutions to research challenges with machine learning, and who particularly welcome collaboration with people working in other domains. This funding will enable Turing-funded research connections to this organisation.