Spatial Learning: Applications in Structure Based Drug Design

Principal Investigator: Professor Ross King

We propose to research how best to represent three-dimensional (3D) spatial relationships for machine learning. This is a key problem in AI as we live in a three-dimensional world. It has been our experience that the best way to attack an abstract problem, such as representing 3D spatial information, is to first tackle a specific concrete problem, and then to generalise to the abstract problem. We therefore propose to focus initially on the problem of representing the binding of drugs to protein active sites.

We propose to (1) compare and contrast two complimentary approaches to representing 3D space: relational learning and rough paths using the specific problem of representing the binding of drugs to protein active sites; and (2) integrate relational learning and rough path machine learning for structure based QSAR learning.

Relational learning

The project will focus on the development of probabilistic relational learning methods for learning three-dimensional spatial relationships. This approach differs from standard machine learning in that it uses first-order predicate logic as the basic representational language, in contrast to propositional logic, which is the basis of machine learning methods such as neural networks, random forests, support-vector machines, etc. A proposition is a statement that is true/false about a whole object, which makes the representation of 3D spatial relationships difficult without a standard reference frame. Relational learning is well suited for representing chemical structure, as the standard human-chemist representation of chemical structure is as 3D relationships between atoms and bonds.

The Signature of a path

Rough paths theory arose as a generalisation of classical control theory: with control being a very oscillatory signal that enables the deterministic treatment of stochastic differential equations driven by a signal rougher than semi-martingales. In mathematics it has contributed to Professor Martin Hairer’s Fields medal-winning work on the regularity structures for stochastic partial differential equations SPDEs. Now, in data science, its impact is also growing.We propose to extend our work on effective representation of curves through their signature to the general surface in d-Euclidean space. We propose to define the iterated signature of d-dimensional surface in a recursive way, by utilising the fact that the d-dimensional surface can be thought as a path taking value in d-1 dimensional surface. The iterated signature feature of a surface has the advantage of potential dimension reduction in representing the surface. There are many interesting mathematical questions, and possible applications around this.