Data Science Club 14/07/16

14th July 2016

Atlas Rooms, Kilburn Building

This meeting featured an Introduction to the University of Manchester Data Science Institute and the Data Science work of the Cathie Marsh Centre and a presentation on Hadoop from the UK Data Service. Presentations followed by University of Manchester staff and researchers on their use of Hadoop - and the new UmanSysProp facility here at Manchester.

Agenda

Magnus        Rattray​

Faculty of Life Sciences,
The University of Manchester

Introduction to the University of Manchester Data Science Institute

Peter Smyth       

UK Data Service

"The challenges of building and populating a secure but accessible big data environment for researchers in the Social Sciences and related disciplines."

Building a Hadoop cluster is the easy part. The hard part comes when we try to ingest data, of all shapes sizes and structures into a fully searchable data lake, which provides previews, dashboards or samples of dataset combinations whilst ensuring where necessary the most stringent security requirements are adhered to, but making the less secure data easily accessible to all and providing a robust and scalable processing environment.  This talk is about the UKDS Hadoop system, about the progress made and the obstacles envisaged.

Mihaly Berekmeri

School of Computer Science, University of Manchester

"From RDBMS to Hadoop: a case study"

What can be done when the traditional database is becoming increasingly sluggish? Do Hadoop environments provide a solution?  In this talk I present our experience in porting a traditional RDBMS to IBM BigSQL and Hadoop, its challenges and its results.   As well as significant cost reduction, initial tests show query speedup of 2 orders of magnitude.

Rachel Gibson

CMIST, University of Manchester

"Overview in Developments in Data Science at Cathie Marsh Institute for Social Research (CMIST)"

David Topping

School of Earth, Atmospheric & Environmental Sciences, University of Manchester

"New database on properties of molecules and mixtures linked to open source informatics suite"

As part of an NERC International Opportunities Fund project lead by D Topping, researchers from across the US and EU will be converging later this year to decipher why we often cannot agree on measured properties of molecules and mixtures important for atmospheric science. This will involve development of a new community database with links to a new open source informatics package that enables predictions of such properties, already hosted here at the University (UManSysProp). We want to encourage update and contributions to these facilities any field across campus that relies on physico-chemical properties. If you want to find out more, please visit our existing site:http://umansysprop.seaes.manchester.ac.uk/