BBC Internship: Interesting stories that can be told from complex datasets


The Data Science Institute is pleased to announce that BBC News has two internship positions available for UoM PhD candidates. Please see below the Project Brief for one of the internships:

Research to be undertaken:


-          to analyse a series of complex datasets (some BBC owned/held, others public domain or others’ datasets) in order to highlight stories or patterns of interest.

-          examples of datasets to investigate and questions to pose are:

Key research challenges to address:

  1. The prescription pattern for opioid painkillers over the last decade, there’s been a big uptick in the US and it would be interesting to see what’s happening in the UK.

-          Can we work out:

-          Number of prescriptions as a total per year for the last 10 or 20 years

-          Geographic distribution of those prescriptions – so where has the highest and lowest levels of prescription of opioid painkillers

-          Can we get numbers on OTC meds containing opioids?

-          If possible I would also be really interested in working out numbers of prescriptions going through online pharmacies?

  1. Forced Marriage: The girls missing from the data?


-          Every summer, teenage girls with Asian or middle eastern heritage go missing from British schools to be forced into marriage by their families. 

But nobody knows how many girls are definitely victims - by definition this is a crime that is concealed within families and difficult to track. We know from official data sources that the government's forced marriage unit was involved in more than 1,200 potential cases last year. 

-          However, every school is required to collect data on its pupils, including ethnicity, nationality and from last year, country of birth. 

-          That data will obviously show year-on-year changes in the number of children in a target group who are in school in the summer term and the following autumn term. 

-          Question: Could that data, with the assistance of other data sets that help to control for other factors, give us an estimate for children who have apparently disappeared from the data for no clear reason? 

-          Theoretical Example: based on trying to calculate how many girls of Pakistani heritage disappear from Bradford 

(these numbers are entirely fictional)

-          British Pakistani population of Bradford:                          100,000 (20% of area's population) 

-          Number of teen Brit-Pakistani girls in schools:                 10,000

-          Number of target girls in Years 11:                                     2,000

-          Number of Year 11 girls who go on to Y12                         1,400

-          Number of girls removed from the school roll                  (600) 

-          Work to be done:

1) Cross reference this data with regional data on requests for help to the national Forced Marriage Unit 

2) Control for various assumed factors such as migratory patterns

-          Number of Y12 girls in target group who cannot be accounted for         200

-          Why this idea?

-          Down the years, a number of experts in this field have suggested to me and to others in government that this would be the best method of tracking what's going on - but as far as I am currently aware, it has never been implemented. If government has tried, it's not said. 

-          I have no doubt it would be a challenging exercise and it would involve a degree of careful thought before it could even be attempted.

-          It may be that it's not doable at all and it may also need some considerable FOI effort to get it off the ground - but I think we ought to at least scope it out. 

Other databases to analyse and questions to answer:

National Pupil Database – several possible ideas here, all of them at an early stage.

Performance of poor white boys

Special needs exclusions

Impact of new grammar schools – one for the long term?

What subjects are being studied at GCSE in which area.

A speculative idea - forced marriages – is it possible to identify if pupils are being dropped from the database, due to participation in forced marriages?

Health/Social care

Data on GP prescribing – look at mapping the areas with the most antibiotic prescribing, so you could show the antibiotic capital of the UK

Or is it possible to use that data to show the use of other drugs like statins or Ritalin for example?

Is it possible to look at the per capita funding of social care for the elderly over the past couple of decades to see how that has changed?


With elections and referendums surrounding us at every turn, it may be interesting to see if there’s any students interested in looking at the changing demographics of voting. No doubt there will soon be more information from British Election Study in the near future. 

Key deliverables:

-          Stories, facts and patterns to emerge from the data, that could be used to feed News stories.

Potential uses of outcomes from internship (to give intern a feel for onward journey of results):

-          BBC News and editorial commissioning

Measures of success:

-          Progress towards stated aims above