Data Language Project Background

The range of different data definitions and terms in use across the sector increases the cost and complexity of reporting and reduces the ability to compare and utilise data from different sources. Even commonly used words like course have a variety of different interpretations across the information landscape.

In early 2013 HEDIIP undertook a project working with some of the key data-processing organisations in HE to begin to develop a lexicon of terms and a thesaurus to help understand the data language that currently exists across these different organisations. The accompanying report set out the issues behind this work and made recommendations for the future development of a higher education data language.

In May 2015 HEDIIP published the New Landscape report setting out the key elements required in order to achieve the broader objectives of the HEDIIP Programme. One of these elements is the ‘development of a standard dataset with agreed definitions that are used by all key data collectors.’ The report established that:

‘The biggest consistent consumption of resource within HEPs in this area is around collecting and transforming their raw data, which is collected in certain formats for different purposes, to fit the format and structure of the collection they are submitting. With a standard dataset, common definitions and a clear structure around the data collection, and only one main collection, HEPs will be able to reduce significantly the workload on their data processing teams.’

This key element was therefore taken forward by the HEDIIP Programme as a Data Language project to achieve sector agreement on a standard student dataset and data definitions.