data.ac.uk - The UK’s Higher Ed Open Data Project

This year, while attending IWMW 14, (a fantastic HE conference held each year in the UK) I attended a talk given by Chris Gutteridge and learned of a huge HE specific open data project - data.ac.uk 

data.ac.uk is an Open Data resource which aims to provide a single reference site for several data sets which are created, maintained and shared by the UK higher education community. The data is provided free of charge for all to use.

It is hoped that the project will encourage the HE community to share, utilize, update and generate demand for open data.

data.ac.uk is broken down into a number of smaller sites or portals which are designed to deliver the various data sets in human readable and often searchable form.

Currently the main focus of data.ac.uk is equipment data. This is a national equipment portal, aggregating information on existing UK HE research equipment to provide a searchable resource which may be used to locate research equipment or faculty staff across the UK. Its primary goal is to facilitate both knowledge and equipment sharing. 

Imagine a university which needs a highly specialized piece of research hardware for just a few weeks or months - the cost of purchasing the unit may be prohibitively high, but by searching equipment.data they may easily find another university willing to share theirs, and some expertise, for the duration of project.

Equipment. Data is currently receiving data from 30+ higher education institutes and contains almost 7,500 information points for all kinds of equipment from Affymetrix GeneChip microarray scanners to ZEISS High-Resolution 3D X-ray Tomography Microscope Systems (whatever they are!)

A smaller portal, observatory.data.ac.uk, scrapes the almost 3,000 working .ac.uk websites primarily compiling listings (and usage statistics) of the various Social Media outlets being used by Higher Education. 

Additionally the observatory gathers statistics on the usage of various web server technologies and platforms (Apache, IIS, etc.) and versions are shown. Though perhaps not immediately useful it is certainly interesting and the dominance of Apache over IIS is always nice to see!

This all sounds great, how do we get onboard?

All Higher Education institutes contributing to this project do so by providing an Organisation Profile Document or OPD. This OPD is a file which is either referenced on the homepage of their sites or is located with a specific URI. 

An OPD is a simple data file which authoritatively describes the organization, listing its official name and logos, its contact details, web pages, official social media accounts, equipment listings and open access documents. 

This OPD is then periodically read and parsed by data.ac.uk and all information contained within is added or updated in their data stores. 

The ODP file is a Turtle format RDF file (Terse RDF Triple Language) - don't worry, I'd never heard of it either! This format is ideal for such data expression and allows HE institutions to relatively easily provide all this information in a machine readable fashion.

The file can look a little confusing at first but detailed "How to" documentation and a syntax checker for your OPD file is available at http://opd.data.ac.uk so don’t get scared, it’s actually pretty simple to put together. 

(If any Terminalfour clients have created a template to share this data we'd be happy to link it here for others to use!)

The benefits of Open Data, particularly within the realm of this project are almost endless. Both the principle and practice of making data available for anyone to use free of charge will have a major impact on everything around us.

Though this project is one of many we'd like to encourage all UK HE's to take part and see just how far it can grow. 

It would be fantastic to see similar projects grow in other markets, in particular the US where given the sheer volume of HE organizations is bound to produce some very interesting data.