LOD-a-lot: A Single-File Enabler for Data Science

Research & Innovation

Many Data Scientists make use of Linked Open Data. However, most scientists restrict their analyses to one or two datasets (often DBpedia). One reason for this lack of variety in dataset use has been the complexity and cost of running large-scale triple stores, graph stores or property graphs. With Header Dictionary Triples (HDT) and Linked Data Fragments (LDF), the cost of Linked Data publishing has been significantly reduced. Still, Data Scientists who wish to run large-scale analyses need to query many LDF endpoints and integrate the results.

Using recent innovations in data storage, compression and dissemination, we are able to compress (a large subset of) the LOD Cloud into a single file. We call this file LOD-a-lot. Because it is just one file, LOD-a-lot can be easily downloaded and shared. It can be queried locally or through an LDF endpoint. In this paper we identify several categories of use cases that previously required an expensive and complicated setup, but that can now be run over a cheap and simple LOD-a-lot file.

LOD-a-lot does not expose the same functionality as a full-blown database suite, mainly offering Triple Pattern Fragments. Despite these limitations, this paper shows that there is a surprisingly wide collection of Data Science use cases that can be performed over a LOD-a-lot file. For these use cases LOD-a-lot significantly reduces the cost and complexity of doing Data Science.

 

Speakers: