Towards Querying Heterogeneous Federations of Interlinked Knowledge Graphs

June 28, 2022 by Juia Holze
Olaf Hartig, Senior Associate Professor at the Department of Computer and Information Science of Linköping University

Olaf Hartig is a Senior Associate Professor at the Department of Computer and Information Science of Linköping University. Additionally, he is an Amazon Scholar working with the Neptune graph database team at Amazon Web Services. Olaf is interested in problems related to the management of databases and knowledge, with a focus on graph data and data that is distributed over multiple, autonomous and/or heterogeneous sources. He received the 2019 SWSA Ten-Year Award for a research paper which pioneered the idea of traversal-based query execution as well as querying Linked Data on the Web in general, and for his PhD thesis on the foundations of Linked Data queries, he was honored with the SWSA Distinguished Dissertation Award in 2015. At this year's SEMANTiCS he will provide the keynote to  DBpedia Day 2022 in Vienna.

You have been working on different topics like Semantic Web, Linked Data, graph data, data quality and many more. In which of these areas is your main focus in your current studies?

Olaf Hartig: It is a bit of a mixture of many of these things. The overall theme of my research is virtual data integration over heterogeneous data sources, and I am considering Semantic Web standards such as RDF, SPARQL, OWL, and SHACL as an excellent basis for this work. Virtual data integration is about scenarios in which bringing data from multiple sources together into a central repository is not a suitable or desired option, and yet one wants to be able to answer queries or run analyses for which the data from these sources needs to be combined with one another. Heterogeneity of the data sources can be in many dimensions such as the underlying data models, the schemas, and the data access interfaces. For instance, one of the concrete problems that we are currently working on is how can the capabilities and limitations of different data access interfaces be considered and perhaps leveraged in a systematic manner by query processors for such virtual data integration settings. Another, related strand of research that I am currently following is about interoperability between the two prevalent graph data and knowledge graph technology stacks, the RDF-based one and the Property Graphs-based one. The RDF-star extension to RDF that I have been working on provides a suitable building block towards achieving such interoperability.

As you received the SWSA 10-years award for your 2009 paper about executing SPARQL queries over the Web of Linked Data, where do you see yourself in 10 years from now regarding your work?

Olaf Hartig: 10 years feels like an eternity. It is hard to make a guess. Anyways, I still love to work on research questions and fundamental problems related to data management in the context of knowledge representation and the Web. So, I can see myself still working on such questions in 10 years from now, and also helping PhD students doing successful research on such topics.

What are the hot topics our communities should pay attention to when it comes to the advancement of computational semantics? How can DBpedia Day and the Semantics Conference help to make it happen? 

Olaf Hartig: There are a lot of discussions around the so-called modern data stack these days, which essentially is about leveraging and piping together cloud services for data warehousing and, ultimately, data analytics and business intelligence. However, from what I have seen, there is not much about capturing explicitly the semantics of the data in these approaches. At least, knowledge graphs are making an appearance, but mostly in the context of data cataloging. I think there are opportunities both for building on a modern data stack approach in semantics-enabled application scenarios and for extending such approaches with semantics-related capabilities.

Another, somewhat orthogonal idea that often shows up in discussions around the modern data stack is that of the data mesh. The core of this idea is to move away from centralized data governance within enterprises towards a more federated, domain-focused approach in which datasets and data services are treated more like products, similar to software projects. While it should be clear that knowledge graphs and semantic technologies are highly suitable for data cataloging in this context, I think that the ontology engineering community can also play a leading role in developing methodologies to data modeling in such a decentralized context where enterprise-wide shared concepts need to emerge alongside the concepts specific to particular data products. Events such as DBpedia day and the SEMANTiCS conference can help by putting such topics on the agenda.