Global domain knowledge graphs to bring intelligence to data integration

September 07, 2018 by Stefan Summesberger

Ontotext is a gold sponsor of SEMANTiCS 2018. The company is based in Sofia (BUL) and was founded in 2000 by Atanas Kiryakov. The BBC, Financial Times and Standard & Poor’s are just three organizations on the long list of Ontotext’s satisified customers. We had a brief talk with founder Atanas who shared some very interesting insights about the areas Ontotext works in, their signature solution GraphDB, domain knowledge modelling and his talk at SEMANTiCS.

What are the focal areas of your Ontotext’s business?

We are best known as the developer of GraphDB^TM – a leading semantic database engine. Our second major business line is the development of enterprise knowledge management solutions that involve big knowledge graphs.

We also have a third growing business line: managed services. Many of our customers are happy to transfer to us the full responsibility for operating the IT systems that we have developed for them. Often, this is more efficient than providing support to the customer’s own IT team. Ontotext’s operations team directly executes service level agreements determined by the business requirements for availability, response time, regular updates, data quality, text mining accuracy, etc.

What kind of solutions are you developing with knowledge graphs?

At a technology level, most of our solutions represent some sort of enterprise content management or data management. The fancy parts of it are the analytics, visualizations and insights. But before that there is the heavy lifting – data cleaning and integration and information extraction from text.

At the end of the day, all big enterprises struggle to get better access to their information assets – typically scattered across different systems. Be it for decision-making, sales, customer support or other purposes. In some businesses, risk management and regulatory compliance are of tremendous importance. For instance, banks use our technology to deal with anti-money-laundering and regulatory reporting. Pharma companies apply it to help them handle faster and better inquiries from agencies like FDA about side effects of drugs.

The biggest global players are the early adopters of such solutions based on semantic technology. They come to us when they have tried the best mainstream data management and content management technologies and they have recognized that they need semantics to further increase their efficiency and competitiveness. I believe that this technology will soon become accessible to a much broader set of businesses. That’s what Ontotext is working for – to democratize semantic technology.

In which sectors do you see the biggest demand for semantic technology?

We see plenty – from financial services to pharma and government. But let me mention two that we have been focusing on recently: Publishing and Market intelligence.

One typical business application is content packaging and re-use for publishers. Imagine you are a media like BBC, a business newspaper like FT or a scientific publisher like Elsevier. They all sit on top of enormous volumes of content, but find it hard to compete against providers of free content. To make this worse, web platforms such as Facebook and Google take away substantial part of their profit. It is a matter of survival for publishers to make their content easier to discover, to better engage their readers with personalized recommendations or to sell tailor-made data feeds to businesses. This is what Ontotext’s solutions do for them.

Another business application is data enrichment and linking for Market intelligence. Imagine you are a business information agency that wants to build a company profiling service for Latin America. Company data providers don’t offer good coverage for this region – they provide decent information for companies with revenues below $50M only in the developed markets. Here, Ontotext offers a solution where we integrate the available company data from different sources into a big knowledge graph and use this graph to analyze local business news to extract more company information. This new information is precious. It gives such agencies a competitive edge. We enrich the knowledge graph with it and this is the final product – a unique body of knowledge about a specific market.

What is special in the way you use knowledge graphs in such solutions?

The Ontotext Platform is a product that builds on top of our semantic database engine. Its key capability is text mining using big knowledge graphs. The most basic service is semantic tagging – one can also call it concept extraction, named entity recognition or semantic annotation. The result is rich metadata that interlinks documents or other unstructured content with the big knowledge graph. This semantic metadata boosts the performance of practically every content management activity such as search, exploration, classification, recommendation, etc.

Our knowledge graphs provide a rich global context about the corresponding domain, be it international business, oil and gas, pharma or the WW2 Holocaust. We build domain-specific graphs that combine big volumes of open data with commercially available datasets to provide a global view on the domain. Then, on top of them we layer text mining pipelines tuned to use these knowledge graphs and in this way achieve maximum accuracy for documents in this domain. We call this combination of a knowledge graph, text-mining and other analytics tuned for the domain - a domain knowledge model.

In most of the specific projects, we combine global domain knowledge graphs with the proprietary data of the enterprise. We also use ontologies to provide specific views, generalizations, reclassifications and abstractions on top of the general-purpose schemata. Each enterprise receives a knowledge graph that benefits from rich global knowledge, but uses its precious business wisdom to interpret and analyze the data.

This sounds like an ambitious vision, but what are the technological challenges when you try to implement it?There are three big challenges: merging data from various sources; recognizing concepts and entities in text; and keeping the knowledge graph up-to-date with updates from the different sources.

Matching concepts and entities across data sources and recognizing their mentions in texts require the disambiguation of their meaning. For instance, to be able to distinguish between Paris, the capital of France, and Paris, Texas or Paris Hilton. This is something that comes easy to people, but computers cannot do on their own, because an average graduate has a level of awareness about a wide set of entities and concepts that computers do not.

Therefore, we build big knowledge graphs and apply cognitive analytics to them to provide entity awareness – semantic fingerprints derived from rich entity descriptions. For instance, Ontotext’s Company Graph provides entity awareness about all locations on Earth as well as the most popular companies and people. We have also built such a knowledge graph in Life Sciences and Healthcare.

What can we expect from the next version of GraphDB?

The next version will be released at end of September. Its key new features are reconciliation and semantic vectors.

Reconciliation makes it easier for the users to interlink their data with an existing knowledge graph. One scenario is to interlink the data between two proprietary sources, using a proprietary reconciliation service. Another one is to use a public service to match your entities to DBPedia, Wikidata, Geonames or another public data source. This is essential for enabling data architects to easily benefit from the links in the linked data.

Semantic vectors allow one to use statistical techniques like embeddings on top of RDF graphs. This is where mainstream semantic technology meets the modern machine learning-based AI techniques. We are planning to add more of this to GraphDB in the near future.

The newest version will include a major performance improvement: small transactions on big repositories will be much quicker. We have also implemented faster SPARQL federation between two repositories in one and the same GraphDB instance. This enables scenarios where a single database instance manages data with different ownership, access rights and update cycles in different repositories, but one can still use them efficiently via federation. This is a big deal for multi-tenant cloud deployments.

There are also several new capabilities that would help operations teams and improve GraphDB’s exploitation in big enterprises. You will hear about them soon.

What is the USP of GraphDB?

GraphDB is the best graph database for master data and metadata management. When it comes to such applications, enterprises prefer RDF-based engines to those based on Property graphs for an obvious reason – standard compliance. It is also important that RDF is designed for linking and merging data developed without centralized control – which is the case of the data silos in large organizations. Data diversity within a single organization with 200 thousand employees is bigger than the diversity across the few thousand linked open data datasets we see on the LOD bubble diagram.

GraphDB is designed from day 1 to deal with very big knowledge graphs – the global domain knowledge graphs that I already mentioned. Since 2004 it has been optimized to be able to index very big volumes of metadata and to efficiently serve hybrid queries combining structured constraints with full-text search, inference and graph analytics. We don’t have real competitors for such scenarios. You either have an engine that is designed for this or you get very awkward performance. The later happens, for instance, when you implement RDF and SPARQL support on top of a document database. It works sort of OK when dealing with ontologies of hundreds of classes or vocabularies with thousands of terms, but falls apart when challenged with a knowledge graph of billions of facts.

This is not your first time at SEMANTiCS Conference. Are there some stories that you associate with SEMANTiCS, some takeaways or leads?

Yes, last year we had a great time in Amsterdam. Wonderful opportunities to grab a beer with partner technology vendors like Semantic Web Company or to get in touch with potential new clients from the industry. There were also plenty of interesting meetings with research people. SEMANTiCS is unique in making a balanced blend of these three audiences.

You will be giving a talk at SEMANTiCS. What will be the topic?

The title of my talk is: Analytics on big knowledge graphs deliver entity awareness and help data linking.

I will share our vision on how global domain knowledge graphs can bring intelligence to data integration. And also how this enables decision-making based on both global data and proprietary knowledge.

I will use GraphDB to demonstrate analytics on this knowledge graph of 2 billion triples. This is our Company Graph, which combines several data sources and interlinks their entities to more than 1 million news articles. The demonstration includes several cognitive capabilities: importance ranking of nodes, based on graph centrality; popularity ranking, based on news mentions of the company and its subsidiaries; retrieval of similar nodes in a knowledge graph and determining distinguishing features of entity.

Looking forward to see you at my presentation at SEMANTiCS, 10:15 on Thursday.

Discuss the potential of domain knowledge models with Atanas at SEMANTiCS 2018. Register now!

About SEMANTiCS

The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, and understand its benefits and know its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers, and researchers, from organisations ranging from NPOs, universities, public administrations to the largest companies in the world. http://www.semantics.cc

Search form

Global domain knowledge graphs to bring intelligence to data integration