Mike Bergman is one of the leading evangelists when it comes to Knowledge Graphs and AI. As this year's SEMANTiCS conference focus exactly on that topic, the time was right to get a 360-degree overview on the topic from Mike. Andreas Blumauer held the Interview.
SEMANTiCS: You are a leader in the use, integration and exploitation of knowledge graphs in a wide range of industries and knowledge areas. In your opinion, which industries and areas are already the most developed, and which are currently catching up fastest when it comes to implementing knowledge graphs?
Mike: I think four different sectors are the most advanced in knowledge graphs. The most visible is the biomedical community, best exemplified by the Open Biological and Biomedical Ontology (OBO) Foundry [http://www.obofoundry.org/] initiative, among other biomed examples. They have dozens of ontologies or knowledge graphs and vocabularies that interoperate in this community, ranging from worms and ticks to diseases and anatomy, and everything in between. A second similar, but smaller and more diverse, example is what is happening in the museum and cultural heritage space, aided by the CIDOC-CRM common vocabulary.
A more background, but pervasive, sector is what is happening with intelligent virtual assistants and online search, exemplified by Google’s Knowledge Graph and the Siri, Cortana, recommendation systems, and the like available from many major online information providers. Almost all of these systems have taken Wikipedia as their starting points, but have grown and enhanced that in their own ways to produce a bestiary of competing, proprietary knowledge graphs. The last advanced sector I will mention is the so-called three-letter agencies, the covert intelligence sector. Of course, we hear little about these applications, but they were some of the earliest and today are the perhaps even more advanced than the virtual assistants from the tech companies.
Three sectors that have been a bit disappointing but are trying to catch up are in financial services (and reporting), health care, and information services. We also have some great company examples in oil and gas and manufacturing and fulfillment, but these are characterized more by individual company leaders than entire sectors.
SEMANTiCS: Your recently published book 'A Knowledge Representation Practionary' can probably be called a 'Magnum Opus'. The focus is on the connection between Charles Sanders Peirce and his teachings with the latest AI technologies. This may sound rather theoretical to many potential readers and practice-oriented users. How do you counter that argument?
Mike: Well, I have sometimes called Peirce (pronounced “purse”), the 19th century American logician, mathematician and polymath, as the patron saint of knowledge representation. His writings over 50 years still are relevant to how we understand subsumption relationships and hierarchies, the fallibility of knowledge, natural language, and signs and the perception of the world. But, in most practical terms, I see three areas of his contributions as being the most relevant to today’s issues in knowledge-based artificial intelligence.
These are all essential considerations to understandable AI. My hope is that study of Peirce will continue to surface important insights into knowledge representation and AI. No one ever said that modeling human intelligence would be easy or can be done without strong theoretical underpinnings. Peirce, in my view, provides the framework for making this happen.
SEMANTiCS: Recently, AI researchers and users have begun to link semantic technologies, knowledge graphs and ontologies with machine learning techniques. What you anticipated years ago and called 'Knowledge-based Artificial Intelligence', or KBAI for short, is now actually becoming real. What exactly are we talking about?
Mike: All of us have already had direct experience with KBAI. Anytime we use a recommender system (such as Amazon’s or Pandora’s) or we pose a query and then get an answer from an intelligent virtual assistant (such as Siri, Cortana, Alexa, Bixby, etc.), we are experiencing knowledge-based artificial intelligence. Entity recognition systems, such as when we extract key names and concepts from documents, or the infoboxes we get on Wikipedia or Google searches, are often additional examples of KBAI.
In essence, KBAI systems arrange concepts logically to one another, which as they grow take the form of a graph (a network of nodes connected by edges), and they also use semantics that relate concepts hierarchically and with various labels that cover the variety of names (including multiple languages) that we use to refer to those concepts. It is the combination of this graph structure with semantic understandings that gives us the ability to model and capture human language, which is how knowledge is expressed and conveyed. A knowledge graph is also known as an ontology. While connected graphs of many forms exist, such as ones that represent telecommunications networks or transit systems, what makes a knowledge graph special is its explicit modeling of human language and knowledge.
KBAI takes the idea of a knowledge graph one explicit step further. We add features and structure to the knowledge graph such that a computer can automatically extract subsets and focused corpuses (by adding text and other characterizations to the nodes and edges in the graph) that can “feed” machine learners directly. A machine learner takes existing information to inductively infer new information. In supervised machine learning, labeled training sets of known instances are used to “train” models to recognize new, unknown instances. In unsupervised machine learning, chunks of relevant unlabelled information are used to segregate that information into logical clusters. KBAI can support both AI techniques.
SEMANTiCS: Another comprehensive work that you and your team have recently published is KBpedia. This immense knowledge graph, available under CC BY 4.0, links and extends sources such as Wikipedia, Wikidata, schema.org, DBpedia or GeoNames. What tips can you give to potential users who are just starting to work with knowledge graphs, how can they benefit most quickly from KBpedia? And how does KBpedia differ from similar projects?
Mike: We designed KBpedia for two express purposes: to make AI using knowledge graphs cheaper and easier; and, to help disparate data interoperate. Though, as you say, KBpedia is large, with about 55,000 reference concepts, its design is simple and modular. We use Charles Peirce’s universal categories to design the upper KBpedia Knowledge Ontology, or KKO, which has fewer than 200 concepts. Underneath the KKO scaffolding we attach about 80 typologies that cover the diversity of human knowledge domains. These are where we map the contributing knowledge bases, such as Wikidata or Wikipedia (or any mapped knowledge base).
The ‘core’ of these typologies do not overlap (are “disjoint” from) one another, which eases slicing-and-dicing the knowledge graph. The graph is also coherent and computable, which means one can reason over and select various ‘slices’ of interest. Through the links to the contributing knowledge bases, one can then extract large sets of instances virtually automatically. These extracted slices with their mapped instances provide labeled training sets for use in supervised machine learning, eliminating the typical 85% of effort normally required for this form of AI. KBpedia can do this for about 45,000 fine-grained entity types. (Unsupervised machine learning can also be staged using other methods.)
So, out of the box, KBpedia has much power and is cheaper to create entity recognition models or other classifiers than other approaches. But the typology design also means that new, specific domain slices can also be added rapidly to the system, leading to fast tailoring of the structure for specific industries or companies.
Each of the 55,000 concepts in KBpedia is also accompanied by rich labels and text descriptions, useful for search and tagging and to create various word embedding models. The graph structure itself also is suited to various graph embedding models. All told, KBpedia offers an unparalleled set of features useful to machine learning. Its modular design and build methods also test for coherence and consistency when new concepts are added to the system. The entire structure can be modified and then re-built from scratch in less than an hour on a conventional workstation, leading to rapid modifications, testing, and updates.
SEMANTiCS: Semantic AI (or KBAI) is about executing algorithms and machine learning directly on graphs. What methods and technologies do you see in the forefront? Who has the best cards in hand to succeed in the market?
Mike: Let me answer these questions in reverse order. Currently, the major tech companies that have built bespoke knowledge graphs have the best cards. They have the resources and the head start of doing KBAI.
Over time, however, we saw that all of these players were essentially following the same path. They would start with the information in Wikipedia, add their own information, and apply massive dollar and person resources to massage the graph to get clean labels and to achieve their own AI purposes. To my knowledge, most, if not all, of the online virtual assistants and question-answering systems have been built, and duplicatively at that, with this labor-intensive approach.
Our idea with KBpedia was to provide a starter set open to any player. (While we originally developed and sold KBpedia as a proprietary system, we have now made it available as open source.) Rather than reinventing the wheel, we felt that a common starting basis would make creating KBAI systems open to more enterprises at a quicker and cheaper entry price.
As a result, I don’t know who the next winners will be. I’d like to hope it is some enterprise out there that sees the usefulness of KBpedia to its own enterprise KBAI interests and is able to bridge off of that starting basis to achieve its aims.
However, I do know who that next player is not likely to be: me. Over the past decades I have built and grown a number of companies. Those days are now past. I am more interested in writing and helping others with more energy and fire-in-the-belly to go out to slay the dragons. I want to continue to hone KBpedia and my writings such that others may continue the battles.
SEMANTiCS: The application of Semantic Web technologies in companies is based on different requirements and assumptions than the Semantic Web in the WWW. For example, there are different views on data quality or governance. What does this mean for KBpedia or for software such as PoolParty Semantic Suite, which is mainly used in the corporate environment?
Mike: I’m glad you added this question, Andreas, at my request. The essence of using KBpedia for specific enterprise purposes is to capture the specific knowledge domain of that enterprise. While KBpedia, as is, provides a useful starting point, it is unlikely in its existing form to be fully responsive to the needs of any given enterprise.
Further, as you well know, most enterprises are not familiar with semantic technologies or query languages like SPARQL, even though there is rich relational data and SQL expertise. We find that we can train librarians or subject matter experts in the role and use of knowledge graphs, but it does take dedicated training and, frankly, better and easier-to-use interfaces than what we see with semantic technologies in the academic environment.
That is where a product like PoolParty comes in. PoolParty creates structures that are exactly akin to the typologies that are used in KBpedia. Further, through your years of client experience and refinement, you have put in place user interfaces and supporting tools and functions that make creating, managing, and updating those structures easier. I think there could be a very fruitful marriage of KBpedia and PoolParty useful to many client needs.
SEMANTiCS: Mike, the Semantic Web Community is looking forward to your next projects. What can we expect, in which direction are your plans going at the moment?
Mike: Well, as I mentioned, I no longer see my future as building my own companies. I will continue to consult where the client needs match my interests, but I am increasingly following my own lights and intellectual pursuits. Fortunately, prior successes grant me the freedom to pursue that path.
I have another book centered on Peirce on my horizon, which may or may not come to fruition. Much of my current time is researching that possible effort. In any case, my writing efforts will continue to include postings on my blog, which has been active now for fifteen years.
I remain committed to continue to refine and improve KBpedia. There are always errors and misassignments that need to be surfaced and corrected.
As for substantive improvements, I very much want to expand on the properties organization and coverage of KBpedia. The system presently maps to nearly 5,000 Wikidata properties, and covers all of the properties in the DBpedia ontology and schema.org. However, like the semantic Web effort as a whole, property coverage is not well organized and not widely used. I’d like to improve the formalization of property coverage and organization, a step I think is essential before true instance data interoperability using semantic technologies can occur. Today, it is rare to find a semantic Web knowledge graph that goes beyond the most rudimentary of properties.
Another area of KBpedia interest is to expand its mappings. Census data and product classification schemes (such as the UNSPSC) are two near-term candidates.
A concluding area I would like to mention is the application of knowledge graphs to workflow modeling and management. There are many interesting challenges in knowledge graph technology as applied to operational and dynamic systems.
SEMANTiCS: Thank you Mike for this interview.
Michael K. Bergman is a senior principal for Cognonto Corporation, and lead editor for the open-source KBpedia knowledge structure. For more than a decade, his AI3:::Adaptive Information blog has been a leading go-to resource on topics in semantic technologies, large-scale knowledge bases for machine learning, data interoperability, knowledge graphs and mapping, and fact and entity extraction and tagging. For the past twenty years Mike has been an entrepreneur, Web scientist, and independent consultant. For the decade up to 2018, Mike was the CEO of Structured Dynamics LLC, which he co-founded with Fred Giasson. Mike has held C-class positions and was a founder of the prior companies Zitgist LLC, BrightPlanet Corporation, VisualMetrics Corporation, and TheWebTools Company. These companies provided notable market advances in semantic technologies, data warehousing, the deep Web, large-scale Internet databases, meta-search tools, and bioinformatics. Bergman began his professional career in the mid-1970s as a project director for the U.S. EPA for a major energy study called the Coal Technology Assessment. He later taught in the Graduate School of Engineering at the University of Virginia, where he was a fellow in the Energy Policies Study Center. He then joined the American Public Power Association in 1982, where he rose to director of energy research. APPA’s pioneering work with small computers sparked Bergman’s transition to information technologies. Before entering industry, Mike was a doctoral candidate at Duke University in population genetics.