Organized by:
AI approaches based on machine learning have become increasingly popular across all sectors. However, experience shows that AI initiatives often fail due to the lack of appropriate data or low data quality. Furthermore, state-of-the-art AI models are widely opaque and suffer from a lack of transparency and explainability. Semantic AI approaches combine methodology from statistical AI and symbolic AI based on semantic technologies such as knowledge graphs as well as natural language processing, while incorporating mechanisms for explainable AI. Semantic AI requires technical and organizational measures, which get implemented along the whole data lifecycle. While the individual aspects of semantic AI are being studied in their respective research communities a dedicated community focusing on their combination is yet to be established. The proposed workshop intends to contribute to this endeavour.
Program Outline:
In many modern statistical approaches to AI, raw data is the preferred input for (Machine Learning) models. In some areas and in some cases, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has invested decades of work on just this problem: how to use graphs to represent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. To build scalable, transparent and explainable AI in various domains where such heterogeneous is available, we need to collaborate with domain experts to develop a) relevant an high-quality knowledge graphs as well as b) appropriate data science and ML methos to constantly enrich and analyse these graphs. In this talk, i will give an overview of collaboratory current work on this continuous enrichment of Knowledge Graphs, specifically in the domains of Digital Humanities and IOT and will discuss some promises, challenges and solutions identified.
Victor de Boer is an Associate Professor of User-Centric Data Science at Vrije Universiteit Amsterdam and he is Senior Research Fellow at the Netherlands Institute for Sound and Vision. His research focuses on data integration, semantic data enrichment and knowledge sharing using Linked Data technologies in various domains. These domains include Cultural Heritage, Digital Humanities and ICT for Development where he collaborates with domain experts in interdisciplinary teams. He is currently involved in the European projects InTaVia and InterConnect as well as the Dutch national projects Pressing Matter, Clariah and Hybrid Intelligence.
In this paper, we report on our proposed approach towards a standardized description for systems combining machine learning (ML) components with techniques developed by the Semantic Web (SW) community (SWeMLS), which is one of lessons learned from our large-scale survey (476 papers) on the topic. We elaborate the key information that should be described of SWeMLS and selected methods to support its documentation.
Weak supervision (WS) is an alternative to standard supervised learning that overcomes the need for manually annotated training data. WS uses heuristics, knowledge repositories, or high-level constraints to obtain (lower quality) labels more efficiently or at higher levels of abstraction. This talk will give an overview of applications of WS to information extraction from text. Potential challenges of these approaches, and algorithmic solutions for overcoming them, will be highlighted. The talk will conclude with a high-level characterization of the software framework Knodle for weakly supervised learning with arbitrary neural networks.
Benjamin Roth is a professor in the area of deep learning & statistical NLP, leading the WWTF Vienna Research Group for Young Investigators "Knowledge-Infused Deep Learning for Natural Language Processing". Prior to this, he was an interim professor at LMU Munich. He obtained his PhD from Saarland University and did a postdoc at UMass, Amherst. His research interests are the extraction of knowledge from text with statistical methods and knowledge-supervised learning.
AI approaches based on machine learning (ML) have become increasingly popular across all sectors. However, experience shows that AI initiatives often fail due to the lack of appropriate data or low data quality. Furthermore, state-of-the-art AI models are widely March 2022 opaque and suffer from a lack of transparency and explainability. This means that even if the underlying mathematical principles of these methods are understood, it is often unclear why a particular prediction has been made and if meaningful and grounded patterns have led to this prediction. Thus, there is a risk that the AI learns biases from the data or makes its decisions based on wrong or ambiguous information.
Semantic AI approaches combine methodology from ML-based AI and symbolic AI based on semantic technologies such as knowledge graphs (KG) as well as natural language processing (NLP), while incorporating mechanisms for explainable AI (XAI). Semantic AI requires technical and organizational measures, which get implemented along the whole data lifecycle. KGs, being one of the core elements of symbolic AI, provide a human-understandable and machine-processable way to model and reason over complex relationships of entities of interest. They also provide means for a more automated data quality management.
The interaction of ML-based and KG-based approaches can manifest in different forms, including the provision of rich background information for the input data by interlinking to existing external KGs, using the real-world domain knowledge in the KG to guide the training process of ML models, and enhancing causal inference to verify and/or assist the prediction. This close interconnection is not only capable of increasing the prediction performance of the AI model, but also of improving its robustness. Furthermore, the interaction between ML and KG leads to more understandable and interpretable predictions, as the ML approaches can be exploited to symbolise their intrinsic knowledge by generating new entities and relations within the KG and thus extending it, and the KG can be leveraged to generate human-understandable explanations for automatically produced predictions.
While the neuro-symbolic nature of semantic AI systems supports the understandability and interpretability of the produced predictions, more focused approaches originating from the field of XAI enable truly explainable decisions. XAI focuses on integrating explainability mechanisms in complex black-box models, either post-hoc (on already trained models) or during training (self-learned explainability). The integration and close interaction of these mechanisms with the neuro-symbolic system shall overcome existing limitations of state-of-the-art XAI methods and provide explanations which are directly formulated in a domain-specific language.