Learning ontology classes from text by clustering lexical substitutes derived from language models

Many tools for knowledge management and the Semantic Web presuppose the existence of an arrangement of instances into classes, i. e. an ontology. Creating such an ontology, however, is a labor-intensive task. In this paper, we present an unsupervised method to learn an ontology from text. We rely on pre-trained language models to generate lexical substitutes of given entities and then use matrix factorization to induce new classes and their entities. Our method differs from previous approaches in that (1) it captures the polysemy of entities; (2) it produces interpretable labels of the induced classes; (3) it does not require any particular structure of the text; (4) no re-training is required. We evaluate our method on German and English WikiNER corpora and demonstrate the improvements over state of the art approaches.


Available material for this talk.