Unleash the Triple: Leveraging a corporate discovery interface. The OECD case

Industry

“O.N.E Sight”, a fully semantic reading assistant developed in the OECD, relies on a semantic layer to enable concept searching. It crawls over millions of resources, consistently tagged to enable discovery of new facts. The use of corporate taxonomies and ontologies, allows you to search using your preferred language and/or vocabulary without having to worry about the language the information is written in, or the words contained in the content. Search in English, find results in French.  Search for articles on Agriculture in Australia, and be prompted with information on coastal fishery in Sydney. The semantic layer, allows for the use of discovery algorithms to expand your search with ideas and suggestions and allows you to search for something and find something else of interest that you were not aware of. Search for agriculture in Australia and be prompted to discover content on Agritourism in New South Wales.

The talk and demonstration will highlight the development, at the Organisation for Economic Co-operation and Development, of “O.N.E Sight”,  a fully semantic reading assistant, which unleashes the power of the triples, the result of 3 years of capacity building, developments and cross functional team work.
Analysts use “O.N.E Sight” to assist with the drafting of reports. They identify, view in context and extract relevant knowledge, regardless of language, contained within large volumes of structured and unstructured information from  internal or external sources. Content is enriched at fragment level with semantic services  based on organisation-wide or domain specific ontologies and specific linguistic rules used to identify multi-lingual knowledge contained within texts.  Analysts have provided feedback on time savings and the discovery of sources they would not have known of otherwise.
We will outline the project approach, the learning curve the team went through, the intellectual and technical challenges faced as issues linked to new ways of handling information, silos, traditional text-indexation, lack of text fragmentation and semantic links, reconciliation of semantic and textual searches, representation issues and more had to be addressed.
We will describe the long march towards semantic annotation and the emphasis placed on the quality of the tagging.  This will include: i) development, maintenance and use of the OECD central Taxonomies and Ontologies  in the semantic analysis tools,  ii) hazards of semantics (fuzziness, context, acronyms and disambiguation), iii) creation of a golden corpora, annotation quality testing, multi-view annotation graphs and iv) development of tools to identify ‘knowledge nuggets’, such as socio-economic indicators, by tagging semantic relationships within texts. The methodology used to develop these quality tagging applications, persistently returning high precision and recall statistics (around 95%) to ensure reliable results enabling the use of the tags in a production environment, will be described.

This discovery framework relies on the combined use of rdf triples and XML in a MarkLogic environment.  A full metadata-based approach enabling atomisation for searching, contextualisation of fragmented resources for viewing and robust reasoning, with acceptable response times for rendering.
“O.N.E Sight” was developed by the Knowledge Management team in close collaboration with subject matter experts/stakeholders that directly benefit from the application. The application is tailored to their needs and incorporates feedback in an Agile way.
Future developments include adding new resources and semantic tags, a mobile friendly version and developing externally facing OECD semantic applications based on “O.N.E Sight”.

Speakers: