Let us introduce another leading expert in semantic technologies. Stephen Buxton is director of product management at MarkLogic, a pioneer in non-relational enterprise databases. How companies can profit from Big Data will be an important question at SEMANTiCS 2015. NoSQL databases are essential to make data work. Join Stephen at his conference talk.
This interview will introduce you to the world of databases that are made for semantic solutions.
Slowly but surely there is a paradigm shift in enterprise databases happening: From relational to non-relational databases. The driver is the increasing importance of Big Data. Could you please explain the difference of the two approaches in data processing?
Stephen Buxton: We are entering an ‘any structure’ data era. With growing volumes of multi-structured data flooding into our datacentres, organisations are finding that the relational databases they have relied on for the last 30 years are now too limiting and inflexible. An incredible 88 per cent of data globally is unstructured and consequently unsuited to relational databases; IT departments that rely on relational databases are also struggling with some of the 12 per cent of heterogeneous structured data. Relational databases start to lose their lustre when there is a requirement to dig deep inside the data to understand context, analyse details and assemble customer reports and views. This is because the relational approach to handling information requires data to be formatted to fit into rows and columns. Whether it’s a customer, derivative trade, or legal document, it has to be shoehorned into a shape that can be represented by the underlying relational database system. And while this can work, it clearly has limitations. For example, it assumes that organisations know up front the kinds of questions they want to ask of their data down the line, which is clearly unworkable in many instances.
A faster and more efficient way to manage data in a variety of formats, Enterprise NoSQL databases don’t need rigid schemas or formats to be predefined before data is loaded. Unstructured data, or any data that does not have a pre-defined data model, including social media, health records, financial documents, journals, videos, and web pages, can be loaded into a NoSQL platform as is. And, with some of today’s Enterprise NoSQL offerings, organisations do not have to compromise on ACID transactions, as well as other enterprise qualities such as fine-grained entitlements, point-in-time recovery and high availability.
Semantic features are very important in the MarkLogic market offer. Could you please describe your semantic components? Where you are heading with semantics in your product development?
Stephen Buxton: MarkLogic has dramatically transformed the semantics landscape. By offering enterprise features and a mix of documents, values, and triples, we are enabling large corporations in sectors such as governments, media, publishing, entertainment, financial services and pharmaceuticals to re-evaluate semantics and deploy our platform to great effect.
MarkLogic Semantics lets you store and manage facts and relationships as RDF triples, and query them with the standard SPARQL query language. With Automatic Inference you can discover new facts and relationships within billions of triples. You can also simplify your data model – there's no need to model every possible relationship, you can just leave it to the engine to infer them.
MarkLogic offers the only Enterprise NoSQL database platform that allows all types of data (e.g. documents, values and triples) and their respective indexes, to sit in the same database so they can be loaded, updated, and queried together. More than that, MarkLogic lets you intertwingle those data models so that you can do things like embedding triples in a document, or linking to documents from triples. This combining of triples with other data types enables richer applications than ever before, and is being used in risk management, decision support, knowledge management, reference data management and many more use cases. It ultimately leads to faster, better decisions – and increased revenues.
For the future, we‘re looking at three broad areas to continue to support our customers and provide them a robust experience with and for their data. First, Big Data just keeps on getting bigger and complex, and MarkLogic will stay ahead by offering more performance and scale with each release. Second, we’re looking at other interesting combinations of documents, values and triples. Examples include bitemporal SPARQL for reference data and provenance tracking; and Geospatial SPARQL for rich location queries. Third, we’re building out the ecosystem around underlying MarkLogic capabilities with close partnerships and tooling, making it even easier to better manage and integrate all types of data so customers can quickly build end-to-end solutions.
Please share your semantics project insights with us. How do your customers profit from implementing MarkLogic? What changes for them?
Stephen Buxton: BSI, the UK’s national standards body and developer of national and international standards, has been a MarkLogic customer for over five years. Last year the organisation incorporated MarkLogic’s semantics module in its new Compliance Navigator application. This application, for the medical devices market, helps support BSI’s customers to bring devices safely and effectively to market, by discovering which standards and regulations apply to their particular products. It allows them to interpret and monitor changes to those documents through their life-cycle and supports them in achieving compliance. In this particular scenario, medical device customers don’t always know what they should be searching for, with each document type available from different providers in varied formats, and with the prevailing limited searchability making effective discovery a challenge. To resolve this, BSI turned to MarkLogic, using an industry recognised taxonomical classification and semantic enrichment across the content set to enable guided search. MarkLogic has helped BSI to take some real steps towards meeting the organisation’s vision to support businesses in achieving their goals and leading the way in standards. With Compliance Navigator, powered by Marklogic, BSI has a product of richness and depth that is substantially different - offering unparalleled discoverability capabilities for clients far beyond any offered in existing services.
CABI, the not-for-profit intergovernmental organisation focused on improving people’s lives and livelihoods by solving problems in agriculture and the environment, turned to MarkLogic in 2015 after being impressed by the flexibility, scalability and agility of the software. The organisation’s relational database was too rigid a structure for what the CABI team want to do next – for example, testing hypotheses and providing early warning tools for the PlantWise programme, a vehicle for collecting, aggregating and publishing agricultural data. It can’t predict now the questions users might want to ask of this data in the future and need to be able to throw its multi-format unstructured data – images, geo-spatial etc. - into a bucket, plus ontologies, and then ask questions of the data as they arise. Before CABI chose MarkLogic, it was shown a proof of concept, which showed the art of the possible. The CABI team realised that Enterprise NoSQL gave them the tools to be innovative, and the possibilities are endless. In terms of the future, CABI is very excited by the prospect of being able to do predictive analysis on the spread of particular crop diseases or on the impact of invasive species. The team has had some early investigations into how it can use semantics to achieve this; e.g. if pest A attacks crop B in country C, what is the likelihood of it attacking crop D in country E which has the same climate and soil types as country C?
Thanks for sharing your insights with us.