The past decade has seen significant investments in data warehouses, data lakes and data lake houses for a sizable number of organizations with the goal to integrate and access data at scale. Despite these new technical capabilities and strong leadership support, many organizations continue to face challenges when it comes to successfully understanding or retrieving meaningful insights from their data or achieving economies of scale and
alignment across the multiple data initiatives, many of these new projects are failing to demonstrate sufficient business value.
To develop scientific and patent text mining tools for students, researchers, and patent experts, we need to understand their daily work, as well as the data they are working with. The latter includes scientific literature, technical and patent documents. Working with them presumes a good understanding of the linguistic characteristics of the text genres. We believe scientific literature, technology and data should be findable to everyone and not just to those who know where to look and how to search.