Jusletter IT

LEXIA: A Data Science Environment for Semantic Analysis of German Legal Texts

  • Authors: Bernhard Waltl / Florian Matthes / Tobias Waltl / Thomas Grass
  • Category: Articles
  • Region: Germany
  • Field of law: Legal Informatics
  • Collection: Conference Proceedings IRIS 2016
  • Citation: Bernhard Waltl / Florian Matthes / Tobias Waltl / Thomas Grass, LEXIA: A Data Science Environment for Semantic Analysis of German Legal Texts, in: Jusletter IT 25 February 2016
The analysis of legal data using information technology, more specifically text and data mining algorithms, has become very attractive in the field of legal informatics. Additionally, legal science and practice consist of data-, knowledge-, and time-intensive tasks, which have always been in the focus of legal informatics. This paper contributes a data science environment, which is in particular suited for legal texts, e.g. documents from legislation and jurisdiction but also contracts and patents. The environment consists of a reference architecture and a specific data model. Furthermore, it integrates an easily adaptable and extendable text mining engine allowing reuse of components. The base line architecture for the text mining engine is the Apache UIMA. The environment enables to collaboratively specify linguistic and semantic structures. Thereby, it uses an existing rule-based script language, namely Apache Ruta. This paper shows how the system can be used to unveil legal definitions in the German Civil Code (BGB) by not only finding them but also by determining which legal term is defined and how. This functionality enables the structuring of unstructured information, i.e., text, which enables data scientists and legal experts to semantically investigate and explore legal texts.

Table of contents

  • 1. Introduction
  • 2. Research method and objectives: The legal domain – a challenge for data science
  • 3. Related work
  • 4. Reference architecture and data model
  • 4.1. Reference architecture for the data science environment
  • 4.2. Data Model, Data Storage and Access
  • 4.3. Text Mining Engine
  • 4.4. Importer and Exporter
  • 5. Unveiling semantic structures in laws
  • 5.1. Determining legal definitions in legal texts using Apache UIMA and Apache Ruta
  • 5.2. UIMA pipeline for semantic annotation of legal texts
  • 5.3. Accessing results and annotations through the user interface
  • 6. Conclusion, outlook, and future applications
  • 7. Acknowledgement
  • 8. Bibliography

No comments

There are no comments yet

Ihr Kommentar zu diesem Beitrag

AbonnentInnen dieser Zeitschrift können sich an der Diskussion beteiligen. Bitte loggen Sie sich ein, um Kommentare verfassen zu können.