Jusletter IT

LEXIA: A Data Science Environment for Semantic Analysis of German Legal Texts (Podcast)

  • Author: Bernhard Waltl
  • Category: Podcasts
  • Region: Germany
  • Field of law: Legal Informatics
  • Citation: Bernhard Waltl, LEXIA: A Data Science Environment for Semantic Analysis of German Legal Texts (Podcast), in: Jusletter IT Flash 7. Dezember 2016
The analysis of legal data using information technology, more specifically text and data mining algorithms, has become very attractive in the field of legal informatics. Additionally, legal science and practice consist of data-, knowledge-, and time-intensive tasks, which have always been in the focus of legal informatics. This paper contributes a data science environment, which is in particular suited for legal texts, e.g. documents from legislation and jurisdiction but also contracts and patents. The environment consists of a reference architecture and a specific data model. Furthermore, it integrates an easily adaptable and extendable text mining engine allowing reuse of components. The base line architecture for the text mining engine is the Apache UIMA. The environment enables to collaboratively specify linguistic and semantic structures. Thereby, it uses an existing rule-based script language, namely Apache Ruta. This paper shows how the system can be used to unveil legal definitions in the German Civil Code (BGB) by not only finding them but also by determining which legal term is defined and how. This functionality enables the structuring of unstructured information, i.e., text, which enables data scientists and legal experts to semantically investigate and explore legal texts.

No comments

There are no comments yet

Ihr Kommentar zu diesem Beitrag

AbonnentInnen dieser Zeitschrift können sich an der Diskussion beteiligen. Bitte loggen Sie sich ein, um Kommentare verfassen zu können.