Jusletter IT

Using transformers to multi-label long legal documents

  • Autoren/Autorinnen: Hanane El Aajati / Roderick Lucas / Radboud Winkels
  • Beitragsart: AI & Recht
  • Region: Niederlande
  • Rechtsgebiete: AI & Recht
  • Sammlung: Tagungsband IRIS 2023
  • DOI: 10.38023/673190ef-90b9-4627-bfa0-07245f33d20d
  • Zitiervorschlag: Hanane el Aajati / Roderick Lucas / Radboud Winkels, Using transformers to multi-label long legal documents, in: Jusletter IT 30. März 2023
This paper describes experiments to find an efficient method to multi-label official legal documents using Transformers. Transformers are popular and powerful ML models because they are pre-trained on large amounts of data. A significant drawback is that most Transformers cannot process documents longer than 512 tokens. We try to tackle this issue by proposing a new method to multi-label long documents – summarizing long documents first before multi-labeling with the bert-base-cased Transformer. This summarization method is compared with two existing methods: truncating documents after the first 512 tokens and Longformer. The methods are evaluated on the F1 score, and the results show that the Longformer performs the best. The summarization method and truncating method seem to output almost equal F1 scores. Although the summarization method did not perform well on the dataset used in this research, it could be promising for datasets with more structured documents.

Table of contents

  • 1. Introduction
  • 2. Background
  • 2.1. Bert
  • 2.2. Longformer
  • 2.3. Summarization
  • 2.4. The Data
  • 3. The Experiment
  • 4. Results
  • 5. Conclusions and Future Work
  • 6. References

0 Kommentare

Es gibt noch keine Kommentare

Ihr Kommentar zu diesem Beitrag

AbonnentInnen dieser Zeitschrift können sich an der Diskussion beteiligen. Bitte loggen Sie sich ein, um Kommentare verfassen zu können.