Index Thomisticus Treebank Resources

Description

The data for the Index Thomisticus Treebank and the Latin Dependency Treebank are available in the XML-based format licended by the Prague Markup Language (PML). The PML files of the treebanks are organized by annotation layers and linked each other through stand-off annotation: raw text (words and punctuations); morphological layer (lemmatization and morphological tagging); analytical layer (surface syntactic annotation); tectogrammatical layer (semantic and pragmatic annotation). The analytical layer of annotation of the Index Thomisticus Treebank is available also in CoNLL (proper names were assigned the "NP" value in the MISC field). The currently distributed release of the Index Thomisticus Treebank (analytical layer) includes 447,306 nodes in 26,831 sentences. These are taken from Summa contra Gentiles (entire: 4 books) and from the concordances of lemma forma in Summa contra Gentiles, Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae (part).

Downloads

https://itreebank.marginalia.it/doc/10-06-2020_all_resources_all_formats.zip

Publisher

Università Cattolica del Sacro Cuore, Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione (CIRCSE).

Type

Dataset

Primary Subjects

Linguistics, Education, culture and sport, Science and technology, Index Thomisticus.

Bibliographic References

https://itreebank.marginalia.it/view/publications.php

Landing Page

https://itreebank.marginalia.it/view/download.php

Data & Metadata Languages

English, Latin.

License

CC BY-NC-SA 3.0.

Digital Object

This record catalogues a digital scholarly object as an instance of the DCAT Dataset class. These objects do not include raw data but rather collections of information that has already been structured in some way. See the Documentation page for more information.

Permalink

http://purl.org/knot/data/thomisticus-resources