Index Thomisticus Treebank Resources
Description
The data for the Index Thomisticus Treebank and the Latin Dependency Treebank are available in the XML-based format licended by the Prague Markup Language (PML). The PML files of the treebanks are organized by annotation layers and linked each other through stand-off annotation: raw text (words and punctuations); morphological layer (lemmatization and morphological tagging); analytical layer (surface syntactic annotation); tectogrammatical layer (semantic and pragmatic annotation). The analytical layer of annotation of the Index Thomisticus Treebank is available also in CoNLL (proper names were assigned the "NP" value in the MISC field). The currently distributed release of the Index Thomisticus Treebank (analytical layer) includes 447,306 nodes in 26,831 sentences. These are taken from Summa contra Gentiles (entire: 4 books) and from the concordances of lemma forma in Summa contra Gentiles, Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae (part).
Downloads
https://itreebank.marginalia.it/doc/10-06-2020_all_resources_all_formats.zip
Publisher
Università Cattolica del Sacro Cuore, Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione (CIRCSE).
Type
Primary Subjects
Linguistics, Education, culture and sport, Science and technology, Index Thomisticus.
Bibliographic References
Landing Page
Data & Metadata Languages
License
Digital Object
This record catalogues a digital scholarly object as an instance of the DCAT Dataset class. These objects do not include raw data but rather collections of information that has already been structured in some way. See the Documentation page for more information.
Permalink
http://purl.org/knot/data/thomisticus-resources