Index Thomisticus Treebank Resources
The data for the Index Thomisticus Treebank and the Latin Dependency Treebank are available in the XML-based format licended by the Prague Markup Language (PML). The PML files of the treebanks are organized by annotation layers and linked each other through stand-off annotation: raw text (words and punctuations); morphological layer (lemmatization and morphological tagging); analytical layer (surface syntactic annotation); tectogrammatical layer (semantic and pragmatic annotation). The analytical layer of annotation of the Index Thomisticus Treebank is available also in CoNLL (proper names were assigned the "NP" value in the MISC field). The currently distributed release of the Index Thomisticus Treebank (analytical layer) includes 447,306 nodes in 26,831 sentences. These are taken from Summa contra Gentiles (entire: 4 books) and from the concordances of lemma forma in Summa contra Gentiles, Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae (part).
Università Cattolica del Sacro Cuore, Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione (CIRCSE).
Primary Subjects
Linguistics, Education, culture and sport, Science and technology, Index Thomisticus.
Bibliographic References
Landing Page
Data & Metadata Languages
Digital Object
This record catalogues a digital scholarly object as an instance of the DCAT Dataset class. These objects represent collections of information that has already been structured in some way rather than raw, unstructured data. See the Documentation page for more information.