De Gasperi's Corpus
Description
A collection of Alcide De Gasperi's public documents with gold and silver annotation The corpus of Alcide De Gasperi's public documents is a collection of 2,762 documents issued between 1901 and 1954, which had been previously published in four volumes by Il Mulino but were not machine-readable. Our repository contains all documents in three formats: txt, XML and tab-separated. Raw txt files contain only the body of the documents, and may be straightforwardly used to extract embeddings or topics. XML files include metadata that cover not only the title, the date and the place of publication, but also key-concepts automatically extracted from each text (with the corresponding relevance score) and genre labels manually assigned by domain experts. Furthermore, the release includes silver annotation for lemma, part of speech, person names and place names with associated coordinates in a CoNLL-like format.
Release Date
2019-07-16
Related Datasets
Downloads
https://github.com/StefanoMenini/De-Gasperi-s-Corpus/raw/master/conll-files.zip, https://github.com/StefanoMenini/De-Gasperi-s-Corpus/raw/master/txt-files.zip, https://github.com/StefanoMenini/De-Gasperi-s-Corpus/raw/master/xml-files.zip.
Publisher
Creator
Type
Primary Subjects
Temporal Coverage
Geographical Coverage
Bibliographic References
Landing Page
Data & Metadata Languages
License
Digital Object
This record catalogues a digital scholarly object as an instance of the DCAT Dataset class. These objects represent collections of information that has already been structured in some way rather than raw, unstructured data. See the Documentation page for more information.
Permalink
http://purl.org/knot/data/alcide-corpus