Date(s) - 05/04/2018
3:00 pm - 5:00 pm
eLab, Mediastudies (BG1)
Categories No Categories
This edition of the CREATE Salon is dedicated to ‘heterogeneous archives’. Sofia Ares Oliveira will discuss the variety of historical documents in the Venice Time Machine project. Christian Oleson and Ivan Kisjes will share their experience working on EYE’s Jean Desmet Collection.
Sofia Ares Oliveira (École Polytechnique Fédérale de Lausanne)
Automatic information extraction from historical collections: the case of the 1808 venetian cadaster
The presentation will report on our on-going work to automatically process heterogeneous historical documents. After a quick overview of the general processing pipeline, a few examples will be more comprehensively described. Our recent progress in making large collections of digitised documents searchable will also be presented through the results of the automatic transcription of named entities in 18th century venetian fiscal documents. Finally, the case of the 1808 venetian cadaster will be used to illustrate the general approach and the results of the processing of the whole 1808 cadaster map will be shown.
Christian Oleson (University of Amsterdam), Ivan Kisjes (University of Amsterdam) and Kathleen Lotze (University of Amsterdam)
MIMEHIST: Annotating EYE’s Jean Desmet Collection in the CLARIAH MediaSuite
In our presentation, we will discuss our work in the research project MIMEHIST: Annotating EYE’s Jean Desmet Collection (2017-2018). The project is one of six pilots currently being carried out as a part of the process of building the digital, national research infrastructure CLARIAH’s Media Suite. MIMEHIST aims to process the 127.000-document archive by doing OCR and various types of document classification, both text-based and computer vision based. This in order to organize the archive – which is currently only organized on the level of folders, not documents – and to provide automated annotations to answer (in bulk) questions like: Who wrote letters to whom? Is a particular document a telegram, a letter or a scrap of notebook? Which films are mentioned in which documents?
The archive and films from EYE Filmmuseum’s Jean Desmet Collection are then to be embedded in the Media Suite and – by carrying out case studies on Desmet films and automatically transcribing data from Desmet’s business documents – will also test and develop the research infrastructure’s annotation environment. The presentation’s discussion of MIMEHIST will focus in particular on how the project builds on a scholarly tradition of enhanced and multimedia editions of archival film, detailing how their methods inform the development of the project’s case studies and the annotation tool’s functionalities while highlighting the new perspectives this may produce on the Desmet Collection. In our presentation we will also present the preliminary results of our tests with textual and computer vision document classification on the archive.”