Date(s) - 15/11/2016
3:00 pm - 5:00 pm
eLab, Mediastudies (BG1)
On the agenda is shared research infrastructure, and the two topics that will be discussed are:
1. Harmonisation of census data (Julia Noordegraaf and Jolanda Visser, UvA CREATE)
In the context of our current research project CINEMAPS: A Data Driven Investigation of Cinema Markets in the Netherlands and Flanders 1950-1975 we work with a variety of different socio-economic datasets, that we need to harmonise in order to project and combine them on maps using GIS. In this presentation we introduce the project, discuss the datasets we use (on the location of cinema theatres, census data and other socio-economic data) and outline the harmonisation challenges we encounter.
2. From Text to Structured Data
The second topic concerns the extraction of structured data from scanned source material and/or unstructured (or semi-structured) texts. The digitization of newspapers and other historical sources and the growing availability of texts in digital form has provided a wealth of material that also contains data that we would like to use as structured data. This session presents two different approaches for doing so.
Extracting structured data from scans (Ivan Kisjes, CREATE)
An example of a source for structured data are the so-called ‘Filmladders’, movie listings for the four big cities that appeared every week in several Dutch newspapers in the post-war period. Currently, no screening data are available for the period post-WWII, so within CREATE we have started to experiment with techniques for extracting the data from these scans (no OCR was available yet). Ivan Kisjes will present his method and evaluate the first results.
Extracting structured data from unstructured and semi-structured texts (Marieke van Erp, VU/CLARIAH work package 3, Textual Data/Linguistics)
Next, Marieke van Erp will present research she conducted on extracting information from unstructured and semi-structured texts.
3. Brainstorm on relation to CLARIAH activities
For this final session, Jauco Noorzij (CLARIAH work package 2, Technology) will join us in a discussion of the possible relations between the presented projects. In particular, the aim is to sketch the first contours of a project around the extraction of structured data.