Editing and Annotating Data

Creating, Editing and (Re-)Using Data

CLARIAH-DE provides tools and services that enable the creation, editing, and (re-)use of data and resources, using established standards and principles.

When creating and publishing digital resources such as editions or text corpora, the choice of a standard and its consistent application is of great importance. This ensures that the resulting data can be processed with appropriate tools and integrated into larger corpora, increasing their visibility and findability, as well as the possibilities for re-use (also according to the terms of the Fair Data Principles). The modelling of data, from which its annotation derives, is determined by the material to be edited, one's own research interests, scholarly discipline, and numerous other factors. Even when using established data formats such as the XML format of the Text Encoding Initiative (TEI), a great deal of variance is possible, making the development and application of standardized tools, but also the further use of the resulting data, more difficult. A solution that is being tested and applied in the context of the CLARIAH-DE project is the use of common TEI customizations, such as the base format of the German Text Archive. Exchange or pivot formats such as this, which might not be able to represent all previously coded information in its original depth after data conversion, can be understood as a core data set and thus create added value also with regard to the curation of data.

The following usage scenarios illustrate the possibilities that the merging of data and tools from CLARIN-D and DARIAH-DE offers. They also outline the importance of developing and establishing uniform standards and workflows as a prerequisite for successful merging.

The project addresses researchers who have already produced digital resources or intend to produce a digital resource, as well as researchers who want to work with appropriately prepared text corpora. Due to this broad scope, there are links to very different phases of the research data cycle (in particular, the creation, adoption and use of research data).

Preparing and Integrating Data

The preparation and integration of data enables the further use of data. A historian for example, having completed a digital edition conforming to the guidelines of the TEI, wants to find out about the possibilities of annotating and analysing this data linguistically by using existing software applications, and making it more visible by integrating it into a larger text corpus.

The information on existing digital editions compiled within the CLARIAH-DE project provides them with an orientation as to which of these resources are similar to their own material regarding, among other factors, the basic editorial model or the material base, and thus shows best practice examples. On the other hand, they find exemplary ways to convert existing data into an exchange format, allowing the software to be used with their own material. In addition, corresponding handouts are available from a single source, which explain these processes and software tools in greater detail.

Data for Teaching

Data are also widely used in academic teaching. In an introductory course on "Digital Editing", a university lecturer in the field of Digital Humanities wants to offer their students as broad an overview as possible of existing digital editions, digital tools (TextGridLab etc.) and resources in general (lexicons, dictionaries etc.), as well as other software applications that help in analysing the available data.

The combination of curated resources and digital tools from existing repositories of the DARIAH-DE and CLARIN-D networks within the CLARIAH-DE project, as well as the associated standardisation and establishment of uniform workflows, can provide the necessary tools to successfully conduct the seminar: Tools, as well as compatible data, are easy to find and centrally accessible, extensive documentation and detailed instructions enable students to work independently.

Keywords:

annotation, edition, annotating data, editing data, TEI, DTABf