CLARIAH-DE offers central access to a wide range of data repositories. The resources contained therein can be used and combined in different contexts by researchers, teachers and students.
Teachers of German as a foreign language might look for media for listening comprehension that they can use in class. They find audio and video files as well as transcriptions from various conversation contexts in the Research and Teaching Corpus of Spoken German (FOLK). Alternatively, researchers from literary studies and computational linguistics might be on the lookout for a collection of poems for a research project, where they process the poems using digital linguistic tools. These collections can be compiled from the TextGrid Repository and the German Text Archive (DTA) and directly analysed via the connection to the Language Resources Switchboard. In another scenario, students of German Studies look for existing lexical resources on the topic of hate speech, which they can analyse for a term paper. The German lexical-semantic net GermaNet can be used to carry out an analysis of word relations and semantic fields. In addition, CLARIAH-DE offers access to multimodal data (e.g. audio and video recordings) as well as to other sources of digital cultural assets (photographs, objects, etc.) with the Database for Spoken German (DGD) of the IDS Mannheim and the DARIAH-DE Repository.
Regardless of the provider, the resources can be used centrally with a university user account or through other research institutions (via eduGAIN). In addition, access is also possible from the European CLARIN network or via DARIAH-DE. This enables easy access to resources and services.
The following list demonstrates the diversity of the research data offered in CLARIAH-DE by way of example.
- The German Reference Corpus (DeReKo) is the world’s largest collection of contemporary written German-language corpora as an empirical basis for linguistic research. DeReKo is developed and maintained at the Leibniz Institute for the German Language (IDS). Researchers and students can access these data via the KorAP or COSMAS II search platforms.
- The TextGrid Repository offers, among other things, the “Digital Library” collection, which contains German and translated works of fiction and non-fiction by about 600 authors from the beginning of book printing to the early 20th century. The repository also contains a steadily growing number of texts and images (e.g. manuscripts) from various edition and digital humanities projects. In addition to the plain text format, the texts are mostly also XML / TEI-encoded, which enables a wide range of subsequent uses. Researchers can download these data, create their own collections using the shelf function and process them directly via the connection to tools for digital analysis (e.g. Language Resources Switchboard).
- The German Text Archive (DTA) offers a well-balanced basic stock of German-language texts from the early 16th to the early 20th century, spanning all disciplines and genres. The DTA core corpus consists of approximately 1500 works. In addition, the DTA integrates a variety of other texts from the mid-15th to the mid-20th century as DTA Extensions (DTAE). All documents are uniformly coded according to the German Text Archive Basic format (DTABf), a standard that is fully compliant with TEI. Further download formats are offered and the corresponding images are made available. The DTA text collection can be edited and analysed using established research tools.
- The Leipzig Corpora Collection and the German lexical-semantic net GermaNet make lexical resources available for research. GermaNet can be searched online and the complete data can be licenced free of charge for academic research.
- The DARIAH-DE Repository is a long-term archive for research data in the Humanities and cultural studies. It typically contains data from research projects in the Humanities that have been made available to the public. These can have very different formats and contents. For retrievability, data in the DARIAH-DE repository are organised in collections and are oriented on the FAIR principles. Therefore, data can be found and reused across formats and disciplines, as all data are freely accessible according to Open Access. All published data are given a persistent identifier (DOI) via Datacite and can thus be found and cited permanently.
Data, Availability, Research Data, Re-use, Repositories, Tools, Resources, Database