Finding Data

Searching and Finding Resources

CLARIAH-DE offers a unified extensive search functionality for a wide range of collections consisting of research data, services and tools, all of which are made available by the scientific community.

Researchers have a great interest in efficient search options for data and tools for their respective research questions. In order to do justice to these different interests and the variety of scientific work, CLARIAH-DE provides a range of search applications for discovering structured and formatted data records, which are made available by diverse institutions: from text corpora containing articles and books, to lexical resources and dictionaries, photographs, codices, maps and pamphlets to audio and video files. There are also services for digital tools and many different areas of applications.

CLARIAH provides the following search applications which are explained in more detail below:

Generic Search

The Generic Search (GS) is an individually customizable search application which enables a collective search through descriptions and contents of data in respective archives. This search is available for numerous collections and allows for various visualizations of results. The application searches through the metadata of resources as well as the contents of the resources themselves. The implementation of associated data models ensures an efficient exploration of the resources in breadth as well as in depth. The generic concept of the GS, which is based on the Data Modelling Environment, enables the representation and query of practically any data model and at the same time facilitates reuse in different contexts, for example as a comprehensive search for the Marbach Weimar Wolfenbüttel Research Association.

A user documentation is available.


Fig. 1: Facet-based search through GS collections using the example of “German texts about bees”

Federated Content Search

CLARIAH-DE also allows access to an optimized search application for finding citations in large-scale text collections or corpora. The Federated Content Search (FCS) facilitates a distributed search in hundreds of text collections and displays results in a comprehensive fashion for easy further processing. More in-depth explanations can be found in a tutorial and a step-by-step application.

Fig. 2: Simultaneous search for “EU” in manifold corpora

Virtual Language Observatory (VLO)

In order to find relevant data in the respective research context, the Virtual Language Observatory (VLO) allows searches via the description of data (metadata) using search facets. Over a million resources can be found in the VLO. In most cases, their further processing and visualization through different applications is enabled through redirections. How researchers are able to use the VLO, for example to find historic scientific textbooks and their metadata descriptions, can be seen in a usage scenario.

Fig. 3: Relevant resources for specific questions via search facets of the VLO, using the example of “German language corpora”

Results of the three different search applications can be compared in a common user interface. This demonstrates how results differ when searching for simple words, names of datasets or other properties. In the course of the project, those search options will be continuously reviewed and evaluated both with regard to the technical merging of applications and with regard to the creation and integration of common user interfaces. At the same time, concepts for the collaboration of search engines will be developed, which will be implemented beyond the project runtime.

Keywords:

Search application, Finding data, Generic Search, Federated Content Search, Virtual Language Observatory, TechLab