Analysing and Processing Data

The processing of data in Humanities research and related disciplines is as multifaceted as the possible research questions. With the CLARIAH-DE project, a wide range of tools is made accessible which can be applied to a variety of research questions.

Finding Tools for Data-based Analyses

Due to the abundance of data formats and types, researchers who have access to research data are faced with the following questions: which tools can handle these data and which research questions can be addressed with the help of these data and tools? Researchers obtain these data, for example, by citing data in publications, by cooperating with other researchers or by accessing data repositories. The Language Resource Switchboard (LRS), which was developed in CLARIAH-DE in cooperation with other national and European partners, offers researchers a central and user-friendly access to a wide range of established tools. The LRS uses information on file formats, descriptive metadata and taxonomies for processing and analysing research data and offers a corresponding short functional description to suggest suitable tools. Many tools can be directly accessed with one’s own research data via the LRS, while other tools require a login or have to be installed locally by the user. The portfolio of tools is constantly being expanded.

There are two common methods of accessing the LRS: (1) In addition to downloading the data, repositories can also offer the option of transferring the data directly to the LRS. In this case, it is forwarded to the LRS with the option of making a tool selection there. (2) Users who have their own research data can directly access the LRS and upload these data. The LRS then offers a suitable tool list for further processing.

In the following you can easily try this out:


Selection of Reference Tools for Data Processing

Many tools are directly accessible via CLARIAH-DE, for instance:

TextGrid is a virtual open source research infrastructure and integrates tools and services for creating, editing, managing and publishing research data. It supports Humanities scholars who want to edit, store and publish their text-based research data in a sustainable environment. The TextGridLab is optimized for XML/TEI modelling, e.g. in connection with digital editions, and is used in teaching for collaborative full text generation and annotation of larger text corpora in different project contexts. As a long-term archive, TextGridRep offers an extensive, searchable and re-usable stock of freely accessible texts and images.

For the processing of spoken language, an extensive collection of tools is available with the services of the Bavarian Archive for Speech Signals (BAS). Using WebMaus, researchers can easily time-align transcriptions and audio signals, OCTRA allows the transcription of audio data, while ASR offers researchers automatic speech recognition. It can be used by researchers at universities and other academic research institutions via a website.

Together with the Datasheet Editor, the Geo-Browser enables the comparative visualization of data in correlation to geographical spatial relationships at corresponding points in time and sequences. The tool is freely accessible on the Internet and displays the reference points via geo-coordinates on a world map, together with a timeline. Both views offer interactive navigation options to display details and relationships. The Geo-Browser has been used in different project contexts, e.g. for the visualization of the geographical distribution of love letters and Jewish gravestones or for the historical review of the Balkan wars.

WebLicht is an application that allows users to enrich their own and re-used textual data with information, e.g. to analyze grammatical information, extract names and places, etc. Members of most German universities can use this application directly and free of charge with the login information of their own university. WebLicht contains language processing tools such as tokenizers, word type taggers, parsers and tools for the recognition of proper names, which can be linked by researchers to processing chains that fit their research question. The resulting annotations can then be visualized in a suitable way, e.g. in a table or as a tree, using the TüNDRA tool. WebLicht is thus convenient both for the automatic annotation of one’s own data and for the enrichment of inventory data. WebLicht offers a wide range of tools as execution environments, and they are also available for different languages.

In addition to tools for analysis and processing, tools for lexical resources are also offered. With the help of GermaNet Rover, the wordnet for German can be searched and visualized, and the ASV-Toolbox is an application to examine written language especially from a lexical point of view.

Additional Tools and Manuals

A collection of tutorials for using tools and data with step-by-step instructions is included in a list of quick start guides. Further instructions can be found in a collection of guides and tutorials.

A comprehensive list of tools and services from the CLARIAH-DE network is also available.

Keywords:

Analysis, Tools / data analysis, data processing, tools, LRS, research data