Application Descriptions
Application descriptions for key point CLARIAH-DE AP2
Status of the Services
Liner2 (hosted by D4Science) UP
HTTP OK: HTTP/1.1 200 OK - 3151 bytes in 0.221 second response time
WebLicht POSTags Lemmas DE UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.103 second response time
Concraft -> Bartek UP
HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.079 second response time
WebLicht NamedEntities SL UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.072 second response time
OAI-PMH-31 DOWN
OAI-PMH CRITICAL: HTTPSConnectionPool(host='clarino.uib.no', port=443): Max retries exceeded with url: /oai?verb=Identify (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))
Concraft -> Sentipejd UP
HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.134 second response time
WebLicht Dep Parsing NL ALPINO UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.116 second response time
CSTLemma (hosted by D4Science) UP
HTTP OK: HTTP/1.1 200 OK - 3055 bytes in 0.190 second response time
Concraft -> Bartek -> NicolasSummarizer UP
HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.142 second response time
OAI-PMH-43 WARNING
OAI-PMH WARNING: XSD validation failed
CLARIN-D project web site UP
HTTP OK: HTTP/1.1 200 OK - 65980 bytes in 0.242 second response time
CLARIN Centre Registry [UI][prod] UP
HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.060 second response time
OAI-PMH-19 WARNING
OAI-PMH WARNING: XSD validation failed
WebLicht Lemmas EN UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.069 second response time
CLARIN DS status proxy [prod] UP
HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.108 second response time
Voyant Tools UP
HTTP OK: HTTP/1.1 200 OK - 6508 bytes in 0.498 second response time
WebLicht Lemmas DE UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.067 second response time
CLARIN VCR [UI][prod] UP
HTTP OK: HTTP/1.1 200 OK - 2798 bytes in 0.039 second response time
WebLicht Const Parsing DE UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.066 second response time
Concraft->Spejd UP
HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.081 second response time
OAI-PMH-53 WARNING
OAI-PMH WARNING: XSD validation failed
SRU/CQL-23 WARNING
SRU/CQL WARNING: XSD validation failed
Automatic Transcription of Dutch Speech Recordings (Wav file) UP
HTTP OK: HTTP/1.1 200 OK - 7889 bytes in 0.134 second response time
WebLicht Const Parsing EN UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.070 second response time
WebLicht Advanced Mode UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.065 second response time
Automatic Transcription of Dutch Speech Recordings (MP3 file) UP
HTTP OK: HTTP/1.1 200 OK - 7889 bytes in 0.132 second response time
Concraft -> Nerf UP
HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.144 second response time
WebLicht POSTags Lemmas IT UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.065 second response time
CLARIN VLO [UI][prod] UP
HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.034 second response time
WebLicht Dep Parsing EN UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.070 second response time
DARIAH-DE Geo-Browser (KML) UP
HTTP OK: HTTP/1.1 200 OK - 9227 bytes in 0.086 second response time
WebLicht Morphology DE UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.067 second response time
Distanbol WARNING
HTTP WARNING: HTTP/1.1 400 - 243 bytes in 0.175 second response time
CMDI Explorer UP
HTTP OK: HTTP/1.1 200 OK - 1880 bytes in 0.182 second response time
WebLicht Tokenization TUR UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.070 second response time
HTTPS CLARIN-D project wiki UP
HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.016 second response time
Spacy (hosted by D4Science) - DE UP
HTTP OK: HTTP/1.1 200 OK - 3489 bytes in 0.211 second response time
NagVis access UP
HTTP OK: HTTP/1.1 302 Found - 1077 bytes in 0.023 second response time
BASWebService UP
clarin.phonetik.uni-muenchen.de
HTTP OK: HTTP/1.1 200 200 - 238465 bytes in 6.756 second response time
WebLicht POSTags Lemmas FR UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.090 second response time
NLP-HUB (multiple NER tools) UP
HTTP OK: HTTP/1.1 302 Found - 698 bytes in 0.197 second response time
WebLicht Morphology EN UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.070 second response time
WebLicht POSTags Lemmas EN UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.069 second response time
SRU/CQL-40 WARNING
SRU/CQL WARNING: XSD validation failed
HTTP UP
HTTP OK: HTTP/1.1 301 Moved Permanently - 552 bytes in 0.012 second response time
HTTP UP
HTTP OK: HTTP/1.1 301 Moved Permanently - 557 bytes in 0.071 second response time
CLARIN OAI-PMH Validator UP
HTTP OK: HTTP/1.1 200 OK - 588 bytes in 0.190 second response time
Automatic Transcription of Dutch Speech Recordings (Ogg file) UP
HTTP OK: HTTP/1.1 200 OK - 7889 bytes in 0.159 second response time
HTTP UP
HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.054 second response time
Sonatype Nexus UP
HTTP OK: HTTP/1.1 200 OK - 3032 bytes in 0.069 second response time
OAI-PMH-47 WARNING
OAI-PMH WARNING: XSD validation failed
Spacy (hosted by D4Science) - EN UP
HTTP OK: HTTP/1.1 200 OK - 3352 bytes in 0.240 second response time
WebLicht NamedEntities DE UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.068 second response time
WebLicht NamedEntities EN UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.103 second response time
LINDAT Translation UP
HTTP OK: HTTP/1.1 200 OK - 15531 bytes in 0.082 second response time
WebLicht Dep Parsing DE UP
HTTP OK: HTTP/1.1 302 Found - 1040 bytes in 0.069 second response time
Concraft -> DependencyParser UP
HTTP OK: HTTP/1.1 200 OK - 8217 bytes in 0.133 second response time
HTTP CLARIN-D project wiki UP
HTTP OK: HTTP/1.1 302 Found - 509 bytes in 0.041 second response time
IMS Fedora Commons UP
HTTP OK: HTTP/1.1 200 OK - 4465 bytes in 0.124 second response time
Handle retrieve /10932/00-017B-E190-A83E-6F01-5 UP
HTTP OK: HTTP/1.1 302 - 546 bytes in 0.618 second response time
Handle resolve /10932/00-017B-E190-A83E-6F01-5?noredirect UP
HTTP OK: HTTP/1.1 200 - 2223 bytes in 1.672 second response time
Data from monitoring.clarin.eu
Our Service List
Alpino

Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesDutch
- text/plainplain text file
- alpinooutput
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
- tokoutput
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Gertjan van Noord (University of Groningen), Maarten van Gompel (webservice only, CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Apache Stanbol Enhancer

Apache Stanbol provides a set of reusable components for semantic content management. A number of EnhancementEngines extract features from passed content, for details see https://stanbol.apache.org. The resulting RDF enhancements are returned in JSON format.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- application/jsonJSON data
application type
Datenblatt (Fact sheet)
contact
- technical contactacdh-tech@oeaw.ac.at, Matej Durco
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Apache Foundation (software), Austrian Centre of Digital Humanities (enhancement chains and configuration)
hoster
usage restrictions for individual users
countries supported
Colibri Core (folia+xml)

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patte rns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDutch, English, German, French, Spanish, Portuguese, Western Frisian
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
- Tadpole Columned Output Format
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Colibri Core (plain text)

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesDutch, English, German, French, Spanish, Portuguese, Western Frisian
- text/plainplain text file
- Tadpole Columned Output Format
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Collection Registry
The Collection Registry - serves as a catalog of collections which occurred within the scope of research projects or serves as a basis for them. - links data, whose data models and the description of a collection for technical reuse by services such as search or analysis tools. - also serves to manage collection descriptions. These can include, in addition to digitally accessible, analog, protected or offline collections.
The purpose of the Collection Registry is
- to describe distributed collections in one place and to process them together in other services (e.g. Generic search, Cosmotool).
- to make collections visible in the Collection Registry, which are otherwise difficult to find.
- to document own collections and make them demonstrable for other scientists.
- in order to be able to manage relevant collections in the sense of an internal catalog.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDeutsch
- application/xmlXTML file, Schema
- json, application/xml
application type
Datenblatt (Fact sheet)
contact
- technical contacttobias.gradl@uni-bamberg.de, Tobias Gradl (Developer)
version
application category
application subcategory
data communication encryption
privacy policy
authentication
hoster
usage restrictions for individual users
countries supported
Concraft

Morphosyntactic tagger for Polish based on constrained conditional random fields. Part of: Multiservice, a robust linguistic Web service for Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- text/htmlHTML file
- application/jsonJSON data
- CoNLL format
- Visualization
application type
Datenblatt (Fact sheet)
contact
- technical contactrjawor@amu.edu.pl, MultiService
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Computer Science, Polish Academy of Sciences, Poland
hoster
usage restrictions for individual users
countries supported
Concraft -> Bartek

A statistical tool chain for performing Coreference Resolution. Part of: Multiservice, a robust linguistic Web service for Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- text/htmlHTML file
- application/jsonJSON data
- CoNLL format
- Visualization
application type
Datenblatt (Fact sheet)
contact
- technical contactrjawor@amu.edu.pl, MultiService
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Computer Science, Polish Academy of Sciences, Poland
hoster
usage restrictions for individual users
countries supported
Concraft -> Bartek -> NicolasSummarizer

Java coreference-based summarization tool; its creation was cofunded by the European Union from resources of the European Social Fund -- Project PO KL 'Information technologies: Research and their interdisciplinary applications'. Part of: Multiservice, a robust linguistic Web service for Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- text/htmlHTML file
- application/jsonJSON data
- CoNLL format
- Visualization
application type
Datenblatt (Fact sheet)
contact
- technical contactrjawor@amu.edu.pl, MultiService
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Computer Science, Polish Academy of Sciences, Poland
hoster
usage restrictions for individual users
countries supported
Concraft -> DependencyParser

The Polish dependency parser is trained on the extended version of the Polish dependency treebank (Składnica zależnościowa) with the publicly available parsing systems – MaltParser or MateParser. MaltParser is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. MateParser, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. Part of: Multiservice, a robust linguistic Web service for Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- text/htmlHTML file
- application/jsonJSON data
- CoNLL format
- Visualization
application type
Datenblatt (Fact sheet)
contact
- technical contactrjawor@amu.edu.pl, MultiService
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Computer Science, Polish Academy of Sciences, Poland
hoster
usage restrictions for individual users
countries supported
Concraft -> Nerf

Statistical named entity recognition tool based on linear-chain conditional random fields. Part of: Multiservice, a robust linguistic Web service for Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- text/htmlHTML file
- application/jsonJSON data
- CoNLL format
- Visualization
application type
Datenblatt (Fact sheet)
contact
- technical contactrjawor@amu.edu.pl, MultiService
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Computer Science, Polish Academy of Sciences, Poland
hoster
usage restrictions for individual users
countries supported
Concraft -> Sentipejd

A morphosyntactic tagger extended with a semantic category, expressing properties of positive or negative sentiment. Part of: Multiservice, a robust linguistic Web service for Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- text/htmlHTML file
- application/jsonJSON data
- CoNLL format
- Visualization
application type
Datenblatt (Fact sheet)
contact
- technical contactrjawor@amu.edu.pl, MultiService
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Computer Science, Polish Academy of Sciences, Poland
hoster
usage restrictions for individual users
countries supported
Concraft->Spejd

Tool for partial parsing and rule-based morphosyntactic disambiguation. Part of: Multiservice, a robust linguistic Web service for Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- text/htmlHTML file
- application/jsonJSON data
- CoNLL format
- Visualization
application type
Datenblatt (Fact sheet)
contact
- technical contactrjawor@amu.edu.pl, MultiService
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Computer Science, Polish Academy of Sciences, Poland
hoster
usage restrictions for individual users
countries supported
ConedaKOR
ConedaKOR facilitate the administration and presentation of academic collections of objects from the image-based cultural sciences and humanities. It allows to store arbitrary documents and interconnect them with relationships. You can build huge semantic networks for an unlimited amount of domains. ConedaKOR integrates a sophisticated ontology management tool with an easy-to-use media database.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
application type
network and security requirements
- memory required4GB
- processor2
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@coneda.net, info@wendig.io, info@daasi.de, info@de.dariah.eu, Moritz Schepp (Developer)
- subject matter contactinfo@wendig.io
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
- Coneda UG in Frankfurt, GitHub
- Moritz Schepp
- Thorsten Wübbena [ORCID, VIAF, GND]
hoster
part of an application suite
usage restrictions for individual users
countries supported
COSMAS II
COSMAS II is a database (Corpus Search, Management and Analysis System) designed at the IDS for corpus-based research on language
- in extensive corpora (over 13 billion word forms, provided by the DEREKO project);
- in linguistically and structurally annotated corpora; e.g. word classes (over 1.7 billion nouns), headings etc;
- in user-defined corpus selections (based on up to eight bibliographic criteria);
- in different language corpora with custom tag sets, useing an embedded graphical wizard;
- using numerous search, distance and range operators that allow to formulate simple to complex facts or grammatical patterns.
The results are
- summarized and sorted according to bibliographical criteria;
- evaluated by frequency measures in terms of their distribution;
- analysed, sorted and tabulated using a co-competition analysis;
- sorted, analysed and presented as KWIC and supporting documents;
- (if desired) reduced to a representative, manageable quantity by means of a random generator.
short description
documentation
Description of the target group and its size
formats and languages
- text/plain+cosmas2Cosmas II Anfrage
- application/rtf
- text/plainplain text file
Localization
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactcosmas2@ids-mannheim.de, https://www.ids-mannheim.de/cosmas2/
- subject matter contactcosmas@ids-mannheim.de, https://www.ids-mannheim.de/cosmas2/
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
hoster
usage restrictions for individual users
CosmoTool
CosmoTool is a digital tool that combines biographical information from different sources into inter- and national movement profiles of historical personalities. This is intended to draw conclusions on characteristics and rules, which can be regarded as international criteria. CosmoTool is based on DARIAH-DE federation architecture and allows the extraction of data from unstructured text. At the moment, CosmoTool is in the development phase and still offers limited functionality.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDeutsch
- application/xmlXTML file
- json
- txt/csv
- json
application type
Datenblatt (Fact sheet)
contact
- technical contacttobias.gradl@uni-bamberg.de, Tobias Gradl (Developer)
version
application category
application subcategory
data communication encryption
privacy policy
authentication
hoster
usage restrictions for individual users
countries supported
CSTLemma (hosted by D4Science)

This is an experimental integration of a D4Science NLP processing service (CSTLemma). The CSTLemma Lemmatizer for English reduces all words in a text to their base form, the lemma.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/csvtabular data, comma-separated values
application type
Datenblatt (Fact sheet)
contact
- technical contactswitchboard@clarin.eu, D4Science Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Bart Jongejan (tool), D4Science staff (WAR upload)
hoster
usage restrictions for individual users
countries supported
Cyril Belica: Kookkurrenzdatenbank CCDB
In a corpus-based empirical linguistic approach, it is of fundamental importance to conceive a methodology that is coherent in terms of scientific methodology and that makes it possible to systematically uncover, inventory, interpret and theoretically substantiate the emergent structures manifest in language use. As an empirical basis for this research project, a large collection of co-occurrence profiles for about 220,000 different lemmas was built up in the Programme Area Corpus Linguistics of the Leibniz Institute for the German Language based on a corpus of written contemporary language of about 2.2 billion running text words. For each lemma, the collection contains the results of up to five different co-occurrence analyses in the form of hierarchies of similar uses, with up to 100,000 examples of use per lemma and analysis.
Guided by the explorative analysis of this language material, we strive to gain new insights into the structures, regularities, properties and functions of language. Currently we focus on topics such as similarity of coccurrence profiles and semantic proximity, on the interrelationships between local, lexical and global, situational contexts, and on various studies on quasi-synonymy.
Through this website we would like to make parts of our thinking and experimenting platform in the sense of a "transparent laboratory" accessible to all interested colleagues.
short description
documentation
Description of the target group and its size
formats and languages
- text/plain; format-variant=ccdbCCDB Anfrage
- image/svg+xml
- image/x-wmf
- text/htmlHTML file
Localization
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactbelica@ids-mannheim.de, http://corpora.ids-mannheim.de/ccdb/
- subject matter contactbelica@ids-mannheim.de, http://corpora.ids-mannheim.de/ccdb/
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
hoster
usage restrictions for individual users
D4Science NER (GATE's Annie)

This is an experimental integration of a D4Science NLP processing service (based on GATE's ANNIE). This service identifies names of persons, locations, organizations, as well as money amounts, time and date expressions in English texts automatically.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contactswitchboard@clarin.eu, D4Science Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
D4Science staff
hoster
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Constituency Parsing DE
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Constituency Parsing EN
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Depency Parsing DE
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Depency Parsing EN
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Hyphenation DE
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Hyphenation EN
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Named Entity Recognition DE
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: Named Entity Recognition EN
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: POS-Tagging und Lemmatization DE
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH DKPro-Wrapper: POS-Tagging und Lemmatization EN
The DARIAH DKPro Wrapper is a wrapper for DKPro Core, a tool for linguistic annotation.
short description
documentation
- User Guide (language: German)
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
network and security requirements
- memory required4GB
- runtimeEnvironmentJava 1.8 or higher, 64bit
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
DARIAH-DE GeoBrowser
The DARIAH-DE Geo-Browser allows a comparative visualization of several requests and facilitates the representation of data and their visualization in a correlation of geographic spatial relations at corresponding points of time and sequences.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman, English
- text/csvtabular data, comma-separated values
- application/vnd.google-earth.kml+xml
- application/vnd.google-earth.kmz
application type
Datenblatt (Fact sheet)
contact
- technical contactfunk@sub.uni-goettingen.de, veentjer@sub.uni-goettingen.de,
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
- DARIAH-DE Association, Responsibilities
- Ubbo Veentjer
- Stefan Funnk
hoster
- SUB, Göttingen Germany
- GWDG, Göttingen Germany
part of an application suite
usage restrictions for individual users
countries supported
DARIAH-DE Publikator
short description
documentation
Description of the target group and its size
formats and languages
Localization
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactsupport@de.dariah.eu,
- subject matter contactsupport@de.dariah.eu, https://de.dariah.eu,
version
application category
application subcategory
privacy policy
authentication
Creators
hoster
- Göttingen State and University Library (SUB), Göttingen Germany
- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), Göttingen Germany
usage restrictions for individual users
countries supported
DARIAH-DE Repository
The entry point for importing collections and data into the DARIAH-DE Repository is the DARIAH-DE Publikator, which allows you to prepare, manage, and finally import your collections into the DARIAH-DE Repository using your favorite internet browser.
short description
documentation
Description of the target group and its size
formats and languages
- application/xml+tei
- text/plainplain text file
- application/epub+zip
- text/htmlHTML file
- application/zipzip archive
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactsupport@de.dariah.eu,
- subject matter contactsupport@de.dariah.eu, https://de.dariah.eu,
version
application category
application subcategory
privacy policy
authentication
Creators
hoster
- Göttingen State and University Library (SUB), Göttingen Germany
- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), Göttingen Germany
usage restrictions for individual users
countries supported
Data Modelling Environment (DME)
The Data Modeling Environment (DME) from DARIAH-DE is a tool for modeling and association of data. A key special feature of the DME is its research-oriented focus and the underlying concepts for the explication of domain knowledge.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesDeutsch
- text/xmlXML file
- text/json
- text/csvtabular data, comma-separated values
- text/plainplain text file
- text/xmlXML file
- text/json
- text/csvtabular data, comma-separated values
- text/plainplain text file
application type
Datenblatt (Fact sheet)
contact
- technical contacttobias.gradl@uni-bamberg.de, Tobias Gradl (Developer)
version
application category
application subcategory
data communication encryption
privacy policy
authentication
hoster
usage restrictions for individual users
countries supported
Deutsches Textarchiv

The German Text Archive (DTA) is the largest single corpus of historical New High German covering the period from the 16th to the early 20th century, comprising more than 350 million tokens in 1.34 million digitized pages. Focusing mostly on (digitized) printed material, the DTA also includes a growing number of hand-written documents. Specialty sub-corpora include historical newspapers and other periodicals. The DTA as a whole covers a rich variety of fiction and non-fiction texts, the latter including academic as well as non-academic writing.
The DTA is composed of the so-called DTA-Kernkorpus (DTAK, “DTA Core Corpus”) with approximately 1500 first editions from the 16th through the 19th century. Additionally, the DTA-Erweiterungen (DTAE, “DTA Extensions”) module contains specialty corpora and individual texts which have been curated in the context of CLARIN-D and other projects. The full-text sources provided by digitization projects and other discipline-specific initiatives have been (manually or semi-automatically) converted to a TEI-compatible XML format conforming to the DTA-Basisformat (DTABf, “DTA Base Format”) guidelines, including extensive metadata on the original sources and data preparation. OCR texts in the DTA Core Corpus – as well as numerous additional text resources – have been manually corrected. A continuous quality assurance process is made possible by the collaborative web-based platform DTAQ, with around 2000 currently registered users. All DTA corpora are prepared for user consumption by automated computational linguistic analysis methods, including not only PoS-tagging and lemmatization, but also – among others – the orthographic normalization of historical spelling variants, allowing users to formulate queries in modern orthography.
short description
documentation
Description of the target group and its size
formats and languages
Localization
application type
network and security requirements
- operating systemLinux
Datenblatt (Fact sheet)
contact
- technical contactwiegand@bbaw.de, Frank Wiegand (Developer)
- subject matter contactAlexander Geyken (Arbeitsstellenleiter Digitales Wörterbuch der deutschen Sprache) [GND]
version
application category
application subcategory
data communication encryption
privacy policy
authentication
hoster
usage restrictions for individual users
countries supported
Deutsches Textarchiv – Qualitätssicherung

Collaborative Quality Assurance in the German Text Archive (DTA) DTAQ (Deutsches Textarchiv - Qualitätssicherung) is a web-based application for finding, categorizing and correcting various types of errors in XML/TEI annotated texts. The interface of DTAQ can be individually adapted by each user, so that different views of the source digitized material and text transcriptions can be set.
DTAQ can be used freely by everyone after registration.
short description
documentation
Description of the target group and its size
formats and languages
Localization
application type
network and security requirements
- operating systemLinux
Datenblatt (Fact sheet)
contact
- technical contactwiegand@bbaw.de, Frank Wiegand (Developer)
- subject matter contactAlexander Geyken (Arbeitsstellenleiter Digitales Wörterbuch der deutschen Sprache) [GND]
version
application category
application subcategory
data communication encryption
privacy policy
authentication
hoster
usage restrictions for individual users
countries supported
DGD – Datenbank für Gesprochenes Deutsch
The DGD is the Database for Spoken German ("Datenbank für Gesprochenes Deutsch"). To use the DGD, you need to register (it's free). The DGD's user interface is in German. We are sorry we cannot provide a localized interface for other languages.
The DGD gives registered users access to 34 corpora of spoken language from the Archive for Spoken German ("Archiv für Gesprochenes Deutsch", AGD). The corpora comprise:
- The Research and Teaching Corpus of Spoken German ("Forschungs und Lehrkorpus Gesprochenes Deutsch", FOLK), a state-of-the-art corpus of spontenaous interaction data
- The GeWiss Corpus ("Gesprochene Wissenschaftssprache Kontrastiv") of academic speech
- Further interaction corpora, such as the Freiburger Korpus ("FR") and the corpus Dialogstrukturen ("DS")
- The large "historic" dialect corpora of German, most importantly the corpus German dialects ("Deutsche Mundarten", "Zwirner-Korpus", ZW) and its "satellite corpora" German dialects in Eastern Europe (OS), German dialects in the Black Forest region (SV), German dialects in south-west Germany (SW), German dialects in the GDR (DR)
- Other influential variation corpora for German, such as the corpus Basic German ("Deutsche Umgangssprachen", "Pfeffer-Korpus", PF) and the corpus Standard German ("Deutsche Standardsprache", "König-Korpus", KN), as well as the more recent corpus Deutsch Heute ("DH")
- Corpora on extra-territorial varieties of German ("speech islands") such as Michael Clyne's corpus on Australian German, a corpus on German in Russia, a corpus on German in Namibia and a corpus on Mennonite Low German in the Americas
- Anne Betten's corpora on the German of Emigrants to Israel ("Emigrantendeutsch in Israel", IS, ISW, ISZ)
- Norbert Dittmar's corpus on German reunification ("Berliner Wendekorpus", BW)
Altogether, the DGD contains more than 4,000 hours of audio and video recordings, and more than 12 million transcribed tokens. With a few exceptions, all transcriptions in the database are time-aligned with the recordings and annotated with lemma and part-of-speech information.
short description
documentation
Description of the target group and its size
formats and languages
- text/plain; format-variant=dgdDGD corpus query
- text/csvtabular data, comma-separated values
- application/xml; format-variant=elan-eafELAN annotation file (*.eaf)
- application/xml; format-variant=exmaralda-exbEXMARaLDA Basic transcription (*.exb)
application type
Datenblatt (Fact sheet)
contact
- technical contactdgd@ids-mannheim.de
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
hoster
usage restrictions for individual users
DiaCollo

DiaCollo (pronounced /diːˈakəloʊ/, "dee-ah-kə-loh", analogous to the well-known juggling prop) is a tool for efficient extraction of diachronic collocations from an underlying text corpus. Unlike other collocation extractors such as DWDS Wortprofil, Sketch Engine, or the UCS toolkit, DiaCollo is suitable for extraction and analysis of diachronic collocation data, i.e. collocations whose significance depends on the date of their occurrence. By tracking changes in a word's typical collocates over time and applying J. R. Firth's famous principle that "you shall know a word by the company it keeps", DiaCollo can help to provide a clearer picture of diachronic changes in the word's usage, in particular those related to semantic shift.
short description
documentation
Description of the target group and its size
licences
formats and languages
Localization
application type
network and security requirements
- operating systemLinux
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactjurish@bbaw.de, Bryan Jurish (Developer) [GND]
- subject matter contactBryan Jurish (Linguist) [GND]
maintenance documentation
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
hoster
usage restrictions for individual users
countries supported
Distanbol

Distanbol analyses texts semantically. For this, it passes the input text to an Apache Stanbol web service that executes a NLP chain yielding named entities. This is followed by Entity Linking on the text. The resulting enhancements are rendered as human-readable HTML-page. In short, Distanbol is adding a human-readable rendering to the JSON-LD output produced by Stanbol.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- application/xhtml+xmlXHTML file
application type
Datenblatt (Fact sheet)
contact
- technical contactacdh-tech@oeaw.ac.at, Matej Durco
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Apache Foundation (software), Austrian Centre of Digital Humanities (enhancement chains and configuration)
hoster
usage restrictions for individual users
countries supported
DTA-Basisformat
The DTABf was developed in accordance with the P5-Guidelines of the Text Encoding Initiative (TEI). Since the TEI Guidelines are offering solutions for a huge amount of tagging requirements and are thus rather extensive and flexible, they are meant to be adjusted to the individual necessities of projects working with the TEI. For the DTA this was achieved by creation of the DTABf, a subset of the TEI/P5 tagset, which offers not only fixed sets of elements but also of corresponding attributes and (where applicable) values. The DTABf tagset is fully conformant with the TEI/P5-Guidelines, i.e. the TEI tagset was only reduced not extended in any way.
short description
documentation
Description of the target group and its size
API
formats and languages
Localization
application type
network and security requirements
- operating systemLinux
Datenblatt (Fact sheet)
contact
- technical contacthaaf@bbaw.de, Susanne Haaf-Dumont (Developer) [GND]
- subject matter contactAlexander Geyken (Arbeitsstellenleiter Digitales Wörterbuch der deutschen Sprache) [GND]
version
application category
application subcategory
data communication encryption
privacy policy
authentication
hoster
usage restrictions for individual users
countries supported
EXMARaLDA
EXMARaLDA is a system for working with oral corpora on a computer. It consists of a transcription and annotation tool (Partitur-Editor), a tool for managing corpora (Corpus-Manager) and a query and analysis tool (EXAKT).
EXMARaLDA's features include, for instance:
- time-aligned transcription of digital audio or video
- flexible annotation for freely choosable categories,
- systematic documentation of a corpus through metadata
- flexible output of transcription data in various layouts and formats (notation, document)
- computer-assisted querying of transcription, annotation and metadata
- interoperable as it works XML based data formats that allow for data exchange with other tools (like Praat, ELAN, Transcriber etc.) and enable a flexible processing and sustainable usage of the data.
EXMARaLDA is used by researchers world wide in different contexts in which spoken language is analysed, these include:
- conversation and discourse analysis,
- study of language acquisition and multilingualism,
- phonetics and phonology,
- dialectology and sociolinguistics.
EXMARaLDA was developed in the project "Computer assisted methods for the creation and analysis of multilingual data" at the Collaborative Research Center "Multilingualism" (Sonderforschungsbereich "Mehrsprachigkeit" – SFB 538) at the University of Hamburg. Since July 2011, the development of EXMARaLDA is continued at the Hamburg Centre for Language Corpora, since November 2011 in cooperation with the Archive for Spoken German at the Institute for the German Language in Mannheim.
short description
documentation
Description of the target group and its size
formats and languages
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/xml; format-variant=weblicht-tcfFile in the Text Corpus Format (*.tcf)
- application/xml; format-variant=exmaralda-exbEXMARaLDA Basic transcription (*.exb)
- application/xml; format-variant=transcriber-trsTranscriber annotation file (*.trs)
- application/xml; format-variant=folker-flnFOLKER transcription (*.flk / *.fln)
- application/xml; format-variant=elan-eafELAN annotation file (*.eaf)
- application/xml; format-variant=clan-chaCHAT transcription file (*.cha)
- text/plain; format-variant=praat-textgridPraat TextGrid (*.textGrid)
- audio/mp3MP3 Audio
- audio/oggOGG Audio
- audio/wavWAV Audio
- video/mp4MP4 Video
- audio/aiffAIFF Audio
- audio/mpegMPEG Audio
- video/mpegMPEG Audio
- video/oggOGG Video
- video/aviAVI Video
- video/x-divxDIVX Video
- video/movQuicktime Video
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/xml; format-variant=weblicht-tcfFile in the Text Corpus Format (*.tcf)
- application/xml; format-variant=exmaralda-exbEXMARaLDA Basic transcription (*.exb)
- application/xml; format-variant=transcriber-trsTranscriber annotation file (*.trs)
- application/xml; format-variant=folker-flnFOLKER transcription (*.flk / *.fln)
- application/xml; format-variant=elan-eafELAN annotation file (*.eaf)
- application/xml; format-variant=clan-chaCHAT transcription file (*.cha)
- application/plain+praatPraat TextGrid (*.textGrid)
- different video formats
application type
network and security requirements
- operating systemWindows, macOS, Linux
- runtimeEnvironmentJava (included in newer versions)
developer documentation
Datenblatt (Fact sheet)
contact
- technical contacthttps://exmaralda.org/en/contact/
- subject matter contacthttps://exmaralda.org/en/contact/
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- EXMARaLDA Developer Group, GitHub
- Thomas Schmidt (developer) [ORCID, GND]
- Kai Wörner (developer) [ORCID]
- Timm Lehmberg (developer)
- Hanna Hedeland (developer) [ORCID]
hoster
- Leibniz-Institut für Deutsche Sprache, Mannheim, Germany
- HZSK Hamburg, Hamburg Germany
part of an application suite
usage restrictions for individual users
FoLiA-stats

N-gram frequency list generation on FoLiA input.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDutch, generic
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
- wordfreqlist
- lemmafreqlist
- lemmaposfreqlist
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Ko van der Sloot (TiCC, Tilburg University)
hoster
usage restrictions for individual users
countries supported
Fowlt (plain text)

Fowlt is an online, free-to-use context-sensitive English spelling checker. It follows the setup of the Dutch spelling checker Valkuil.net. Both Valkuil and Fowlt are unlike the typical spelling checkers: whereas the latter mostly try to find errors by comparing all words to a built-in dictionary and flag the word as an error if they can't find a match, Fowlt is context sensitive, taking into account the words around every word. Fowlt makes use of language models. These models are created by giving lots of texts to machine learning software (TiMBL and WOPR).
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Fowlt (xml+folia)

Fowlt is an online, free-to-use context-sensitive English spelling checker. It follows the setup of the Dutch spelling checker Valkuil.net. Both Valkuil and Fowlt are unlike the typical spelling checkers: whereas the latter mostly try to find errors by comparing all words to a built-in dictionary and flag the word as an error if they can't find a match, Fowlt is context sensitive, taking into account the words around every word. Fowlt makes use of language models. These models are created by giving lots of texts to machine learning software (TiMBL and WOPR).
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Frog (folia+xml)

Frog's current version will tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, will assign a dependency graph to each sentence, will identify the base phrase chunks in the sentence, and will attempt to find and label all named entities.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesDutch
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
- Tadpole Columned Output Format
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Ko van der Sloot, Maarten van Gompel (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Frog (plain text)

Frog's current version will tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, will assign a dependency graph to each sentence, will identify the base phrase chunks in the sentence, and will attempt to find and label all named entities.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesDutch
- text/plainplain text file
- Tadpole Columned Output Format
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Ko van der Sloot, Maarten van Gompel (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Inkluz

Inkluz - detects foreign language inclusions in Polish texts.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/octet-streamarbitrary binary data
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Iobber

Chunker for Polish. It recognises shallow syntactic structure (up to three levels) of phrases (chunks) in Polish texts.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
KorAP (REST)
KorAP is a new corpus analysis platform, optimized for large, multiple annotated corpora and complex search mechanisms.
KorAP supports the query languages (of) COSMAS II, ANNIS, Poliqarp, Poliqarp+, CQL and FCQL.
KorAP is developed at the Leibniz Institute for German Language in Mannheim. The individual modules are published as open source on GitHub.
short description
documentation
Description of the target group and its size
formats and languages
- application/jsonJSON data
- application/jsonJSON data
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactkorap@ids-mannheim.de, https://www1.ids-mannheim.de/s/corpus-linguistics/projects/korap.html?L=1, https://www1.ids-mannheim.de/kl/projekte/korap.html?L=0
- subject matter contactkorap@ids-mannheim.de, https://www1.ids-mannheim.de/s/corpus-linguistics/projects/korap.html?L=1, https://www1.ids-mannheim.de/kl/projekte/korap.html?L=0
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
KorAP (Web)
KorAP is a new corpus analysis platform, optimized for large, multiple annotated corpora and complex search mechanisms.
KorAP supports the query languages (of) COSMAS II, ANNIS, Poliqarp, Poliqarp+, CQL and FCQL.
KorAP is developed at the Leibniz Institute for German Language in Mannheim. The individual modules are published as open source on GitHub.
short description
documentation
Description of the target group and its size
formats and languages
- text/plain; format-variant=cosmas2COSMAS-II-Abfrage
- text/plain; format-variant=annisANNIS-Abfrage
- text/plain; format-variant=poliqarpPoliqarp -Abfrage
- text/plain; format-variant=poliqarpplusPoliqarp+-Abfrage
- text/plain; format-variant=cqlCQL-Abfrage
- text/plain; format-variant=fcqlFCQL-Abfrage
- text/htmlHTML file
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactkorap@ids-mannheim.de, https://www1.ids-mannheim.de/s/corpus-linguistics/projects/korap.html?L=1, https://www1.ids-mannheim.de/kl/projekte/korap.html?L=0
- subject matter contactkorap@ids-mannheim.de, https://www1.ids-mannheim.de/s/corpus-linguistics/projects/korap.html?L=1, https://www1.ids-mannheim.de/kl/projekte/korap.html?L=0
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
LINDAT Translation

The input file size is limited to 100kB.
Translates from->to:
Czech->English, Hindi, French, Russian, German
English->Russsian, German, Czech, Hindi, French
Russian->German, French, Czech, Hindi, English
German->Russian, Hindi, Czech, English, French
French->Russian, German, Czech, English, Hindi
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesGerman, Russian, Czech, English, French
- text/plainplain text file
- text/plainplain text file
application type
Datenblatt (Fact sheet)
contact
- technical contactkosarko@ufal.mff.cuni.cz, Ondřej Košarko
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Institute of Formal and Applied Linguistics
hoster
usage restrictions for individual users
countries supported
Liner2

Name Entity and Temporal Expression recognition
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Liner2 (hosted by D4Science)

This is an experimental integration of a D4Science NLP processing service (NER Liner 2). This service identifies names of persons, locations, organizations, as well as money amounts, time and date expressions in Polish texts automatically.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contactswitchboard@clarin.eu, D4Science Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
D4Science staff
hoster
usage restrictions for individual users
countries supported
MaltParser

A language dependency parser chain for Polish. The used tools include Morfeusz-2 with SGJP dictionary (for morphological analysis), wcrft2 (for tagging), and the MaltParser with a model for Polish. The CONLL output can be visualised with DepSVG, a dependency tree and predicate-argument structure visualizer.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- CoNLL Format
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Morfeusz 2

Morphological analysis of Polish texts by Morfeusz 2 (based on the SGJP dictionary)
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
MorphoDiTa

Morphological dictionary and tagger for the analysis of natural language texts in Polish.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
NER NLTK

Name Entity Recogniser for English by NLTK.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
NLP-HUB (multiple NER tools)

This is an experimental integration of a D4Science NLP processing service hub. This service runs a number of NER tools in parallel, and merges their results. It identifies names of persons, locations, organizations, as well as money amounts, time and date expressions -- and other expressions -- in English, French, Italian, Spanish and German texts automatically.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish, French, Italian, Spanish, German
- text/plainplain text file
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contactswitchboard@clarin.eu, D4Science Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
D4Science staff
hoster
usage restrictions for individual users
countries supported
Oersetter (FRY-NLD)

Oersetter is a Frisian-Dutch Machine Translation system.
short description
documentation
Description of the target group and its size
formats and languages
- languagesWestern Frisian
- text/plainplain text file
- text/plainplain text file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Oersetter (NLD-FRY)

Oersetter is a statistical machine translation (SMT) system for Frisian to Dutch and Dutch to Frisian. A parallel training corpus has been established, which has subsequently been used to automatically learn a phrase-based SMT model. The translation system is built around the open-source SMT software Moses.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDutch
- text/plainplain text file
- text/plainplain text file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Opener Tokenizer

Tokenizer for Dutch, English, German, French, Spanish and Italian. Consumes Plain text and produces TCF.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish, Italian, Spanish, French, Dutch, German
- text/plainplain text file
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactriccardo.delgratta@ilc.cnr.it, Riccardo Del Gratta
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
CLARIN-IT
hoster
usage restrictions for individual users
countries supported
ReSpa

Keywords extraction for Polish by ReSpa based on the representation of text documents as word graphs.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Serel

Detection of semantic relations between Named Entities in Polish texts by Serel.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Spacy (hosted by D4Science) - DE

This is an experimental integration of a D4Science NLP processing service (spaCy). This service identifies performs dependency parsing for plain German text. For more information on spaCy, see https://spacy.io.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/tab-separated-valuestabular data, tab-separated values
application type
Datenblatt (Fact sheet)
contact
- technical contactswitchboard@clarin.eu, D4Science Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
D4Science staff
hoster
usage restrictions for individual users
countries supported
Spacy (hosted by D4Science) - EN

This is an experimental integration of a D4Science NLP processing service (spaCy). This service identifies performs dependency parsing for plain English text. For more information on spaCy, see https://spacy.io.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/tab-separated-valuestabular data, tab-separated values
application type
Datenblatt (Fact sheet)
contact
- technical contactswitchboard@clarin.eu, D4Science Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
D4Science staff
hoster
usage restrictions for individual users
countries supported
Spatial

Recognition of spatial expressions in Polish texts.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/jsonJSON data
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Spejd

Spejd - a partial, shallow parser for Polish with rule-based morphosyntactic disambiguation.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Summarize

Automated word graph based summarisation of Polish texts.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/octet-streamarbitrary binary data
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
T-scan

T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it features from tools such as Frog and Alpino, and resources such as SoNaR, SUBTLEX-NL and Referentie Bestand Nederlands.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesDutch
- text/plainplain text file
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
- text/xslXSLT Stylesheet
- text/csvtabular data, comma-separated values
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen), Martijn van der Klis (Utrecht University)
hoster
usage restrictions for individual users
countries supported
Tagger NLTK

Morpho-syntactic tagger for English texts.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
TEILicht-align
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
align: Pseudo-alignment using Phonetic Transcription or Orthographic Information
short description
documentation
Description of the target group and its size
formats and languages
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-guess
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
guess: language-detection
short description
documentation
Description of the target group and its size
formats and languages
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-identify
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
identify adding and removing XML IDs
short description
documentation
Description of the target group and its size
formats and languages
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-normalize
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
normalize: OrthoNormal-like Normalization of orthography
short description
documentation
Description of the target group and its size
formats and languages
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-pos
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
pos: POS-Tagging with the TreeTagger
short description
documentation
Description of the target group and its size
formats and languages
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-segmentize
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
segmentize: segmentation according to transcription conventions
short description
documentation
Description of the target group and its size
formats and languages
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-text2iso
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
text2iso: converting plain text in Simple EXMARaLDA format to ISO-TEI-annotated texts
short description
documentation
Description of the target group and its size
formats and languages
- application/plain; format-variant=exmaraldaSimple EXMARaLDA transcription
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-text2seg
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
text2seg: converting plain text in Simple EXMARaLDA format to ISO-TEI-annotated texts, combined with segmentation according to transcription standards
short description
documentation
Description of the target group and its size
licences
formats and languages
- application/plain; format-variant=exmaraldaSimple EXMARaLDA transcription
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TEILicht-unidentify
RESTful webservices for transcriptions of spoken data following the TEI guidelines. In principle, target documents are those conforming to the ISO standard ISO 24624:2016(E) Language resource management – Transcription of spoken language. The services are built on the library teispeechtools ; the source code of the services is available on GitHub. Currently, we offer:
unidentify: removing XML IDs
short description
documentation
Description of the target group and its size
formats and languages
- support for multilingual documents
- accepts any language
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml; format-variant=tei-iso-spokenISO-24624-compliant transcription of spoken language
- application/tei+xmlTEI-P5-compliant XML
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfisseni@ids-mannheim.de, Bernhard Fisseni (Developer) [GND]
- subject matter contactThomas Schmidt (Transcription Expert) [ORCID, GND]
version
application category
application subcategory
source code available
data communication encryption
privacy policy
authentication
Creators
- Bernhard Fisseni (Developer)
- Thomas Schmidt (Developer)
hoster
usage restrictions for individual users
TermoPL

TermoPL is a tool for automated extraction of terminology from Polish texts.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/jsonJSON data
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
TextGrid Laboratory
With the TextGridLab, a free software package, you can access tools and services to create, manage and edit research data. The open source software is the entry point to the virtual research environment. It is available for Windows, Mac OS X and Linux and provides differentiated access rights management within the protected research environment. The TextGridLab is optimised for XML/TEI development, e.g. in the context of digital editions.
**TextGridLab** features include, for instance:
-
Editor for text and XML with WYSIWYG functionality - Integrated unicode character table from the Unicode character set
-
A Text-Image-Link Editor - The Dictionary Search Tool - The note editor MEISE.
The infrastructure include powerful Project and User Management, Project Browser /Navigator, Search Tool, Metadata Editor Aggregation Composer, Import/Export Tool, revisions and collection publication (in the repository) supported by an automated metadata validation.
TextGridLab is used by German researchers in different research networks and edition projects, such as:
-
hybrid edition of Theodor Fontane's notebooks (Fontane Research Centre of the University of Göttingen) - text database and dictionary of classical Maya (University of Bonn) - the Library of Neology (University of Münster).
(see https://textgrid.de/en/web/guest/kooperationsprojekte)
TextGrid Lab TextGrid was a project of ten partners, funded by the German Federal Ministry of Education and Research (BMBF) for the period from June 2012 to May 2015 (reference number: 01UG1203A). Since 2016, TextGrid is part of the DARIAH-DE Research Infrastructure.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman, English
- text/plainplain text file
- application/xmlXTML file
- image/tiff
- application/xml+tei, Schema
application type
network and security requirements
- processor32 / 64 bit
- operating systemWindows, macOS, Linux, Linux
- runtimeEnvironmentJava Runtime Environment, JRE Version 6
- installation licensehttps://textgrid.liferay.de.dariah.eu/en/web/guest/terms-of-use
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactfunk@sub.uni-goettingen.de, veentjer@sub.uni-goettingen.de, philipp.wieder@gwdg.de,
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
- TextGrid Research Association, Responsibilities
- Ubbo Veentjer
- Stefan Funnk
- Thorsten Vitt
- Philipp Wieder
hoster
- SUB, Göttingen Germany
- GWDG, Göttingen Germany
part of an application suite
usage restrictions for individual users
countries supported
TextGrid Repository Portal
The TextGrid repository is a long-term archive for research data in the humanities. It provides a comprehensive, searchable and re-usable stock of texts and images. The TextGridRepository 2020 is based on the principles of Open Access and the FAIR principles and has been awarded the CoreTrustSeal. For researchers, the TextGrid Repository offers a sustainable, permanent and secure possibility to publish their research data in a citable manner and to describe them in a comprehensible way by means of required metadata. More about sustainability, FAIR and Open Access in the TextGrid Repository's mission statement.
documentation
Description of the target group and its size
formats and languages
- application/xml+tei, Schema
- text/plainplain text file
- application/epub+zip
- text/htmlHTML file
- application/zipzip archive
Localization
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contacttextgrid-support@gwdg.de,
- subject matter contactsupport@de.dariah.eu, https://de.dariah.eu,
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
- Göttingen State and University Library (SUB), Göttingen Germany
- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), Göttingen Germany
usage restrictions for individual users
countries supported
TF-IDF

TF, IDF, TF-IDF calculation.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- text/csvtabular data, comma-separated values
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Topic

Topic modelling of texts in Polish. The tools used include: Morfeusz 2 with SGJP dictionary (for morphological analysis), wcrft2 (for tagging), gensim and mallet (for topic modelling), and D3.js plus D3-tip (for result visualisation).
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- application/zipzip archive
- application/octet-streamarbitrary binary data
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
TopicsExplorer
The Topicsexplorer is a beginner-oriented Software allowing interested researchers to experiment with topic modeling on their own computers, with their own text corpora.
short description
documentation
- Tutorial (language: English)
- Manual (language: English)
- Worked Example
Description of the target group and its size
formats and languages
- languagesany
- text/plainplain text file
- text/xmlXML file
- text/csvtabular data, comma-separated values
application type
developer documentation
Datenblatt (Fact sheet)
contact
- technical contactinfo@de.dariah.eu
version
application category
application subcategory
source code available
privacy policy
authentication
Creators
hoster
part of an application suite
usage restrictions for individual users
countries supported
Ucto

Ucto is a unicode-compliant tokeniser. It takes input in the form of one or more untokenised texts, and subsequently tokenises them. Several languages are supported, but the software is extensible to other languages.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesSwedish, Russian, Spanish, Portuguese, Dutch, English, German, French, Italian
- text/plainplain text file
- Tadpole Columned Output Format
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
UDPipe

UDPipe is an trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks.
short description
documentation
Description of the target group and its size
licences
formats and languages
- languagesAfrikaans, Arabic, Armenian, Belarusian, Bulgarian, Catalan, Czech, Chinese, Church Slavic, Coptic, Czech, Danish, German, Dutch, Greek, Modern (1453-), English, Estonian, Basque, Persian, Finnish, French, French, French, Old (842-ca.1400), German, Gaelic, Irish, Galician, Gothic, Greek, Ancient (to 1453), Greek, Modern (1453-), Hebrew, Hindi, Croatian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Lithuanian, lzh, Marathi, Maltese, Dutch, Norwegian Nynorsk, Bokmål, Norwegian, orv, Persian, Polish, Portuguese, Romanian, Romanian, Russian, Sanskrit, Slovak, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, wof, Wolof, Chinese
- text/plainplain text file
- CoNLL-U Format
application type
Datenblatt (Fact sheet)
contact
- technical contactstraka@ufal.mff.cuni.cz, Milan Straka
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Milan Straka, Jana Straková
hoster
usage restrictions for individual users
countries supported
Valkuil (folia+xml)

Valkuil is a Dutch spelling correction system.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDutch
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Valkuil (plain text)

Valkuil is a Dutch spelling correction system.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDutch
- text/plainplain text file
- text/folia+xmlFormat for Linguistic Annotation (FoLiA) file
application type
Datenblatt (Fact sheet)
contact
- technical contactproycon@anaproy.nl, Maarten van Gompel
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Yes. Before tool use, please register at https://webservices-lst.science.ru.nl/register.
Creators
Maarten van Gompel, Ko van der Sloot (CLST, Radboud University Nijmegen)
hoster
usage restrictions for individual users
countries supported
Voyant Tools

Use it to learn how computers-assisted analysis works. Check out our examples that show you how to do real academic tasks with Voyant. Use it to study texts that you find on the web or texts that you have carefully edited and have on your computer. Use it to add functionality to your online collections, journals, blogs or web sites so others can see through your texts with analytical tools. Use it to add interactive evidence to your essays that you publish online. Add interactive panels right into your research essays (if they can be published online) so your readers can recapitulate your results. Use it to develop your own tools using our functionality and code.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish, German, Spanish, Dutch, French, generic
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tei+xmlTEI-P5-compliant XML
- application/tei+xml;format-variant=tei-dtaTexts in the DTA Base format
- nonenone
application type
Datenblatt (Fact sheet)
contact
- technical contactswitchboard@clarin.eu, Unknown Person
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Stéfan Sinclair (McGill Alberta) and Geoffrey Rockwell (U Alberta)
hoster
usage restrictions for individual users
countries supported
WCRFT2

Morpho-syntactic tagger for Polish - WCRFT2
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
WebLicht Advanced Mode

This tool links to the WebLicht environment without preselecting an execution chain. WebLicht is an execution environment for automatic annotation of text corpora. Linguistic tools such as tokenizers, part of speech taggers, and parsers are encapsulated as web services, which can be combined by the user into custom processing chains. The resulting annotations can then be visualized in an appropriate way, such as in a table or tree format.
short description
documentation
Description of the target group and its size
formats and languages
- languagesgeneric
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Const-Parsing-DE

WebLicht Easy Chain for Constituency Parsing (German). The pipeline makes use of WebLicht's TCF converter, the tokenizer and sentence boundary detector of the IMS/Stuttgart , and the constituent parser from the Berkeley NLP project. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Const-Parsing-EN

WebLicht Easy Chain for Constituency Parsing (English). The pipeline makes use of WebLicht's TCF converter, the Stanford tokenizer, and the statistical BLLIP/Charniak parser. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Dep-Parsing-DE

WebLicht Easy Chain for Dependency Parsing (German). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, the POS Tagger from the OpenNLP projet, and the MaltParser, a system for data-driven dependency parsing. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Dep-Parsing-EN

WebLicht Easy Chain for Dependency Parsing (English). The pipeline makes use of WebLicht's TCF converter, the Stanford tokenizer, the Jitar POS Tagger, and TurboParser, a multilingual dependency parser based on linear programming relaxations. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Dep-Parsing-HR (RELDI)

WebLicht Easy Chain for Dependency Parsing (Croatian). The easy-chain makes use of the RELDI software (see https://github.com/clarinsi), which tokenizes and lemmatizes the text, performs part-of-speech tagging, and subsequently, does dependency parsing. For RELDI specific inquiries, please contact nljubesi@gmail.com.
short description
documentation
Description of the target group and its size
formats and languages
- languagesCroatian
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Dep-Parsing-NL-ALPINO

WebLicht Easy Chain for Dependency Parsing (Dutch). The pipeline makes use of WebLicht's TCF converter, the tokenizer and sentence splitter from Alpino, and the Alpino dependency parser for Dutch. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesDutch
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Dep-Parsing-SL (RELDI)

WebLicht Easy Chain for Dependency Parsing (Slovenian). The easy-chain makes use of the RELDI software (see https://github.com/clarinsi), which tokenizes and lemmatizes the text, performs part-of-speech tagging, and subsequently, does dependency parsing. For RELDI specific inquiries, please contact nljubesi@gmail.com.
short description
documentation
Description of the target group and its size
formats and languages
- languagesSlovenian
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Dep-Parsing-SR (RELDI)

WebLicht Easy Chain for Dependency Parsing (Serbian). The easy-chain makes use of the RELDI software (see https://github.com/clarinsi), which tokenizes and lemmatizes the text, performs part-of-speech tagging, and subsequently, does dependency parsing. For RELDI specific inquiries, please contact nljubesi@gmail.com.
short description
documentation
Description of the target group and its size
formats and languages
- languagesSerbian
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Lemmas-DE

WebLicht Easy Chain for Lemmatization (German). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, and the IMS TreeTagger. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Lemmas-EN

WebLicht Easy Chain for Lemmatization (English). The pipeline makes use of WebLicht's TCF converter, the Stanford tokenizer, the Jitar POS Tagger, and the lemmatizer service from MorphAdorner. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Morphology-DE

WebLicht Easy Chain for Morphological Analysis (German). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, and the IMS tool on German morphology. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Morphology-EN

WebLicht Easy Chain for Morphological Analysis (English). The pipeline makes use of WebLicht's TCF converter, the Stanford tokenizer, and the morphology analysis service from MorphAdorner. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-NamedEntities-DE

WebLicht Easy Chain for German Named Entity Recognition (German). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, the IMS TreeTagger, and a German Named Entity Recognizer that has been trained based on a maximum entropy approach using the OpenNLP maxent library. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-NamedEntities-EN

WebLicht Easy Chain for Named Entity Recognition (English). The pipeline makes use of WebLicht's TCF converter, the Stanford tokenizer, and the Illinois Named Entity Recognizer. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-NamedEntities-SL

WebLicht Easy Chain for Named Entity Recognition (Slovenian). The easy-chain makes use of the ReLDI tag, NER JSI software, which performs NER without a parse.
short description
documentation
Description of the target group and its size
formats and languages
- languagesSlovenian
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-POSTags-Lemmas-DE

WebLicht Easy Chain for POS Tagging and Lemmatization (German). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, and the IMS TreeTagger. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesGerman
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-POSTags-Lemmas-EN

WebLicht Easy Chain for POS Tagging and Lemmatization (English). The pipeline makes use of WebLicht's TCF converter, the Stanford tokenizer, the Jitar POS Tagger, and the lemmatizer service from MorphAdorner. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesEnglish
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-POSTags-Lemmas-FR

WebLicht Easy Chain for POS Tagging and Lemmatization (French). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, and the IMS TreeTagger. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesFrench
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-POSTags-Lemmas-IT

WebLicht Easy Chain for POS Tagging and Lemmatization (Italian). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, and the POS Tagger from the OpenNLP project. The model for Italian is trained on a relatively small training corpus (MIDT) and should therefore be considered experimental. WebLicht's Tundra can be used to visualize the result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesItalian
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebLicht-Tokenization-TUR

WebLicht Easy Chain for tokenization of Turkish texts. The pipeline makes use of WebLicht's TCF converter, and the tokenizer from the OpenNLP project. The 'newlineBounds' parameter treats newlines as a hard break (a sentence boundary). WebLicht's built-in viewer for annotations can be used to visualize the processing result.
short description
documentation
Description of the target group and its size
formats and languages
- languagesTurkish
- text/plainplain text file
- text/rtfWord Processing File in the Rich Text Format
- application/pdfAdobe PDF file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/tcf+xmlTCF file
application type
Datenblatt (Fact sheet)
contact
- technical contactwlsupport@sfs.uni-tuebingen.de, CLARIN WebLicht Support
version
application category
application subcategory
data communication encryption
privacy policy
authentication
authentication requirements
Requires a CLARIN Service Provider Federation account, provided by many universities and institutions.
Creators
CLARIN-D Centre at the University of Tuebingen, Germany
hoster
usage restrictions for individual users
countries supported
WebSty

Similarity and clustering of texts in Polish. The tools used include: Morfeusz 2 with SGJP dictionary (for morphological analysis), wcrft2 (for tagging), Liner2 (for named entities recognition), Fextor (for extraction of feaures from texts); Cluto (for clustering), result visualisation: D3.js, D3-tip. For zip files with content in English, German, Russian, Hungarian, and Spanish, users are redirected to WebStyML.
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish, English, German, Russian, Hungarian, Spanish
- application/zipzip archive
- application/octet-streamarbitrary binary data
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
Wortverlaufskurven
The frequency with which a word is used changes over time: it can increase (examples: stress, demography) or decrease or the word may even go out of use altogether (examples: fried fish, soon). Often a word is also gradually replaced by a other; sneakers are now almost as common as german Turnschuhe.
With the tool "Word progression curves" such changes can be determined in different corpora. The three most important corpora are:
DTA-Total+DWDS-Core Corpus (1600-1999, approx. 350 million tokens), the DWDS newspaper corpus (from 1946, default view in the DWDS, approx. 6.3 billion tokens) as well as the ZDL regional corpus (from 1993, approx. 6.2 billion tokens, usable only after registration).
short description
documentation
Description of the target group and its size
API
formats and languages
- languagesDeutsch
- image/jpeg
- image/png
- application/pdfAdobe PDF file
- image/svg+xml, Schema
application type
network and security requirements
- operating systemLinux
Datenblatt (Fact sheet)
contact
- technical contactwiegand@bbaw.de, Frank Wiegand (Developer)
- subject matter contactAlexander Geyken (Arbeitsstellenleiter Digitales Wörterbuch der deutschen Sprache) [GND]
version
application category
application subcategory
data communication encryption
privacy policy
authentication
hoster
usage restrictions for individual users
countries supported
WoSeDon

Word Sense Disambiguation for Polish texts based on plWordNet - the Polish wordnet (weakly supervised, for all words).
short description
documentation
Description of the target group and its size
formats and languages
- languagesPolish
- text/plainplain text file
- application/mswordMicrosoft Word file
- application/vnd.openxmlformats-officedocument.wordprocessingml.documentMicrosoft OpenXML word processing file (Word)
- application/vnd.openxmlformats-officedocument.presentationml.presentationMicrosoft OpenXML presentation file (PowerPoint)
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMicrosoft OpenXML spreadsheet file (Excel)
- application/vnd.oasis.opendocument.textOpenDocument Text file
- application/pdfAdobe PDF file
- text/htmlHTML file
- text/rtfWord Processing File in the Rich Text Format
- application/xmlXTML file
application type
Datenblatt (Fact sheet)
contact
- technical contacttomasz.walkowiak@pwr.edu.pl, Tomasz Walkowiak
version
application category
application subcategory
data communication encryption
privacy policy
authentication
Creators
Clarin-PL
hoster
usage restrictions for individual users
countries supported
XTriples
A generic webservice to extract RDF statements from XML resources. With the XTriples webservice you can crawl XML repositories and extract RDF statements using a simple configuration based on XPATH/XQuery expressions. The webservice can be used with direct POST, form-style POST or GET requests.
short description
documentation
Description of the target group and its size
licences
formats and languages
- application/xmlXTML file
- application/rdf+xml
- application/turtle
- application/ntriples
- application/nquads
- application/trix
- application/ld+json
- image/svg+xml
- application/xtriples
Localization
application type
Datenblatt (Fact sheet)
contact
- technical contactgeneralsekretariat@adwmainz.de, Torsten Schrade (Developer) [ORCID]
- subject matter contactTorsten Schrade (Developer) [ORCID]