Skip to main content

Corpus Query Tools

The software applications included in this resource family allow searching, exploring, analysing and visualizing linguistic corpora and texts. Text and corpus analysis lie at the heart of digital scholarship in the humanities and social sciences, and a wide range of software tools are available in this domain. These software tools represent prime examples of the ways in which language technologies can support research across a range of disciplines, and they are therefore central to CLARIN’s mission.

The resource family includes both applications for installation on the users own computer (desktop) and those accessible via a web browser (online), with some key information about them in order to help users to find them and to choose between them for a particular research goal. A 'corpus analysis tool' is defined here in the sense indicated by the late John Sinclair (and others) that the basic operations of corpus linguistics involve ‘corpus, concordance, collocation’. So we include tools that can at least: deal with a corpus, show concordances, as well as (preferably) calculating frequent collocates.

Most of the tools listed so far can do a lot more than this, including generating word frequency lists and keywords, calculating n-grams and clusters, working with linguistic annotation and descriptive metadata, and producing visualizations of distributions of words and features.

For comments, changes of the existing content or inclusion of new corpora, send us an resource-families [at] clarin.eu (email).

 

Corpus Query Tools in the CLARIN Infrastructure

Online Query Tools

Tool Language Description  

Voyant tools (SADILAR)

Functionality: Querying/concordancing, Stylometry 
Licence: GPL3 (code)

Arabic, Bosnian, Croatian, Czech, English, French, German, Hebrew, Italian, Japanese, Portuguese, Serbian, Spanish

This tool constitutes a deployment of Voyant Tools used at SADILAR.

CLARIN Centre: SADiLAR
 

 

Intellitext

Functionality: Querying/concordancing, corpus upload

Arabic, Czech, Chinese, English, French, German, Italian, Japanese, Kannada, Lithuanian, Portuguese, Russian, Spanish, Ukrainian

The Intelligent Tools for Creating and Analysing Electronic Text Corpora for Humanities Research (IntelliText) project aims to facilitate corpus use for academics working in various areas of the humanities. The project produced a user-friendly corpus interface with an array of easy-to-use functions that will benefit teaching and research in several academic disciplines.

It is possible to upload one's own corpus with this tool. An online guide is available.

CLARIN Centre: CLARIN-UK
 

 

WebClark

Functionality: Querying/concordancing

Bulgarian

This is a dedicated concordancer for the Bulgarian National Reference Corpus.

CLARIN Centre: CLARIN-BG
 

 

Concordancer of the Croatian National Corpus

Functionality: Querying/concordancing

Croatian

This is an implementation of NoSketchEngine for the Croatian National Corpus.

CLARIN Centre: CLARIN-HR
 

 

Kontext (LINDAT)

Functionality: Querying/concordancing

Czech

KonText is a basic web application for querying corpora available within the LINDAT/CLARIAH-CZ project. It allows evaluation of simple and complex queries, displaying their results as concordance lines, computing frequency distribution, calculating association measures for collocations and further work with language data. This LINDAT/CLARIAH-CZ instance is a fork of KonText application developed by the Institute of the Czech National Corpus that has been further extended by the Institute of Formal and Applied Linguistics to suit the needs of LINDAT/CLARIAH-CZ project.

It is possible to upload one's own corpus with this tool. KonText is openly developed. Registration is required and Shibboleth log-in is supported.

CLARIN Centre: CLARIAH-CZ
Publication: Machalek (2020)

 

 

Korp (Copenhagen)

Functionality: Querying/concordancing

Danish

This is a web-based concordancer that can be used for corpus queries based on morphosyntactic analysis and various other features.

Registration is required.

CLARIN Centre: CLARIN-DK
 

 

Concordancer of Corpus Gysseling

Functionality: Querying/concordancing

Dutch

This is a dedicated query tool for the Corpus Gysseling, developed by the Instituut voor de Nederlandse Taal. The backend of the application is the BlackLab Lucene-based search engine developed for corpora with token-based annotation. The web-based frontend is a further development of the corpus-frontend application developed by INT in CLARIN and CLARIAH projects.

CLARIN Centre: CLARIAH-NL
 

 

Concordancer of Corpus Middelnederlands

Functionality: Querying/concordancing

Dutch

This is a dedicated query tool for the Corpus Middelnederlands.

CLARIN Centre: CLARIAH-NL
 

 

GrETEL 4.0

Functionality: Querying/concordancing (treebanks)

Dutch

GrETEL stands for Greedy Extraction of Trees for Empirical Linguistics. It is a user-friendly search engine for the exploitation of syntactically annotated corpora or treebanks.

It is possible to upload one's own corpus with this tool.

CLARIN Centre: CLARIAH-NL
Publication: Odijk et al. (2018)

 

 

nederlab

Functionality: Querying/concordancing

Dutch

This is an online research portal for historical texts in the Dutch language.

Registration is required and Shibboleth log-in is supported.

CLARIN Centre: CLARIAH-NL
 

 

OpenSoNaR

Functionality: Querying/concordancing

Dutch

This is an online corpus retrieval system that allows for analyzing and searching the SoNaR and CGN corpora.

Registration is required and Shibboleth log-in is supported.

CLARIN Centre: CLARIAH-NL
Publication: Does et al. (2017)

 

 

Couranten

Functionality: Querying/concordancing

Dutch (17th Century)

This is a dedicated querying tool for the Couranten Corpus, which comprises the seventeenth-century Dutch newspapers, available on Delpher.

CLARIN Centre: CLARIAH-NL
 

 

BNCweb (Lancaster)

Functionality: Querying/concordancing

English

This tool is a modified version of CQPweb for the British National Corpus. It allows a number of search options: publication date, text medium, author gender, target audience, genre, author domicile.

Registration is required to use the tool.

CLARIN Centre: CLARIN-UK
 

 

CLiC

Functionality: Querying/concordancing 
Licence: Use of CLiC follows the University of Birmingham’s legal policy

English

This tool has been developed as part of the CLiC Dickens project, which demonstrates through corpus stylistics how computer-assisted methods can be used to study literary texts and lead to new insights into how readers perceive fictional characters. Further literary texts have been added to the online service.

Technical support is offered through clic [at] contacts.birmingham.ac.uk (email).

CLARIN Centre: CLARIN-UK
Publication: Mahlberg et al. (2020)

 

 

Wmatrix

Functionality: Querying/concordancing, corpus upload and processing

English, Spanish

This tool provides a web interface to the English USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains.

It is possible to upload one's own corpus with this tool. The tool is free for UK government and academic researchers in countries on the OECD DAC list, £50 per username per year for non commercial research and teaching. Technical support is offered here.

CLARIN Centre: CLARIN-UK
Publication: Rayson (2008)

 

 

CQPweb (Lancaster)

Functionality: Querying/concordancing 
Licence: No licence

English, Arabic, French, Italian, Norwegian, Polish, Latvian

This is an online implementation of the CQPweb system with a large number of corpora installed.

It is possible to upload one's own corpus with this tool. Note that CQPweb will be superseded by Ziggurat, which is under development. Registration is required to use this tool.

CLARIN Centre: CLARIN-UK
 

 

Concordancer of the Text Corpus of the Institute of the Estonian Language

Functionality: Querying/concordancing

Estonian

This tool provides a simple interface for a text corpus. The material for the text corpus has been collected haphazardly, 10.4 million word forms. Approximately 80% of the texts come from newspapers, which is why the corpus is not representative. The corpus also is not tagged, thus being suited for lexical search mainly.

CLARIN Centre: CELR
 

 

Korp (Kielipankki)

Functionality: Querying/concordancing 
Licence: Individual corpora have different licenses (and access conditions)

Finnish, Swedish, Russian, English, and more

This is a web-based concordance tool that can be used for corpus queries based on morphosyntactic analysis and various other features. A large proportion of the corpora in Kielipankki are offered via Korp.

User support is available through email.

CLARIN Centre: PORTULAN CLARIN 
Publication: KorP publications

 

 

COSMAS II

Functionality: Querying/concordancing 
Licence: DeReKo-EULA

German

This tool is used for querying the German reference corpus DeReKo, as well as several other historical and non-historical corpora.

Technical support is offered through cosmas2 [at] ids-mannheim.de (email).

CLARIN Centre: CLARIN-D
Publication: Bodmer (1996)

 

 

DWDS

Functionality: Querying/concordancing

German

This is a tool for browsing DWDS corpora. The DWDS is part of the Center for Digital Lexicography of the German Language (ZDL), funded by the Federal Ministry of Education and Research. It is based at the Berlin-Brandenburg Academy of Sciences.

CLARIN Centre: CLARIN-D
 

 

KorAP (on DeReKo)

Functionality: Querying/concordancing 
Licence: DeReKo-EULA

German

This is a corpus analysis platform that is suited for large, multiply annotated corpora and complex search queries independent of particular research questions.

Registration is required only for license restricted corpora.

CLARIN Centre: CLARIN-D
Publication: Diewald et al. (2016)

 

 

SHEBANQ

Functionality: Querying/concordancing

Hebrew

This is a dedicated online environment for querying the Hebrew Bible.

CLARIN Centre: CLARIAH-NL
 

 

AutoSearch

Functionality: Querying/concordancing, corpus upload and analysis

Language independent

This tool allows users to upload corpora annotated at the token level for (extended) part of speech, lemma and word form in FoLiA or format, after which the corpus can be searched for these properties with a Corpus of Contemporary Dutch-like interface

CLARIN Centre: CLARIAH-NL
 

 

Concordancer of the Italian Corpus for the dissemination of culture and the enhancement of the Italian literary heritage

Functionality: Querying/concordancing (non-parallel and parallel)

Language independent

This tool allows text and corpora querying, supporting both basic information retrieval and advanced search. It allows the customization of the query system functionalities and provides indexing also for morpho-syntactically annotated texts. The system can handle several type of text annotations and make concordances also for parallel bilingual corpora.

CLARIN Centre: CLARIN-IT
 

 

Corpuscle

Functionality: Querying/concordancing

Language independent This is a corpus management and analysis system for annotated corpora, with sophisticated query language. It is a reimplementation of Corpuscle featuring an improved user experience and many new features that is now available as a Meurer (2012)  

Glossa

Functionality: Querying/concordancing and text analysis 
Licence: Public license for software. CLARIN-licenses for corpora in Glossa

Language independent

Glossa offers a modern, simple and functional search interface with advanced post-processing possibilities for both written corpora, multilingual corpora and speech corpora.

Glossa is developed at the Text Laboratory, Department of Linguistics and Scandinavian Studies, University of Oslo with support from the Norwegian contribution to the CLARIN infrastructure, CLARINO. Glossa is also freely available for download from GitHub and is easy to install on one's own server. Glossa is search engine agnostic and comes with support for the IMS Corpus Workbench and CLARIN Federated Content Search out of the box.

CLARIN Centre: CLARINO Text Laboratory Centre
 

 

INESS

Functionality: Querying/concordancing (treebanks)

Language independent

INESS is the Norwegian Infrastructure for the Exploration of Syntax and Semantics. INESS offers an open, interactive, language independent platform for building, accessing, searching and visualizing treebanks.

INESS offers a user guide for querying their treebanks.

CLARIN Centre: CLARINO
Publication: INESS publications

 

 

Voyant tools

Functionality: Querying/concordancing, Stylometry

Language independent

This tool constitutes a deployment of Voyant Tools at CLARIN-DK.

CLARIN Centre: CLARIN-DK
 

 

Kontext at the Centre of Latvian language resources and tools

Functionality: Querying/concordancing

Latvian

This tool corresponds to an implementation of LINDAT's KonText for Latvian resources.

Eight Latvian corpora can be searched with this tool.

CLARIN Centre: CLARIN-LV
 

 

Latvian National Corpora Collection (LNCC)

Functionality: Querying/concordancing

Latvian, Latgalian, Lithuanian

Latvian National Corpora Collection (LNCC) is a diverse collection of corpora representing both written and spoken language. LNCC covers various use cases and all the important text types and genres. It is a continuous multi-institutional and multi-project effort, supported by the digital humanities and language technology communities in Latvia.

Currently, 34 corpora developed by 13 institutions are available in the LNCC. Most of the corpora are annotated with a uniform morpho-syntactic annotation scheme and included in the federated search. The federated search combines multiple corpora from two corpus indexer instances (endpoints) maintained by IMCS UL and NLL. Federated search includes 28 corpora (2.4 billions tokens).

CLARIN Centre: CLARIN-LV
Publication: Saulite et al. (2022)

 

 

TEITOK

Functionality: Querying/concordancing, corpus upload and processing

Multiple

This is a web-based system for viewing, creating, and editing corpora with both rich textual mark-up and linguistic annotation. For visitors, the system provides a graphical user interface in which the annotated document can be visualized in a number of different ways. And for administrators of the corpus, TEITOK uses the same interface to allow easy editing of the underlying XML document, meaning administrators can correct their corpus while they are consulting it.

Registration is required and Shibboleth log-in is supported. User documentation is available.

CLARIN Centre: CLARIAH-CZ
Publication: Janssen (2016)

 

 

NB DH-LAB

Functionality: Querying/concordancing/analysis

Norwegian Bokmål, Norwegian Nynorsk, Northern Sami, Lule Sami, Southern Sami

This collection of tools corresponds to a , Python package and web applications allowing a user to build corpora from the vast digital collections of the National Library of Norway (currently ca. 160 billion words). Users get concordances, frequency lists and co-occurrence data.

User support is available through email.

CLARIN Centre: CLARINO
 

 

CINTIL Concordancer

Functionality: Querying/concordancing 
Licence: Proprietary

Portuguese

This is a freely available online concordancing service to support the research usage of the CINTIL Corpus. The CINTIL concordancer allows the use of patterns to specify the occurrences to be retrieved. This permits to uncover linguistic structures of high complexity and use this service as a powerful research tool.

CLARIN Centre: PORTULAN CLARIN
Publication: Barreto et al. (2006)

 

 

Kontext (CLARIN.SI)

Functionality: Querying/concordancing 
Licence: No licence

Slovenian, Croatian, Bosnian, Serbian, Montenegrin, Macedonian, Serbo-Croatian, Bulgarian, Czech, Slovak, Polish, English, Danish, Dutch, Estonian, Finnish, French, Gaelic, German, Greek, Hungarian, Icelandic, Italian, Japanese, Latvian, Lithuanian, Portu

This is the CLARIN.SI installation of LINDAT's KonText, comprised of the KonText front-end developed by the Czech National Corpus team and the Manatee back-end, developed by Lexical Computing. This installation offers over 50 richly annotated corpora in Slovenian and other languages.

Shibboleth log-in is supported.

CLARIN Centre: CLARIN.SI
 

 

NoSKetchEngine (CLARIN.SI)

Functionality: Querying/concordancing 
Licence: no

Slovenian, Croatian, Bosnian, Serbian, Montenegrin, Macedonian, Serbo-Croatian, Bulgarian, Czech, Slovak, Polish, English, Danish, Dutch, Estonian, Finnish, French, Gaelic, German, Greek, Hungarian, Icelandic, Italian, Japanese, Latvian, Lithuanian, Portu

This is an open-source version of the commercial Sketch Engine, produced by Lexical Computing. This installation of noSketch Engine at CLARIN.SI offers over 50 richly annotated corpora in Slovenian and other languages.

CLARIN Centre: CLARIN.SI
 

 

Korp (Språkbanken)

Functionality: Querying/concordancing

Swedish

This is Språkbanken's corpus tool for searching in large amounts of texts, including newspapers, novels and social media.

CLARIN Centre: SWE-CLARIN
Publication: Borin et al. (2012)

 

 

Desktop Tools

Tool Language Description  

#LancsBox

Functionality: Concordancing/querying 
Platform: Platform-independent (java) 
Licence: CC BY-NC-ND 4.0

Language independent

#LancsBox is a new-generation software package for the analysis of language data and corpora developed at Lancaster University. The latest version, #Lancsbox X has increased functionality for XML texts.

A user guide is available in English, French and Japanese, along with instructional videos. See here.

CLARIN Centre: CLARIN-UK
Publication: Brezina et al. (2015)

 

 

CLAN

Functionality: Concordancing/querying 
Platform: Windows, MacOS, Source code provided for Linux users 
Licence: GPL2 (source code)

Language independent

The CLAN Programs are downloaded, installed, and used as a single application. Functionally, however, CLAN has two parts. The first part is the CLAN editor which can be used to edit files in either CHAT or CA (Conversation Analysis) format. The editor also provides a wide range of additional functions, such as audio and video playback, linkage to audio and video, fonts for Roman and non-Roman orthographies, data validation, adding codes to files, and shipping data to other programs. The second part of CLAN is the set of data analysis programs. These programs are run from a separate window called the Commands window. The results of the analytic programs are sent to the CLAN Output window.

The tool is only compatible with TalkBank corpora that have CHAT annotation.

An online manual is available.

CLARIN Centre: TalkBank
 

 

CLaRK

Functionality: Concordancing/querying, corpus building 
Platform: Platform-independent

Language independent

This tool is an XML-based system for corpus linguistics, primarily for corpus construction, but also with functionality for analysing and exploring corpora.

The support team is reachable through clark-support [at] bultreebank.org (email). A user manual is also available.

CLARIN Centre: CLARIN-BG
Publication: Simov et al. (2014)

 

 

GATE

Functionality: Concordancing/querying 
Platform: Platform-independent (Windows and generic installers available) 
Licence: GNU

Language independent

This tool allows for text and corpus analysis.

CLARIN Centre: CLARIN-UK
 

 

Q-CAT Corpus Annotation Tool 1.5

Functionality: Annotating/concordancing/querying/listening to audio recordings 
Platform: .NET 
Licence: Apache License 2.0

Language independent

The tools allows for manual linguistic annotation of corpora and advanced queries on top of these annotations.

The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian, such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system.

This resource is available for download from the CLARIN.SI repository.

CLARIN Centre: CLARIN.SI
Publication: Krek etal. (2020)

 

 

Corpus Query Tools Outside CLARIN

Online Query Tools

Tool Language Description  

Voyant Tools (home)

Functionality: Querying/concordancing, Stylometry 
Licence: GPL3 (code)

Arabic, Bosnian, Croatian, Czech, English, French, German, Hebrew, Italian, Japanese, Portuguese, Russian, Serbian, Spanish

This is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public.

It is possible to upload one's own corpus with this tool.

The interface is available in a number of languages. An online user guide is available.

CLARIN Centre: External
 

 

Concordancer of Corpus Hedendaags Nederlands (Corpus of Contemporary Dutch) 

Functionality: Querying/concordancing

Dutch

This is a dedicated query tool, built on BlackLab software, for Corpus Hedendaags Nederlands (Corpus of Contemporary Dutch), a corpus of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814–2013).

The corpus is a combination of the 5, 27 and 38 million word corpora and the PAROLE Corpus, supplemented with newspaper texts from NRC and De Standaard (until 2013).

Registration is required for using this tool. Shibboleth log-in is supported.

CLARIN Centre: External
 

 

PaQu

Functionality: Querying/concordancing (treebanks)

Dutch

This is an application for searching in treebanks (i.e. text corpora in which each sentence has been assigned a syntactic structure) and for analysing the search results.

It is possible to upload one's own corpus with this tool, for which registration is required.

CLARIN Centre: External
Publication: Odijk et al. (2017)

 

 

CoANZSE Audio

Functionality: Querying/concordancing

English

This is a dedicated concordancer for the Corpus of Australian and New Zealand Spoken English.

The corpus contains 195 million words of geolocated automatic speech recognition transcripts of video content from local governments in Australia and New Zealand, created for the study of lexical, grammatical, phonetic, and discourse-pragmatic phenomena of spoken language. Additionally, the corpus contains complete textual content of the corpus, audio files and forced alignments in Praat's TextGrid format for most transcripts.

The corpus can be accessed through the CLARIN Service Provider Federation.

CLARIN Centre: External
Publication: Coats (2022)

 

 

english-corpora.org

Functionality: Querying/concordancing

English

This is a tool for browsing the corpora available on english-corpora.org, which are formerly known as the BYU or Brigham Young University copora.

CLARIN Centre: External
Publication: English Corpora

 

 

Compleat Lexical Tutor

Functionality: Querying/concordancing, corpus upload and analysis 
Licence: up to 1.5 million words).

English, French

This tool includes a concordancer, vocabulary profiler, exercise maker, interactive exercises, and much more.

It is possible to upload one's own corpus with this tool (10 MB limit

CLARIN Centre: External
 

 

SKell (SKetch Engine for language learners)

Functionality: Querying/concordancing

English, Russian, German, Italian, Czech, Estonian

This is a simple tool for students and teachers of English to easily check whether or how a particular phrase or a word is used by real speakers of English.

CLARIN Centre: External
Publication: Baisa and Suchomel (2014)

 

 

TXM

Functionality: Querying/concordancing

French, English

This tool corresponds to a number of different TXM portals running at various sites and with a number of different corpora. TXM offers online analysis tools for querying language corpora. The interface is in French.

CLARIN Centre: External
 

 

CATMA

Functionality: Querying/concordancing, corpus upload and analysis

German

The acronym CATMA stands forComputer Assisted Text Markup and Analysis.

It is possible to upload one's own corpus with this tool.

CLARIN Centre: External
 

 

Webcorp

Functionality: Querying/concordancing

Language independent

This is a dedicated concordancing tool.

CLARIN Centre: External
 

 

Webcorp Learn

Functionality: Querying/concordancing

Language independent

This tool gives researchers access to a large collection (corpus) of newspaper articles spanning three decades. The tool has been created by linguists to encourage curiosity in language learners. WebCorp Learn promotes playful and context-based inductive learning and enables you to discover language through exploratory experimentation.

Registration is required.

CLARIN Centre: External
 

 

Webcorp LSE (Linguist's Search Engine)

Functionality: Querying/concordancing

Language independent

This is a dedicated tool for the study of language on the web. The corpora were built by crawling the web and extracting textual content from web pages. Searches can be performed to find words, lemmas or phrases, including pattern matching, wildcards and part-of-speech. Results are given as concordance lines in KWIC format. Post-search analyses are possible including time series, collocation tables, sorting and summaries of meta-data from the matched web pages.

It is possible to upload one's own corpus with this tool.

Registration is required.

CLARIN Centre: External
 

 

I-Analyzer

Functionality: Querying/concordancing, analysis, visualizations

Multiple

I-Analyzer allows searching and exploring text corpora, visualizing trends, and downloading tables of text and metadata for further analysis. I-Analyzer is open-source software and freely available.

  1. Digital Library for Dutch Literature (DBNL),
  2. Financial reports of Dutch companies,
  3. Dutch Newspapers from the Royal Library: public dataset and full dataset (available upon request),
  4. Eighteenth Century Collections Online (available for Utrecht University users),
  5. Jewish Funerary Inscriptions,
  6. Book reviews from Goodreads,
  7. The Guardian-Observer newspaper archives (available for Utrecht University users),
  8. 19th century UK Periodicals (available for Utrecht University users),
  9. Dutch court rulings,
  10. Times newspaper archives (available for Utrecht University users),
  11. Dutch monarchs’ speeches,
  12. Dutch parliamentary debates.

 

CLARIN Centre: External
 

 

SketchEngine

Functionality: Querying/concordancing, corpus upload and processing 
Licence: Proprietary

Multiple

Sketch Engine is a commercial online corpus analysis application, used by linguists, lexicographers, translators, students and teachers. Sketch Engine contains 600 ready-to-use corpora in 90+ languages.

It is possible to upload one's own corpus with this tool. Registration is required and Shibboleth log-in is supported. Support is offered via email.

CLARIN Centre: External
Publication: Sketch Engine bibliography

 

 

National Corpus of Polish (IPI PAN search engine)

Functionality: Querying/concordancing

Polish

This is a dedicated concordancer for NKJP corpora.

CLARIN Centre: External
 

 

National Corpus of Polish (Pelcra search engine)

Functionality: Querying/concordancing

Polish

This is a dedicated concordancer for NKJP corpora.

CLARIN Centre: External
 

 

Concordancer of O corpus do português

Functionality: Querying/concordancing

Portuguese

This is a dedicated concordancer for the Corpus of Portuguese developed by Mark Davies.

CLARIN Centre: External
Publication: publications

 

 

KorAP (on CoRoLa)

Functionality: Querying/concordancing

Romanian

This tool is used to query the Reference Corpus for Contemporary Romanian Language CoRoLa.

CLARIN Centre: External
Publication: Diewald et al. (2019)

 

 

Concordancer of the Corpus del Español

Functionality: Querying/concordancing

Spanish

This is a querying tool for the corpora from Corpus del Español, which provide billions of words of recent data from 21 Spanish-speaking countries. There are four different corpora in the Corpus del Español.

CLARIN Centre: External
Publication: English Corpora

 

 

Desktop Tools

Tool Language Description  

aConCorde

Functionality: Concordancing/querying 
Platform: Platform-independent (java) 
Licence: No licence

Language independent

This is a multi-lingual concordance tool. Originally developed for native Arabic concordance, it posses basic concordance functionality, as well as English and Arabic interfaces.

CLARIN Centre: External
 

 

Antconc

Functionality: Concordancing/querying 
Platform: Linux, MacOS, Windows 
Licence: Proprietary

Language independent

This is a freeware corpus analysis toolkit for concordancing and text analysis.

Online videos and manuals from the creator and community (Google Group).

CLARIN Centre: External
 

 

AntPConc

Functionality: Parallel Concordancing/querying 
Platform: Linux, MacOS, Windows

Language independent

This is a freeware parallel corpus analysis toolkit for concordancing and text analysis using UTF-8 encoded text files.

CLARIN Centre: External
 

 

CasualConc 

Functionality: Concordancing/querying 
Platform: MacOS 
Licence: No licence

Language independent

This is a concordance program that runs natively on macOS 11.3 or later.and can generate KWIC concordance lines, word clusters, collocation analysis, and word count.

CLARIN Centre: External
 

 

Collocate 

Functionality: Concordancing/querying 
Platform: Windows 
Licence: No licence

Language independent

This tool is a Windows software program that can be used to find collocations or terms in a corpus. It is a commercial tool.

CLARIN Centre: External
 

 

ConcGram

Functionality: Concordancing/querying

Language independent

This tool is a corpus linguistics software package which is specifically designed to find all the co-occurrences of words in a text or corpus irrespective of variation. This is a commercial tool, available for purchase on optical disc.

CLARIN Centre: External
Publication: Greaves (2009)

 

 

Coquery

Functionality: Concordancing/querying 
Platform: Linux, MacOS, Windows 
Licence: GPL3

Language independent

This is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus. The tool works with any corpus, with installers for a number of widely used ones.

CLARIN Centre: External
 

 

CorpKit 

Functionality: Concordancing/querying 
Platform: OSX 
Licence: No licence

Language independent

This is a tool for doing corpus linguistics. It enables parsing, concordancing and keywording, including concordance by searching for combinations of lexical and grammatical features, and keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses. corpkit leverages a number of sophisticated programming libraries, including pandas, matplotlib, scipy, Tkinter, tkintertable and Stanford CoreNLP.

CLARIN Centre: External
 

 

Corpus Explorer

Functionality: Concordancing/querying, text and data mining 
Platform: Windows

Language independent

This tool is intended for corpus linguistics and for text and data mining.

CLARIN Centre: External
 

 

Corpus Presenter 

Functionality: Concordancing/querying, corpus compilation 
Platform: Windows 
Licence: No licence

Language independent

This tool can be used to compile text corpora and to carry out retrieval tasks on any corpus or selection of text files, no matter what their source or how they are organised. The tool is designed to have a maximally open architecture and can be used straight away to examine any texts users may have access to.

CLARIN Centre: External
Publication: Hickey (2003)

 

 

Corpus Workbench

Functionality: Concordancing/querying 
Platform: Linux, MacOS, VM images (CQPwebinABox) 
Licence: GPL3

Language independent

This is a collection of open-source tools for managing and querying large text corpora (up to 2 billion words) with linguistic annotations. Its central component is the flexible and efficient query processor CQP.

CLARIN Centre: External
 

 

EXAKT

Functionality: Concordancing/querying 
Platform: Windows, MacOS, Linux with a current Java runtime environment.

Language independent

EXAKT (‘EXMARaLDA Analysis- and Concordance Tool’) is the query and analysis tool for EXMARaLDA corpora. It can also be used for corpora created with other tools (FOLKER, Transcriber, ELAN).

Support is offered via the CLARIN-D Helpdesk. Manuals and how-to guides are available; there have also been training courses for EXAKT. The source code of the program is open source and accessible via GitHub.

CLARIN Centre: External
 

 

ICECup

Functionality: Concordancing/querying 
Platform: Windows 
Licence: Proprietary

Language independent

This is a state-of-the-art corpus exploration program designed for parsed corpora such as ICE-GB and The Diachronic Corpus of Present-Day Spoken English. This is a commercial tool that works for ICE corpora with proprietary annotation scheme.

A handbook is available.

CLARIN Centre: External
 

 

LIWC-22

Functionality: Concordancing/querying 
Platform: Windows, MacOS 
Licence: Proprietary

Language independent

This is a commercial product for analyzing word use. It can be used to study a single individual, groups of people over time, or all of social media.

CLARIN Centre: External
 

 

Monoconc

Functionality: Concordancing/querying 
Platform: Windows 
Licence: No licence

Language independent

This is a concordance programme. It is made available on a commercial basis.

CLARIN Centre: External
 

 

NooJ

Functionality: Concordancing/querying 
Platform: Windows (cut-down java version also available for other OS) 
Licence: GPL Academic - Non-commercial (Java version)

Language independent

This tool is part of a linguistic development environment, which includes functionality for text and corpus analysis.

CLARIN Centre: External
 

 

NoSketchEngine

Functionality: Concordancing/querying 
Platform: Linux 
Licence: Components available under separate licences: GPLv2+, GPLv3

Language independent

This is an open source version of Sketch Engine with certain functionality limitations (for instance, WordSketch is not available).

CLARIN Centre: External
 

 

NVivo

Functionality: Concordancing/querying 
Platform: Windows, MacOS 
Licence: Proprietary

Language independent

This is a commercial software application for qualitative text and data analysis.

CLARIN Centre: External
 

 

ParaConc 

Functionality: Parallel Concordancing/querying 
Platform: Windows 
Licence: No licence

Language independent

A parallel concordance programme for aligned source and target translation texts. This is a commercial tool.

CLARIN Centre: External
 

 

Praaline 

Functionality: Concordancing/querying, corpus building 
Platform: Linux, MacOS, Windows 
Licence: GPL3

Language independent

This is a system for managing, annotating, visualising and analysing spoken language corpora.

CLARIN Centre: External
 

 

PyXMLConc 

Functionality: Concordancing/querying 
Platform: Platform-independent (requires Python) 
Licence: MIT

Language independent

This is a simple concordancer. It is supposed to be used in exploratory analysis of XML-annotated corpora. Its primary feature lies in the automatic detection of XML tags and attributes. The search/concordancing function supports regular expressions.

CLARIN Centre: External
 

 

Scattertext

Functionality: Concordancing/querying

Language independent

This is a tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding to terms are selectively labelled so that they don't overlap with other labels or points.

CLARIN Centre: External
 

 

Shinyconc 

Functionality: Concordancing/querying, corpus building 
Platform: Windows, MacOS, Linux 
Licence: GPL3

Language independent

This is a framework for generating custom web-based concordancers. It requires R and Rstudio/Shiny.

A detailed setup tutorial is available.

CLARIN Centre: External
 

 

Simple Concordance Program 

Functionality: Concordancing/querying 
Platform: Linux, MacOS, Windows 
Licence: Proprietary

Language independent

This tool allows users to create word lists and search natural language text files for words, phrases, and patterns. The tool is a concordance and word listing program that is able to read texts written in many languages. There are built-in alphabets for English, French, German, Polish, Greek and Russian. The tool contains an alphabet editor which you can use to create alphabets for any other language.

A help document is available.

CLARIN Centre: External
 

 

Simple Corpus Tool (SCT)

Functionality: Concordancing/querying 
Platform: Linux, MacOS, Windows

Language independent

This is a combination of an annotation and analysis tool for use with either simple XML files or basic plain-text files.

CLARIN Centre: External
 

 

Textable

Functionality: Concordancing/querying 
Platform: Linux, MacOS, Windows 
Licence: GPL3

Language independent

This is a free open source software application to analyze and process texts visually.

Support is available.

CLARIN Centre: External
 

 

Textal

Functionality: Concordancing/querying 
Platform: iPhone app

Language independent

This is a free smartphone app that allows users to analyze websites, tweet streams, and documents, as you explore the relationships between words in the text via an intuitive word cloud interface. It can generate graphs and statics, and share the data and visualizations.

CLARIN Centre: External
 

 

TextSTAT 

Functionality: Concordancing/querying 
Platform: Versions for Windows and platform-independent Python version

Language independent

This is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as the researcher wants from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file.

A quickstart guide, a user guide and video tutorial are available online.

CLARIN Centre: External
 

 

The Prime Machine 

Functionality: Concordancing/querying 
Platform: iOS, MacOS, Android, Windows 
Licence: Proprietary

Language independent

This a user-friendly corpus tool for English language teaching, linguistic analysis and self-tutoring based on the Lexical Priming theory of language.

Online support is available.

CLARIN Centre: External
 

 

The SPAADIA concordancer 

Functionality: Concordancing/querying 
Platform: Windows, MacOS

Language independent

The SPAADIA concordancer (32bit Windows version): a concordancer (mainly) for use with the SPAADIA corpus.

CLARIN Centre: External
 

 

TXM 

Functionality: Concordancing/querying 
Platform: Linux, MacOS, Windows 
Licence: GPL2

Language independent

This tool employs lexicometry (see Scholz 2019) and text statistical analysis. It offers tools and methods tested in multiple branches of the humanities and is statistically well founded.

CLARIN Centre: External
 

 

WordCruncher 

Functionality: Concordancing/querying 
Platform: Windows, iOS

Language independent

This tool offers a wide variety of tools for searching, studying, and analyzing texts.

CLARIN Centre: External
 

 

Wordless

Functionality: Concordancing/querying 
Platform: Windows, MacOS, Linux 
Licence: GPL3

Language independent

This is an integrated corpus tool with multilingual support for the study of language, literature, and translation.

The latest version (3.2.0) of Wordless supports Windows 7/8/8.1/10/11, macOS 10.11 or later, and Ubuntu 16.04 or later, all 64-bit only. Both Intel-based and M1-based Macs are supported.

The tool is available for download from GitHub.

CLARIN Centre: External
 

 

Wordsmith Tools

Functionality: Concordancing/querying 
Platform: Windows 
Licence: Proprietary

Language independent

This tool is capable of finding word patterns, and has functionalities for concordance, collocation, word lists and keywords. It is a commercial tool.

There is a dedicated Google Group for this tool.

CLARIN Centre: External
 

 

Wordstatix 

Functionality: Concordancing/querying 
Platform: Linux, Windows 
Licence: GPL3

Language independent

This is a simple concordancer.

CLARIN Centre: External