Corpus Query Tools | CLARIN ERIC

The software applications included in this resource family allow searching, exploring, analysing and visualizing linguistic corpora and texts. Text and corpus analysis lie at the heart of digital scholarship in the humanities and social sciences, and a wide range of software tools are available in this domain. These software tools represent prime examples of the ways in which language technologies can support research across a range of disciplines, and they are therefore central to CLARIN’s mission.

The resource family includes both applications for installation on the users own computer (desktop) and those accessible via a web browser (online), with some key information about them in order to help users to find them and to choose between them for a particular research goal. A 'corpus analysis tool' is defined here in the sense indicated by the late John Sinclair (and others) that the basic operations of corpus linguistics involve ‘corpus, concordance, collocation’. So we include tools that can at least: deal with a corpus, show concordances, as well as (preferably) calculating frequent collocates.

Most of the tools listed so far can do a lot more than this, including generating word frequency lists and keywords, calculating n-grams and clusters, working with linguistic annotation and descriptive metadata, and producing visualizations of distributions of words and features.

For comments, changes of the existing content or inclusion of new corpora, send us an resource-families [at] clarin.eu (email).

Corpus Query Tools in the CLARIN Infrastructure

Online Query Tools

Tool	Language	Description
Voyant tools (SADILAR) Functionality: Querying/concordancing, Stylometry Licence: GPL3 (code)	Arabic, Bosnian, Croatian, Czech, English, French, German, Hebrew, Italian, Japanese, Portuguese, Serbian, Spanish	This tool constitutes a deployment of Voyant Tools used at SADILAR. CLARIN Centre: SADiLAR
Intellitext Functionality: Querying/concordancing, corpus upload	Arabic, Czech, Chinese, English, French, German, Italian, Japanese, Kannada, Lithuanian, Portuguese, Russian, Spanish, Ukrainian	The Intelligent Tools for Creating and Analysing Electronic Text Corpora for Humanities Research (IntelliText) project aims to facilitate corpus use for academics working in various areas of the humanities. The project produced a user-friendly corpus interface with an array of easy-to-use functions that will benefit teaching and research in several academic disciplines. It is possible to upload one's own corpus with this tool. An online guide is available. CLARIN Centre: CLARIN-UK
WebClark Functionality: Querying/concordancing	Bulgarian	This is a dedicated concordancer for the Bulgarian National Reference Corpus. CLARIN Centre: CLARIN-BG
Concordancer of the Croatian National Corpus Functionality: Querying/concordancing	Croatian	This is an implementation of NoSketchEngine for the Croatian National Corpus. CLARIN Centre: CLARIN-HR
Kontext (LINDAT) Functionality: Querying/concordancing	Czech	KonText is a basic web application for querying corpora available within the LINDAT/CLARIAH-CZ project. It allows evaluation of simple and complex queries, displaying their results as concordance lines, computing frequency distribution, calculating association measures for collocations and further work with language data. This LINDAT/CLARIAH-CZ instance is a fork of KonText application developed by the Institute of the Czech National Corpus that has been further extended by the Institute of Formal and Applied Linguistics to suit the needs of LINDAT/CLARIAH-CZ project. It is possible to upload one's own corpus with this tool. KonText is openly developed. Registration is required and Shibboleth log-in is supported. CLARIN Centre: CLARIAH-CZ Publication: Machalek (2020)
Korp (Copenhagen) Functionality: Querying/concordancing	Danish	This is a web-based concordancer that can be used for corpus queries based on morphosyntactic analysis and various other features. Registration is required. CLARIN Centre: CLARIN-DK
Concordancer of Corpus Gysseling Functionality: Querying/concordancing	Dutch	This is a dedicated query tool for the Corpus Gysseling, developed by the Instituut voor de Nederlandse Taal. The backend of the application is the BlackLab Lucene-based search engine developed for corpora with token-based annotation. The web-based frontend is a further development of the corpus-frontend application developed by INT in CLARIN and CLARIAH projects. CLARIN Centre: CLARIAH-NL
Concordancer of Corpus Middelnederlands Functionality: Querying/concordancing	Dutch	This is a dedicated query tool for the Corpus Middelnederlands. CLARIN Centre: CLARIAH-NL
GrETEL 4.0 Functionality: Querying/concordancing (treebanks)	Dutch	GrETEL stands for Greedy Extraction of Trees for Empirical Linguistics. It is a user-friendly search engine for the exploitation of syntactically annotated corpora or treebanks. It is possible to upload one's own corpus with this tool. CLARIN Centre: CLARIAH-NL Publication: Odijk et al. (2018)
nederlab Functionality: Querying/concordancing	Dutch	This is an online research portal for historical texts in the Dutch language. Registration is required and Shibboleth log-in is supported. CLARIN Centre: CLARIAH-NL
OpenSoNaR Functionality: Querying/concordancing	Dutch	This is an online corpus retrieval system that allows for analyzing and searching the SoNaR and CGN corpora. Registration is required and Shibboleth log-in is supported. CLARIN Centre: CLARIAH-NL Publication: Does et al. (2017)
Couranten Functionality: Querying/concordancing	Dutch (17th Century)	This is a dedicated querying tool for the Couranten Corpus, which comprises the seventeenth-century Dutch newspapers, available on Delpher. CLARIN Centre: CLARIAH-NL
BNCweb (Lancaster) Functionality: Querying/concordancing	English	This tool is a modified version of CQPweb for the British National Corpus. It allows a number of search options: publication date, text medium, author gender, target audience, genre, author domicile. Registration is required to use the tool. CLARIN Centre: CLARIN-UK
CLiC Functionality: Querying/concordancing Licence: Use of CLiC follows the University of Birmingham’s legal policy	English	This tool has been developed as part of the CLiC Dickens project, which demonstrates through corpus stylistics how computer-assisted methods can be used to study literary texts and lead to new insights into how readers perceive fictional characters. Further literary texts have been added to the online service. Technical support is offered through clic [at] contacts.birmingham.ac.uk (email). CLARIN Centre: CLARIN-UK Publication: Mahlberg et al. (2020)
Wmatrix Functionality: Querying/concordancing, corpus upload and processing	English, Spanish	This tool provides a web interface to the English USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains. It is possible to upload one's own corpus with this tool. The tool is free for UK government and academic researchers in countries on the OECD DAC list, £50 per username per year for non commercial research and teaching. Technical support is offered here. CLARIN Centre: CLARIN-UK Publication: Rayson (2008)
CQPweb (Lancaster) Functionality: Querying/concordancing Licence: No licence	English, Arabic, French, Italian, Norwegian, Polish, Latvian	This is an online implementation of the CQPweb system with a large number of corpora installed. It is possible to upload one's own corpus with this tool. Note that CQPweb will be superseded by Ziggurat, which is under development. Registration is required to use this tool. CLARIN Centre: CLARIN-UK
Concordancer of the Text Corpus of the Institute of the Estonian Language Functionality: Querying/concordancing	Estonian	This tool provides a simple interface for a text corpus. The material for the text corpus has been collected haphazardly, 10.4 million word forms. Approximately 80% of the texts come from newspapers, which is why the corpus is not representative. The corpus also is not tagged, thus being suited for lexical search mainly. CLARIN Centre: CELR
Korp (Kielipankki) Functionality: Querying/concordancing Licence: Individual corpora have different licenses (and access conditions)	Finnish, Swedish, Russian, English, and more	This is a web-based concordance tool that can be used for corpus queries based on morphosyntactic analysis and various other features. A large proportion of the corpora in Kielipankki are offered via Korp. User support is available through email. CLARIN Centre: PORTULAN CLARIN Publication: KorP publications
COSMAS II Functionality: Querying/concordancing Licence: DeReKo-EULA	German	This tool is used for querying the German reference corpus DeReKo, as well as several other historical and non-historical corpora. Technical support is offered through cosmas2 [at] ids-mannheim.de (email). CLARIN Centre: CLARIN-D Publication: Bodmer (1996)
DWDS Functionality: Querying/concordancing	German	This is a tool for browsing DWDS corpora. The DWDS is part of the Center for Digital Lexicography of the German Language (ZDL), funded by the Federal Ministry of Education and Research. It is based at the Berlin-Brandenburg Academy of Sciences. CLARIN Centre: CLARIN-D
KorAP (on DeReKo) Functionality: Querying/concordancing Licence: DeReKo-EULA	German	This is a corpus analysis platform that is suited for large, multiply annotated corpora and complex search queries independent of particular research questions. Registration is required only for license restricted corpora. CLARIN Centre: CLARIN-D Publication: Diewald et al. (2016)
SHEBANQ Functionality: Querying/concordancing	Hebrew	This is a dedicated online environment for querying the Hebrew Bible. CLARIN Centre: CLARIAH-NL
AutoSearch Functionality: Querying/concordancing, corpus upload and analysis	Language independent	This tool allows users to upload corpora annotated at the token level for (extended) part of speech, lemma and word form in FoLiA or format, after which the corpus can be searched for these properties with a Corpus of Contemporary Dutch-like interface CLARIN Centre: CLARIAH-NL
Concordancer of the Italian Corpus for the dissemination of culture and the enhancement of the Italian literary heritage Functionality: Querying/concordancing (non-parallel and parallel)	Language independent	This tool allows text and corpora querying, supporting both basic information retrieval and advanced search. It allows the customization of the query system functionalities and provides indexing also for morpho-syntactically annotated texts. The system can handle several type of text annotations and make concordances also for parallel bilingual corpora. CLARIN Centre: CLARIN-IT
Corpuscle Functionality: Querying/concordancing	Language independent	This is a corpus management and analysis system for annotated corpora, with sophisticated query language. It is a reimplementation of Corpuscle featuring an improved user experience and many new features that is now available as a Meurer (2012)
Glossa Functionality: Querying/concordancing and text analysis Licence: Public license for software. CLARIN-licenses for corpora in Glossa	Language independent	Glossa offers a modern, simple and functional search interface with advanced post-processing possibilities for both written corpora, multilingual corpora and speech corpora. Glossa is developed at the Text Laboratory, Department of Linguistics and Scandinavian Studies, University of Oslo with support from the Norwegian contribution to the CLARIN infrastructure, CLARINO. Glossa is also freely available for download from GitHub and is easy to install on one's own server. Glossa is search engine agnostic and comes with support for the IMS Corpus Workbench and CLARIN Federated Content Search out of the box. CLARIN Centre: CLARINO Text Laboratory Centre
INESS Functionality: Querying/concordancing (treebanks)	Language independent	INESS is the Norwegian Infrastructure for the Exploration of Syntax and Semantics. INESS offers an open, interactive, language independent platform for building, accessing, searching and visualizing treebanks. INESS offers a user guide for querying their treebanks. CLARIN Centre: CLARINO Publication: INESS publications
Voyant tools Functionality: Querying/concordancing, Stylometry	Language independent	This tool constitutes a deployment of Voyant Tools at CLARIN-DK. CLARIN Centre: CLARIN-DK
Kontext at the Centre of Latvian language resources and tools Functionality: Querying/concordancing	Latvian	This tool corresponds to an implementation of LINDAT's KonText for Latvian resources. Eight Latvian corpora can be searched with this tool. CLARIN Centre: CLARIN-LV
Latvian National Corpora Collection (LNCC) Functionality: Querying/concordancing	Latvian, Latgalian, Lithuanian	Latvian National Corpora Collection (LNCC) is a diverse collection of corpora representing both written and spoken language. LNCC covers various use cases and all the important text types and genres. It is a continuous multi-institutional and multi-project effort, supported by the digital humanities and language technology communities in Latvia. Currently, 34 corpora developed by 13 institutions are available in the LNCC. Most of the corpora are annotated with a uniform morpho-syntactic annotation scheme and included in the federated search. The federated search combines multiple corpora from two corpus indexer instances (endpoints) maintained by IMCS UL and NLL. Federated search includes 28 corpora (2.4 billions tokens). CLARIN Centre: CLARIN-LV Publication: Saulite et al. (2022)
TEITOK Functionality: Querying/concordancing, corpus upload and processing	Multiple	This is a web-based system for viewing, creating, and editing corpora with both rich textual mark-up and linguistic annotation. For visitors, the system provides a graphical user interface in which the annotated document can be visualized in a number of different ways. And for administrators of the corpus, TEITOK uses the same interface to allow easy editing of the underlying XML document, meaning administrators can correct their corpus while they are consulting it. Registration is required and Shibboleth log-in is supported. User documentation is available. CLARIN Centre: CLARIAH-CZ Publication: Janssen (2016)
NB DH-LAB Functionality: Querying/concordancing/analysis	Norwegian Bokmål, Norwegian Nynorsk, Northern Sami, Lule Sami, Southern Sami	This collection of tools corresponds to a , Python package and web applications allowing a user to build corpora from the vast digital collections of the National Library of Norway (currently ca. 160 billion words). Users get concordances, frequency lists and co-occurrence data. User support is available through email. CLARIN Centre: CLARINO
CINTIL Concordancer Functionality: Querying/concordancing Licence: Proprietary	Portuguese	This is a freely available online concordancing service to support the research usage of the CINTIL Corpus. The CINTIL concordancer allows the use of patterns to specify the occurrences to be retrieved. This permits to uncover linguistic structures of high complexity and use this service as a powerful research tool. CLARIN Centre: PORTULAN CLARIN Publication: Barreto et al. (2006)
Kontext (CLARIN.SI) Functionality: Querying/concordancing Licence: No licence	Slovenian, Croatian, Bosnian, Serbian, Montenegrin, Macedonian, Serbo-Croatian, Bulgarian, Czech, Slovak, Polish, English, Danish, Dutch, Estonian, Finnish, French, Gaelic, German, Greek, Hungarian, Icelandic, Italian, Japanese, Latvian, Lithuanian, Portu	This is the CLARIN.SI installation of LINDAT's KonText, comprised of the KonText front-end developed by the Czech National Corpus team and the Manatee back-end, developed by Lexical Computing. This installation offers over 50 richly annotated corpora in Slovenian and other languages. Shibboleth log-in is supported. CLARIN Centre: CLARIN.SI
NoSKetchEngine (CLARIN.SI) Functionality: Querying/concordancing Licence: no	Slovenian, Croatian, Bosnian, Serbian, Montenegrin, Macedonian, Serbo-Croatian, Bulgarian, Czech, Slovak, Polish, English, Danish, Dutch, Estonian, Finnish, French, Gaelic, German, Greek, Hungarian, Icelandic, Italian, Japanese, Latvian, Lithuanian, Portu	This is an open-source version of the commercial Sketch Engine, produced by Lexical Computing. This installation of noSketch Engine at CLARIN.SI offers over 50 richly annotated corpora in Slovenian and other languages. CLARIN Centre: CLARIN.SI
Korp (Språkbanken) Functionality: Querying/concordancing	Swedish	This is Språkbanken's corpus tool for searching in large amounts of texts, including newspapers, novels and social media. CLARIN Centre: SWE-CLARIN Publication: Borin et al. (2012)

Desktop Tools

Tool	Language	Description
#LancsBox Functionality: Concordancing/querying Platform: Platform-independent (java) Licence: CC BY-NC-ND 4.0	Language independent	#LancsBox is a new-generation software package for the analysis of language data and corpora developed at Lancaster University. The latest version, #Lancsbox X has increased functionality for XML texts. A user guide is available in English, French and Japanese, along with instructional videos. See here. CLARIN Centre: CLARIN-UK Publication: Brezina et al. (2015)
CLAN Functionality: Concordancing/querying Platform: Windows, MacOS, Source code provided for Linux users Licence: GPL2 (source code)	Language independent	The CLAN Programs are downloaded, installed, and used as a single application. Functionally, however, CLAN has two parts. The first part is the CLAN editor which can be used to edit files in either CHAT or CA (Conversation Analysis) format. The editor also provides a wide range of additional functions, such as audio and video playback, linkage to audio and video, fonts for Roman and non-Roman orthographies, data validation, adding codes to files, and shipping data to other programs. The second part of CLAN is the set of data analysis programs. These programs are run from a separate window called the Commands window. The results of the analytic programs are sent to the CLAN Output window. The tool is only compatible with TalkBank corpora that have CHAT annotation. An online manual is available. CLARIN Centre: TalkBank
CLaRK Functionality: Concordancing/querying, corpus building Platform: Platform-independent	Language independent	This tool is an XML-based system for corpus linguistics, primarily for corpus construction, but also with functionality for analysing and exploring corpora. The support team is reachable through clark-support [at] bultreebank.org (email). A user manual is also available. CLARIN Centre: CLARIN-BG Publication: Simov et al. (2014)
GATE Functionality: Concordancing/querying Platform: Platform-independent (Windows and generic installers available) Licence: GNU	Language independent	This tool allows for text and corpus analysis. CLARIN Centre: CLARIN-UK
Q-CAT Corpus Annotation Tool 1.5 Functionality: Annotating/concordancing/querying/listening to audio recordings Platform: .NET Licence: Apache License 2.0	Language independent	The tools allows for manual linguistic annotation of corpora and advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian, such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. This resource is available for download from the CLARIN.SI repository. CLARIN Centre: CLARIN.SI Publication: Krek etal. (2020)

Corpus Query Tools Outside CLARIN

Online Query Tools

Tool	Language	Description
Voyant Tools (home) Functionality: Querying/concordancing, Stylometry Licence: GPL3 (code)	Arabic, Bosnian, Croatian, Czech, English, French, German, Hebrew, Italian, Japanese, Portuguese, Russian, Serbian, Spanish	This is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public. It is possible to upload one's own corpus with this tool. The interface is available in a number of languages. An online user guide is available. CLARIN Centre: External
Concordancer of Corpus Hedendaags Nederlands (Corpus of Contemporary Dutch) Functionality: Querying/concordancing	Dutch	This is a dedicated query tool, built on BlackLab software, for Corpus Hedendaags Nederlands (Corpus of Contemporary Dutch), a corpus of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814–2013). The corpus is a combination of the 5, 27 and 38 million word corpora and the PAROLE Corpus, supplemented with newspaper texts from NRC and De Standaard (until 2013). Registration is required for using this tool. Shibboleth log-in is supported. CLARIN Centre: External
PaQu Functionality: Querying/concordancing (treebanks)	Dutch	This is an application for searching in treebanks (i.e. text corpora in which each sentence has been assigned a syntactic structure) and for analysing the search results. It is possible to upload one's own corpus with this tool, for which registration is required. CLARIN Centre: External Publication: Odijk et al. (2017)
CoANZSE Audio Functionality: Querying/concordancing	English	This is a dedicated concordancer for the Corpus of Australian and New Zealand Spoken English. The corpus contains 195 million words of geolocated automatic speech recognition transcripts of video content from local governments in Australia and New Zealand, created for the study of lexical, grammatical, phonetic, and discourse-pragmatic phenomena of spoken language. Additionally, the corpus contains complete textual content of the corpus, audio files and forced alignments in Praat's TextGrid format for most transcripts. The corpus can be accessed through the CLARIN Service Provider Federation. CLARIN Centre: External Publication: Coats (2022)
english-corpora.org Functionality: Querying/concordancing	English	This is a tool for browsing the corpora available on english-corpora.org, which are formerly known as the BYU or Brigham Young University copora. CLARIN Centre: External Publication: English Corpora
Compleat Lexical Tutor Functionality: Querying/concordancing, corpus upload and analysis Licence: up to 1.5 million words).	English, French	This tool includes a concordancer, vocabulary profiler, exercise maker, interactive exercises, and much more. It is possible to upload one's own corpus with this tool (10 MB limit CLARIN Centre: External
SKell (SKetch Engine for language learners) Functionality: Querying/concordancing	English, Russian, German, Italian, Czech, Estonian	This is a simple tool for students and teachers of English to easily check whether or how a particular phrase or a word is used by real speakers of English. CLARIN Centre: External Publication: Baisa and Suchomel (2014)
TXM Functionality: Querying/concordancing	French, English	This tool corresponds to a number of different TXM portals running at various sites and with a number of different corpora. TXM offers online analysis tools for querying language corpora. The interface is in French. CLARIN Centre: External
CATMA Functionality: Querying/concordancing, corpus upload and analysis	German	The acronym CATMA stands forComputer Assisted Text Markup and Analysis. It is possible to upload one's own corpus with this tool. CLARIN Centre: External
Webcorp Functionality: Querying/concordancing	Language independent	This is a dedicated concordancing tool. CLARIN Centre: External
Webcorp Learn Functionality: Querying/concordancing	Language independent	This tool gives researchers access to a large collection (corpus) of newspaper articles spanning three decades. The tool has been created by linguists to encourage curiosity in language learners. WebCorp Learn promotes playful and context-based inductive learning and enables you to discover language through exploratory experimentation. Registration is required. CLARIN Centre: External
Webcorp LSE (Linguist's Search Engine) Functionality: Querying/concordancing	Language independent	This is a dedicated tool for the study of language on the web. The corpora were built by crawling the web and extracting textual content from web pages. Searches can be performed to find words, lemmas or phrases, including pattern matching, wildcards and part-of-speech. Results are given as concordance lines in KWIC format. Post-search analyses are possible including time series, collocation tables, sorting and summaries of meta-data from the matched web pages. It is possible to upload one's own corpus with this tool. Registration is required. CLARIN Centre: External
I-Analyzer Functionality: Querying/concordancing, analysis, visualizations	Multiple	I-Analyzer allows searching and exploring text corpora, visualizing trends, and downloading tables of text and metadata for further analysis. I-Analyzer is open-source software and freely available. Digital Library for Dutch Literature (DBNL), Financial reports of Dutch companies, Dutch Newspapers from the Royal Library: public dataset and full dataset (available upon request), Eighteenth Century Collections Online (available for Utrecht University users), Jewish Funerary Inscriptions, Book reviews from Goodreads, The Guardian-Observer newspaper archives (available for Utrecht University users), 19th century UK Periodicals (available for Utrecht University users), Dutch court rulings, Times newspaper archives (available for Utrecht University users), Dutch monarchs’ speeches, Dutch parliamentary debates. CLARIN Centre: External
SketchEngine Functionality: Querying/concordancing, corpus upload and processing Licence: Proprietary	Multiple	Sketch Engine is a commercial online corpus analysis application, used by linguists, lexicographers, translators, students and teachers. Sketch Engine contains 600 ready-to-use corpora in 90+ languages. It is possible to upload one's own corpus with this tool. Registration is required and Shibboleth log-in is supported. Support is offered via email. CLARIN Centre: External Publication: Sketch Engine bibliography
National Corpus of Polish (IPI PAN search engine) Functionality: Querying/concordancing	Polish	This is a dedicated concordancer for NKJP corpora. CLARIN Centre: External
National Corpus of Polish (Pelcra search engine) Functionality: Querying/concordancing	Polish	This is a dedicated concordancer for NKJP corpora. CLARIN Centre: External
Concordancer of O corpus do português Functionality: Querying/concordancing	Portuguese	This is a dedicated concordancer for the Corpus of Portuguese developed by Mark Davies. CLARIN Centre: External Publication: publications
KorAP (on CoRoLa) Functionality: Querying/concordancing	Romanian	This tool is used to query the Reference Corpus for Contemporary Romanian Language CoRoLa. CLARIN Centre: External Publication: Diewald et al. (2019)
Concordancer of the Corpus del Español Functionality: Querying/concordancing	Spanish	This is a querying tool for the corpora from Corpus del Español, which provide billions of words of recent data from 21 Spanish-speaking countries. There are four different corpora in the Corpus del Español. CLARIN Centre: External Publication: English Corpora

Desktop Tools

Tool	Language	Description
aConCorde Functionality: Concordancing/querying Platform: Platform-independent (java) Licence: No licence	Language independent	This is a multi-lingual concordance tool. Originally developed for native Arabic concordance, it posses basic concordance functionality, as well as English and Arabic interfaces. CLARIN Centre: External
Antconc Functionality: Concordancing/querying Platform: Linux, MacOS, Windows Licence: Proprietary	Language independent	This is a freeware corpus analysis toolkit for concordancing and text analysis. Online videos and manuals from the creator and community (Google Group). CLARIN Centre: External
AntPConc Functionality: Parallel Concordancing/querying Platform: Linux, MacOS, Windows	Language independent	This is a freeware parallel corpus analysis toolkit for concordancing and text analysis using UTF-8 encoded text files. CLARIN Centre: External
CasualConc Functionality: Concordancing/querying Platform: MacOS Licence: No licence	Language independent	This is a concordance program that runs natively on macOS 11.3 or later.and can generate KWIC concordance lines, word clusters, collocation analysis, and word count. CLARIN Centre: External
Collocate Functionality: Concordancing/querying Platform: Windows Licence: No licence	Language independent	This tool is a Windows software program that can be used to find collocations or terms in a corpus. It is a commercial tool. CLARIN Centre: External
ConcGram Functionality: Concordancing/querying	Language independent	This tool is a corpus linguistics software package which is specifically designed to find all the co-occurrences of words in a text or corpus irrespective of variation. This is a commercial tool, available for purchase on optical disc. CLARIN Centre: External Publication: Greaves (2009)
Coquery Functionality: Concordancing/querying Platform: Linux, MacOS, Windows Licence: GPL3	Language independent	This is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus. The tool works with any corpus, with installers for a number of widely used ones. CLARIN Centre: External
CorpKit Functionality: Concordancing/querying Platform: OSX Licence: No licence	Language independent	This is a tool for doing corpus linguistics. It enables parsing, concordancing and keywording, including concordance by searching for combinations of lexical and grammatical features, and keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses. corpkit leverages a number of sophisticated programming libraries, including pandas, matplotlib, scipy, Tkinter, tkintertable and Stanford CoreNLP. CLARIN Centre: External
Corpus Explorer Functionality: Concordancing/querying, text and data mining Platform: Windows	Language independent	This tool is intended for corpus linguistics and for text and data mining. CLARIN Centre: External
Corpus Presenter Functionality: Concordancing/querying, corpus compilation Platform: Windows Licence: No licence	Language independent	This tool can be used to compile text corpora and to carry out retrieval tasks on any corpus or selection of text files, no matter what their source or how they are organised. The tool is designed to have a maximally open architecture and can be used straight away to examine any texts users may have access to. CLARIN Centre: External Publication: Hickey (2003)
Corpus Workbench Functionality: Concordancing/querying Platform: Linux, MacOS, VM images (CQPwebinABox) Licence: GPL3	Language independent	This is a collection of open-source tools for managing and querying large text corpora (up to 2 billion words) with linguistic annotations. Its central component is the flexible and efficient query processor CQP. CLARIN Centre: External
EXAKT Functionality: Concordancing/querying Platform: Windows, MacOS, Linux with a current Java runtime environment.	Language independent	EXAKT (‘EXMARaLDA Analysis- and Concordance Tool’) is the query and analysis tool for EXMARaLDA corpora. It can also be used for corpora created with other tools (FOLKER, Transcriber, ELAN). Support is offered via the CLARIN-D Helpdesk. Manuals and how-to guides are available; there have also been training courses for EXAKT. The source code of the program is open source and accessible via GitHub. CLARIN Centre: External
ICECup Functionality: Concordancing/querying Platform: Windows Licence: Proprietary	Language independent	This is a state-of-the-art corpus exploration program designed for parsed corpora such as ICE-GB and The Diachronic Corpus of Present-Day Spoken English. This is a commercial tool that works for ICE corpora with proprietary annotation scheme. A handbook is available. CLARIN Centre: External
LIWC-22 Functionality: Concordancing/querying Platform: Windows, MacOS Licence: Proprietary	Language independent	This is a commercial product for analyzing word use. It can be used to study a single individual, groups of people over time, or all of social media. CLARIN Centre: External
Monoconc Functionality: Concordancing/querying Platform: Windows Licence: No licence	Language independent	This is a concordance programme. It is made available on a commercial basis. CLARIN Centre: External
NooJ Functionality: Concordancing/querying Platform: Windows (cut-down java version also available for other OS) Licence: GPL Academic - Non-commercial (Java version)	Language independent	This tool is part of a linguistic development environment, which includes functionality for text and corpus analysis. CLARIN Centre: External
NoSketchEngine Functionality: Concordancing/querying Platform: Linux Licence: Components available under separate licences: GPLv2+, GPLv3	Language independent	This is an open source version of Sketch Engine with certain functionality limitations (for instance, WordSketch is not available). CLARIN Centre: External
NVivo Functionality: Concordancing/querying Platform: Windows, MacOS Licence: Proprietary	Language independent	This is a commercial software application for qualitative text and data analysis. CLARIN Centre: External
ParaConc Functionality: Parallel Concordancing/querying Platform: Windows Licence: No licence	Language independent	A parallel concordance programme for aligned source and target translation texts. This is a commercial tool. CLARIN Centre: External
Praaline Functionality: Concordancing/querying, corpus building Platform: Linux, MacOS, Windows Licence: GPL3	Language independent	This is a system for managing, annotating, visualising and analysing spoken language corpora. CLARIN Centre: External
PyXMLConc Functionality: Concordancing/querying Platform: Platform-independent (requires Python) Licence: MIT	Language independent	This is a simple concordancer. It is supposed to be used in exploratory analysis of XML-annotated corpora. Its primary feature lies in the automatic detection of XML tags and attributes. The search/concordancing function supports regular expressions. CLARIN Centre: External
Scattertext Functionality: Concordancing/querying	Language independent	This is a tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding to terms are selectively labelled so that they don't overlap with other labels or points. CLARIN Centre: External
Shinyconc Functionality: Concordancing/querying, corpus building Platform: Windows, MacOS, Linux Licence: GPL3	Language independent	This is a framework for generating custom web-based concordancers. It requires R and Rstudio/Shiny. A detailed setup tutorial is available. CLARIN Centre: External
Simple Concordance Program Functionality: Concordancing/querying Platform: Linux, MacOS, Windows Licence: Proprietary	Language independent	This tool allows users to create word lists and search natural language text files for words, phrases, and patterns. The tool is a concordance and word listing program that is able to read texts written in many languages. There are built-in alphabets for English, French, German, Polish, Greek and Russian. The tool contains an alphabet editor which you can use to create alphabets for any other language. A help document is available. CLARIN Centre: External
Simple Corpus Tool (SCT) Functionality: Concordancing/querying Platform: Linux, MacOS, Windows	Language independent	This is a combination of an annotation and analysis tool for use with either simple XML files or basic plain-text files. CLARIN Centre: External
Textable Functionality: Concordancing/querying Platform: Linux, MacOS, Windows Licence: GPL3	Language independent	This is a free open source software application to analyze and process texts visually. Support is available. CLARIN Centre: External
Textal Functionality: Concordancing/querying Platform: iPhone app	Language independent	This is a free smartphone app that allows users to analyze websites, tweet streams, and documents, as you explore the relationships between words in the text via an intuitive word cloud interface. It can generate graphs and statics, and share the data and visualizations. CLARIN Centre: External
TextSTAT Functionality: Concordancing/querying Platform: Versions for Windows and platform-independent Python version	Language independent	This is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as the researcher wants from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file. A quickstart guide, a user guide and video tutorial are available online. CLARIN Centre: External
The Prime Machine Functionality: Concordancing/querying Platform: iOS, MacOS, Android, Windows Licence: Proprietary	Language independent	This a user-friendly corpus tool for English language teaching, linguistic analysis and self-tutoring based on the Lexical Priming theory of language. Online support is available. CLARIN Centre: External
The SPAADIA concordancer Functionality: Concordancing/querying Platform: Windows, MacOS	Language independent	The SPAADIA concordancer (32bit Windows version): a concordancer (mainly) for use with the SPAADIA corpus. CLARIN Centre: External
TXM Functionality: Concordancing/querying Platform: Linux, MacOS, Windows Licence: GPL2	Language independent	This tool employs lexicometry (see Scholz 2019) and text statistical analysis. It offers tools and methods tested in multiple branches of the humanities and is statistically well founded. CLARIN Centre: External
WordCruncher Functionality: Concordancing/querying Platform: Windows, iOS	Language independent	This tool offers a wide variety of tools for searching, studying, and analyzing texts. CLARIN Centre: External
Wordless Functionality: Concordancing/querying Platform: Windows, MacOS, Linux Licence: GPL3	Language independent	This is an integrated corpus tool with multilingual support for the study of language, literature, and translation. The latest version (3.2.0) of Wordless supports Windows 7/8/8.1/10/11, macOS 10.11 or later, and Ubuntu 16.04 or later, all 64-bit only. Both Intel-based and M1-based Macs are supported. The tool is available for download from GitHub. CLARIN Centre: External
Wordsmith Tools Functionality: Concordancing/querying Platform: Windows Licence: Proprietary	Language independent	This tool is capable of finding word patterns, and has functionalities for concordance, collocation, word lists and keywords. It is a commercial tool. There is a dedicated Google Group for this tool. CLARIN Centre: External
Wordstatix Functionality: Concordancing/querying Platform: Linux, Windows Licence: GPL3	Language independent	This is a simple concordancer. CLARIN Centre: External