L2 Learner Corpora | CLARIN ERIC

L2 learner corpora play a crucial role in second language research and pedagogy, allowing for a systematic study of how learners of a second language acquire the new language on a lexical as well as syntactic level, and how it is influenced by their native language. A special characteristic of this type of corpora are the markup of errors and prosodic features of the learners.

The CLARIN infrastructure provides access to 75 L2 learner corpora. 14 corpora are multilingual, while the rest provide written, spoken and even videotaped forms of monolingual L2 data in the following languages: Arabic, Czech, English, Finnish, French, German, Hungarian, Icelandic, Italian, Mandarin, Norwegian, Spanish, and Swedish. Many of these corpora are available through public licences.

We first provide overviews of the corpora that are already part of the CLARIN infrastructure and then list those that have not yet been integrated.

For comments, changes of the existing content or inclusion of new corpora, send us an resource-families [at] clarin.eu (email).

Monolingual L2 learner corpora in the CLARIN infrastructure

Written corpora

Corpus	Language	Description	Availability
CzeSL – Czech as a Second Language Size: 0.9 million words Annotation: tokenised, PoS-tagged, lemmatised, error labels Licence: CC-BY	Czech	This corpus contains essays written in 2013 by learners from 54 L1 backgrounds. The corpus is available for download from LINDAT. For the relevant publication, see Rosen (2016).	Download
British Academic Written English Corpus Size: 2761 texts Licence: CC-BY	English	This is primarily a L1 corpus although it also contains L2 texts. The corpus is available for download from the University of Oxford Text Archive.	Download
CORYL (Corpus of Young Learner Language) Size: 191,568 tokens Annotation: tokenised, anonymised, error labels, linked to CEFR levels Licence: CC-BY	English	This corpus contains English texts written yb Norwegian primary school pupils (7th, 10th, and 11th grade). The corpus is available through the Browse Corpuscle provided by CLARINO.	Browse
ETS Corpus of Non-Native Written English Size: 12,100 essays (1100 / language) Licence: restricted	English	This corpus contains texts written by learners from 11 L1 backgrounds as part of an international text of academic English proficiency. Prompts as well as proficiency level are part of the metadata. The corpus is available for download from the LDC catalogue.	Download
ICLE International Corpus of Learner English Size: 3 million words	English	This corpus contains texts written by learners of English from 14 L1 backgrounds. The corpus can be
The Hanken Corpus of Academic Writing Size: 500,000 words Licence: CC-BY	English	This corpus contains academic texts written by Finnish and Swedish native speakers. The corpus is still under development.
The Uppsala Student English corpus Size: 1.2 million tokens Annotation: tokenised Licence: CC-BY	English	This corpus contains essays written during the first three semesters of English studies at Uppsala University; most of the essays were written during the first semester. The corpus contains text files, each with a student ID and text ID including the course level, and information about the different prompts are available. The corpus is available for download from the University of Oxford Text Archive.	Download
International Corpus of Learner Finnish (ICLFI) Corpus Size: 1 million words Annotation: MSD-tagged Licence: CLARIN RES	Finnish	This corpus contains fictional (e.g., letters, narratives) and non-fictional (e.g., essays) texts. The corpus provides information on a large number of variables concerning the linguistic background of the learner, the learning task, the learning context, etc. It is available through the Browse Korp. For the relevant publication, see Jantunen (2011).	Browse
Testipiste Corpus Size: 840,000 tokens Annotation: tokenised Licence: CLARIN RES	Finnish	This corpus contains essays written by adult migrants from various L1 backgrounds. The corpus will be made available through the Browse Korp.
The Advanced Finnish Learners’ Corpus Size: 288,000 tokens Annotation: tokenised, MSD-tagged, lemmatised Licence: CLARIN RES	Finnish	This corpus contains academic texts written by MA students and collected in 2009. The corpus consists of two subcorpora - The Exam Essays Subcorpus and the Course Papers Subcorpus, both of which are also available through Korp.	Browse Download Download
Commented Learner Corpus Academic Writing Size: 853 texts Licence: CC BY-NC-SA 3.0	German	This corpus contains texts written by students at the University of Hamburg from various L1 backgrounds. The corpus is available for download through the repository of the University of Hamburg.	Download
ASK – Norsk andrespråkskorpus Size: 618,000 tokens Annotation: tokenised, PoS-tagged, errors Licence: CLARIN RES	Norwegian	This corpus contains essays and tests written by students from 10 L1 backgrounds. It also contains L1 control essays. The corpus is available through a dedicated Browse provided by CLARINO.	Browse
Slovene learner corpus KOST 1.0 Size: 6311 texts, 1 million words Annotation: annotated with rich author and text metadata Licence: CC BY-SA 4.0	Slovenian	This corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,311 texts (just over 1 million words) written by adult speakers for whom Slovene is not their first language. This corpus offers insights into Slovene language as produced by those who are still learning it as a second or foreign language, and in particular into the most common errors that occur in this process. KOST therefore aims at all those working with Slovene as a second or foreign language. The texts were mainly written at lectorates and Slovene as a L2/FL courses. Most of the authors of these texts speak Serbian, Bosnian and Macedonian as their first language, but texts by speakers of other languages are also included. The authors are at different proficiency levels in Slovene, from beginners to advanced. For each contributor, information is available on gender, year of birth, country, first language and other languages they speak, employment status and education, and prior experience of learning Slovene. For each text, there is also information on the time and circumstances of creation (exam or homework), the programme in which it was produced, input type (digital or hand-written), language level and the grade. A part of the corpus has also texts available in their corrected version which can be access also through concordancers (noSketchEngine or KonText). The tokens of the original and corrected texts are linked (one group of link per paragraph) and the links categorised into 23 error types. The corpus is available for download from the Slovenian repository CLARIN.SI and can be queried through noSketchEngine and KonText concordancers. For the relevant publication, see Stritar Kučuk (2022)	Concordancer (noSketchEngine) Concordancer (KonText) Download
FinSveStud 79-80 Size: 175,000 tokens Annotation: tokenised, lemmatised Licence: CLARIN RES	Swedish	This corpus contains texts written by students with Finnish as their L1 background. The corpus is available through the Browse Korp.	Browse
SpIn Size: 46,911 tokens; 4,302 sentences Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms Licence: CC-BY	Swedish	This corpus contains essays from a Language Introduction course for newly arrived students (256 essays; 166 students, some of whom are recurrent) – i.e., course preparation for Swedish upper-intermediate school (gymnasium-level). It is a subcorpus of the SweLL-pilot corpus. Aside from the automatic linguistic annotation, the corpus is manually annotated for CEFR labels (A1-B2). See the metadata description for further details on the automatic and manual annotation. The corpus is available through the Browse Korp and for download in Språkbanken Text / the SweLL infrastructure through an individual application form. For the relevant publication, see Volodina et al. (2016).	Browse (Korp) Online (application)
SW1203-essays Size: 52,528 tokens; 3,145 sentences Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms Licence: CC-BY	Swedish	This corpus contains essays from a preparatory university course with three essays written by (almost) all students: (1) entrance essay, (2) mid-term essay; (3) fnal exam essay; (4) final exam retake for some students. The corpus is longitudinal in a way. It is a subcorpus of the SweLL-pilot corpus. Aside from the automatic linguistic annotation, the corpus is manually annotated for CEFR labels (B1-C2). See the metadata description for further details on the automatic and manual annotation. The corpus is available for download from the Språkbanken Resource List, through the Browse Korp, and for download through in Språkbanken Text / the SweLL infrastructure through an individual application form. For the relevant publication, see Volodina et al. (2016).	Browse (Korp) Online (application) Download
SweLL-gold Size: 147,842 tokens (original version), 151,851 (normalized version); 7,807 sentences (original), 8,137 sentences (normalized) Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms Licence: CC-BY	Swedish	This corpus contains essays from various education establishments in Sweden for non-Swedish speaking adult learners. Aside from the automatic linguistic annotation, the corpus is manually annotated at the following levels: pseudonymization, normalization, and correction annotation. See the metadata description for further details on the automatic and manual annotation. While the SweLL-pilot corpus was collected in 2006–2016, SweLL-gold was collected in 2017–2020. The corpus is available through the Browse Korp and for download in Språkbanken Text / the SweLL infrastructure through an individual application form. For the relevant publication, see Volodina et al. (2019).	Browse (original) Browse (normalized) Online (application)
Tisus corpus Size: 60,632 tokens; 3,422 sentences Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms Licence: CC-BY	Swedish	This corpus contains essays from a test situation written by adult learners (105 essays, 105 sutdents; one essay per student). The essays are argumentative on the topic of stress, written at an advanced level. This is a subcorpus of the SweLL-pilot corpus. Aside from the automatic linguistic annotation, the corpus is manually annotated for CEFR labels (B2-C1). See the metadata description for further details on the automatic and manual annotation. The corpus is available for download from Språkbanken, through the Browse Korp, and in Språkbanken Text / the SweLL infrastructure through an individual application form. For the relevant publication, see Volodina et al. (2016).	Browse (Korp) Online (application) Download

Corpus

Language

Description

Availability

CzeSL – Czech as a Second Language

Size: 0.9 million words
Annotation: tokenised, PoS-tagged, lemmatised, error labels
Licence: CC-BY

Czech

This corpus contains essays written in 2013 by learners from 54 L1 backgrounds.

The corpus is available for download from LINDAT.

For the relevant publication, see Rosen (2016).

Download

British Academic Written English Corpus

Size: 2761 texts
Licence: CC-BY

English

This is primarily a L1 corpus although it also contains L2 texts.

The corpus is available for download from the University of Oxford Text Archive.

Download

CORYL (Corpus of Young Learner Language)

Size: 191,568 tokens
Annotation: tokenised, anonymised, error labels, linked to CEFR levels
Licence: CC-BY

English

This corpus contains English texts written yb Norwegian primary school pupils (7th, 10th, and 11th grade).

The corpus is available through the Browse Corpuscle provided by CLARINO.

Browse

ETS Corpus of Non-Native Written English

Size: 12,100 essays (1100 / language)
Licence: restricted

English

This corpus contains texts written by learners from 11 L1 backgrounds as part of an international text of academic English proficiency. Prompts as well as proficiency level are part of the metadata.

The corpus is available for download from the LDC catalogue.

Download

ICLE International Corpus of Learner English

Size: 3 million words

English

This corpus contains texts written by learners of English from 14 L1 backgrounds.

The corpus can be

The Hanken Corpus of Academic Writing

Size: 500,000 words
Licence: CC-BY

English

This corpus contains academic texts written by Finnish and Swedish native speakers.

The corpus is still under development.

The Uppsala Student English corpus

Size: 1.2 million tokens
Annotation: tokenised
Licence: CC-BY

English

This corpus contains essays written during the first three semesters of English studies at Uppsala University; most of the essays were written during the first semester. The corpus contains text files, each with a student ID and text ID including the course level, and information about the different prompts are available.

The corpus is available for download from the University of Oxford Text Archive.

Download

International Corpus of Learner Finnish (ICLFI) Corpus

Size: 1 million words
Annotation: MSD-tagged
Licence: CLARIN RES

Finnish

This corpus contains fictional (e.g., letters, narratives) and non-fictional (e.g., essays) texts.

The corpus provides information on a large number of variables concerning the linguistic background of the learner, the learning task, the learning context, etc. It is available through the Browse Korp.

For the relevant publication, see Jantunen (2011).

Browse

Testipiste Corpus

Size: 840,000 tokens
Annotation: tokenised
Licence: CLARIN RES

Finnish

This corpus contains essays written by adult migrants from various L1 backgrounds.

The corpus will be made available through the Browse Korp.

The Advanced Finnish Learners’ Corpus

Size: 288,000 tokens
Annotation: tokenised, MSD-tagged, lemmatised
Licence: CLARIN RES

Finnish

This corpus contains academic texts written by MA students and collected in 2009.

The corpus consists of two subcorpora - The Exam Essays Subcorpus and the Course Papers Subcorpus, both of which are also available through Korp.

Browse

Download

Commented Learner Corpus Academic Writing

Size: 853 texts
Licence: CC BY-NC-SA 3.0

German

This corpus contains texts written by students at the University of Hamburg from various L1 backgrounds.

The corpus is available for download through the repository of the University of Hamburg.

Download

ASK – Norsk andrespråkskorpus

Size: 618,000 tokens
Annotation: tokenised, PoS-tagged, errors
Licence: CLARIN RES

Norwegian

This corpus contains essays and tests written by students from 10 L1 backgrounds. It also contains L1 control essays.

The corpus is available through a dedicated Browse provided by CLARINO.

Browse

Slovene learner corpus KOST 1.0

Size: 6311 texts, 1 million words
Annotation: annotated with rich author and text metadata
Licence: CC BY-SA 4.0

Slovenian

This corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,311 texts (just over 1 million words) written by adult speakers for whom Slovene is not their first language. This corpus offers insights into Slovene language as produced by those who are still learning it as a second or foreign language, and in particular into the most common errors that occur in this process. KOST therefore aims at all those working with Slovene as a second or foreign language. The texts were mainly written at lectorates and Slovene as a L2/FL courses.

Most of the authors of these texts speak Serbian, Bosnian and Macedonian as their first language, but texts by speakers of other languages are also included. The authors are at different proficiency levels in Slovene, from beginners to advanced. For each contributor, information is available on gender, year of birth, country, first language and other languages they speak, employment status and education, and prior experience of learning Slovene. For each text, there is also information on the time and circumstances of creation (exam or homework), the programme in which it was produced, input type (digital or hand-written), language level and the grade. A part of the corpus has also texts available in their corrected version which can be access also through concordancers (noSketchEngine or KonText). The tokens of the original and corrected texts are linked (one group of link per paragraph) and the links categorised into 23 error types.

The corpus is available for download from the Slovenian repository CLARIN.SI and can be queried through noSketchEngine and KonText concordancers.

For the relevant publication, see Stritar Kučuk (2022)

Concordancer (noSketchEngine)

Concordancer (KonText)

Download

FinSveStud 79-80

Size: 175,000 tokens
Annotation: tokenised, lemmatised
Licence: CLARIN RES

Swedish

This corpus contains texts written by students with Finnish as their L1 background.

The corpus is available through the Browse Korp.

Browse

SpIn

Size: 46,911 tokens; 4,302 sentences
Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms
Licence: CC-BY

Swedish

This corpus contains essays from a Language Introduction course for newly arrived students (256 essays; 166 students, some of whom are recurrent) – i.e., course preparation for Swedish upper-intermediate school (gymnasium-level). It is a subcorpus of the SweLL-pilot corpus.

Aside from the automatic linguistic annotation, the corpus is manually annotated for CEFR labels (A1-B2). See the metadata description for further details on the automatic and manual annotation.

The corpus is available through the Browse Korp and for download in Språkbanken Text / the SweLL infrastructure through an individual application form.

For the relevant publication, see Volodina et al. (2016).

Browse (Korp)

Online (application)

SW1203-essays

Size: 52,528 tokens; 3,145 sentences
Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms
Licence: CC-BY

Swedish

This corpus contains essays from a preparatory university course with three essays written by (almost) all students: (1) entrance essay, (2) mid-term essay; (3) fnal exam essay; (4) final exam retake for some students. The corpus is longitudinal in a way. It is a subcorpus of the SweLL-pilot corpus.

Aside from the automatic linguistic annotation, the corpus is manually annotated for CEFR labels (B1-C2). See the metadata description for further details on the automatic and manual annotation.

The corpus is available for download from the Språkbanken Resource List, through the Browse Korp, and for download through in Språkbanken Text / the SweLL infrastructure through an individual application form.

For the relevant publication, see Volodina et al. (2016).

Size: 147,842 tokens (original version), 151,851 (normalized version); 7,807 sentences (original), 8,137 sentences (normalized)
Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms
Licence: CC-BY

Swedish

This corpus contains essays from various education establishments in Sweden for non-Swedish speaking adult learners.

Aside from the automatic linguistic annotation, the corpus is manually annotated at the following levels: pseudonymization, normalization, and correction annotation. See the metadata description for further details on the automatic and manual annotation. While the SweLL-pilot corpus was collected in 2006–2016, SweLL-gold was collected in 2017–2020.

The corpus is available through the Browse Korp and for download in Språkbanken Text / the SweLL infrastructure through an individual application form.

For the relevant publication, see Volodina et al. (2019).

Size: 60,632 tokens; 3,422 sentences
Annotation: tokenised, PoS-tagged, MSD-tagged, lemgrams, compounds word forms
Licence: CC-BY

Swedish

This corpus contains essays from a test situation written by adult learners (105 essays, 105 sutdents; one essay per student). The essays are argumentative on the topic of stress, written at an advanced level. This is a subcorpus of the SweLL-pilot corpus.

Aside from the automatic linguistic annotation, the corpus is manually annotated for CEFR labels (B2-C1). See the metadata description for further details on the automatic and manual annotation.

The corpus is available for download from Språkbanken, through the Browse Korp, and in Språkbanken Text / the SweLL infrastructure through an individual application form.

For the relevant publication, see Volodina et al. (2016).

Browse (Korp)

Online (application)

Download

Spoken corpora

Corpus	Language	Description	Availability
The Dresden Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Czech	The corpus contains speech recordings of ~32 German children learning Czech (type of study: interview). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Kubanek-German (2000)	Browse Download
The Anglish Corpus Annotation: interpausal units Licence: CLARIN RES	English	This corpus contains various speech tasks performed by French native speakers and the associated transcriptions. The corpus is available for download from Ortolang.	Download
The Barcelona English Language Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of Spanish children and teenagers learning English in Barcelona. across 4 tasks (written composition, oral narrative, oral interview and role play). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Muñoz (2006)	Browse Download
The Barraja-Rohan Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of adult international students who spoke English as a second language and who had newly arrived at an Australian university. These undergraduate international students from various Asian backgrounds interacted over a period of seven months with Australian graduate students who were native speakers of English. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Barraja-Rohan (2013)	Browse Download
The Connolly Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	The corpus contains speech recordings of 60 Japanese high school students learning English. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank.	Browse Download
The CUHK corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of 6 children learning English in Hong Kong. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see MacWhinney (2016)	Browse Download
The Dresden Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	The corpus contains speech recordings of ~32 German children learning English (type of study: interview). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Kubanek-German (2000)	Browse Download
GLBCC (Giessen - Long Beach Chaplin Corpus) Size: 2472 words/transcript Licence: CC-BY	English	This corpus contains film retellings performed by English and German native speakers. The corpus is available for download from the University of Oxford Text archive.	Download
A Learners' Corpus of Reading Texts Licence: CLARIN RES	English	This corpus contains unprepared readings by first-year students at an English department who speak French as a native language. The corpus is available for download from Ortolang.	Download
The Markee Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of 3 students learning English as a second language. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Markee (2000)	Browse Download
The PAROLE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of 95 students learning English in France (type of study: tasks/storytelling). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Hilton (2009)	Browse Download
The QATAR Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains recorded interviews involving 19 Qatari learners of English. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Zhao and MacWhinney (2010)	Browse Download
The Vercellotti Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains recordings of adult learners entering an Intensive English Program (IEP) in the United States during the year 2010. Tasks include 2 minute monologues. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Vercellotti (2017)	Browse Download
The Dresden Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	The corpus contains speech recordings of ~32 German children learning French (type of study: interview). The corpus is a part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Kubanek-German (2000)	Browse Download
French Learner Language Oral Corpora (FLLOC) Size: 1375 transcripts Annotation: MSD-tagged Licence: CC-BY	French	This corpus contains various narrative and interactive speech tasks performed by English and Dutch native speakers. The corpus is available for download from the University of Oxford Text Archive. The transcripts and audio files can also be downloaded and browsed through through TalkBank.	Download
The LANGSNAP Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains speech recordings of 28 British undergraduates learning French before, during and after a year abroad. Tasks include oral interviews and and story retellings, aside from argumentative writing tasks. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The LANGSNAP3 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus is a 3-year follow up to the LANGSNAP corpus, involving 18 participants. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The Newcastle Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains intermediate level spoken French from 17-18 year old second language learners, in years 12 to 13 of UK secondary education. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank.	Browse Download
The PAROLE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains speech recordings of 40 students learning French in France as a second language (type of study: tasks/storytelling). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Hilton (2009)	Browse Download
The Trinity College (TCD) Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains recordings of 5 children (2 Irish, 1 Polish, 2 Cambodian) learning French in a school in France. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank.	Browse Download
The Reading Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains oral proficiency interviews with 34 16-year-olds learning French in South Wales. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Chambers and Richards (1995)	Browse Download
The UWI Corpus Size: 15,068 tokens Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus consists of 25 recorded interviews with learners of French (9 adult learners) in Jamaica. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Péters (2017)	Browse Download
The Dimroth SLA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	German	The corpus contains speech recordings of 47 students learning German (type of study: interview). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Dimroth (2008)	Browse Download
Hamburg Modern Times Corpus Size: 24,000 words Annotation: prosody Licence: CLARIN RES	German	This corpus contains film retellings and the accompanying transcriptions. The corpus is available for download from the HZSK CLARIN-D repository.	Download
The RyanDan Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	German	The corpus contains recordings of 4 Carnegie Mellon University students learning German. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Walter (2020)	Browse Download
The VYSA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	German	This corpus contains recordings of 3 highschool students learning German abroad while living with German-speaking host families and attending German secondary schools in standard German-speaking urban and peri-urban regions of Germany. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Young-Scholten and Langer (2015)	Browse Download
The Theodórsdóttir corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Icelandic	The corpus contains recordings obtained in a longitudinal case study of L2 Icelandic. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Theodórsdóttir (2018)	Browse Download
The PAROLE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Italian	This corpus contains speech recordings of 95 students learning Italian in France (type of study: tasks/storytelling). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Hilton (2009)	Browse Download
The COPA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Mandarin	This corpus contains speech recordings of ~120 college students learning Mandarin in Hong Kong (type of study: responses to questions). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Zhang (2009)	Browse Download
The HKPU Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Mandarin	This corpus contains speech recordings of 20 college students learning Mandarin in Hong Kong. The tasks involve oral interviews. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Chang et al. (2013)	Browse Download
LANGMAN Size: 11 subcorpora Annotation: error coding Licence: CC-BY	Hungarian	This corpus is a spoken corpus involving Chinese native speakers who learn Hungarian as a second language. The subcorpora are available for download from and browsing through the TalkBank.	Browse Download
The BCN-L2 Corpus Annotation: error coding Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of Berber students learning Spanish. The participants were 88 native speakers of Moroccan Arabic (Darija) and 26 speakers of Berber (Amazigh) living in Catalonia. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Bet et al. (2016)	Browse Download
The Díaz Rodríguez Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of Indoeuropean and Asian Learners, both semi-spontaneous and experimental, obtained in Barcelona, Spain (type of study: naturalistic, longitudinal). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Díaz (2002)	Browse Download
The LANGSNAP Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of 27 British undergraduates learning Spanish before, during and after a year abroad. Tasks include oral interviews and and story retellings, aside from argumentative writing tasks. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The LANGSNAP3 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus is a 3-year follow-up to the LANGSNAP Corpus, involving 33 participants. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The Liceras Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of 11 students learning Spanish as a second language. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Liceras et al. (1999)	Browse Download
The Nebrija-CORELE-UA Corpus Size: 1 hour 27 minutes, 10,292 words Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains 10 recorded interviews involving students of Spanish as a Foreign Language have at the University of Alicante, in Alicante. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Medina Soler (2017)	Browse Download
The Nebrija-INMIGRA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus consists of oral interviews carried out in the context of the LETRA test of Spanish for immigrant workers. It is made up of semi-guided interviews carried out in Spanish which last approximately 10 minutes each. The participants are immigrants from 11 different countries who live in the Autonomous Community of Madrid (Spain). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Liceras (2017)	Browse Download
The Nebrija-OAP Corpus Size: 9 hours 19 minutes, 49,718 words Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains 67 videotaped presentations involving 95 North American students of Spanish as a Foreign Language at Nebrija University in Madrid. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Vergara Padilla (2017)	Browse Download
The Nebrija-WOCAE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains recordings of emails written and read by 28 Chinese students. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Vergara Padilla (2017)	Browse Download
The Nicolás Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains recordings of 2 two children from Morocco learning Spanish in Spain learning Spanish. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see de Benito (2016)	Browse Download
The SPLLOC1 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains recordings of L2 Spanish in a classroom context. There were 20 learners, all of whom were English native speakers, at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students (A2 students aged 17-18), and fourth year undergraduates. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Mitchell et al. (2008)	Browse Download
The SPLLOC2 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus is an extension of the SPLLOC1 Corpus. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Mitchell et al. (2008)	Browse Download
The Dresden Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Czech	The corpus contains speech recordings of ~32 German children learning Czech (type of study: interview). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Kubanek-German (2000)	Browse Download
The Anglish Corpus Annotation: interpausal units Licence: CLARIN RES	English	This corpus contains various speech tasks performed by French native speakers and the associated transcriptions. The corpus is available for download from Ortolang.	Download
The Barcelona English Language Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of Spanish children and teenagers learning English in Barcelona. across 4 tasks (written composition, oral narrative, oral interview and role play). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Muñoz (2006)	Browse Download
The Barraja-Rohan Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of adult international students who spoke English as a second language and who had newly arrived at an Australian university. These undergraduate international students from various Asian backgrounds interacted over a period of seven months with Australian graduate students who were native speakers of English. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Barraja-Rohan (2013)	Browse Download
The Connolly Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	The corpus contains speech recordings of 60 Japanese high school students learning English. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank.	Browse Download
The CUHK corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of 6 children learning English in Hong Kong. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see MacWhinney (2016)	Browse Download
The Dresden Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	The corpus contains speech recordings of ~32 German children learning English (type of study: interview). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Kubanek-German (2000)	Browse Download
GLBCC (Giessen - Long Beach Chaplin Corpus) Size: 2472 words/transcript Licence: CC-BY	English	This corpus contains film retellings performed by English and German native speakers. The corpus is available for download from the University of Oxford Text archive.	Download
A Learners' Corpus of Reading Texts Licence: CLARIN RES	English	This corpus contains unprepared readings by first-year students at an English department who speak French as a native language. The corpus is available for download from Ortolang.	Download
The Markee Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of 3 students learning English as a second language. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Markee (2000)	Browse Download
The PAROLE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains speech recordings of 95 students learning English in France (type of study: tasks/storytelling). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Hilton (2009)	Browse Download
The QATAR Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains recorded interviews involving 19 Qatari learners of English. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Zhao and MacWhinney (2010)	Browse Download
The Vercellotti Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	English	This corpus contains recordings of adult learners entering an Intensive English Program (IEP) in the United States during the year 2010. Tasks include 2 minute monologues. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Vercellotti (2017)	Browse Download
The Dresden Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	The corpus contains speech recordings of ~32 German children learning French (type of study: interview). The corpus is a part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Kubanek-German (2000)	Browse Download
French Learner Language Oral Corpora (FLLOC) Size: 1375 transcripts Annotation: MSD-tagged Licence: CC-BY	French	This corpus contains various narrative and interactive speech tasks performed by English and Dutch native speakers. The corpus is available for download from the University of Oxford Text Archive. The transcripts and audio files can also be downloaded and browsed through through TalkBank.	Download
The LANGSNAP Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains speech recordings of 28 British undergraduates learning French before, during and after a year abroad. Tasks include oral interviews and and story retellings, aside from argumentative writing tasks. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The LANGSNAP3 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus is a 3-year follow up to the LANGSNAP corpus, involving 18 participants. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The Newcastle Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains intermediate level spoken French from 17-18 year old second language learners, in years 12 to 13 of UK secondary education. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank.	Browse Download
The PAROLE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains speech recordings of 40 students learning French in France as a second language (type of study: tasks/storytelling). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Hilton (2009)	Browse Download
The Trinity College (TCD) Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains recordings of 5 children (2 Irish, 1 Polish, 2 Cambodian) learning French in a school in France. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank.	Browse Download
The Reading Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus contains oral proficiency interviews with 34 16-year-olds learning French in South Wales. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Chambers and Richards (1995)	Browse Download
The UWI Corpus Size: 15,068 tokens Annotation: audio/transcription linking Licence: public (acknowledgment required)	French	This corpus consists of 25 recorded interviews with learners of French (9 adult learners) in Jamaica. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Péters (2017)	Browse Download
The Dimroth SLA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	German	The corpus contains speech recordings of 47 students learning German (type of study: interview). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Dimroth (2008)	Browse Download
Hamburg Modern Times Corpus Size: 24,000 words Annotation: prosody Licence: CLARIN RES	German	This corpus contains film retellings and the accompanying transcriptions. The corpus is available for download from the HZSK CLARIN-D repository.	Download
The RyanDan Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	German	The corpus contains recordings of 4 Carnegie Mellon University students learning German. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Walter (2020)	Browse Download
The VYSA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	German	This corpus contains recordings of 3 highschool students learning German abroad while living with German-speaking host families and attending German secondary schools in standard German-speaking urban and peri-urban regions of Germany. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Young-Scholten and Langer (2015)	Browse Download
The Theodórsdóttir corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Icelandic	The corpus contains recordings obtained in a longitudinal case study of L2 Icelandic. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Theodórsdóttir (2018)	Browse Download
The PAROLE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Italian	This corpus contains speech recordings of 95 students learning Italian in France (type of study: tasks/storytelling). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Hilton (2009)	Browse Download
The COPA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Mandarin	This corpus contains speech recordings of ~120 college students learning Mandarin in Hong Kong (type of study: responses to questions). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Zhang (2009)	Browse Download
The HKPU Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Mandarin	This corpus contains speech recordings of 20 college students learning Mandarin in Hong Kong. The tasks involve oral interviews. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Chang et al. (2013)	Browse Download
LANGMAN Size: 11 subcorpora Annotation: error coding Licence: CC-BY	Hungarian	This corpus is a spoken corpus involving Chinese native speakers who learn Hungarian as a second language. The subcorpora are available for download from and browsing through the TalkBank.	Browse Download
The BCN-L2 Corpus Annotation: error coding Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of Berber students learning Spanish. The participants were 88 native speakers of Moroccan Arabic (Darija) and 26 speakers of Berber (Amazigh) living in Catalonia. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Bet et al. (2016)	Browse Download
The Díaz Rodríguez Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of Indoeuropean and Asian Learners, both semi-spontaneous and experimental, obtained in Barcelona, Spain (type of study: naturalistic, longitudinal). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Díaz (2002)	Browse Download
The LANGSNAP Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of 27 British undergraduates learning Spanish before, during and after a year abroad. Tasks include oral interviews and and story retellings, aside from argumentative writing tasks. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The LANGSNAP3 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus is a 3-year follow-up to the LANGSNAP Corpus, involving 33 participants. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Tracy-Ventura and Huensch (2018)	Browse Download
The Liceras Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains speech recordings of 11 students learning Spanish as a second language. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Liceras et al. (1999)	Browse Download
The Nebrija-CORELE-UA Corpus Size: 1 hour 27 minutes, 10,292 words Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains 10 recorded interviews involving students of Spanish as a Foreign Language have at the University of Alicante, in Alicante. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Medina Soler (2017)	Browse Download
The Nebrija-INMIGRA Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus consists of oral interviews carried out in the context of the LETRA test of Spanish for immigrant workers. It is made up of semi-guided interviews carried out in Spanish which last approximately 10 minutes each. The participants are immigrants from 11 different countries who live in the Autonomous Community of Madrid (Spain). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Liceras (2017)	Browse Download
The Nebrija-OAP Corpus Size: 9 hours 19 minutes, 49,718 words Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains 67 videotaped presentations involving 95 North American students of Spanish as a Foreign Language at Nebrija University in Madrid. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Vergara Padilla (2017)	Browse Download
The Nebrija-WOCAE Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains recordings of emails written and read by 28 Chinese students. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Vergara Padilla (2017)	Browse Download
The Nicolás Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains recordings of 2 two children from Morocco learning Spanish in Spain learning Spanish. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see de Benito (2016)	Browse Download
The SPLLOC1 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus contains recordings of L2 Spanish in a classroom context. There were 20 learners, all of whom were English native speakers, at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students (A2 students aged 17-18), and fourth year undergraduates. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Mitchell et al. (2008)	Browse Download
The SPLLOC2 Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required)	Spanish	This corpus is an extension of the SPLLOC1 Corpus. The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank. For the relevant publication, see Mitchell et al. (2008)	Browse Download

Corpus

Language

Description

Availability

The Dresden Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Czech

The corpus contains speech recordings of ~32 German children learning Czech (type of study: interview).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Kubanek-German (2000)

Browse

Download

The Anglish Corpus

Annotation: interpausal units
Licence: CLARIN RES

English

This corpus contains various speech tasks performed by French native speakers and the associated transcriptions.

The corpus is available for download from Ortolang.

Download

The Barcelona English Language Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of Spanish children and teenagers learning English in Barcelona. across 4 tasks (written composition, oral narrative, oral interview and role play).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Muñoz (2006)

Browse

Download

The Barraja-Rohan Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of adult international students who spoke English as a second language and who had newly arrived at an Australian university. These undergraduate international students from various Asian backgrounds interacted over a period of seven months with Australian graduate students who were native speakers of English.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Barraja-Rohan (2013)

Browse

Download

The Connolly Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

The corpus contains speech recordings of 60 Japanese high school students learning English.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

Browse

Download

The CUHK corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of 6 children learning English in Hong Kong.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see MacWhinney (2016)

Browse

Download

The Dresden Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

The corpus contains speech recordings of ~32 German children learning English (type of study: interview).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Kubanek-German (2000)

Browse

Download

GLBCC (Giessen - Long Beach Chaplin Corpus)

Size: 2472 words/transcript
Licence: CC-BY

English

This corpus contains film retellings performed by English and German native speakers.

The corpus is available for download from the University of Oxford Text archive.

Download

A Learners' Corpus of Reading Texts

Licence: CLARIN RES

English

This corpus contains unprepared readings by first-year students at an English department who speak French as a native language.

The corpus is available for download from Ortolang.

Download

The Markee Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of 3 students learning English as a second language.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Markee (2000)

Browse

Download

The PAROLE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of 95 students learning English in France (type of study: tasks/storytelling).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Hilton (2009)

Browse

Download

The QATAR Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains recorded interviews involving 19 Qatari learners of English.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Zhao and MacWhinney (2010)

Browse

Download

The Vercellotti Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains recordings of adult learners entering an Intensive English Program (IEP) in the United States during the year 2010. Tasks include 2 minute monologues.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Vercellotti (2017)

Browse

Download

The Dresden Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

The corpus contains speech recordings of ~32 German children learning French (type of study: interview).

The corpus is a part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Kubanek-German (2000)

Browse

Download

French Learner Language Oral Corpora (FLLOC)

Size: 1375 transcripts
Annotation: MSD-tagged
Licence: CC-BY

French

This corpus contains various narrative and interactive speech tasks performed by English and Dutch native speakers.

The corpus is available for download from the University of Oxford Text Archive. The transcripts and audio files can also be downloaded and browsed through through TalkBank.

Download

The LANGSNAP Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains speech recordings of 28 British undergraduates learning French before, during and after a year abroad. Tasks include oral interviews and and story retellings, aside from argumentative writing tasks.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The LANGSNAP3 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus is a 3-year follow up to the LANGSNAP corpus, involving 18 participants.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The Newcastle Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains intermediate level spoken French from 17-18 year old second language learners, in years 12 to 13 of UK secondary education.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

Browse

Download

The PAROLE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains speech recordings of 40 students learning French in France as a second language (type of study: tasks/storytelling).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Hilton (2009)

Browse

Download

The Trinity College (TCD) Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains recordings of 5 children (2 Irish, 1 Polish, 2 Cambodian) learning French in a school in France.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

Browse

Download

The Reading Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains oral proficiency interviews with 34 16-year-olds learning French in South Wales.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Chambers and Richards (1995)

Browse

Download

The UWI Corpus

Size: 15,068 tokens
Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus consists of 25 recorded interviews with learners of French (9 adult learners) in Jamaica.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Péters (2017)

Browse

Download

The Dimroth SLA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

German

The corpus contains speech recordings of 47 students learning German (type of study: interview).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Dimroth (2008)

Browse

Download

Hamburg Modern Times Corpus

Size: 24,000 words
Annotation: prosody
Licence: CLARIN RES

German

This corpus contains film retellings and the accompanying transcriptions.

The corpus is available for download from the HZSK CLARIN-D repository.

Download

The RyanDan Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

German

The corpus contains recordings of 4 Carnegie Mellon University students learning German.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Walter (2020)

Browse

Download

The VYSA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

German

This corpus contains recordings of 3 highschool students learning German abroad while living with German-speaking host families and attending German secondary schools in standard German-speaking urban and peri-urban regions of Germany.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Young-Scholten and Langer (2015)

Browse

Download

The Theodórsdóttir corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Icelandic

The corpus contains recordings obtained in a longitudinal case study of L2 Icelandic.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Theodórsdóttir (2018)

Browse

Download

The PAROLE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Italian

This corpus contains speech recordings of 95 students learning Italian in France (type of study: tasks/storytelling).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Hilton (2009)

Browse

Download

The COPA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Mandarin

This corpus contains speech recordings of ~120 college students learning Mandarin in Hong Kong (type of study: responses to questions).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Zhang (2009)

Browse

Download

The HKPU Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Mandarin

This corpus contains speech recordings of 20 college students learning Mandarin in Hong Kong. The tasks involve oral interviews.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Chang et al. (2013)

Browse

Download

LANGMAN

Size: 11 subcorpora
Annotation: error coding
Licence: CC-BY

Hungarian

This corpus is a spoken corpus involving Chinese native speakers who learn Hungarian as a second language.

The subcorpora are available for download from and browsing through the TalkBank.

Browse

Download

The BCN-L2 Corpus

Annotation: error coding
Licence: public (acknowledgment required)

Spanish

This corpus contains speech recordings of Berber students learning Spanish. The participants were 88 native speakers of Moroccan Arabic (Darija) and 26 speakers of Berber (Amazigh) living in Catalonia.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Bet et al. (2016)

Browse

Download

The Díaz Rodríguez Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains speech recordings of Indoeuropean and Asian Learners, both semi-spontaneous and experimental, obtained in Barcelona, Spain (type of study: naturalistic, longitudinal).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Díaz (2002)

Browse

Download

The LANGSNAP Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains speech recordings of 27 British undergraduates learning Spanish before, during and after a year abroad. Tasks include oral interviews and and story retellings, aside from argumentative writing tasks.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The LANGSNAP3 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus is a 3-year follow-up to the LANGSNAP Corpus, involving 33 participants.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The Liceras Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains speech recordings of 11 students learning Spanish as a second language.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Liceras et al. (1999)

Browse

Download

The Nebrija-CORELE-UA Corpus

Size: 1 hour 27 minutes, 10,292 words
Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains 10 recorded interviews involving students of Spanish as a Foreign Language have at the University of Alicante, in Alicante.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Medina Soler (2017)

Browse

Download

The Nebrija-INMIGRA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus consists of oral interviews carried out in the context of the LETRA test of Spanish for immigrant workers. It is made up of semi-guided interviews carried out in Spanish which last approximately 10 minutes each. The participants are immigrants from 11 different countries who live in the Autonomous Community of Madrid (Spain).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Liceras (2017)

Browse

Download

The Nebrija-OAP Corpus

Size: 9 hours 19 minutes, 49,718 words
Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains 67 videotaped presentations involving 95 North American students of Spanish as a Foreign Language at Nebrija University in Madrid.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Vergara Padilla (2017)

Browse

Download

The Nebrija-WOCAE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains recordings of emails written and read by 28 Chinese students.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Vergara Padilla (2017)

Browse

Download

The Nicolás Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains recordings of 2 two children from Morocco learning Spanish in Spain learning Spanish.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see de Benito (2016)

Browse

Download

The SPLLOC1 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains recordings of L2 Spanish in a classroom context. There were 20 learners, all of whom were English native speakers, at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students (A2 students aged 17-18), and fourth year undergraduates.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Mitchell et al. (2008)

Browse

Download

The SPLLOC2 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus is an extension of the SPLLOC1 Corpus.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Mitchell et al. (2008)

Browse

Download

The Dresden Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Czech

The corpus contains speech recordings of ~32 German children learning Czech (type of study: interview).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Kubanek-German (2000)

Browse

Download

The Anglish Corpus

Annotation: interpausal units
Licence: CLARIN RES

English

This corpus contains various speech tasks performed by French native speakers and the associated transcriptions.

The corpus is available for download from Ortolang.

Download

The Barcelona English Language Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of Spanish children and teenagers learning English in Barcelona. across 4 tasks (written composition, oral narrative, oral interview and role play).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Muñoz (2006)

Browse

Download

The Barraja-Rohan Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Barraja-Rohan (2013)

Browse

Download

The Connolly Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

The corpus contains speech recordings of 60 Japanese high school students learning English.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

Browse

Download

The CUHK corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of 6 children learning English in Hong Kong.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see MacWhinney (2016)

Browse

Download

The Dresden Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

The corpus contains speech recordings of ~32 German children learning English (type of study: interview).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Kubanek-German (2000)

Browse

Download

GLBCC (Giessen - Long Beach Chaplin Corpus)

Size: 2472 words/transcript
Licence: CC-BY

English

This corpus contains film retellings performed by English and German native speakers.

The corpus is available for download from the University of Oxford Text archive.

Download

A Learners' Corpus of Reading Texts

Licence: CLARIN RES

English

This corpus contains unprepared readings by first-year students at an English department who speak French as a native language.

The corpus is available for download from Ortolang.

Download

The Markee Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of 3 students learning English as a second language.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Markee (2000)

Browse

Download

The PAROLE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains speech recordings of 95 students learning English in France (type of study: tasks/storytelling).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Hilton (2009)

Browse

Download

The QATAR Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains recorded interviews involving 19 Qatari learners of English.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Zhao and MacWhinney (2010)

Browse

Download

The Vercellotti Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

English

This corpus contains recordings of adult learners entering an Intensive English Program (IEP) in the United States during the year 2010. Tasks include 2 minute monologues.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Vercellotti (2017)

Browse

Download

The Dresden Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

The corpus contains speech recordings of ~32 German children learning French (type of study: interview).

The corpus is a part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Kubanek-German (2000)

Browse

Download

French Learner Language Oral Corpora (FLLOC)

Size: 1375 transcripts
Annotation: MSD-tagged
Licence: CC-BY

French

This corpus contains various narrative and interactive speech tasks performed by English and Dutch native speakers.

The corpus is available for download from the University of Oxford Text Archive. The transcripts and audio files can also be downloaded and browsed through through TalkBank.

Download

The LANGSNAP Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The LANGSNAP3 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus is a 3-year follow up to the LANGSNAP corpus, involving 18 participants.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The Newcastle Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains intermediate level spoken French from 17-18 year old second language learners, in years 12 to 13 of UK secondary education.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

Browse

Download

The PAROLE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains speech recordings of 40 students learning French in France as a second language (type of study: tasks/storytelling).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Hilton (2009)

Browse

Download

The Trinity College (TCD) Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains recordings of 5 children (2 Irish, 1 Polish, 2 Cambodian) learning French in a school in France.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

Browse

Download

The Reading Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus contains oral proficiency interviews with 34 16-year-olds learning French in South Wales.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Chambers and Richards (1995)

Browse

Download

The UWI Corpus

Size: 15,068 tokens
Annotation: audio/transcription linking
Licence: public (acknowledgment required)

French

This corpus consists of 25 recorded interviews with learners of French (9 adult learners) in Jamaica.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Péters (2017)

Browse

Download

The Dimroth SLA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

German

The corpus contains speech recordings of 47 students learning German (type of study: interview).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Dimroth (2008)

Browse

Download

Hamburg Modern Times Corpus

Size: 24,000 words
Annotation: prosody
Licence: CLARIN RES

German

This corpus contains film retellings and the accompanying transcriptions.

The corpus is available for download from the HZSK CLARIN-D repository.

Download

The RyanDan Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

German

The corpus contains recordings of 4 Carnegie Mellon University students learning German.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Walter (2020)

Browse

Download

The VYSA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

German

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Young-Scholten and Langer (2015)

Browse

Download

The Theodórsdóttir corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Icelandic

The corpus contains recordings obtained in a longitudinal case study of L2 Icelandic.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Theodórsdóttir (2018)

Browse

Download

The PAROLE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Italian

This corpus contains speech recordings of 95 students learning Italian in France (type of study: tasks/storytelling).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Hilton (2009)

Browse

Download

The COPA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Mandarin

This corpus contains speech recordings of ~120 college students learning Mandarin in Hong Kong (type of study: responses to questions).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Zhang (2009)

Browse

Download

The HKPU Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Mandarin

This corpus contains speech recordings of 20 college students learning Mandarin in Hong Kong. The tasks involve oral interviews.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Chang et al. (2013)

Browse

Download

LANGMAN

Size: 11 subcorpora
Annotation: error coding
Licence: CC-BY

Hungarian

This corpus is a spoken corpus involving Chinese native speakers who learn Hungarian as a second language.

The subcorpora are available for download from and browsing through the TalkBank.

Browse

Download

The BCN-L2 Corpus

Annotation: error coding
Licence: public (acknowledgment required)

Spanish

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Bet et al. (2016)

Browse

Download

The Díaz Rodríguez Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains speech recordings of Indoeuropean and Asian Learners, both semi-spontaneous and experimental, obtained in Barcelona, Spain (type of study: naturalistic, longitudinal).

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Díaz (2002)

Browse

Download

The LANGSNAP Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The LANGSNAP3 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus is a 3-year follow-up to the LANGSNAP Corpus, involving 33 participants.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Tracy-Ventura and Huensch (2018)

Browse

Download

The Liceras Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains speech recordings of 11 students learning Spanish as a second language.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Liceras et al. (1999)

Browse

Download

The Nebrija-CORELE-UA Corpus

Size: 1 hour 27 minutes, 10,292 words
Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains 10 recorded interviews involving students of Spanish as a Foreign Language have at the University of Alicante, in Alicante.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Medina Soler (2017)

Browse

Download

The Nebrija-INMIGRA Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Liceras (2017)

Browse

Download

The Nebrija-OAP Corpus

Size: 9 hours 19 minutes, 49,718 words
Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains 67 videotaped presentations involving 95 North American students of Spanish as a Foreign Language at Nebrija University in Madrid.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Vergara Padilla (2017)

Browse

Download

The Nebrija-WOCAE Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains recordings of emails written and read by 28 Chinese students.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Vergara Padilla (2017)

Browse

Download

The Nicolás Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus contains recordings of 2 two children from Morocco learning Spanish in Spain learning Spanish.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see de Benito (2016)

Browse

Download

The SPLLOC1 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Mitchell et al. (2008)

Browse

Download

The SPLLOC2 Corpus

Annotation: audio/transcription linking
Licence: public (acknowledgment required)

Spanish

This corpus is an extension of the SPLLOC1 Corpus.

The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning.

The corpus is available for online browsing and download via TalkBank.

For the relevant publication, see Mitchell et al. (2008)

Browse

Download

Video and multimodal corpora

Corpus	Language	Description	Availability
Arabic Learner Corpus Size: 0.3 million tokens Annotation: tokenised Licence: CLARIN RES	Arabic	This corpus contains essays written by students from 67 L1 backgrounds. It also contains recordings of speech tasks and associated transcriptions. The corpus is available for download from the LDC catalogue.	Download
English as a Foreign Language Corpus Size: 24 hours Licence: Under Negotiation	English	The corpus contains videotaped lessons involving students at Finnish secondary schools.
The Long Second Corpus Licence: Under Negotiation	Finnish	This corpus contains written texts, audio recordings and videotaped lessons involving immigrants from the following L1 backgrounds: Estonian, Macedonian, Kurdish, Portuguese, Russian, and English.	The corpus is still in preparation. It is set to be made available on the LAT platform.
The van Compernolle Corpus Annotation: audio/transcription linking Licence: public (acknowledgment required	French	This corpus contains a recorded examination of classroom interactional practices and actions in a beginning-level ESL reading class. Analytic foci include aspects of speech delivery and timing as well as nonverbal behaviors (e.g., eye gaze, gesture). The corpus is part of the SLABank collection, which is a component of TalkBank dedicated to providing corpora for the study of second language acquisition and learning. The corpus is available for online browsing and download via TalkBank.	Browse Download

Multilingual L2 learner corpora in the CLARIN infrastructure

Written corpora

Corpus	Language	Description	Availability
MERLIN Written Learner Corpus for Czech, German, Italian 1.1 Size: 2287 texts Annotation: a wide range of language characteristics that provide researchers with concrete examples of learner performance and progress across multiple proficiency levels. Licence: CC BY-SA 4.0	Czech, German, Italian	This corpus contains learner texts produced in standardized language certifications covering CEFR levels A1-C1. The corpus is available for download from the Eurac Research CLARIN Centre Repository.	Download
CEFLING Project Corpus	Finnish and English	This corpus contains texts written by primary and secondary school students (years 7-9).
DIALUKI: Diagnosing reading and writing in a second or foreign language Size: 8,600 texts Licence: CLARIN RES	Finnish and English	This corpus contains texts both in Finnish (written by Russian native speakers) and English (written by Finnish native speakers). The corpus will be made available through Korp.
Topling - Paths in Second Language Acquisition Size: 165,000 tokens Annotation: tokenised Licence: CLARIN End User Licence Agreement	Finnish, English, Swedish	This corpus contains written texts in English, Swedish and Finnish produced by students in the Finnish educational system and is an extension of the CEFLING corpus, which it also includes. The corpus is available through the concordancer Korp.	Browse
Kolipsi Corpus Family Size: 5500 texts; 1.15 million tokens Annotation: sentence splitting, tokenised, lemmatised, PoS-tagged, manual annotation (see description) Licence: CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)	German, Italian	The Kolipsi Corpus Family is a collection of Italian and German learner texts that were collected in the course of the KOLIPSI project in 2007/2008 (Kolipsi-1) and a follow-up study in 2014/2015 (Kolipsi-2). The aim of the original project and the follow-up study was to analyse the second language competences of South-Tyrolean pupils from upper secondary schools (between 16 and 18 years old), and to contextualize the results of such investigation by commenting on crucial sociolinguistic and psychosocial aspects that influence it. The results of the follow-up study should be compared to the results of the original KOLIPSI project. All sub-corpora of the Kolipsi Corpus Family contain manually performed transcription annotations. Transcription annotations reflect surface features of the text, such as the graphical arrangement, and include error annotation on the orthographic level. Both subcorpora of KOLIPSI are available for download from the Eurac Research CLARIN Centre Repository. In addition, the family is also available for online browsing through the ANNIS concordancer.	Download (Kolipsi-1) Download (Kolipsi-2) Browse
LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English Size: 2510 texts; 240,000 tokens Annotation: sentence splitting, tokenised, lemmatised, PoS-tagged, manual annotation (see description) Licence: CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)	Italian, German, English	This is a longitudinal corpus of student essays documenting the language competences and writing development of lower secondary school students in three different languages. The texts were collected over the span of 3 consecutive years (2015-2018) in public middle schools. The pupils were 11 years old at the beginning of the data collection and 13 years old at the end. In each grade, two written texts were collected that differ with respect to genre: the first text was elicited using a picture story re-telling task; the second text is an opinion text on different aspects related to the pupils’ life and public discourse. Manual annotation concerns the fact that the corpus is fully anonymised and annotated with target hypotheses correcting orthography errors in the text as well as annotations on structural elements (paragraphs, line breaks, bullet points, symbols or emoticons etc.), foreign word insertions and transcript surface features (e.g. deletions, corrections or insertions of the student, unreadable or ambiguous items). The corpus is available for download from the the Eurac Research CLARIN Centre Repository. For the relevant publication, see Glaznieks et al. (2021)	Download

Spoken corpora

Corpus	Language	Description	Availability
AixOx Size: 40 minutes/task Licence: restricted	English and French	This corpus contains readings of written texts performed by French and English native speakers.
LeaP: The Learning the Prosody of a Foreign Language Size: 31 hours Annotation: PoS-tagged, lemmatised, prosody	English and German	This corpus contains recordings of English and German spoken by non-native speakers from 31 different native language backgrounds. The corpus is available for download from the Language Archive.	Download
Repiso/Contrefactualité Licence: CLARIN RES	French, Italian, Spanish	This corpus contains recordings of counterfactual sentences.
Openprodat Licence: Publique Générale GNU	Dutch, English, French, German, Italian, Arabic, Spanish, Hungarian, Japanese, Thai, Norwegian, Chinese	This corpus contains paragraph readings by participants in both their L1 and in as many L2 as they felt they could manage. The corpus is available for download from Ortolang. For the relevant publication, see Hirst et al 2013	Download
GeWiss Size: 1.4 million tokens Annotation: code switching	German (L2 and L1), English, Polish, Italian (L1)	This corpus contains L1 and L2 transcripts and audio recordings of spoken German academic discourse, as well as L1 data of spoken English, Polish, and Italian academic discourse. For the relevant publication, see Fandrych et al. (2014)	Browse

Video and multimodal corpora

Corpus	Language	Description	Availability
TAITO: Written and Oral Data of the TAITO-project Licence: Under Negotiation	English, French, German, Italian, Swedish	This corpus contains texts written by undergraduate students at the beginning of their studies and videotaped discussions.
YKI National Certificates corpus Licence: CLARIN RES	Italian, Swedish, Spanish, English, Finnish, German, French, Russian	This corpus contains written and speech tasks.

Corpus

Language

Description

Availability

TAITO: Written and Oral Data of the TAITO-project

Licence: Under Negotiation

English, French, German, Italian, Swedish

This corpus contains texts written by undergraduate students at the beginning of their studies and videotaped discussions.

YKI National Certificates corpus

Licence: CLARIN RES

Italian, Swedish, Spanish, English, Finnish, German, French, Russian

This corpus contains written and speech tasks.

Other L2 Learner Corpora

There exist an additional number of 128 L2 learner corpora that are not part of the CLARIN infrastructure that are listed on the website of the Catholic University of Louvain.

See also LADDER. Learners' digital communication: a corpus for pragmatic competences in Italian L1/L2. This downloadable corpus consists of emails and instant messages, where the informants are (i) German learners of Italian between A2-C1 level according to the CEFR and most of them are students living in Tyrol (Austria) and (ii) native speakers of Italian most of whom are students from Rome (Italy). See also Brocca (2021) for a related publication.

Additional Materials

CLARIN workshop on Interoperability of Second Language Resources and Tools, 6-8 December 2017, Gothenburg, Sweden. [html]

Publications on L2 Learner Corpora

[Barraja-Rohan 2013] Anne-Marie Barraja-Rohan. 2013. Second Language Interactional Competence and its Development: A Study of International Students in Australia.

[Bel et al. 2016] Aurora Bel, Estela García-Alcaraz, and Elisa Rosado. 2016. Reference comprehension and production in bilingual Spanish. In Language Acquisition Beyond Parameters: Studies in honour of Juana M. Liceras, edited by Anahí Alba de la Fuente, Elena Valenzuela, and Cristina Martínez Sanz, 37–70.

[de Benito 2016] Estrella Nicolás de Benito. 2016. La adquisición del sintagma determinante en español por niños de lengua materna árabe marroquí. Doctoral dissertation.

[Chang et al. 2013] A. Chang, Z.H. Feng, and W.C. Yang. 2013. A new multimedia shared L2 spoken Mandarin Chinese corpus: construction and linguistic analyses. In Proceedings of the 21st Annual Meeting of the Internatioal Association of Chinese Linguistics.

[Chambers and Richards 1995] Francine Chambers and Brian Richards. 1995. The "free conversation" and the assessment of oral proficiency. Language Learning Journal, 11: 6–10.

[Dimroth 2008] Christine Dimroth. 2008. Age Effects on the Process of L2 Acquisition? Evidence From the Acquisition of Negation and Finiteness in L2 German. Language Learning, 58 (1): 117–150.

[Díaz 2002] Lourdes Díaz. 2002. Interferencias discursivas de hablantes bilingües castellano/catalán: uso oral y escrito. In Seminari sobre les llengües i educació de l’Estat, edited by J. Perera.

[Hirst et al. 2013] Daniel Hirst, Brigitte Bigi, Hyongsil Cho, Hongwei Ding, Sophie Herment, Ting Wang. 2013. Building OMProDat: an open multilingual prosodic database.

[Hilton 2009] Heather Hilton. 2009. Annotation and Analyses of Temporal Aspects of Spoken Fluency. CALICO Journal, 26 (3): 644–661.

[Kubanek-German 2000] Angelika Kubanek-German. 2010. Early Language Programmes in Germany. In An Early Start: Young Learners and Modern Languages in Europe and Beyond.

[Jantunen 2011] Jarmo Harri Jantunen. 2011. Kansainvälinen oppijansuomen korpus (ICLFI): typologia, taustamuuttujat ja annotointi.

[Liceras 2017] Juana M. Liceras. Herramientas para abordar el análisis de la gramática no nativa de los inmigrantes (Juana M. Liceras). In La formación de los docentes de español para inmigrantes en distintos contextos educativos, edited By Dimitrinka Georgieva Níkleva.

[Liceras et al. 1999] J.M. Liceras, E. Valenzuela, and L. Díaz. 1999. L1/L2 Spanish grammars and the pragmatic deficit hypothesis. Second Language Research, 15 (2): 161–190.

[MacWhinney 2016] Brian MacWhinney. 2016. A Shared Platform for Studying Second Language Acquisition. Language Learning, 67 (1).

[Markee 2000] Numa P. Markee. 2000. Conversation Analysis. Mahwah, New Jersey: Erlbaum.

[Medina Soler 2017] Isabela Medina Soler. 2017. La atenuación en el discurso oral de estudiantes de e/le universitarios con nivel b1 en contexto de inmersión para los actos de habla disentivo.

[Mitchell et al. 2008] Rosamond Mitchell, Laura Domínguez, María Arceh, Florence Myles, and Emma Marsden. 2008. SPLLOC: A new corpus for Spanish second language acquisition research. Eurosla Yearbook, 8 (1): 287–304.

[Muñoz 2006] Carmen Muñoz (editor). 2006. Age and the Rate of Foreign Language Learning. Great Britain: Comwell Press Ltd

[Orr and Quené 2017] Rosemary Orr and Hugo Quené. 2017. D-LUCEA: Curation of the UCU Accent Project Data.

[Péters 2017] Hugues Péters. 2017. Comportements d'autocorrection et d'hésitation manifestés par les apprenants de FLE au cours de conversations orales spontanées. Publié dans Bulletin VALS-ASLA N° Spécial, 2: 133–145.

[Rosen 2016] Alexandr Rosen. 2016. Building and using corpora of non-native Czech.

[Theodórsdóttir‬ 2018] Guðrún Theodórsdóttir‬. 2018. L2 Teaching in the Wild: A Closer Look at Correction and Explanation Practices in Everyday L2 Interaction. The Modern Language Journal, 102 (1).

[Tracy-Ventura and Huensch 2018] Nicole Tracy-Ventura and Amanda Huensch. 2018. The potential of publicly shared longitudinal learner corpora in SLA research. In Critical Reflections on Data in Second Language Acquisition, edited by Aarnes Gudmestad and Amanda Edmonds, 149–170.

[Vercellotti 2015] Mary Lou Vercellotti. 2015. The Development of Complexity, Accuracy, and Fluency in Second Language Performance: A Longitudinal Study. Applied Linguistics, 38 (1): 90–111.

[Volodina et al. 2016] Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, and Monica Sandell. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings of LREC 2016, Slovenia.

[Volodina et al. 2019] Elena Volodina, Lena Granstedt, Arild Matsson, Beáta Megyesi, Ildikó Pilán, Julia Prentice, Dan Rosén, Lisa Rudebeck, Carl-Johan Schenström, Gunlög Sundberg, and Mats Wirén. 2019. The SweLL Language Learner Corpus: From Design to Annotation. Northern European Journal of Language Technology, Special Issue. (Non-final version)

[Vergara Padilla 2017] María Ángeles Vergara Padilla. 2017. La influencia de las tipologías textuales en la fluidez. Las presentaciones académicas orales de aprendientes estadounidenses de ele.

[Walter 2020] Daniel Walter. 2020. Student Uses of the First Language for L2 Classroom Interactions.

[Young-Scholten and Langer 2015] Martha Young-Scholten and Monika Langer. 2015. The role of orthographic input in second language German: Evidence from naturalistic adult learners’ production. Applied Psycholinguistics, 36 (1): 93–114.

[Zhang 2009] Yanhui Zhang. 2009. A Tutor for Learning Chinese Sounds through Pinyin (Unpublished Doctoral Dissertation). Carnegie Mellon University.

[Zhao and MacWhinney 2009] Yun Zhao and Brian MacWhinney. 2009. Competing Cues: A Corpus-based Study of the English Tense-Aspect in Second Language Acquisition. In Proceedings of the 34th annual Boston University Conference on Language Development, edited by Katie Franich, Kate M. Iserman, and Lauren L. Keil, 503–514.