https://aginglexicon.github.io/resources/
Please provide the commonly used name (and acronym) for the resource, together with a link, reference and short description (language, nature of participants, size of the data). If you are aware of any new initiatives or work in preparation that might be of interest, you can include it here.
- Talk Bank: The goal of TalkBank is to foster fundamental research in the study of human communication. It contains a number of diverse speech and text corpora. Some are public and some require contacting TalkBank for permission.
- BYU corpora: Collection of free and commercial corpora by Mark Davies in English, Spanish and Portuguese
- SUBTLEX-NL: frequencies based on Dutch subtitles
- SUBTLEX-US: frequencies based on American English subtitles:
- SUBTLEX-CH: frequencies based on Chinese subtitles:
- SUBTLEX-ESP: frequencies based on Spanish subtitles
- SUBTLEX-DE: frequencies based on German subtitles
- SUBTLEX-GR: frequencies based on Greek subtitles (Dimitropoulou et al., 2010)
- SUBTLEX-UK: frequencies based on British English subtitles
- SUBTLEX-PL: frequencies based on Polish subtitles
- Age-of-acquisition (AoA) norms for over 50 thousand English words
- Affective ratings for nearly 14 thousand English words
- MacArthur-Bates Communicative Development Inventories
- University of South Florida (USF) Free Association Norms: English word association norms for approx. 5,000 cues.
- Small World of Words: Word association norms for over 12,000 cues in English and Dutch.
- Leuven concept data: norms for over 400 concrete nouns including typicality, similarity within particular domains, category naming data, exemplar generation data, frequency, AoA, etc.
- SNAUT: Interface and access to semantic vectors for Dutch and English based on word2vec
- Latent Semantic Analysis: Interface to obtain semantic similarity for words and documents
- ESPAL: phonology, part-of-speech, subtitle frequencies, etc. in Castillian and Latin American Spanish
- Erin Buchanan's word norms: Concept features, LSA and BEAGLE similarity estimates
- English Lexicon Project (ELP)
- Dutch Lexicon Project (DLP)
- Dutch Lexicon Project 2 (DLP2)
- BALDEY (Auditory Lexical Decision in Dutch)
- French Lexicon Project (FLP)
- British Lexicon Project (BLP)
- Provo Corpus: A Large Eye-Tracking Corpus with Predictability Norms
- Ghent Eye-Tracking Corpus (GECO): Includes bilingual data
- Eye tracking in young and older adults
-
CMU fMRI dataset: 60 concrete concepts. in 12 categories, collected while nine English speakers were presented with 60 line drawings of objects with text labels and were instructed to think of the same properties of the stimulus object consistently during each presentation. For each concept there are 6 instances of ~20k neural activity features (brain blood oxygenation levels).
-
Trento EEG data-set for 60 concepts: concepts in 2 categories (work tools and land mammals), collected while seven Italian speakers were silently naming photographic images that represent these concepts. For each concept there are 6 instances of ~15k neural activity features (spectral power in voltage signals).
- tm package: Text Mining in R
- NetworkToolBox an R package to analyze brain, cognitive, and psychometric networks
- SemNetCleaner: automated R package to clean semantic fluency data
- spaCy: Industrial-Strength Natural Language Processing in Python
- Prolific Academic: research-focussed crowd-sourcing platform
- Executive control/Inhibition measures from an individual differences study with young and old adults: https://osf.io/rygex/
- Meta-analysis of aging effects on inhibition tasks [data and analysis script)(https://osf.io/fthku/)
- Nun Study
Overview and commentatory papers, and references to the resources