Disclaimer: This project — including all code, analysis, and the research report — was implemented entirely by AI (Claude Code). Matthew Khoriaty (@AMindToThink) directed the research questions and reviewed the outputs but cannot fully vouch for the correctness of the implementation or statistical analysis. The full Claude Code conversation logs are available in
logs/conversation/for transparency.
Systematic investigation of whether modern LLMs exhibit priming effects analogous to those documented in human psycholinguistics. Tests syntactic, semantic, lexical, temporal decay, and pragmatic priming across GPT-4o-mini and Claude 3.5 Haiku.
Key finding: GPT-4o-mini exhibits a strong, dose-dependent dative syntactic priming effect (0% → 30%, p < 0.0001), mirroring cumulative priming in humans.
See REPORT.md for the full write-up.
# Install dependencies
uv sync --all-extras
# Set your OpenRouter API key
export OPENROUTER_API_KEY_mk_era_1=sk-or-...
# Dry run (2 trials, one model)
uv run python -m experiments.runner --trials 2 --models gpt4o-mini
# Full run (~2,500 API calls, both models)
uv run python -m experiments.runner
# Run a single experiment
uv run python -m experiments.runner exp1 --trials 30
# Generate figures from saved results
uv run python -c "
import pandas as pd
from pathlib import Path
from analysis.plots import generate_all_figures
all_results = {p.stem: pd.read_parquet(p) for p in Path('data/processed').glob('*.parquet')}
generate_all_figures(all_results)
"
# Run tests
uv run pytest tests/ -vpriming_effects/
├── src/
│ ├── config.py # Model specs, API key, experiment defaults
│ ├── client.py # Async OpenRouter client with caching + retries
│ ├── cache.py # SHA-256 disk cache for API responses
│ ├── classifiers.py # Voice, dative, semantic, register classifiers
│ └── stimuli.py # All experimental stimuli
├── experiments/
│ ├── base.py # BaseExperiment ABC
│ ├── exp1_syntactic.py # Syntactic priming (voice + dative, dose 0/1/3/5)
│ ├── exp2_semantic.py # Semantic field leakage
│ ├── exp3_lexical_boost.py # Shared vs. different verb priming
│ ├── exp4_decay.py # Priming persistence over filler turns
│ ├── exp5_pragmatic.py # Hedging/formal/assertive register mirroring
│ └── runner.py # CLI entry point
├── analysis/
│ ├── stats.py # Chi-squared, Cohen's h, bootstrap CIs, decay fit
│ └── plots.py # 7 figures
├── tests/
│ ├── test_classifiers.py
│ └── test_stats.py
├── data/processed/ # Result DataFrames (parquet)
├── figures/ # Output plots (PNG)
└── REPORT.md # Full research report
| # | Experiment | What it tests | Key result |
|---|---|---|---|
| 1 | Syntactic Priming | Dose-response (0,1,3,5 primes) for voice + dative | Dative: 0%→30% (GPT-4o-mini, p<0.0001) |
| 2 | Semantic Priming | Domain vocabulary leakage | Null — models compartmentalize well |
| 3 | Lexical Boost | Shared verb amplification | Floor effect (0% passive) |
| 4 | Decay Curve | Priming over 0–10 filler turns | Floor effect |
| 5 | Pragmatic Style | Register mirroring | Near-zero with keyword classifier |
Both accessed via OpenRouter (https://openrouter.ai/api/v1):
openai/gpt-4o-minianthropic/claude-3.5-haiku
32 unit tests covering classifiers (voice, dative, semantic overlap, register) and statistical functions (Cohen's h, bootstrap CIs, chi-squared, decay fitting):
uv run pytest tests/ -v