AnimalTaskSim
Benchmark AI agents on classic animal decision-making tasks
Train and evaluate agents using task-faithful environments, public reference data, and behavioral fingerprints from decision neuroscience.
Beyond reward maximization
Standard RL benchmarks optimize for reward. But animal behavior is richer — it has characteristic patterns in reaction times, error rates, and how past trials influence current choices.
AnimalTaskSim evaluates whether AI agents reproduce these behavioral fingerprints, not just whether they win.
Guiding Principles
- Task-faithful environments over abstract games
- Behavioral metrics over raw reward
- Reproducibility through seeded baselines
- Schema-locked logging for fair comparison
What's included
Behavioral Fingerprints
Match psychometric curves, chronometric slopes, history effects, and lapse patterns — not just reward.
Mouse 2AFC Tasks
IBL-style two-alternative forced choice with realistic timing and contrast levels.
Macaque RDM
Random-dot motion discrimination with coherence-dependent behavior.
Reference Data
Benchmark against canonical datasets from decision neuroscience labs.
Reproducible Envs
Gymnasium environments that mirror lab protocols and timing exactly.
HTML Reports
Automatic evaluation scripts that score and render visual reports.
See it in action
Psychometric curves, chronometric plots, or agent training video
Quick start
# Clone the repository git clone https://github.com/ermanakar/animaltasksim.git cd animaltasksim # Install dependencies pip install -e . # Run interactive workflow python -m animaltasksim.cli interactive