Add Schrag2026Pediatric SSVEP dataset (n=47, ages 5-18)#1069
Add Schrag2026Pediatric SSVEP dataset (n=47, ages 5-18)#1069bruAristimunha wants to merge 2 commits intodevelopfrom
Conversation
Pediatric SSVEP-BCI dataset from Schrag et al. (2026): 47 children
(ages 5-18, 40.4% female) recorded with g.tec g.GAMMAsys + g.USBamp
at 256 Hz on 16 scalp channels. Two-stage protocol:
- Stimulus personalization (12 stimuli, 4 contrasts x 3 sizes at 10 Hz)
- Online 4-target SSVEP game (6.25 / 10 / 11.11 / 14.28 Hz),
played twice per subject (personal vs standard stimulus, two maps).
By default the loader exposes the SSVEP game runs as two sessions
("0standard", "1personal") with 5 s trials at four target frequencies;
include_personalization=True opens a third "2personalization" session
(all trials labelled "10" -- the shared 10 Hz flicker).
Trial labels for the game come from the live fbCCA classifier output
(Selected SPO column in the per-game movement CSV); this is documented
in the class docstring as not-quite-ground-truth. Trial / CSV count
drift is min-truncated when small (<= 10 percent), otherwise the run's
labels are dropped to avoid silent shifts.
Data hosted as a single 1.2 GB Zenodo zip (10.5281/zenodo.19440997,
CC-BY-ND-4.0); per-subject extraction is staged via tempfile +
os.replace for race-safe concurrent runs.
Preprint DOI: 10.21203/rs.3.rs-9347306/v1
- Add moabb/datasets/schrag2026.py
- Register in moabb/datasets/__init__.py and summary_ssvep.csv
- Add to docs/source/api.rst SSVEP autosummary and whats_new.rst
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1446b3ff31
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if zip_path.suffix != ".zip": | ||
| target = zip_path.with_suffix(".zip") | ||
| if not target.exists(): | ||
| zip_path.rename(target) | ||
| zip_path = target |
There was a problem hiding this comment.
Preserve the downloader cache filename
Because the Zenodo API URL ends in /content, data_dl caches this archive at a path named content; renaming it to content.zip removes the exact file that data_dl checks on the next call. Since data_path() calls data_dl() before checking whether the subject was already extracted, any later access to an already-extracted subject will still re-download the full ~1.2 GB archive every time. Keep the cached path intact or download/cache under the final zip filename instead.
Useful? React with 👍 / 👎.
Audit of the dataset class showed six private helpers each used by
exactly one caller, which obscured the linear flow when reading
top-down. Inline the small ones; keep helpers used by both loaders.
- Inline _normalize_spo (now: ``_match_freq`` next to its caller)
- Inline _personalization_label as a 2-line ``rsplit`` in the loader
- Inline _movement_csv_for_eeg as a path expression at the call site
- Inline _wanted_session_keys, _find_game_files, _safe_pair_count
- Move _load_xdf_streams, _read_unity_markers, _build_raw, _load_*_run,
_extract_subject to module level so the file reads top-to-bottom
- Rename ``marker_text`` -> ``markers`` and ``start_idx`` -> ``trial_starts``
to match the variable names used in the upstream Schrag/Comaduran
reference notebooks (``epoching-example.ipynb`` in the Zenodo
deposit)
Behavior unchanged. All previously-verified properties hold:
- demographics still match the Zenodo CSV byte-for-byte
- METADATA fields preserved (DOI, license, freqs, n_classes, n_subjects)
- 10 percent drift safeguard still drops shifted-label runs
- include_personalization=True still yields 40 T1 trials
- session filtering ("personal" and "0standard" forms) still works
- _extract_subject is still atomic (tempfile + os.replace)
- SSVEP paradigm round-trip identical (428 trials, balanced 4 classes)
Summary
Adds
Schrag2026Pediatric— pediatric SSVEP-BCI dataset from Schrag et al. (2026) hosted on Zenodo (CC-BY-ND-4.0). 47 children (ages 5–18, 40.4% female) recorded at 256 Hz on 16 channels with the g.tec g.GAMMAsys + g.USBamp + g.GAMMAcap system, ground Fpz, earlobe reference.Each subject contributes:
By default the loader exposes the SSVEP game runs only — two sessions per subject (
"0standard","1personal"), 5 s trials at four target frequencies.include_personalization=Truealso loads T1 as a third session ("2personalization"); all its trials carry the"10"event since every personalization stimulus flickers at 10 Hz.Important caveat (also in the class docstring): trial labels for the game sessions come from the recorded fbCCA classifier output (the
Selected SPOcolumn of the per-game movement CSV) — i.e. the frequency the system identified during the live game, which then drove avatar movement. They are not ground-truth target frequencies. Treatingyas such biases benchmarks toward fbCCA's behaviour. Trial-vs-CSV count drift is min-truncated when small (≤ 10 percent), otherwise the run's labels are dropped to avoid silent shifts.Files
moabb/datasets/schrag2026.py— new dataset class + helpersmoabb/datasets/__init__.py— registrationmoabb/datasets/summary_ssvep.csv— summary rowdocs/source/api.rst— autosummary entry under SSVEP datasetsdocs/source/whats_new.rst— changelog entryImplementation notes
pyxdf(soft-imported). Modelled afteraguilera_rodriguez2025.py(XDF) andkumar2024.py(single-zip Zenodo). Marker stream is selected by name (UnityMarkerStream) — each XDF also carries an emptygUSBamp-1Markersstream that wins a type-based match in some files.DatasetData.zipis downloaded once and extracted per-subject on demand viasafe_extract_zip(... members=...)so first-use latency stays in seconds for one-subject runs. Extraction is staged into a sibling temp dir thenos.replace-d into place — race-safe under concurrent pytest workers._AGES,_SEXES) hardcoded fromParticipant_Demographic_Info.csvin the deposit; verified to match byte-for-byte.CC-BY-ND-4.0per the live Zenodo deposit (the preprint PDF says CC-BY-4.0; Zenodo metadata is authoritative for the data — comment in the source explains the discrepancy).Cross-checks performed
Participant_Demographic_Info.csv, (c) live Zenodo API.X.shape=(428, 16, 1281), balanced classes{6.25: 119, 10: 116, 11.11: 95, 14.28: 98}._extract_subjectverified idempotent; no temp leftovers after concurrent extraction races.log.errorrather than silently shift them.Style note
The class follows MOABB's existing
BaseDatasetshape but its module-level helpers (_load_xdf_streams,_read_unity_markers,_build_raw,_load_game_run,_load_personalization_run,_match_freq,_extract_subject) are deliberately flat / procedural and use the variable names (marker_ts,markers,eeg_stream, …) from the upstream Schrag / Comaduran reference notebooks (epoching-example.ipynbin the Zenodo deposit), so the original authors can read it top-to-bottom.Test plan
Schrag2026Pediatric()instantiates with 47 subjects, 4 eventsdataset_search(paradigm="ssvep", events=["10"])finds itinclude_personalization=Trueloads T1 (40 trials per subject)sessions=["personal"]andsessions=["0standard"]filter correctlyReferences