Python utilities for the Stanford RNA 3D Folding Part 1 Kaggle competition. These scripts handle preparing submissions, scoring predictions, and converting between formats.
The main submission preparation tool. Given a CSV of target sequences and a set of PDB/mmCIF structure files, it extracts C1′ coordinates per residue for each model and packages them into the competition submission CSV format (columns ID, resname, resid, x_1/y_1/z_1, …, x_5/y_5/z_5). Handles missing residues (filled with sentinel value -1e18), optional target-ID remapping via --id_map, and can also produce solution files with Usage labels.
prepare_3d_sub.py -t test_sequences.csv -s model1.pdb model2.pdb ... -o submission.csv
Scores a submission CSV against a solution CSV using TM-score (via USalign), lDDT (via OpenStructure), or Kabsch RMSD. Mirrors the Kaggle evaluation metric: for each target, the best score across up to 40 native structures and 5 predicted models is used. Reports public, private, and ignored split scores separately, and writes per-target scores to a .TMscores.csv file.
ribonanza_tmscore.py --solution solution.csv --submission submission.csv
ribonanza_tmscore.py --solution solution.csv --submission submission.csv --rmsd
ribonanza_tmscore.py --solution solution.csv --submission submission.csv --lddt
Converts a submission CSV back to PDB format (C1′ atoms only, one file per target per model) and optionally runs each through Arena for all-atom reconstruction. Output files are organized into a subdirectory named after the submission file.
save_pdb_arena.py submission.csv --output_dir ./pdbs/
save_pdb_arena.py submission.csv --skip_arena # just write C1'-only PDBs
Patches a large "full" submission CSV with rows from a smaller "original" CSV, matching on the ID column. Useful for replacing placeholder predictions with finalized ones while keeping the rest of the full submission intact.
merge_from_original_csv.py --full_csv full.csv --original_csv patch.csv --output_csv merged.csv
Replaces all NaN values in one or more CSV files with 0.0, writing output to <filename>.nan_to_zeros.csv. A quick fix for submission files that contain missing values where zeros are expected.
nan_to_zeros.py submission.csv
- Python 3
biopython(for PDB/mmCIF parsing inprepare_3d_sub.py)pandas,numpyUSalign(must be onPATHfor TM-score calculation)Arena(must be onPATHfor all-atom reconstruction)- OpenStructure (optional, for
--lddtscoring; called directly via a local OpenStructure Python — no Docker/Singularity required) daslab_tools(optional, for--lddt/--gdt: provideslddt.pyandgdt.py; setDASLAB_TOOLS_STRUCTUREor put them onPATH)