Skip to content

DasLab/k1_tools

Repository files navigation

k1_tools

Python utilities for the Stanford RNA 3D Folding Part 1 Kaggle competition. These scripts handle preparing submissions, scoring predictions, and converting between formats.

Scripts

prepare_3d_sub.py

The main submission preparation tool. Given a CSV of target sequences and a set of PDB/mmCIF structure files, it extracts C1′ coordinates per residue for each model and packages them into the competition submission CSV format (columns ID, resname, resid, x_1/y_1/z_1, …, x_5/y_5/z_5). Handles missing residues (filled with sentinel value -1e18), optional target-ID remapping via --id_map, and can also produce solution files with Usage labels.

prepare_3d_sub.py -t test_sequences.csv -s model1.pdb model2.pdb ... -o submission.csv

ribonanza_tmscore.py

Scores a submission CSV against a solution CSV using TM-score (via USalign), lDDT (via OpenStructure), or Kabsch RMSD. Mirrors the Kaggle evaluation metric: for each target, the best score across up to 40 native structures and 5 predicted models is used. Reports public, private, and ignored split scores separately, and writes per-target scores to a .TMscores.csv file.

ribonanza_tmscore.py --solution solution.csv --submission submission.csv
ribonanza_tmscore.py --solution solution.csv --submission submission.csv --rmsd
ribonanza_tmscore.py --solution solution.csv --submission submission.csv --lddt

save_pdb_arena.py

Converts a submission CSV back to PDB format (C1′ atoms only, one file per target per model) and optionally runs each through Arena for all-atom reconstruction. Output files are organized into a subdirectory named after the submission file.

save_pdb_arena.py submission.csv --output_dir ./pdbs/
save_pdb_arena.py submission.csv --skip_arena   # just write C1'-only PDBs

merge_from_original_csv.py

Patches a large "full" submission CSV with rows from a smaller "original" CSV, matching on the ID column. Useful for replacing placeholder predictions with finalized ones while keeping the rest of the full submission intact.

merge_from_original_csv.py --full_csv full.csv --original_csv patch.csv --output_csv merged.csv

nan_to_zeros.py

Replaces all NaN values in one or more CSV files with 0.0, writing output to <filename>.nan_to_zeros.csv. A quick fix for submission files that contain missing values where zeros are expected.

nan_to_zeros.py submission.csv

Dependencies

  • Python 3
  • biopython (for PDB/mmCIF parsing in prepare_3d_sub.py)
  • pandas, numpy
  • USalign (must be on PATH for TM-score calculation)
  • Arena (must be on PATH for all-atom reconstruction)
  • OpenStructure (optional, for --lddt scoring; called directly via a local OpenStructure Python — no Docker/Singularity required)
  • daslab_tools (optional, for --lddt/--gdt: provides lddt.py and gdt.py; set DASLAB_TOOLS_STRUCTURE or put them on PATH)

About

Tools for the Kaggle 2025 RNA 3D structure prediction competition (k1)

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages