Add mid train dataset generation scripts by JewelRoam · Pull Request #80 · PaddlePaddle/PassNet

JewelRoam · 2026-04-21T09:56:27Z

生成流程

前置工作：根据sample_lists找到graph_list，从而找到子图路径

Usage:

#   bash expand_graph_paths.sh

生成步骤：
1. 设置TORCH_COMPILE_DEBUG环境变量，使用graph_net_bench.torch.test_compiler捕获Inductor的编译产物，并记录下log；
2. 根据 log，筛选正优化样本留用；
3. 从 Inductor 的编译产物中清除无用的代码，提取出triton kernel；
4. 从原 GraphNet 样本目录中提取出计算图代码（目前为model.py）；
5. 把c、d两步的产物互相配对，存放在一个目录下。

Usage:

#   bash extract_triton_kernels.sh <source> [gpu_ids]
#
# Args:
#   source   (required): "list" or "hf"
#   gpu_ids  (optional): comma-separated GPU IDs, e.g. "0,2,5,7"
#
# Examples:
#   bash extract_triton_kernels.sh list            # list source, auto-detect GPUs
#   bash extract_triton_kernels.sh hf 0,2,5,7      # hf source, specified GPUs

Move the 498-line monolithic bash script logic into tools/triton_kernel_extractor/, a structured Python module with clear separation of concerns (config, sample enumeration, multi-GPU compilation, speedup filtering, kernel extraction, cleanup). The bash entry script is reduced to a thin launcher that sets machine-specific paths and delegates to `python3 -m tools.triton_kernel_extractor`. CLI interface unchanged: `bash extract_triton_kernels.sh <source> [gpu_ids]`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extend Step 4 of the extraction pipeline to locate and pair each triton kernel with its PTX assembly from the inductor cache. When multiple autotuning candidates exist, the winning configuration is identified via the .best_config triton_cache_hash field. Add package README documenting the full pipeline, PTX resolution algorithm, and output structure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add cache_analyzer.py: replaces analyze_inductor_cache.sh with a Python module that concatenates logs, computes speedup statistics, and generates distribution plots. - Add 'analyze' subcommand to CLI with backward-compatible implicit 'extract' for the old --source-first invocation style. - Add --enable-cache-analysis flag to run analysis after extraction pipeline. - Harden kernel_extractor.py: guard file reads with try/except OSError, deduplicate kernel names across multiple output_code.py files per sample, remove dead KeyError from exception handler. - Extract shared is_sample_dir() into config.py, remove duplicates from speedup_filter.py and cache_analyzer.py. - Replace assert with explicit raise ValueError in pipeline.py for -O safety. - Update README with simplified cache analysis section and CLI arguments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Thread the GraphNet inductor config template through the compilation pipeline: CLI flag → PipelineConfig → base64-encoded --config arg on test_compiler subprocess. The flag is off by default; the bash launcher enables it alongside --enable-cache-analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JewelRoam and others added 5 commits April 21, 2026 17:54

Add mid train dataset generation scripts

c0f4b28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mid train dataset generation scripts#80

Add mid train dataset generation scripts#80
JewelRoam wants to merge 5 commits intoPaddlePaddle:developfrom
JewelRoam:midtrain

JewelRoam commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JewelRoam commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

生成流程

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JewelRoam commented Apr 21, 2026 •

edited

Loading