Skip to content

[improve](streaming-job) async chunk splitting for cdc source job#63079

Merged
JNSimba merged 24 commits into
apache:masterfrom
JNSimba:improve-streamingjob-split
May 19, 2026
Merged

[improve](streaming-job) async chunk splitting for cdc source job#63079
JNSimba merged 24 commits into
apache:masterfrom
JNSimba:improve-streamingjob-split

Conversation

@JNSimba
Copy link
Copy Markdown
Member

@JNSimba JNSimba commented May 8, 2026

Summary

StreamingInsertJob (CDC FROM-TO and cdc_stream TVF paths) used to call splitChunks() synchronously inside CREATE STREAMING JOB, asking cdc_client to cut every chunk of every table before returning. On large/non-uniform PK tables this can take 30+ minutes — far beyond the BE→cdc_client BRPC 60s timeout, and the SQL client blocks the whole time.

This PR makes splitting tick-driven by the FE scheduler:

  • CREATE returns immediately; no more synchronous splitChunks().
  • Each scheduler tick advanceSplits() issues one short fetchSplits RPC (default batchSize=100) and pushes that batch into remainingSplits. Tasks dispatch as soon as the first batch lands, so end-to-end first-byte latency stays close to flink-cdc's.
  • cdc_client is stateless — every RPC reconstructs ChunkSplitter from the (currentSplittingTable, nextSplitStart, nextSplitId) triple supplied by FE; flink-cdc internals are untouched (uses the public ChunkSplitter API only).
  • Crash recovery uses three sources of truth:
    • editlog persists committedSplitProgress (3-field SplitProgress) + existing chunkHighWatermarkMap / binlogOffsetPersist
    • streaming_job_meta system table holds full chunk_list JSON per table (UPSERT each advanceSplits)
    • cdc_client memory holds nothing
  • Both FROM-TO (multi-table) and TVF (single-table) paths share the same SourceOffsetProvider#initSplitProgress / noMoreSplits / advanceSplits interface; StreamingJobSchedulerTask.handlePendingState pre-advances one batch so the first task doesn't wait a full max_interval.

Detailed design lives in the linked plan.

Changes

  • fe-common: FetchTableSplitsRequest adds nextSplitStart (Object[]) / nextSplitId / batchSize.
  • fe-core:
    • SourceOffsetProvider adds 3 default methods: initSplitProgress / advanceSplits / noMoreSplits.
    • JdbcSourceOffsetProvider implements the async state machine (committed/cdc SplitProgress, advanceSplits, dedup, system-table UPSERT, replay path).
    • JdbcTvfSourceOffsetProvider.initOnCreate no longer pre-splits; relies on the same scheduler tick path.
    • StreamingInsertJob carries syncTables (@SerializedName("st")); initSourceJob / initInsertJob initialize SplitProgress; advanceSplitsIfNeed() mirrors fetchMeta error handling (PAUSE on failure).
    • StreamingJobSchedulerTask.handlePendingState / handleRunningState call advanceSplitsIfNeed() each tick; PENDING handler pre-advances and short-circuits if PAUSED.
    • StreamingJobUtils.upsertChunkList covers id-allocation via MAX(id)+1 lookup.
  • cdc_client/JdbcIncrementalSourceReader: getSourceSplits() rebuilt around the public ChunkSplitter API (no more in-memory loop / reflection hack).

Tests

  • SplitProgressTest — copy/null-field semantics.
  • JdbcSourceOffsetProviderAsyncSplitTest — covers advanceSplits (first call / continue same table / cross-table switch / dedup / empty batch), noMoreSplits, updateOffset committed-progress advancement (mid-chunk vs last chunk vs replay missing-split path), and computeCdcRemainingTables.
  • Regression case (separate commit, not in this PR yet): test_streaming_postgres_job_async_split.groovy — 100 rows × snapshot_split_size=5 → 20 splits across multiple ticks; asserts CREATE returns < 30s, full snapshot count + DISTINCT id, then INSERT/UPDATE/DELETE in binlog phase.

Test plan

  • mvn test -pl fe/fe-core -Dtest=JdbcSourceOffsetProviderAsyncSplitTest,SplitProgressTest
  • Run test_streaming_postgres_job_async_split regression locally
  • PG/MySQL non-uniform PK large-table manual test: confirm CREATE returns in seconds, SHOW STREAMING JOB immediately reflects the new job, snapshot completes, binlog phase healthy
  • FE restart mid-snapshot: confirm cdc-side resumes from system-table position, no duplicate / lost rows
  • cdc_client kill mid-snapshot: confirm FE retries on next tick, no duplicate / lost rows
  • cdc_stream TVF + StreamingInsertJob path: confirm CREATE no longer blocks

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba JNSimba changed the title [feature](streaming-job) async chunk splitting for StreamingInsertJob [improve](streaming-job) async chunk splitting for StreamingInsertJob May 8, 2026
@JNSimba JNSimba requested a review from Copilot May 11, 2026 03:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves StreamingInsertJob (CDC FROM-TO and cdc_stream TVF) snapshot chunk splitting from a synchronous CREATE STREAMING JOB path to an incremental, scheduler-tick-driven flow. The goal is to avoid long blocking CREATE times and BRPC timeouts on large / skewed PK tables by fetching snapshot splits in small batches and persisting progress for recovery.

Changes:

  • Adds split-progress APIs to SourceOffsetProvider and implements an async split state machine in JdbcSourceOffsetProvider (plus new FE tests).
  • Introduces FetchTableSplitsRequest fields to drive stateless, resumable split generation (nextSplitStart/nextSplitId/batchSize) and rebuilds cdc_client split fetching around flink-cdc ChunkSplitter.
  • Persists per-table chunk lists incrementally via StreamingJobUtils.upsertChunkList, and advances splits each scheduler tick (including a pre-advance in PENDING).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/JdbcIncrementalSourceReader.java Reworks /api/fetchSplits handling to drive flink-cdc ChunkSplitter directly (stateless batch split generation).
fe/fe-core/src/test/java/org/apache/doris/job/offset/jdbc/SplitProgressTest.java Unit tests for SplitProgress default state and deep-copy semantics.
fe/fe-core/src/test/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProviderAsyncSplitTest.java Unit tests covering async split advancement, dedup, noMoreSplits, and committed-progress advancement.
fe/fe-core/src/main/java/org/apache/doris/job/util/StreamingJobUtils.java Adds per-table chunk_list UPSERT support with id reuse / allocation.
fe/fe-core/src/main/java/org/apache/doris/job/offset/SourceOffsetProvider.java Adds default split-progress hooks (initSplitProgress, advanceSplits, noMoreSplits).
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcTvfSourceOffsetProvider.java Removes create-time pre-splitting; re-init split progress on replay; relies on scheduler-driven split fetching.
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProvider.java Implements async split progress, scheduler-driven split fetching, persistence to system table, and restart replay logic.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingJobSchedulerTask.java Calls advanceSplitsIfNeed() each tick and pre-advances once in PENDING before dispatch.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java Persists syncTables, initializes split progress on CREATE, and adds advanceSplitsIfNeed() that pauses job on failure.
fe/fe-common/src/main/java/org/apache/doris/job/cdc/request/FetchTableSplitsRequest.java Adds nextSplitStart, nextSplitId, and batchSize fields to support resumable batched split fetching.
Comments suppressed due to low confidence (1)

fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcTvfSourceOffsetProvider.java:320

  • replayIfNeed() comment still says snapshot splits in the meta table are "written by initOnCreate", but initOnCreate() is now an intentional no-op and meta writes come from scheduler-driven advanceSplits()/upsertChunkList. Updating this comment will avoid confusion when debugging TVF recovery behavior.
        // Re-init transient split progress fields lost across FE restart.
        // syncTables itself is persisted on StreamingInsertJob; cdcSplitProgress is rebuilt empty
        // here and advanceSplits will resume from the system table on next tick.
        if (cdcSplitProgress == null) {
            initSplitProgress(job.getSyncTables());
        }
        if (currentOffset == null) {
            // No committed txn yet. If snapshot splits exist in the meta table (written by
            // initOnCreate), restore remainingSplits so getNextOffset() returns snapshot splits
            // instead of a BinlogSplit (which would incorrectly skip the snapshot phase).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JNSimba JNSimba requested a review from Copilot May 11, 2026 06:06
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 11, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one additional blocking issue beyond the existing review threads.

Critical checkpoint conclusions:

  • Goal/test coverage: the PR moves CDC chunk splitting from CREATE-time blocking to scheduler-driven async fetching and adds focused unit tests, but the TVF FE-restart path is not covered and can stop fetching the remaining chunks.
  • Scope: the change is focused on async split progress, though it touches both FROM-TO and TVF paths.
  • Concurrency: existing threads already cover the broad splitsLock/RPC concern; I did not add a duplicate. The new issue is lifecycle/replay state, not a new lock-order finding.
  • Lifecycle/replay: blocking issue found for cdc_stream TVF restart because transient cachedSyncTables is not restored.
  • Configuration/compatibility: no new config or wire-incompatible persistent format issue found beyond the added optional request fields.
  • Parallel paths: FROM-TO replay initializes split progress, but the TVF override does not do the equivalent unconditionally.
  • Testing: unit tests cover the base provider state machine, but miss TVF replay after FE restart. No end-to-end regression result is included in this PR.
  • Observability/performance/transactions: no additional distinct issue found beyond existing review threads.
  • User focus: no additional user-provided review focus was supplied.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

@JNSimba JNSimba marked this pull request as ready for review May 12, 2026 11:07
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 12, 2026

run buildall

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 12, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full PR diff and existing review threads. I found two additional correctness issues that are not duplicates of the existing comments.

Critical checkpoint conclusions:

  • Goal/test: The PR moves CDC snapshot split generation to scheduler-driven batches and adds tests for the base async state machine, but the cdc_stream TVF path still has an untested concurrent update path and the split-key type cache lacks collision coverage.
  • Scope/focus: The implementation is mostly focused, but it now routes both non-TVF and TVF CDC through shared async splitting state, which exposes subclass state-management gaps.
  • Concurrency: Blocking issue found. JdbcTvfSourceOffsetProvider.updateOffset() mutates inherited split lists/maps without splitsLock while scheduler-driven advanceSplits() and inherited readers use that lock.
  • Lifecycle/restart: Existing review threads already cover TVF restart/cache restoration concerns; I did not duplicate them.
  • Config: New CDC RPC/batch configs are present with descriptions and are read dynamically.
  • Compatibility/protocol: FE now sends batchSize and resume fields; existing threads already cover composite-boundary contract concerns.
  • Parallel paths: MySQL and generic JDBC split fetching were both updated; the cache-key collision issue affects the generic JDBC path and should be considered for MySQL if chunk key/database identity can vary.
  • Tests: Added tests cover base provider state, but no TVF concurrency/restart continuation test or cache-collision test covers the issues below.
  • Observability: Logs were added around split fetch and progress; no additional blocker found.
  • Persistence/transactions: Snapshot progress is persisted through the meta table/EditLog or txn attachment depending on path; existing unresolved threads cover restart truncation risks.
  • Performance: Existing review threads already cover long lock holds and repeated sorting; I did not duplicate them.

User focus: no additional user-provided review focus was supplied.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 12, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 42.86% (111/259) 🎉
Increment coverage report
Complete coverage report

@JNSimba JNSimba requested a review from Copilot May 12, 2026 12:56
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 12, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review result: request changes. I did not add new inline comments because the blocking findings I confirmed are already covered by existing review threads and should not be duplicated.

Critical checkpoint conclusions:

  • Goal/test proof: The PR aims to move CDC snapshot split generation to scheduler-driven batches and add resume/restart coverage. Tests were added, but current code still has unresolved correctness and concurrency concerns already raised in inline threads, including snapshot truncation/races and split-fetch batching behavior.
  • Scope/focus: The feature is focused, but the change is broad across FE scheduling, persisted split metadata, CDC client split generation, and TVF recovery; several edge paths still need tightening before merge.
  • Concurrency: Not OK. Existing threads cover split-state access/locking issues and long blocking RPC/internal SQL work while holding splitsLock. These can delay task commits/scheduling or race with TVF updateOffset paths.
  • Lifecycle/restart: Not OK. Existing TVF restart comments cover mid-snapshot replay/resume hazards where transient split progress/table cache reconstruction can truncate snapshot progress after restored chunks are consumed.
  • Configuration: A new dynamic FE batch-size config is read at split-fetch time, which is appropriate, but existing comments note the batch contract/enforcement is still incomplete in CDC client paths.
  • Compatibility/storage format: No new external storage format incompatibility found beyond the new persisted split-progress fields, but replay behavior must be correct across FE restart before this is safe.
  • Parallel paths: Not fully OK. MySQL, generic JDBC/Postgres, non-TVF, and cdc_stream TVF paths all need consistent batch limits, startup-mode handling, and resume semantics; existing threads identify gaps.
  • Conditional checks/error handling: Some existing concerns remain around splitId parsing, composite split boundaries, SQL literal construction, and error clarity.
  • Test coverage/results: Regression and unit tests were added, but coverage does not yet prove the unresolved edge cases from existing threads are fixed; no tests were run by this review.
  • Observability: Logging was added for split fetch/progress, sufficient for basic diagnosis, but correctness blockers remain.
  • Transaction/persistence/data correctness: Not OK until the existing replay, upsert, and split-progress concerns are resolved; these can affect snapshot completeness/duplicates after restart or retries.
  • FE/BE variable passing: The new batchSize field is passed from FE now, but existing comments show downstream enforcement still needs work.
  • Performance: Existing comments cover heavy work under splitsLock and repeated full-list sorting/materialization; these are relevant for large split counts.

User focus: No additional user-provided review focus was supplied.

Existing inline review threads should be addressed rather than duplicated here, especially the split-state concurrency/locking, TVF restart recovery, SQL upsert safety, batch-size enforcement, split-key/composite-boundary handling, and split metadata determinism issues.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

…plits and sink resolveSplitKeyClass to PG reader
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31292 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2918b239599ffdc4159d6fe0a8e712a7be534731, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17653	4021	4020	4020
q2	q3	10772	1386	795	795
q4	4681	473	353	353
q5	7718	2256	2076	2076
q6	238	172	139	139
q7	951	766	626	626
q8	9368	1785	1582	1582
q9	5146	4966	4895	4895
q10	6370	2078	1767	1767
q11	427	280	241	241
q12	627	430	294	294
q13	18144	3343	2752	2752
q14	261	259	236	236
q15	q16	816	780	705	705
q17	932	977	866	866
q18	6968	5908	5590	5590
q19	1833	1352	1113	1113
q20	668	445	295	295
q21	6087	2850	2628	2628
q22	585	376	319	319
Total cold run time: 100245 ms
Total hot run time: 31292 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4870	4802	4709	4709
q2	q3	4830	5217	4585	4585
q4	2105	2207	1402	1402
q5	4903	4684	4655	4655
q6	240	177	127	127
q7	1863	1747	1515	1515
q8	2356	2109	2057	2057
q9	7702	7213	7254	7213
q10	4471	4391	3970	3970
q11	520	372	347	347
q12	705	727	509	509
q13	2994	3407	2769	2769
q14	264	277	255	255
q15	q16	669	693	607	607
q17	1262	1232	1236	1232
q18	7526	6788	6843	6788
q19	1120	1137	1110	1110
q20	2216	2215	1925	1925
q21	5282	4633	4517	4517
q22	528	445	402	402
Total cold run time: 56426 ms
Total hot run time: 50694 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 168963 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2918b239599ffdc4159d6fe0a8e712a7be534731, data reload: false

query5	4331	666	513	513
query6	317	215	206	206
query7	4291	553	293	293
query8	336	229	222	222
query9	8838	4039	4065	4039
query10	440	338	313	313
query11	5764	2381	2212	2212
query12	186	133	128	128
query13	1296	599	444	444
query14	5973	5306	5013	5013
query14_1	4329	4322	4312	4312
query15	204	206	181	181
query16	1005	458	430	430
query17	1166	715	578	578
query18	2468	490	353	353
query19	222	200	169	169
query20	137	131	129	129
query21	218	138	114	114
query22	13635	13557	13368	13368
query23	17272	16471	16034	16034
query23_1	16171	16082	16174	16082
query24	7635	1767	1291	1291
query24_1	1320	1304	1305	1304
query25	580	511	442	442
query26	1315	331	177	177
query27	2712	550	342	342
query28	4463	1963	1952	1952
query29	1024	640	511	511
query30	325	241	205	205
query31	1117	1080	944	944
query32	91	77	76	76
query33	555	362	306	306
query34	1173	1152	647	647
query35	768	803	675	675
query36	1350	1318	1192	1192
query37	157	108	96	96
query38	3220	3128	3021	3021
query39	922	940	915	915
query39_1	889	894	884	884
query40	243	158	126	126
query41	73	73	68	68
query42	114	111	111	111
query43	325	324	287	287
query44	
query45	218	205	202	202
query46	1047	1185	737	737
query47	2357	2369	2200	2200
query48	412	422	300	300
query49	658	510	408	408
query50	970	358	252	252
query51	4294	4262	4218	4218
query52	107	106	96	96
query53	274	296	202	202
query54	330	284	286	284
query55	100	94	89	89
query56	303	318	331	318
query57	1431	1401	1291	1291
query58	303	284	285	284
query59	1638	1614	1422	1422
query60	340	375	303	303
query61	154	152	154	152
query62	669	622	558	558
query63	235	198	208	198
query64	2388	813	619	619
query65	
query66	1721	476	342	342
query67	30098	30018	29294	29294
query68	
query69	462	339	296	296
query70	1039	1032	987	987
query71	305	278	277	277
query72	2962	2687	2377	2377
query73	842	736	436	436
query74	5079	4945	4727	4727
query75	2665	2618	2236	2236
query76	2300	1114	766	766
query77	396	413	335	335
query78	12181	12142	11551	11551
query79	1473	1003	782	782
query80	661	537	450	450
query81	458	279	247	247
query82	1390	164	129	129
query83	362	275	254	254
query84	263	138	111	111
query85	868	580	446	446
query86	408	334	300	300
query87	3436	3376	3195	3195
query88	3503	2621	2614	2614
query89	442	389	335	335
query90	1995	183	181	181
query91	177	171	153	153
query92	79	76	75	75
query93	1468	1403	876	876
query94	556	348	315	315
query95	678	384	426	384
query96	1004	803	343	343
query97	2697	2663	2575	2575
query98	238	234	241	234
query99	1118	1109	962	962
Total cold run time: 253570 ms
Total hot run time: 168963 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 69.77% (247/354) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 69.77% (247/354) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 15, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one additional blocking issue beyond the existing review threads: async split resume casts the persisted split boundary with Jackson convertValue directly, which does not preserve the temporal conversion behavior already used when reconstructing snapshot splits for execution.

Critical checkpoint conclusions: the async split goal is mostly covered by the new scheduler-driven design and tests, but temporal split-key resume is not proven and can fail after the first batch; the change is focused but still touches high-risk split lifecycle paths; concurrency issues around split state already have existing review threads and I did not duplicate them; lifecycle/replay paths were reviewed, with this new issue affecting mid-table resume after FE/CDC JSON round-trip; the new mutable batch-size config is covered by an existing review thread; no storage-format compatibility issue was found in this pass; parallel MySQL and generic JDBC paths both need the same temporal-bound fix; regression coverage is broad but does not cover temporal chunk-key async resume; observability is adequate for this issue because the failure would surface as fetchSplits failure but the root cause would be hard to diagnose from tests; no BE transaction/version/delete-bitmap concerns apply; no additional user focus was provided.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 15, 2026

/review

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 15, 2026

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: Requesting changes for one additional correctness issue found in the current head. The existing review threads already cover the other major async split concerns, including temporal boundary conversion, TVF replay cursor restoration, invalid batch size, long lock holds, meta-table SQL safety, and duplicate snapshot reuse after a committed snapshot offset.

Critical checkpoint conclusions:

  • Goal/test coverage: The PR aims to make CDC snapshot splitting asynchronous and adds regression/unit coverage, but the empty-first-batch path can still start binlog consumption before snapshot splits exist, so the goal is not fully met.
  • Scope/focus: The change is focused on async CDC split state, but the new scheduler/provider interaction needs one more guard.
  • Concurrency: The PR improves several split-list locks; no new distinct lock-order issue was found beyond existing review threads.
  • Lifecycle/restart: Restart handling is heavily touched; no additional restart issue beyond the existing threads and the empty-first-batch dispatch issue was identified.
  • Configuration: The dynamic batch-size validation issue is already covered by an existing thread; no additional config issue found.
  • Compatibility/protocol: No additional FE-BE protocol compatibility issue found beyond existing split boundary contract threads.
  • Parallel paths: MySQL/JDBC/TVF paths were checked; no additional uncopied path issue found beyond existing threads.
  • Tests/results: Tests were reviewed at code level; no additional test-only blocker beyond existing sampler-race threads.
  • Observability/performance: Existing threads already cover the main performance/observability risks; no additional blocking issue found.
  • Data correctness: The new issue can skip the initial snapshot and lose/duplicate CDC data when the first async split fetch returns no splits.

User focus: No additional user-provided review focus was supplied.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31183 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 58ebbd61708f63480c6839fd9662d1c747f7fa06, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17799	3923	3833	3833
q2	q3	10748	1386	787	787
q4	4682	469	342	342
q5	7600	2226	2096	2096
q6	331	177	141	141
q7	945	764	625	625
q8	9345	1743	1565	1565
q9	6968	4868	5037	4868
q10	6452	2079	1784	1784
q11	430	271	240	240
q12	692	424	302	302
q13	18194	3403	2783	2783
q14	261	255	232	232
q15	q16	814	785	699	699
q17	848	901	852	852
q18	6775	5829	5552	5552
q19	1216	1218	1189	1189
q20	538	425	279	279
q21	6029	2774	2696	2696
q22	453	366	318	318
Total cold run time: 101120 ms
Total hot run time: 31183 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4608	4485	4669	4485
q2	q3	4784	5184	4585	4585
q4	2138	2342	1391	1391
q5	4844	4718	4579	4579
q6	232	189	133	133
q7	1857	1746	1470	1470
q8	2378	1878	1897	1878
q9	7167	7194	7040	7040
q10	4447	4367	3966	3966
q11	518	375	346	346
q12	700	713	508	508
q13	2963	3420	2801	2801
q14	272	278	249	249
q15	q16	676	737	604	604
q17	1250	1217	1211	1211
q18	7306	6906	6637	6637
q19	1109	1071	1110	1071
q20	2201	2196	1931	1931
q21	5255	4564	4431	4431
q22	519	468	451	451
Total cold run time: 55224 ms
Total hot run time: 49767 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169024 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 58ebbd61708f63480c6839fd9662d1c747f7fa06, data reload: false

query5	4330	640	517	517
query6	325	216	200	200
query7	4264	526	299	299
query8	321	228	230	228
query9	8834	3959	3959	3959
query10	435	363	296	296
query11	5782	2424	2164	2164
query12	194	126	123	123
query13	1291	629	444	444
query14	5980	5337	5001	5001
query14_1	4326	4316	4286	4286
query15	209	198	181	181
query16	1007	448	411	411
query17	1135	720	589	589
query18	2453	484	346	346
query19	223	203	158	158
query20	138	134	130	130
query21	213	137	120	120
query22	13655	13621	13355	13355
query23	17218	16274	16060	16060
query23_1	16121	16133	16187	16133
query24	7521	1730	1317	1317
query24_1	1322	1305	1321	1305
query25	592	493	444	444
query26	1313	329	176	176
query27	2695	562	343	343
query28	4496	1967	1942	1942
query29	979	641	508	508
query30	312	246	202	202
query31	1114	1081	931	931
query32	87	75	71	71
query33	547	360	301	301
query34	1181	1134	657	657
query35	758	785	687	687
query36	1308	1312	1196	1196
query37	157	106	90	90
query38	3206	3186	3063	3063
query39	922	912	885	885
query39_1	879	881	891	881
query40	242	151	129	129
query41	73	68	67	67
query42	112	114	111	111
query43	317	324	286	286
query44	
query45	212	211	214	211
query46	1140	1177	725	725
query47	2262	2329	2176	2176
query48	410	415	292	292
query49	648	498	405	405
query50	963	344	255	255
query51	4319	4275	4229	4229
query52	105	105	99	99
query53	258	295	213	213
query54	342	284	274	274
query55	94	90	88	88
query56	310	324	316	316
query57	1407	1362	1332	1332
query58	306	289	279	279
query59	1587	1610	1414	1414
query60	327	326	317	317
query61	179	202	159	159
query62	662	616	560	560
query63	249	201	204	201
query64	2405	795	637	637
query65	
query66	1757	502	350	350
query67	30054	29980	29852	29852
query68	
query69	471	335	321	321
query70	976	1005	896	896
query71	313	276	270	270
query72	2991	2874	2361	2361
query73	846	794	430	430
query74	5076	4926	4727	4727
query75	2655	2586	2262	2262
query76	2345	1115	766	766
query77	403	403	326	326
query78	12029	12123	11684	11684
query79	1426	1014	738	738
query80	660	530	458	458
query81	452	275	251	251
query82	1382	164	126	126
query83	373	279	254	254
query84	250	142	112	112
query85	880	531	483	483
query86	384	322	330	322
query87	3396	3328	3218	3218
query88	3511	2654	2670	2654
query89	441	396	335	335
query90	1958	182	177	177
query91	178	182	142	142
query92	75	79	72	72
query93	1502	1442	855	855
query94	547	349	291	291
query95	683	376	444	376
query96	1112	757	345	345
query97	2713	2682	2555	2555
query98	252	226	227	226
query99	1120	1085	965	965
Total cold run time: 252545 ms
Total hot run time: 169024 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 39.56% (127/321) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 2.49% (8/321) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 15, 2026

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 2.49% (8/321) 🎉
Increment coverage report
Complete coverage report

…t to avoid concurrent FE config races in async-split uneven cases
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 18, 2026

run buildall

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 18, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completed a full pass over the PR diff and the existing review context. I am not adding duplicate inline comments because the remaining blocking concerns I found are already covered by existing threads.

Critical checkpoint conclusions:

  • Goal/test coverage: The PR aims to make CDC snapshot split generation asynchronous and adds unit/regression coverage, including restart cases. The tests help, but existing review threads still identify uncovered failure modes around scheduler gaps, replay, invalid config, and meta persistence.
  • Scope/focus: The change is focused on CDC async splitting, but it is large and touches FE scheduling, persistence, cdc_client split generation, and regression infrastructure.
  • Concurrency: Split state locking has been improved in current code, but existing threads still cover scheduler/task timing windows and TVF/base split-state races that need resolution/confirmation.
  • Lifecycle/replay: FE restart and checkpoint replay are central to this PR. Existing replay comments around binlog restore, TVF restore, and meta-table cursor restoration remain blocking until addressed or convincingly resolved.
  • Configuration: streaming_cdc_fetch_splits_batch_size is mutable and still needs positive-value validation/clamping per the existing thread.
  • Compatibility/protocol: The new fetch-splits cursor protocol depends on split-id and boundary formats. Existing threads cover composite-boundary and cache-key/type-resume risks.
  • Persistence/transactions: The async split meta table is now part of durable scheduling state; existing comments around SQL construction, row identity, and durable upsert semantics remain important blockers.
  • Performance/observability: The current implementation reduces long lock holds compared with earlier code, but batch sizing and empty-batch behavior still need attention as already noted.
  • Test result review: Added regression tests are relevant, but existing comments identify races in sampler-based assertions and restart timing cases.

User focus: No additional user-provided review focus was specified.

Overall opinion: request changes until the already-open correctness threads are resolved; I found no additional distinct issues that should be raised as new inline comments.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 70.23% (243/346) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31418 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 02b1c7f06617066f97906552b942a3376c83304b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17684	3901	3955	3901
q2	q3	10760	1348	813	813
q4	4684	478	345	345
q5	7581	2231	2088	2088
q6	241	176	141	141
q7	953	782	637	637
q8	9376	1655	1624	1624
q9	5171	4936	4907	4907
q10	6405	2045	1778	1778
q11	435	277	248	248
q12	629	427	300	300
q13	18143	3403	2736	2736
q14	268	264	238	238
q15	q16	834	767	712	712
q17	961	828	872	828
q18	7066	5872	5582	5582
q19	1322	1374	1067	1067
q20	513	400	410	400
q21	6283	2853	2745	2745
q22	469	375	328	328
Total cold run time: 99778 ms
Total hot run time: 31418 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4784	4618	4523	4523
q2	q3	4825	5305	4576	4576
q4	2179	2182	1399	1399
q5	4879	4590	4605	4590
q6	235	184	146	146
q7	1972	1726	1508	1508
q8	2429	2035	2116	2035
q9	7710	7294	7153	7153
q10	4460	4377	3979	3979
q11	522	376	357	357
q12	708	729	505	505
q13	3002	3364	2844	2844
q14	275	278	255	255
q15	q16	682	695	606	606
q17	1246	1225	1232	1225
q18	7238	6932	6883	6883
q19	1119	1094	1108	1094
q20	2213	2200	1923	1923
q21	5293	4602	4464	4464
q22	517	456	427	427
Total cold run time: 56288 ms
Total hot run time: 50492 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170138 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 02b1c7f06617066f97906552b942a3376c83304b, data reload: false

query5	4306	652	523	523
query6	351	228	205	205
query7	4240	551	295	295
query8	332	259	267	259
query9	8859	4016	3972	3972
query10	457	345	305	305
query11	5764	2401	2199	2199
query12	182	129	125	125
query13	1277	628	434	434
query14	6041	5363	5058	5058
query14_1	4371	4375	4345	4345
query15	219	207	181	181
query16	1062	490	440	440
query17	1164	742	605	605
query18	2728	510	361	361
query19	223	207	173	173
query20	140	135	133	133
query21	218	142	117	117
query22	13596	13567	13414	13414
query23	17291	16438	16104	16104
query23_1	16187	16192	16171	16171
query24	7338	1767	1279	1279
query24_1	1337	1305	1314	1305
query25	561	501	451	451
query26	1302	332	172	172
query27	2667	569	341	341
query28	4485	1943	1954	1943
query29	1051	655	527	527
query30	305	236	205	205
query31	1133	1059	941	941
query32	97	76	76	76
query33	554	363	308	308
query34	1193	1176	637	637
query35	754	784	672	672
query36	1319	1347	1189	1189
query37	147	104	107	104
query38	3244	3112	3017	3017
query39	930	940	894	894
query39_1	889	896	871	871
query40	223	146	126	126
query41	65	64	63	63
query42	111	109	109	109
query43	320	321	279	279
query44	
query45	208	205	196	196
query46	1057	1220	742	742
query47	2310	2334	2242	2242
query48	415	407	288	288
query49	631	494	386	386
query50	993	352	258	258
query51	4304	4304	4263	4263
query52	104	105	94	94
query53	259	277	223	223
query54	304	268	247	247
query55	90	90	90	90
query56	298	320	302	302
query57	1394	1408	1324	1324
query58	305	273	264	264
query59	1527	1639	1363	1363
query60	323	326	333	326
query61	157	165	154	154
query62	672	626	552	552
query63	243	205	207	205
query64	2407	796	638	638
query65	
query66	1687	510	362	362
query67	29323	29949	29764	29764
query68	
query69	456	352	308	308
query70	1001	973	987	973
query71	311	274	265	265
query72	3073	2731	2411	2411
query73	819	763	428	428
query74	5111	4883	4726	4726
query75	2675	2633	2256	2256
query76	2296	1118	736	736
query77	398	402	324	324
query78	12274	12154	11648	11648
query79	1449	1049	766	766
query80	657	536	463	463
query81	465	288	240	240
query82	1362	159	122	122
query83	359	277	249	249
query84	258	143	109	109
query85	873	548	454	454
query86	401	327	298	298
query87	3463	3394	3237	3237
query88	3610	2669	2657	2657
query89	457	393	341	341
query90	1994	187	175	175
query91	186	163	138	138
query92	76	80	74	74
query93	1484	1450	832	832
query94	528	372	323	323
query95	675	397	367	367
query96	1051	833	331	331
query97	2695	2690	2542	2542
query98	240	228	229	228
query99	1102	1135	983	983
Total cold run time: 252853 ms
Total hot run time: 170138 ms

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 18, 2026

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 55.35% (243/439) 🎉
Increment coverage report
Complete coverage report

@JNSimba JNSimba removed the dev/4.1.x label May 19, 2026
Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JNSimba JNSimba merged commit be49e67 into apache:master May 19, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants