Skip to content

[fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns#63118

Open
wuguowei1994 wants to merge 2 commits into
apache:masterfrom
wuguowei1994:fix-variant-inverted-index-cast
Open

[fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns#63118
wuguowei1994 wants to merge 2 commits into
apache:masterfrom
wuguowei1994:fix-variant-inverted-index-cast

Conversation

@wuguowei1994
Copy link
Copy Markdown

Summary

On the current master branch, inverted index predicate pushdown does not work correctly when querying VARIANT fields with explicit CAST.

This is a serious issue because it is not limited to a single target type. In our testing, predicates in the form of CAST(variant_field["key"] AS <type>) = ... fail to leverage the inverted index properly across casted VARIANT access patterns.

This is especially problematic because the recommended usage for VARIANT fields is to explicitly use CAST when extracting typed values. In our internal production workloads, VARIANT is heavily used, and all business teams are required to query VARIANT subfields through explicit CAST. As a result, these queries cannot benefit from inverted index filtering and end up scanning significantly more rows than expected, causing severe performance degradation.

For production workloads with large VARIANT columns, this effectively makes the inverted index unusable for the officially recommended query pattern, which has a major impact on query latency and resource consumption.


Reproduction

DROP TABLE IF EXISTS variant_inverted_intkey_test;

CREATE TABLE variant_inverted_intkey_test (
    row_id BIGINT,
    v VARIANT,
    INDEX idx_v(v) USING INVERTED
)
ENGINE=OLAP
DUPLICATE KEY(row_id)
DISTRIBUTED BY HASH(row_id) BUCKETS 1
PROPERTIES (
    "replication_num" = "1",
    "disable_auto_compaction" = "true",
    "inverted_index_storage_format" = "v2"
);

INSERT INTO variant_inverted_intkey_test VALUES
(1,  '{"int_key": 1}'),
(2,  '{"int_key": 2}'),
(3,  '{"int_key": 3}'),
(4,  '{"int_key": 4}'),
(5,  '{"int_key": 5}'),
(6,  '{"int_key": 6}'),
(7,  '{"int_key": 7}'),
(8,  '{"int_key": 8}'),
(9,  '{"int_key": 9}'),
(10, '{"int_key": 10}'),
(11, '{"int_key": 11}'),
(12, '{"int_key": 12}'),
(13, '{"int_key": 13}'),
(14, '{"int_key": 14}'),
(15, '{"int_key": 15}'),
(16, '{"int_key": 16}'),
(17, '{"int_key": 17}'),
(18, '{"int_key": 18}'),
(19, '{"int_key": 19}'),
(20, '{"int_key": 20}');

SELECT row_id, CAST(v["int_key"] AS INT) AS int_key
FROM variant_inverted_intkey_test
WHERE CAST(v["int_key"] AS INT) = 13;

Expected Behavior

The predicate:

CAST(v["int_key"] AS INT) = 13

should be pushed down to the inverted index on the VARIANT column, and the query should use the inverted index to filter rows before data scanning.

Only the matching row should need to be read after index filtering.


Actual Behavior

The query result is correct, but the query profile shows that the inverted index does not effectively filter the data.

Instead of being pruned by the inverted index, all 20 rows are still read/scanned. This indicates that the predicate involving CAST on the VARIANT subfield is not correctly handled by inverted index predicate pushdown.

Please check the query profile after running the reproduction SQL. The key point is that the inverted index does not successfully reduce the scanned rows for the casted VARIANT predicate.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wuguowei1994 wuguowei1994 changed the title [fix](variant) VARIANT Inverted Index Predicate Pushdown Bug [fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns May 10, 2026
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from e75111a to 904d4c0 Compare May 10, 2026 13:04
@eldenmoon
Copy link
Copy Markdown
Member

run buildall

@eldenmoon
Copy link
Copy Markdown
Member

/review

Copy link
Copy Markdown
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a correctness blocker in the relaxed variant predicate compatibility check. The current regression covers only the same-width CAST(... AS INT) case, but this change also enables cross-width integer casts and same-family string casts without normalizing the predicate value to the segment storage encoding.

Comment thread be/src/storage/segment/segment.h Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29643 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 904d4c0549574f24d129b8dfb7f4d588b645f43e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17613	3976	3964	3964
q2	q3	10719	948	611	611
q4	4659	460	356	356
q5	7447	1381	1137	1137
q6	195	179	140	140
q7	930	946	749	749
q8	9315	1364	1281	1281
q9	5637	5430	5335	5335
q10	6315	2098	1831	1831
q11	472	266	253	253
q12	651	415	288	288
q13	18162	3433	2740	2740
q14	290	284	262	262
q15	q16	913	878	785	785
q17	986	1108	770	770
q18	6513	5674	5598	5598
q19	1166	1286	1106	1106
q20	533	401	281	281
q21	4546	2297	1850	1850
q22	422	357	306	306
Total cold run time: 97484 ms
Total hot run time: 29643 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4184	4184	4127	4127
q2	q3	4620	4749	4177	4177
q4	2087	2175	1386	1386
q5	4975	4918	5209	4918
q6	188	165	133	133
q7	2022	1778	2016	1778
q8	3577	3321	3274	3274
q9	8454	8571	8576	8571
q10	4657	4591	4280	4280
q11	608	440	406	406
q12	698	771	519	519
q13	3522	3640	2891	2891
q14	298	304	291	291
q15	q16	805	805	676	676
q17	1369	1336	1284	1284
q18	7936	7102	7097	7097
q19	1202	1188	1163	1163
q20	2252	2211	1960	1960
q21	6142	5446	5411	5411
q22	717	548	415	415
Total cold run time: 60313 ms
Total hot run time: 54757 ms

@wuguowei1994
Copy link
Copy Markdown
Author

wuguowei1994 commented May 11, 2026

@eldenmoon

Thank you very much for the patient and detailed feedback.

I have reconsidered the scope and decided to address the type compatibility issue more comprehensively in this PR.

The implementation has been updated to:

  • Safely support exact type matches (after removing nullable wrappers) — this fixes the immediate production issue.
  • Properly handle type widening / conversion scenarios (e.g. TINYINT/SMALLINT/INT stored value vs BIGINT cast, same string family conversions, etc.) by normalizing the predicate value to match the storage encoding before pushing down to the inverted index.

This ensures both correctness and broader usability without introducing unsafe behavior.

I have also strengthened the regression tests:

  • Positive cases for both exact match and safe widening conversions.
  • Negative test cases to guard against truly incompatible type conversions.

This change makes the CAST predicate pushdown on VARIANT subcolumns work reliably in the common usage patterns recommended for VARIANT.

PS:
I checked the two failing CI jobs — they do not appear to be related to this change. Could you please help rerun them, or skip them if appropriate?

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170611 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 904d4c0549574f24d129b8dfb7f4d588b645f43e, data reload: false

query5	4333	652	535	535
query6	344	234	212	212
query7	4248	572	311	311
query8	340	250	222	222
query9	8867	4101	4098	4098
query10	463	353	302	302
query11	5831	2402	2183	2183
query12	185	139	129	129
query13	1280	608	416	416
query14	6069	5383	5072	5072
query14_1	4390	4370	4383	4370
query15	231	211	182	182
query16	1036	441	480	441
query17	1165	792	665	665
query18	2752	498	367	367
query19	235	209	181	181
query20	147	134	133	133
query21	217	139	119	119
query22	13635	14021	14420	14021
query23	17436	16492	16296	16296
query23_1	16347	16315	16302	16302
query24	7437	1819	1322	1322
query24_1	1357	1325	1357	1325
query25	565	476	429	429
query26	1310	318	174	174
query27	2689	599	344	344
query28	4308	1937	1948	1937
query29	999	616	519	519
query30	291	228	192	192
query31	1099	1039	934	934
query32	81	74	72	72
query33	543	334	288	288
query34	1190	1136	649	649
query35	753	792	663	663
query36	1306	1369	1204	1204
query37	148	105	87	87
query38	3209	3113	3056	3056
query39	970	911	902	902
query39_1	882	875	874	874
query40	232	149	136	136
query41	66	60	61	60
query42	107	105	106	105
query43	320	323	285	285
query44	
query45	210	202	185	185
query46	1059	1156	732	732
query47	2317	2284	2227	2227
query48	392	393	290	290
query49	631	535	432	432
query50	694	283	213	213
query51	4253	4208	4195	4195
query52	104	102	99	99
query53	243	278	204	204
query54	310	274	254	254
query55	91	89	85	85
query56	307	297	313	297
query57	1422	1376	1306	1306
query58	296	270	262	262
query59	1538	1555	1402	1402
query60	340	323	328	323
query61	167	159	168	159
query62	666	612	560	560
query63	239	195	206	195
query64	2404	808	684	684
query65	
query66	1689	502	398	398
query67	30071	29309	29172	29172
query68	
query69	468	334	302	302
query70	1028	1015	973	973
query71	323	280	270	270
query72	2922	2700	2478	2478
query73	851	758	446	446
query74	5033	4853	4755	4755
query75	2770	2655	2326	2326
query76	2296	1148	749	749
query77	404	425	349	349
query78	13015	13005	12308	12308
query79	1407	1029	761	761
query80	792	586	484	484
query81	467	276	235	235
query82	1300	160	127	127
query83	337	281	249	249
query84	256	143	110	110
query85	888	534	444	444
query86	428	334	310	310
query87	3441	3346	3234	3234
query88	3504	2652	2629	2629
query89	439	380	340	340
query90	1836	180	172	172
query91	189	166	142	142
query92	81	75	69	69
query93	973	958	549	549
query94	602	338	325	325
query95	690	384	343	343
query96	1076	802	338	338
query97	2698	2678	2549	2549
query98	245	229	228	228
query99	1127	1118	958	958
Total cold run time: 253780 ms
Total hot run time: 170611 ms

@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from 43fb305 to d4d86b8 Compare May 11, 2026 15:16
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from d4d86b8 to 6c2b533 Compare May 11, 2026 15:26
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the patch to avoid unsafe cross-width index encoding. I still think this needs changes because the latest exact-type rule no longer fixes the original CAST(v["int_key"] AS INT) scenario described in the PR body. Critical checkpoints: goal/test coverage is not satisfied because only same-width TINYINT pushdown is now proven while the documented INT reproduction remains non-pushdown; the code change is small and focused; no new concurrency, lifecycle, config, persistence, FE-BE protocol, or storage-format compatibility concerns were introduced; the main correctness risk is now an incomplete fix rather than wrong-result pushdown; observability is unchanged and adequate for this path through the existing debug/profile checks. User focus: no additional user-provided focus was specified.

} else {
return false;
}
auto normalized_storage_type = remove_nullable(storage_column_type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exact-type check avoids the unsafe cross-width index encoding that was raised earlier, but it also means the PR no longer fixes the reproduction in the PR body. For the inserted values 1..20, the new regression test itself documents the inferred storage type as TINYINT; when the query is the original CAST(v["int_key"] AS INT) = 13, target_cast_type_for_variants is INT, storage_column_type is TINYINT, and this returns false, so the common expression remains outside the inverted-index path and the scan still reads all rows. The added positive test was changed to CAST AS TINYINT, so it does not prove the stated CAST AS INT behavior. Please either implement a safe conversion of predicate values to the segment storage type with range/overflow checks, or narrow the PR/test expectations so they no longer claim to fix the INT cast case.

@wuguowei1994
Copy link
Copy Markdown
Author

@eldenmoon After reconsidering it, I believe we should strive for higher standards ourselves.

I’ve revised the approach described in the comment above. Please give me one week to come back with a better implementation.

@eldenmoon
Copy link
Copy Markdown
Member

currently only bigint in interger types will be infered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants