Skip to content

[fix](decimal) Fix incorrect decimal cast results for scientific-notation strings#63119

Merged
zclllyybb merged 1 commit into
apache:masterfrom
jacktengg:260510-fix-decimal
May 12, 2026
Merged

[fix](decimal) Fix incorrect decimal cast results for scientific-notation strings#63119
zclllyybb merged 1 commit into
apache:masterfrom
jacktengg:260510-fix-decimal

Conversation

@jacktengg
Copy link
Copy Markdown
Contributor

@jacktengg jacktengg commented May 10, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: Bug introduced by #60004

Fix incorrect string-to-decimal parsing for scientific notation.

Previously, decimal parsing could count exponent characters as part of the significand, causing
values like "1.4E+2" to be cast incorrectly as 14 instead of 140. It could also round very
small scientific-notation values incorrectly when the significant digit appeared after implicit
fractional zeros, such as "5e-17" for scale 15.

This change makes the parser track the end of the significand separately from the exponent,
applies exponent-based decimal-point shifting correctly, and only rounds when the next real
significand digit is the first discarded fractional digit.

Also add comments to help understand the code.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed. I did not find blocking issues in the actual PR diff, which is limited to decimal scientific-notation parsing and associated tests. The change keeps the parser scope focused, adds coverage for positive/negative exponents and rounding around the target scale, and preserves failure handling for malformed exponents.\n\nCritical checkpoint conclusions:\n- Goal/test coverage: Implements decimal scientific-notation parsing fixes and adds BE unit tests for representative parsing and rounding cases.\n- Scope: Small and focused on plus tests.\n- Concurrency/lifecycle/config/compatibility: Not applicable; no shared state, lifecycle, config, persistence, or wire/storage format changes in the actual PR patch.\n- Data correctness: Reviewed exponent decimal-point shifting, fractional padding, rounding carry, and overflow/underflow checks; no confirmed defect found.\n- Error handling: Malformed exponent suffixes still return parse failure; exponent overflow returns parse overflow.\n- Performance/memory: No new allocations or hot-path complexity beyond existing linear parsing; no MemTracker concern for this utility path.\n- User focus: No additional user-provided review focus was present.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed. I did not find blocking issues in the actual PR diff, which is limited to decimal scientific-notation parsing and associated tests. The change keeps the parser scope focused, adds coverage for positive/negative exponents and rounding around the target scale, and preserves failure handling for malformed exponents.

Critical checkpoint conclusions:

  • Goal/test coverage: Implements decimal scientific-notation parsing fixes and adds BE unit tests for representative parsing and rounding cases.
  • Scope: Small and focused on StringParser::string_to_decimal plus tests.
  • Concurrency/lifecycle/config/compatibility: Not applicable; no shared state, lifecycle, config, persistence, or wire/storage format changes in the actual PR patch.
  • Data correctness: Reviewed exponent decimal-point shifting, fractional padding, rounding carry, and overflow/underflow checks; no confirmed defect found.
  • Error handling: Malformed exponent suffixes still return parse failure; exponent overflow returns parse overflow.
  • Performance/memory: No new allocations or hot-path complexity beyond existing linear parsing; no MemTracker concern for this utility path.
  • User focus: No additional user-provided review focus was present.

@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

@jacktengg jacktengg force-pushed the 260510-fix-decimal branch from ad2828f to 3e39b7e Compare May 10, 2026 13:58
@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29251 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3e39b7e0840914b3f6f9a5d99651b2afbe59d215, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17759	4111	3932	3932
q2	q3	10725	865	608	608
q4	4666	452	349	349
q5	7469	1333	1133	1133
q6	184	175	145	145
q7	904	940	763	763
q8	9309	1414	1209	1209
q9	5612	5327	5302	5302
q10	6294	2071	1807	1807
q11	477	265	255	255
q12	689	412	301	301
q13	18213	3243	2715	2715
q14	304	283	263	263
q15	q16	893	865	793	793
q17	965	1002	673	673
q18	6456	5613	5509	5509
q19	1182	1197	1075	1075
q20	520	393	259	259
q21	5061	2267	1861	1861
q22	417	357	299	299
Total cold run time: 98099 ms
Total hot run time: 29251 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4173	4141	4141	4141
q2	q3	4610	4744	4200	4200
q4	2068	2192	1385	1385
q5	4954	4979	5287	4979
q6	185	167	134	134
q7	2039	1761	1875	1761
q8	3503	3174	3203	3174
q9	8419	8478	8460	8460
q10	4481	4490	4268	4268
q11	591	471	412	412
q12	722	765	517	517
q13	3248	3549	2993	2993
q14	302	308	277	277
q15	q16	763	810	684	684
q17	1328	1311	1415	1311
q18	7960	7183	7174	7174
q19	1151	1162	1160	1160
q20	2231	2289	1964	1964
q21	6137	5417	4848	4848
q22	555	514	435	435
Total cold run time: 59420 ms
Total hot run time: 54277 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170689 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3e39b7e0840914b3f6f9a5d99651b2afbe59d215, data reload: false

query5	4351	657	532	532
query6	321	230	207	207
query7	4299	545	321	321
query8	359	237	215	215
query9	8844	4104	4074	4074
query10	490	356	306	306
query11	5779	2394	2244	2244
query12	185	137	130	130
query13	1321	623	452	452
query14	6101	5421	5207	5207
query14_1	4502	4551	4508	4508
query15	223	214	192	192
query16	1000	479	441	441
query17	1157	790	663	663
query18	2779	508	364	364
query19	229	213	184	184
query20	147	139	133	133
query21	221	142	118	118
query22	13617	13946	14552	13946
query23	17278	16430	16153	16153
query23_1	16391	16323	16262	16262
query24	7959	1818	1417	1417
query24_1	1388	1371	1372	1371
query25	561	479	433	433
query26	1294	330	174	174
query27	2661	576	348	348
query28	4355	1979	1935	1935
query29	989	642	524	524
query30	304	238	194	194
query31	1124	1066	939	939
query32	104	74	69	69
query33	538	343	286	286
query34	1150	1152	641	641
query35	767	781	688	688
query36	1329	1364	1116	1116
query37	190	104	87	87
query38	3162	3173	3052	3052
query39	910	908	920	908
query39_1	887	871	866	866
query40	239	155	133	133
query41	63	59	59	59
query42	108	109	107	107
query43	338	340	290	290
query44	
query45	211	204	196	196
query46	1105	1200	780	780
query47	2252	2258	2129	2129
query48	381	411	304	304
query49	634	525	413	413
query50	703	288	220	220
query51	4452	4267	4233	4233
query52	111	111	95	95
query53	258	288	214	214
query54	305	277	250	250
query55	91	89	83	83
query56	290	305	298	298
query57	1425	1357	1296	1296
query58	296	269	266	266
query59	1561	1680	1438	1438
query60	348	339	324	324
query61	158	152	152	152
query62	662	615	552	552
query63	250	204	209	204
query64	2397	811	711	711
query65	
query66	1732	522	391	391
query67	30066	29366	29211	29211
query68	
query69	478	346	313	313
query70	1055	1005	933	933
query71	311	285	272	272
query72	2949	2668	2397	2397
query73	814	786	393	393
query74	5093	4883	4717	4717
query75	2791	2667	2332	2332
query76	2288	1150	793	793
query77	430	459	349	349
query78	12706	12873	12279	12279
query79	1537	1071	763	763
query80	701	572	495	495
query81	449	286	245	245
query82	1379	162	121	121
query83	362	276	258	258
query84	255	144	111	111
query85	842	523	448	448
query86	404	349	329	329
query87	3415	3367	3228	3228
query88	3647	2716	2696	2696
query89	444	374	342	342
query90	1892	185	193	185
query91	178	166	141	141
query92	82	76	73	73
query93	979	944	576	576
query94	540	344	297	297
query95	646	467	340	340
query96	1010	773	346	346
query97	2706	2667	2536	2536
query98	246	232	240	232
query99	1102	1118	977	977
Total cold run time: 254235 ms
Total hot run time: 170689 ms

@jacktengg jacktengg force-pushed the 260510-fix-decimal branch from 3e39b7e to 8d5d793 Compare May 11, 2026 03:50
@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29671 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8d5d793c4a2f6f484bdd1145a870f1e22bf0ab3b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17755	3856	3798	3798
q2	q3	10714	911	618	618
q4	4657	498	346	346
q5	7464	1357	1172	1172
q6	187	174	143	143
q7	910	958	740	740
q8	9311	1431	1331	1331
q9	5672	5413	5330	5330
q10	6302	2092	1836	1836
q11	466	275	256	256
q12	669	419	300	300
q13	18166	3399	2767	2767
q14	292	286	262	262
q15	q16	913	871	797	797
q17	999	992	708	708
q18	6571	5625	5625	5625
q19	1174	1345	1087	1087
q20	502	401	276	276
q21	4950	2326	1922	1922
q22	482	373	357	357
Total cold run time: 98156 ms
Total hot run time: 29671 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4730	4463	4584	4463
q2	q3	4775	4818	4218	4218
q4	2244	2229	1444	1444
q5	5017	5066	5240	5066
q6	197	170	135	135
q7	2065	1809	1639	1639
q8	3360	3102	3103	3102
q9	8527	8657	8444	8444
q10	4575	4597	4311	4311
q11	676	462	417	417
q12	695	760	547	547
q13	3271	3636	2948	2948
q14	315	327	284	284
q15	q16	756	835	756	756
q17	1448	1372	1269	1269
q18	8207	7298	7313	7298
q19	1151	1152	1151	1151
q20	2301	2257	2020	2020
q21	6329	5719	5222	5222
q22	543	496	411	411
Total cold run time: 61182 ms
Total hot run time: 55145 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170073 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8d5d793c4a2f6f484bdd1145a870f1e22bf0ab3b, data reload: false

query5	4363	635	517	517
query6	329	218	197	197
query7	4226	540	302	302
query8	326	228	224	224
query9	8805	3985	3989	3985
query10	481	362	301	301
query11	5769	2424	2181	2181
query12	185	131	127	127
query13	1260	636	442	442
query14	6685	5369	5060	5060
query14_1	4356	4347	4397	4347
query15	206	204	178	178
query16	1006	445	445	445
query17	1114	742	611	611
query18	2711	473	353	353
query19	216	196	162	162
query20	141	130	128	128
query21	215	136	118	118
query22	13660	13516	13378	13378
query23	17236	16367	16015	16015
query23_1	16264	16165	16238	16165
query24	7385	1748	1336	1336
query24_1	1332	1347	1336	1336
query25	565	513	464	464
query26	1284	311	171	171
query27	2696	623	347	347
query28	4418	1953	1949	1949
query29	966	672	511	511
query30	309	241	197	197
query31	1147	1056	952	952
query32	90	74	71	71
query33	542	342	288	288
query34	1168	1106	623	623
query35	754	785	663	663
query36	1327	1338	1203	1203
query37	147	99	87	87
query38	3213	3137	3053	3053
query39	970	919	892	892
query39_1	883	887	862	862
query40	242	154	138	138
query41	64	61	61	61
query42	108	112	110	110
query43	324	325	281	281
query44	
query45	211	206	195	195
query46	1056	1212	744	744
query47	2372	2433	2237	2237
query48	409	410	301	301
query49	635	527	419	419
query50	679	288	210	210
query51	4325	4231	4152	4152
query52	104	110	95	95
query53	252	276	201	201
query54	309	274	260	260
query55	96	94	82	82
query56	292	320	300	300
query57	1434	1430	1328	1328
query58	306	269	270	269
query59	1523	1640	1409	1409
query60	342	330	316	316
query61	162	185	148	148
query62	674	628	568	568
query63	247	210	204	204
query64	2333	822	658	658
query65	
query66	1699	514	401	401
query67	29336	30280	29907	29907
query68	
query69	461	340	306	306
query70	1019	980	997	980
query71	301	276	270	270
query72	2974	2735	2685	2685
query73	840	746	407	407
query74	5055	4880	4751	4751
query75	2789	2690	2350	2350
query76	2299	1140	771	771
query77	418	434	353	353
query78	12969	12900	12439	12439
query79	1450	1072	747	747
query80	709	625	517	517
query81	459	284	245	245
query82	1356	155	126	126
query83	364	284	254	254
query84	264	150	122	122
query85	927	597	511	511
query86	375	346	326	326
query87	3464	3352	3226	3226
query88	3583	2656	2630	2630
query89	441	385	348	348
query90	1874	171	177	171
query91	183	165	138	138
query92	81	79	69	69
query93	958	952	569	569
query94	521	325	312	312
query95	654	466	339	339
query96	1030	787	333	333
query97	2730	2719	2577	2577
query98	246	248	224	224
query99	1150	1128	1020	1020
Total cold run time: 253142 ms
Total hot run time: 170073 ms

… very small scientific-notation values

Problems:
1. String-to-decimal casting counted exponent characters as significand digits, so values such as "1.4E+2" could miss the exponent scale and return 14 instead of 140.
2. String-to-decimal parsing rounded scientific-notation values up even when implicit zeros placed the significant digit beyond the first discarded decimal scale position.

Also add comments to help understand the code.
@jacktengg jacktengg force-pushed the 260510-fix-decimal branch from 8d5d793 to 706c35f Compare May 11, 2026 06:17
@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29361 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 706c35f73a36212792491586dedf5036d226dee5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17608	3868	3831	3831
q2	q3	10743	861	592	592
q4	4666	459	342	342
q5	7451	1349	1135	1135
q6	191	163	136	136
q7	903	935	754	754
q8	9434	1423	1215	1215
q9	5827	5386	5340	5340
q10	6323	2089	1807	1807
q11	466	267	255	255
q12	705	417	294	294
q13	18172	3287	2732	2732
q14	295	281	265	265
q15	q16	901	879	788	788
q17	986	1049	728	728
q18	6496	5737	5579	5579
q19	1258	1163	1033	1033
q20	504	407	270	270
q21	5029	2328	1920	1920
q22	476	379	345	345
Total cold run time: 98434 ms
Total hot run time: 29361 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4730	4511	4541	4511
q2	q3	4689	4770	4219	4219
q4	2126	2188	1403	1403
q5	4997	5046	5289	5046
q6	200	168	136	136
q7	2068	1786	1619	1619
q8	3352	3091	3105	3091
q9	8496	8594	8461	8461
q10	4462	4471	4231	4231
q11	603	428	410	410
q12	688	757	518	518
q13	3265	3596	2879	2879
q14	292	303	267	267
q15	q16	782	789	695	695
q17	1296	1353	1307	1307
q18	7950	7182	7081	7081
q19	1129	1178	1199	1178
q20	2211	2259	1954	1954
q21	6139	5474	4872	4872
q22	531	472	395	395
Total cold run time: 60006 ms
Total hot run time: 54273 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170875 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 706c35f73a36212792491586dedf5036d226dee5, data reload: false

query5	4304	649	524	524
query6	333	216	205	205
query7	4258	574	314	314
query8	335	245	237	237
query9	8876	4036	4036	4036
query10	454	348	313	313
query11	5790	2422	2218	2218
query12	176	133	131	131
query13	1280	624	450	450
query14	6495	5374	5081	5081
query14_1	4378	4353	4339	4339
query15	214	204	182	182
query16	1018	455	420	420
query17	1150	769	639	639
query18	2773	494	370	370
query19	223	210	171	171
query20	147	136	132	132
query21	215	144	129	129
query22	13544	13529	13371	13371
query23	17172	16401	16126	16126
query23_1	16242	16250	16192	16192
query24	7407	1745	1352	1352
query24_1	1349	1349	1350	1349
query25	558	488	442	442
query26	1295	312	173	173
query27	2740	615	340	340
query28	4417	1981	1930	1930
query29	965	653	515	515
query30	303	248	199	199
query31	1111	1073	939	939
query32	88	76	73	73
query33	539	351	288	288
query34	1140	1143	642	642
query35	772	783	670	670
query36	1321	1342	1177	1177
query37	152	105	90	90
query38	3205	3133	3055	3055
query39	935	943	894	894
query39_1	859	866	888	866
query40	233	160	136	136
query41	63	63	64	63
query42	110	110	109	109
query43	321	328	285	285
query44	
query45	213	217	195	195
query46	1015	1194	710	710
query47	2354	2448	2217	2217
query48	410	397	290	290
query49	621	530	412	412
query50	710	311	231	231
query51	4322	4252	4347	4252
query52	104	104	94	94
query53	244	267	199	199
query54	307	279	259	259
query55	93	91	86	86
query56	294	295	310	295
query57	1434	1391	1317	1317
query58	297	269	269	269
query59	1574	1707	1427	1427
query60	346	340	329	329
query61	161	157	170	157
query62	675	630	565	565
query63	244	210	212	210
query64	2440	880	739	739
query65	
query66	1764	502	401	401
query67	30190	29975	29796	29796
query68	
query69	440	336	303	303
query70	1002	932	961	932
query71	302	275	268	268
query72	3030	2709	2435	2435
query73	861	757	437	437
query74	5067	4912	4726	4726
query75	2751	2665	2324	2324
query76	2301	1121	778	778
query77	412	414	340	340
query78	12959	12965	12349	12349
query79	1547	987	750	750
query80	1310	592	493	493
query81	497	277	239	239
query82	1331	156	127	127
query83	349	276	249	249
query84	262	143	115	115
query85	929	512	468	468
query86	442	348	324	324
query87	3396	3343	3210	3210
query88	3508	2635	2670	2635
query89	453	381	339	339
query90	1812	184	178	178
query91	176	165	138	138
query92	77	82	76	76
query93	947	955	559	559
query94	668	347	298	298
query95	650	395	447	395
query96	1012	781	335	335
query97	2726	2703	2585	2585
query98	242	232	237	232
query99	1143	1123	1014	1014
Total cold run time: 254710 ms
Total hot run time: 170875 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.98% (27180/37763)
Line Coverage 55.21% (288711/522912)
Region Coverage 52.42% (240083/458021)
Branch Coverage 53.59% (103257/192670)

@jacktengg jacktengg changed the title [fix](decimal) Fix scientific string cast to decimal and rounding for very small scientific-notation values [fix](decimal) Fix incorrect decimal cast results for scientific-notation strings May 12, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 12, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@zclllyybb
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review opinion: approve. I did not find blocking correctness issues in this PR.

Checklist conclusions:

  • Goal and proof: The change targets string-to-decimal scientific notation parsing and rounding for very small values. The parser now keeps the significand end separate from exponent syntax and gates rounding on the first actually discarded fractional digit. Added BE tests cover positive exponent shifts, leading-dot input, zero exponent, negative exponent, and the small-value rounding boundary.
  • Scope: The code change is focused on ; the CRC expected-result updates are consistent with decimal parsing affecting the array decimal256 test data loaded from scientific notation CSV values.
  • Concurrency/lifecycle: No new concurrency, locks, threads, globals, or lifecycle ownership changes.
  • Configuration/compatibility: No new configs, persisted formats, thrift protocol, or rolling-upgrade compatibility concerns.
  • Parallel paths: is the shared BE parser used by decimal cast/serde paths, so the fix applies to the relevant decimal widths through the existing template instantiations.
  • Error handling/data correctness: Existing parse failure/overflow/underflow status paths are preserved; the reviewed edge cases keep exponent characters out of digit counts and avoid rounding when implicit zeros precede the first significand digit beyond scale.
  • Tests: Targeted tests were added. I attempted to run Target system: Linux; Target arch: x86_64
    Python 3.12.3
    Check JAVA_HOME version
    Apache Maven 3.9.14 (996c630dbc656c76214ce58821dcc58be960875b)
    Maven home: /usr/share/apache-maven-3.9.14
    Java version: 17.0.18, vendor: Eclipse Adoptium, runtime: /usr/lib/jvm/temurin-17-jdk-amd64
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "6.17.0-1010-azure", arch: "amd64", family: "unix"
    cmake version 3.31.6

CMake suite maintained and supported by Kitware (kitware.com/cmake).
ninja 1.13.2
Get params:
PARALLEL -- 1
CLEAN -- 0
ENABLE_PCH -- ON
WITH_TDE_DIR --

Build Backend UT
Update apache-orc submodule ...
Update clucene submodule ...
-- Make program: /usr/local/bin/ninja
-- Use ccache: and
-- Extra cxx flags:
-- GLIBC_COMPATIBILITY is ON
-- USE_LIBCPP is OFF
-- USE_JEMALLOC is OFF
-- USE_UNWIND is ON
-- ENABLE_PCH is ON
-- USE_AVX2 is ON
-- Build type is ASAN_UT
-- Build target arch is x86_64
-- DORIS_HOME is /home/runner/work/doris/doris
-- THIRDPARTY_DIR is /home/runner/work/doris/doris/thirdparty/installed
-- make test: ON
-- make benchmark: OFF
-- build fs benchmark tool: OFF
-- build task executor simulator: OFF
-- build file cache lru tool: OFF
-- build gensrc if necessary
make: Entering directory '/home/runner/work/doris/doris/gensrc'
make -C script
make[1]: Entering directory '/home/runner/work/doris/doris/gensrc/script'
/home/runner/work/doris/doris/gensrc/script/gen_build_version.sh
make[1]: Leaving directory '/home/runner/work/doris/doris/gensrc/script'
make -C proto
make[1]: Entering directory '/home/runner/work/doris/doris/gensrc/proto'
/home/runner/work/doris/doris/thirdparty/installed/bin/protoc --proto_path=/home/runner/work/doris/doris/gensrc/proto --cpp_out=/home/runner/work/doris/doris/gensrc/proto/../build//gen_cpp /home/runner/work/doris/doris/gensrc/proto/descriptors.proto
make[1]: Leaving directory '/home/runner/work/doris/doris/gensrc/proto'
make: Leaving directory '/home/runner/work/doris/doris/gensrc'
-- Configuring incomplete, errors occurred!, but local setup failed before compilation because is missing in this runner.

  • Observability/performance: No new observability appears necessary for this parser-only fix. The implementation remains single-pass over the parsed significand/exponent and does not add material allocations.
  • User focus: No additional user-provided review focus was present.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review opinion: approve. I did not find blocking correctness issues in this PR.

Checklist conclusions:

  • Goal and proof: The change targets string-to-decimal scientific notation parsing and rounding for very small values. The parser now keeps the significand end separate from exponent syntax and gates rounding on the first actually discarded fractional digit. Added BE tests cover positive exponent shifts, leading-dot input, zero exponent, negative exponent, and the small-value rounding boundary.
  • Scope: The code change is focused on StringParser::string_to_decimal; the CRC expected-result updates are consistent with decimal parsing affecting the array decimal256 test data loaded from scientific notation CSV values.
  • Concurrency/lifecycle: No new concurrency, locks, threads, globals, or lifecycle ownership changes.
  • Configuration/compatibility: No new configs, persisted formats, thrift protocol, or rolling-upgrade compatibility concerns.
  • Parallel paths: string_to_decimal is the shared BE parser used by decimal cast/serde paths, so the fix applies to the relevant decimal widths through the existing template instantiations.
  • Error handling/data correctness: Existing parse failure/overflow/underflow status paths are preserved; the reviewed edge cases keep exponent characters out of digit counts and avoid rounding when implicit zeros precede the first significand digit beyond scale.
  • Tests: Targeted tests were added. I attempted to run ./run-be-ut.sh --run --filter=FunctionCastToDecimalTest.test_from_string_scientific_notation:FunctionCastToDecimalTest.string_parser_scientific_rounding, but local setup failed before compilation because thirdparty/installed/bin/protoc is missing in this runner.
  • Observability/performance: No new observability appears necessary for this parser-only fix. The implementation remains single-pass over the parsed significand/exponent and does not add material allocations.
  • User focus: No additional user-provided review focus was present.

@zclllyybb zclllyybb merged commit 58765f9 into apache:master May 12, 2026
32 of 33 checks passed
github-actions Bot pushed a commit that referenced this pull request May 12, 2026
…tion strings (#63119)

Related PR: Bug introduced by #60004

Fix incorrect string-to-decimal parsing for scientific notation.
 
Previously, decimal parsing could count exponent characters as part of
the significand, causing
values like `"1.4E+2"` to be cast incorrectly as `14` instead of `140`.
It could also round very
small scientific-notation values incorrectly when the significant digit
appeared after implicit
fractional zeros, such as `"5e-17"` for scale 15.
 
This change makes the parser track the end of the significand separately
from the exponent,
applies exponent-based decimal-point shifting correctly, and only rounds
when the next real
significand digit is the first discarded fractional digit.

Also add comments to help understand the code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants