Search before asking
Version
Apache Doris 4.0
What's Wrong?
BE crashes (SIGSEGV) when a Stream Load uses strict_mode=true together with a columns header in which every target column is assigned via an expression (no direct-mapped column), and at least one row produces a NULL on a derived column.
Crash location
FileScanner::_convert_to_output_block() in be/src/vec/exec/scan/file_scanner.cpp.
The strict-mode branch indexes _src_slot_descs_order_by_dest[dest_index] and _dest_slot_to_src_slot_index[dest_index] without checking their size.
Root cause
These two members are populated only when FE sends dest_sid_to_src_sid_without_trans, which only happens if at least one target column is direct-mapped (no = in columns). When every target column uses an expression, both containers stay empty and the strict-mode branch reads them out of bounds.
What You Expected?
Stream Load should not crash BE. The strict-mode branch should be guarded by a size / existence check on _src_slot_descs_order_by_dest and _dest_slot_to_src_slot_index, and fall through to the regular nullable check when no source-column mapping exists for a derived column.
How to Reproduce?
Table
CREATE TABLE sl_min (
k BIGINT NOT NULL,
v VARCHAR(64) NULL
) ENGINE=OLAP
DUPLICATE KEY(k)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_num" = "1");
Source file sl_min.csv (one row, two fields):
Stream Load
curl --location-trusted -u root: \
-H "label:sl_min_$(date +%s)" \
-H "format:csv" \
-H "column_separator:," \
-H "columns:c1,c2,k=c1,v=c2" \
-H "strict_mode:true" \
-H "max_filter_ratio:0" \
-T ./sl_min.csv \
"http://<fe_host>:<fe_http_port>/api/<db>/sl_min/_stream_load"
The BE process crashes with SIGSEGV. curl reports Warning: Binary output can mess up your terminal because the connection is dropped mid-response.
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
Search before asking
Version
Apache Doris 4.0
What's Wrong?
BE crashes (SIGSEGV) when a Stream Load uses
strict_mode=truetogether with acolumnsheader in which every target column is assigned via an expression (no direct-mapped column), and at least one row produces a NULL on a derived column.Crash location
FileScanner::_convert_to_output_block()inbe/src/vec/exec/scan/file_scanner.cpp.The strict-mode branch indexes
_src_slot_descs_order_by_dest[dest_index]and_dest_slot_to_src_slot_index[dest_index]without checking their size.Root cause
These two members are populated only when FE sends
dest_sid_to_src_sid_without_trans, which only happens if at least one target column is direct-mapped (no=incolumns). When every target column uses an expression, both containers stay empty and the strict-mode branch reads them out of bounds.What You Expected?
Stream Load should not crash BE. The strict-mode branch should be guarded by a size / existence check on
_src_slot_descs_order_by_destand_dest_slot_to_src_slot_index, and fall through to the regular nullable check when no source-column mapping exists for a derived column.How to Reproduce?
Table
Source file
sl_min.csv(one row, two fields):Stream Load
curl --location-trusted -u root: \ -H "label:sl_min_$(date +%s)" \ -H "format:csv" \ -H "column_separator:," \ -H "columns:c1,c2,k=c1,v=c2" \ -H "strict_mode:true" \ -H "max_filter_ratio:0" \ -T ./sl_min.csv \ "http://<fe_host>:<fe_http_port>/api/<db>/sl_min/_stream_load"The BE process crashes with SIGSEGV. curl reports
Warning: Binary output can mess up your terminalbecause the connection is dropped mid-response.Anything Else?
No response
Are you willing to submit PR?
Code of Conduct