[BUG](exec) fix coalesce function output null#63092
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
The implementation change is small and directly addresses the reported COALESCE value-pollution path, but this PR currently has no regression or unit test for the bug it fixes. Because this is BE expression correctness code, Doris review standards require a targeted test before merge.
Critical checkpoint conclusions:
- Goal/test: The code appears to accomplish the stated goal for scalar non-string COALESCE results by avoiding arithmetic with null-position payloads, but no test proves the reported nullable DOUBLE/NaN case or prevents regression.
- Scope: The code change is focused and minimal.
- Concurrency/lifecycle: Not applicable; this is per-block expression evaluation with no new shared state or lifecycle management.
- Configuration/compatibility/protocol: Not applicable; no config, persisted format, or FE-BE protocol changes.
- Parallel paths: String/array/map/struct/JSON and bitmap paths already use guarded insertion/assignment; the changed scalar path is the relevant divergent path.
- Conditions/error handling: No new error path; the new guard is the necessary data-correctness condition.
- Tests/results: Missing targeted regression/unit coverage and no expected result update was included.
- Observability/transactions/data writes: Not applicable.
- Performance: The new branch is in a hot scalar loop but is justified by correctness; no obvious blocking performance issue found.
Focus points: No additional user-provided review focus was supplied.
| column_raw_data[row] * | ||
| typename ColumnType::value_type(!(null_map_data[row] | filled_flag[row])); | ||
| if (!(null_map_data[row] | filled_flag[row])) { | ||
| result_raw_data[row] = column_raw_data[row]; |
There was a problem hiding this comment.
This bug fix needs a targeted regression test. The PR description describes a concrete data-correctness failure where COALESCE(nullable_double, 0) can return NULL after a null-position payload contains NaN, but the PR only changes BE code and does not add a case under the existing conditional function/coalesce tests (or a BE unit test) that fails before this change and passes after it. Please add coverage for the nullable floating-point path so this does not regress.
| result_raw_data[row] += | ||
| column_raw_data[row] * | ||
| typename ColumnType::value_type(!(null_map_data[row] | filled_flag[row])); | ||
| if (!(null_map_data[row] | filled_flag[row])) { |
There was a problem hiding this comment.
add test please and also check the document
|
run buildall |
TPC-H: Total hot run time: 29271 ms |
TPC-DS: Total hot run time: 170437 ms |
|
run buildall |
TPC-H: Total hot run time: 29699 ms |
TPC-DS: Total hot run time: 170254 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
skip buildall |
|
PR approved by anyone and no changes requested. |
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 31376 ms |
TPC-DS: Total hot run time: 168320 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
I found one blocking issue in the new BE unit test: it accesses protected VExpr internals directly, so the test target will not compile.
Critical checkpoint conclusions:
- Goal/test: the production change targets COALESCE floating-point NaN/Inf pollution and the added tests cover that behavior, but the tests are currently uncompilable.
- Scope: the production code change is small and focused.
- Concurrency/lifecycle/config/compatibility/persistence/write paths: not applicable for this expression-local fix.
- Parallel paths: Float32 and Float64 are covered; integer path is left unchanged and has a sanity test.
- Error handling/memory/observability: no new Status propagation, large allocations, or observability concerns found in the changed production path.
- Test coverage/results: coverage was added in response to the earlier review thread, but must be fixed to compile before it can prove the behavior.
User focus: no additional user-provided review focus was specified.
| auto coalesce_expr = VectorizedCoalesceExpr::create_shared(coalesce_node); | ||
|
|
||
| // _data_type is already set to Nullable(Float64) by the base ctor; reassert it explicitly. | ||
| coalesce_expr->_data_type = |
There was a problem hiding this comment.
This test file will not compile because it writes coalesce_expr->_data_type here and coalesce_expr->_open_finished below, but both members are protected in VExpr (be/src/exprs/vexpr.h). The test is neither a derived class nor a friend of VExpr/VectorizedCoalesceExpr, so the BE UT target fails at compile time before exercising the regression. Please avoid direct protected-member access, for example use the public data_type() accessor if the type really needs overriding and remove the _open_finished assignments since these tests call execute_column_impl() directly. The same pattern appears in the other test cases in this file as well.
… into fix-coalesce
|
run buildall |
TPC-H: Total hot run time: 31055 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-DS: Total hot run time: 168962 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Automated review summary:
I reviewed the full PR diff and existing inline review context. I did not find any additional blocking issues distinct from the already-known review threads.
Critical checkpoint conclusions:
- Goal and tests: The PR targets COALESCE result pollution from NaN/Inf payloads in skipped nullable floating-point rows. The added BE unit tests cover Float64 NaN, Float64 Inf, selected NaN preservation, Float32 NaN, and an integer control path.
- Scope and clarity: The production change is small and localized to the coalesce scalar fill helper; the test is focused on the regression scenario.
- Concurrency and lifecycle: No new shared state, threads, locks, static initialization, or lifecycle-sensitive ownership were introduced.
- Compatibility and persistence: No storage format, wire protocol, EditLog, or rolling-upgrade compatibility impact found.
- Parallel code paths: The regular coalesce numeric fill path is the affected path; the short-circuit coalesce path fills selected rows directly and does not appear to share this arithmetic pollution issue.
- Error handling and observability: No new Status-discarding path or user-visible error path was introduced; no extra observability appears necessary for this local arithmetic fix.
- Memory and performance: No significant new allocation or memory-tracking concern in production code. The per-row finite check is only on Float32/Float64 coalesce fill and is bounded by the existing scan.
- Test result files: No regression .out files are involved because coverage is added as BE UTs.
User focus: No additional user-provided review focus was specified.
### What problem does this PR solve?
Issue Number: close #xxx
Example: COALESCE(same_department_income_amount, 0) ==> outputs NULL
(where same_department_income_amount is of type double).
When assigning the value to the result column in the computation, the
assignment is done unconditionally (forced), as in:
```cpp
result_raw_data[row] +=
column_raw_data[row] *
typename ColumnType::value_type(!(null_map_data[row] | filled_flag[row]));
```
If the argument column column_raw_data's null_map[row] is 1, then the
value stored in column_raw_data[row] is garbage data. This garbage may
contain values such as NaN. If a preceding argument of COALESCE happens
to be assigned NaN, then during subsequent assignments we run into cases
like:
0 * NaN = NaN
num + NaN = NaN
so the assigned result also becomes NaN, which causes value pollution.
By rights the final output should also be NaN, but what is actually
returned is NULL. The reason is that during result serialization/output,
NaN values are emitted as NULL.
```cpp
tatus DataTypeNumberSerDe<T>::_write_column_to_mysql(const IColumn& column,
MysqlRowBuffer<is_binary_format>& result,
int row_idx, bool col_const,
const FormatOptions& options) const {
//...
else if constexpr (std::is_same_v<T, float>) {
if (std::isnan(data[col_index])) {
// Handle NaN for float, we should push null value
buf_ret = result.push_null();
} else {
buf_ret = result.push_float(data[col_index]);
}
}
//...
}
```
Co-authored-by: garenshi <garenshi@tencent.com>
### What problem does this PR solve?
Issue Number: close #xxx
Example: COALESCE(same_department_income_amount, 0) ==> outputs NULL
(where same_department_income_amount is of type double).
When assigning the value to the result column in the computation, the
assignment is done unconditionally (forced), as in:
```cpp
result_raw_data[row] +=
column_raw_data[row] *
typename ColumnType::value_type(!(null_map_data[row] | filled_flag[row]));
```
If the argument column column_raw_data's null_map[row] is 1, then the
value stored in column_raw_data[row] is garbage data. This garbage may
contain values such as NaN. If a preceding argument of COALESCE happens
to be assigned NaN, then during subsequent assignments we run into cases
like:
0 * NaN = NaN
num + NaN = NaN
so the assigned result also becomes NaN, which causes value pollution.
By rights the final output should also be NaN, but what is actually
returned is NULL. The reason is that during result serialization/output,
NaN values are emitted as NULL.
```cpp
tatus DataTypeNumberSerDe<T>::_write_column_to_mysql(const IColumn& column,
MysqlRowBuffer<is_binary_format>& result,
int row_idx, bool col_const,
const FormatOptions& options) const {
//...
else if constexpr (std::is_same_v<T, float>) {
if (std::isnan(data[col_index])) {
// Handle NaN for float, we should push null value
buf_ret = result.push_null();
} else {
buf_ret = result.push_float(data[col_index]);
}
}
//...
}
```
Co-authored-by: garenshi <garenshi@tencent.com>
What problem does this PR solve?
Issue Number: close #xxx
Example: COALESCE(same_department_income_amount, 0) ==> outputs NULL (where same_department_income_amount is of type double).
When assigning the value to the result column in the computation, the assignment is done unconditionally (forced), as in:
result_raw_data[row] += column_raw_data[row] * typename ColumnType::value_type(!(null_map_data[row] | filled_flag[row]));If the argument column column_raw_data's null_map[row] is 1, then the value stored in column_raw_data[row] is garbage data. This garbage may contain values such as NaN. If a preceding argument of COALESCE happens to be assigned NaN, then during subsequent assignments we run into cases like:
0 * NaN = NaN
num + NaN = NaN
so the assigned result also becomes NaN, which causes value pollution.
By rights the final output should also be NaN, but what is actually returned is NULL. The reason is that during result serialization/output, NaN values are emitted as NULL.
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)