Skip to content

fix(aggregate): show aliased expr in explain#21739

Open
kumarUjjawal wants to merge 9 commits into
apache:mainfrom
kumarUjjawal:fix/aliased_expr_explain
Open

fix(aggregate): show aliased expr in explain#21739
kumarUjjawal wants to merge 9 commits into
apache:mainfrom
kumarUjjawal:fix/aliased_expr_explain

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

@kumarUjjawal kumarUjjawal commented Apr 20, 2026

Which issue does this PR close?

Rationale for this change

Physical explain output only showed the alias for aliased aggregates. That made it hard to understand the plan, especially when the aggregate had a filter, explicit RESPECT NULLS, or a custom UDAF display.

What changes are included in this PR?

  • Show the full aggregate expression in physical explain for user-written aggregate aliases.
  • Keep internal aliases like count(*) compact in physical explain.
  • Replace the old hidden metadata approach with an explicit is_internal flag on Alias.
  • Preserve that flag through planner rewrites, tree rewrites, and proto round-trip.
  • Add tests for aliased aggregate explain output, including:
    • normal aliased aggregates
    • quoted aliases
    • explicit RESPECT NULLS
    • custom human display
    • count(*)
    • nested internal alias display
  • Add an upgrade note for the public Alias API change.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes.

  • Physical explain output is clearer for aliased aggregate expressions.
  • Alias now has a new is_internal field.

This is a public API change for users who build or pattern match Alias directly. The upgrade guide has been updated with the needed changes.

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Apr 20, 2026
@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

cc @pepijnve

@kumarUjjawal kumarUjjawal marked this pull request as draft April 20, 2026 07:56
@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

It has much more cases than I realized initially 😢

@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

kumarUjjawal commented Apr 20, 2026

I am working on another approach which is looking better than this

I updated the code with new changes along with pr body

@github-actions github-actions Bot added documentation Improvements or additions to documentation sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate functions Changes to functions implementation labels Apr 20, 2026
@kumarUjjawal kumarUjjawal marked this pull request as ready for review April 20, 2026 12:34
@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

I tried two approaches for this.

The first approach was by hiding the bit in alias metadata. It fixed the display problem, but it also mixed planner-only state with user metadata. That made the behavior harder to reason about and forced extra handling in places like equality, hashing, serialization, and rewrite logic. In practice, a simple display fix startedaffecting alias identity and metadata flow in too many places.

The current approach adds an explicit internal flag to alias. This is a small API break, but it makes the model much clearer: whether an alias is user written or planner generated is now part of the type itself, not hidden in metadata. That keeps the display logic direct, avoids hidden state leaking into unrelated code paths, and makes future maintenance safer because the intent is visible and compiler checked.

Would love to hear your thoughts @pepijnve

@pepijnve
Copy link
Copy Markdown
Contributor

@kumarUjjawal could you give an example of where/when the internal flag is necessary?

@pepijnve
Copy link
Copy Markdown
Contributor

I think I might have found it in your second commit. Where this change was reverted.

image

Wouldn't we want the left version though? The physical plan is actually executing count(1), but you don't see that at all in the physical plan even though the logical plan does show it.

.map(|expr| expr.human_display())
.map(|expr| {
let human_display = expr.human_display();
if human_display.is_empty() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should human_display be Option<String>?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I could do that.

Comment thread datafusion/core/src/physical_planner.rs Outdated
}

#[tokio::test]
async fn test_aggregate_explain_shows_aliased_expression() -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might be more concise as SLTs

@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

Wouldn't we want the left version though? The physical plan is actually executing count(1), but you don't see that at all in the physical plan even though the logical plan does show it.

I think the current behavior is intentional.

My thinking:

  • user-written aliases should inline the aggregate expression
  • internal planner aliases should stay compact

count() is in the second group, so keeping aggr=[count()] is expected. The bug here is about user aliases like sum(...) as agg disappearing in physical explain, not about exposing every internal rewrite.

So while the physical plan does execute the lowered count(1) form, showing count(1) as count() in explain would expose planner internals and would regress the compact count() output we already preserve elsewhere. If we want physical explain to show lowered forms more generally, I think that should be doable but will require some changes.

@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

@kumarUjjawal could you give an example of where/when the internal flag is necessary?

A good example could be:

Internally, COUNT() is lowered to count(1) so it can use the normal aggregate path. But the user did not write count(1), they wrote count(). So the planner needs a way to preserve that user-facing name without treating it like a real user alias.

That is what the internal flag is for: it marks aliases that exist only because the planner rewrote the expression.

Without that bit, physical explain cannot tell these two cases apart:

  • user alias: sum(a) AS total
  • planner-generated alias: lowered count(1) wrapped as count(*)

Those should display differently. For example, with:

SELECT COUNT(*) AS total_rows FROM t

there are really two alias layers:

  • internal: count(1) -> count(*)
  • user: count(*) -> total_rows

The internal flag lets explain show count() as total_rows, instead of either exposing the lowered form count(1) as count() as total_rows or collapsing everything to just total_rows.

@pepijnve
Copy link
Copy Markdown
Contributor

This might just be personal preference speaking, but in a physical explain plan I'm looking for what the engine is actually doing (how is it being executed), not what I wrote as query. It doesn't make sense for me that the logical plan (which is more declarative in nature) shows the lowered version, while the physical plan (which is more imperative) does not. If anything it should be the other way around that logical hides this detail, and physical shows it.

So if I got to choose I would prefer the left side plan of the diff image above rather than the right one.

@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

This might just be personal preference speaking, but in a physical explain plan I'm looking for what the engine is actually doing (how is it being executed), not what I wrote as query. It doesn't make sense for me that the logical plan (which is more declarative in nature) shows the lowered version, while the physical plan (which is more imperative) does not. If anything it should be the other way around that logical hides this detail, and physical shows it.

So if I got to choose I would prefer the left side plan of the diff image above rather than the right one.

I think I like this approach too, showing lowered expressions in physical explain can be more useful. Let me see how the code looks for this.

@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 21, 2026
@kumarUjjawal kumarUjjawal force-pushed the fix/aliased_expr_explain branch from 21c1690 to 2580bc5 Compare April 21, 2026 14:31
@kumarUjjawal kumarUjjawal requested a review from pepijnve April 30, 2026 09:52
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v53.1.0 (current)
       Built [  82.566s] (current)
     Parsing datafusion v53.1.0 (current)
      Parsed [   0.034s] (current)
    Building datafusion v53.1.0 (baseline)
       Built [  82.205s] (baseline)
     Parsing datafusion v53.1.0 (baseline)
      Parsed [   0.036s] (baseline)
    Checking datafusion v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.631s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure function_marked_deprecated: function #[deprecated] added ---

Description:
A function is now #[deprecated]. Downstream crates will get a compiler warning when using this function.
        ref: https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-deprecated-attribute
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/function_marked_deprecated.ron

Failed in:
  function datafusion::physical_planner::create_aggregate_expr_and_maybe_filter in /home/runner/work/datafusion/datafusion/datafusion/core/src/physical_planner.rs:2523
  function datafusion::physical_planner::create_aggregate_expr_with_name_and_maybe_filter in /home/runner/work/datafusion/datafusion/datafusion/core/src/physical_planner.rs:2498

     Summary semver requires new minor version: 0 major and 1 minor checks failed
    Finished [ 167.367s] datafusion
    Building datafusion-expr v53.1.0 (current)
       Built [  25.699s] (current)
     Parsing datafusion-expr v53.1.0 (current)
      Parsed [   0.070s] (current)
    Building datafusion-expr v53.1.0 (baseline)
       Built [  26.030s] (baseline)
     Parsing datafusion-expr v53.1.0 (baseline)
      Parsed [   0.077s] (baseline)
    Checking datafusion-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.066s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  54.211s] datafusion-expr
    Building datafusion-optimizer v53.1.0 (current)
       Built [  26.446s] (current)
     Parsing datafusion-optimizer v53.1.0 (current)
      Parsed [   0.026s] (current)
    Building datafusion-optimizer v53.1.0 (baseline)
       Built [  26.495s] (baseline)
     Parsing datafusion-optimizer v53.1.0 (baseline)
      Parsed [   0.028s] (baseline)
    Checking datafusion-optimizer v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.157s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  54.226s] datafusion-optimizer
    Building datafusion-physical-expr v53.1.0 (current)
       Built [  25.359s] (current)
     Parsing datafusion-physical-expr v53.1.0 (current)
      Parsed [   0.043s] (current)
    Building datafusion-physical-expr v53.1.0 (baseline)
       Built [  25.380s] (baseline)
     Parsing datafusion-physical-expr v53.1.0 (baseline)
      Parsed [   0.043s] (baseline)
    Checking datafusion-physical-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.310s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  52.350s] datafusion-physical-expr
    Building datafusion-physical-plan v53.1.0 (current)
       Built [  32.224s] (current)
     Parsing datafusion-physical-plan v53.1.0 (current)
      Parsed [   0.123s] (current)
    Building datafusion-physical-plan v53.1.0 (baseline)
       Built [  32.636s] (baseline)
     Parsing datafusion-physical-plan v53.1.0 (baseline)
      Parsed [   0.129s] (baseline)
    Checking datafusion-physical-plan v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.597s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  67.131s] datafusion-physical-plan
    Building datafusion-proto v53.1.0 (current)
       Built [  54.635s] (current)
     Parsing datafusion-proto v53.1.0 (current)
      Parsed [   0.132s] (current)
    Building datafusion-proto v53.1.0 (baseline)
       Built [  55.841s] (baseline)
     Parsing datafusion-proto v53.1.0 (baseline)
      Parsed [   0.136s] (baseline)
    Checking datafusion-proto v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.548s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 114.394s] datafusion-proto
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 134.932s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.021s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 138.174s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.022s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.089s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 276.384s] datafusion-sqllogictest

Comment thread datafusion/core/src/dataframe/mod.rs Outdated
relation: None,
name: field.name().to_string(),
metadata: None,
is_internal: false,
Copy link
Copy Markdown
Contributor

@pepijnve pepijnve Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the is_internal flag? I may have overlooked it, but it doesn't seem to actually be used anymore. It is passed on everywhere, but in the end setting it to true vs false doesn't seem to have an impact on the explain output.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't need it now. I will remove it. Thanks you!

03)----CoalescePartitionsExec
04)------AggregateExec: mode=Partial, gby=[], aggr=[sum(alias1), sum(alias2)]
05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], aggr=[alias2]
05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], aggr=[sum(__common_expr_1) as alias2]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! This is an example of exactly the kind of issue I was struggling with in the explain plan output. The left hand side here which only contains aggr=[alias2] makes the plan unusable.

01)AggregateExec: mode=Final, gby=[], aggr=[max(aggregate_test_100.c2) as percentile_cont(aggregate_test_100.c2,Float64(1))]
02)--CoalescePartitionsExec
03)----AggregateExec: mode=Partial, gby=[], aggr=[percentile_cont(aggregate_test_100.c2,Float64(1))]
03)----AggregateExec: mode=Partial, gby=[], aggr=[max(aggregate_test_100.c2) as percentile_cont(aggregate_test_100.c2,Float64(1))]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement since it makes the optimisation that happened visible in the plan.

Copy link
Copy Markdown
Contributor

@pepijnve pepijnve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I think it would help to revert the 'internal' portions of this PR. That would reduce the number of changes and improve the signal/noise ratio.

@github-actions github-actions Bot removed the functions Changes to functions implementation label Apr 30, 2026
- internal aggregate aliases may now show the underlying expression instead of
only the alias name

If you have tooling that parses the `aggr=[...]` text from physical `EXPLAIN`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are people actually doing this? I guess it might be necessary sometimes, but that seems extremely brittle. The more general is if the explain output is considered part of the stable API of DataFusion or not.


#[doc(hidden)]
pub fn has_aliased_human_display(&self) -> bool {
self.human_display_is_aliased
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a derived property of human_display or is it necessary for reliability to make it explicit? Just wondering if it's worth memoizing this eagerly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it explicit because I was thinking of it as display metadata.

We could derive it by checking whether human_display ends with as <name>, but that puts us back into parsing display strings. The explicit flag lets the builder validate the invariant once, and the formatting/reverse code can use it without guessing.

What do you think?

Copy link
Copy Markdown
Contributor

@pepijnve pepijnve Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that I can now create a conflicting situation where the display string is aliased but the boolean indicates it isn't and vice versa. strip_alias_suffix is already kind of parsing the display string as well, so maybe the fact that we need to inspect the string value isn't that big of a problem?

The main reason I asked is because the addition of this field triggers a protobuf change as well and I was wondering if that was desirable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution to this could be to make human_display (String, Option<String>) (or an equivalent explicit type). In other words make the alias an explicit and optional part. That way you wouldn't have to strip it either.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try this and see how it turn out.

@kumarUjjawal kumarUjjawal force-pushed the fix/aliased_expr_explain branch from f25fb1a to 1275f19 Compare May 3, 2026 17:49
@github-actions github-actions Bot removed the sqllogictest SQL Logic Tests (.slt) label May 3, 2026
@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

kumarUjjawal commented May 3, 2026

There are few more cleanups left.

@kumarUjjawal kumarUjjawal requested a review from pepijnve May 3, 2026 18:19
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 4, 2026
@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 4, 2026
Copy link
Copy Markdown
Contributor

@pepijnve pepijnve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me these plan format changes are exactly what I was hoping for and I think we were able to eliminate some rough edges in the code along the way. At this point we probably need input from a maintainer.

@kumarUjjawal kumarUjjawal requested a review from adriangb May 5, 2026 10:58
02)--SortExec: expr=[l_returnflag@0 ASC NULLS LAST, l_linestatus@1 ASC NULLS LAST], preserve_partitioning=[true]
03)----ProjectionExec: expr=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus, sum(lineitem.l_quantity)@2 as sum_qty, sum(lineitem.l_extendedprice)@3 as sum_base_price, sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)@4 as sum_disc_price, sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax)@5 as sum_charge, avg(lineitem.l_quantity)@6 as avg_qty, avg(lineitem.l_extendedprice)@7 as avg_price, avg(lineitem.l_discount)@8 as avg_disc, count(Int64(1))@9 as count_order]
04)------AggregateExec: mode=FinalPartitioned, gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus], aggr=[sum(lineitem.l_quantity), sum(lineitem.l_extendedprice), sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax), avg(lineitem.l_quantity), avg(lineitem.l_extendedprice), avg(lineitem.l_discount), count(Int64(1))]
04)------AggregateExec: mode=FinalPartitioned, gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus], aggr=[sum(lineitem.l_quantity), sum(lineitem.l_extendedprice), sum(__common_expr_1) as sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), sum(__common_expr_1 * Some(1),20,0 + lineitem.l_tax) as sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax), avg(lineitem.l_quantity), avg(lineitem.l_extendedprice), avg(lineitem.l_discount), count(Int64(1))]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional? The __comon_expr_ part seems like unfortunate noise. Same with Some(1). I think what we'd want is sum(lineitem.l_extendedprice * 1 - lineitem.l_discount)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state of this MR the __common_expr_1 part is certainly intentional. My argumentation above was that this exposes the fact that line 4 is not actually executing sum(lineitem.l_extendedprice * 1 - lineitem.l_discount), that's the logical view of things. Instead sum(__common_expr_1) is being computed and __common_expr_1 is calculated in line 7. For the physical plan I don't think you would want to obscure this.

The Some(1) bit is unfortunate indeed, but I think that's a separate issue. The physical explain output doesn't format literals as nicely as the logical plan output does.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Some(1) bit is unfortunate indeed, but I think that's a separate issue. The physical explain output doesn't format literals as nicely as the logical plan output does.

Yeah I noticed this but left as is for now. @adriangb what's your overall thoughts on the current approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @adriangb's comment I think this change may be controversial. Do we want more input on this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw: I don't think it's a big deal either way. There's always going to be some tension between verbosity and readability. Maybe we need a flag for this (#21768 ?) or maybe if no one complains it's fine. Either way, I don't think there's a strong contract we promise with this, we can always revert the change.

Comment thread datafusion/core/src/physical_planner.rs Outdated
name: Option<String>,
human_displan: String,
human_display: Option<String>,
human_display_alias: Option<String>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explored this change a bit with Claude and came up with what I think is a good proposal. Here's the transcript / summary.

One concern about the public API surface here: this changes the signature of create_aggregate_expr_with_name_and_maybe_filter (a pub fn) by adding a new human_display_alias: Option<String> parameter and changing human_display from String to Option<String>. That's a silent break for any downstream code that builds aggregate physical expressions through this entry point — and it isn't called out in docs/source/library-user-guide/upgrading/54.0.0.md.

Stepping back: I think the existence of create_aggregate_expr_with_name_and_maybe_filter and create_aggregate_expr_and_maybe_filter as parallel public free functions is already a smell. They're really one thing — "lower a logical aggregate Expr into a physical AggregateFunctionExpr plus its filter and order-by" — split awkwardly across two signatures. We already have AggregateExprBuilder for the pure physical-construction half. What's missing is a builder for the logical→physical lowering half.

Could we kill two birds with one stone in this PR by introducing a LoweredAggregateBuilder?

// datafusion-physical-expr/src/aggregate.rs
pub struct LoweredAggregateBuilder<'a> { /* expr + schemas + execution_props + overrides */ }

pub struct LoweredAggregate {
    pub aggregate: Arc<AggregateFunctionExpr>,
    pub filter: Option<Arc<dyn PhysicalExpr>>,
    pub order_bys: Vec<PhysicalSortExpr>,
}

impl<'a> LoweredAggregateBuilder<'a> {
    pub fn new(expr: &'a Expr, logical_schema: &'a DFSchema, physical_schema: &'a Schema, execution_props: &'a ExecutionProps) -> Self;
    pub fn with_name(mut self, name: impl Into<String>) -> Self;
    pub fn build(self) -> Result<LoweredAggregate>;
}

Internally, build() does what these two functions do today: unwrap aliases, derive name / human_display / human_display_alias, lower args / filter / order-by via the existing create_physical_* helpers, then hand off to AggregateExprBuilder. All the new aliased-display logic from this PR lives here.

The proposed migration shape:

  1. Add LoweredAggregateBuilder next to AggregateExprBuilder in datafusion-physical-expr/src/aggregate.rs (the create_physical_* helpers it needs already live in that crate, so no new dependencies).
  2. Revert the signature change to create_aggregate_expr_with_name_and_maybe_filter. Keep both free functions on their pre-PR signatures.
  3. Mark them #[deprecated(note = "use LoweredAggregateBuilder")]. They become thin delegations — the deprecated _with_name_ variant passes its single human_display straight through with no alias, matching pre-PR behavior. The new aliased-display path is only reachable via LoweredAggregateBuilder.
  4. Migrate the planner's own call site (the only in-tree caller) over to LoweredAggregateBuilder in this same PR, so deprecation warnings don't fire on internal code.

Why I'm suggesting this in this PR rather than a follow-up: this PR is already breaking the public surface. If we land it as-is and clean up later, downstream users pay the migration cost twice — once now for the new parameter, once again when we deprecate the function. Doing it in one shot means:

  • Zero public signature breaks (AggregateFunctionExpr::human_display() -> Option<&str> is still a break and still needs the upgrade-guide entry, but the free-function break disappears).
  • The upgrade guide gets a "prefer LoweredAggregateBuilder" pointer instead of a parameter-list diff.
  • Future additions to aggregate construction stop touching public function signatures.

I realize this enlarges the PR and is partly scope creep on what was supposed to be an EXPLAIN fix — happy to help with the refactor (or pair with a committer who can) if it would be useful. What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this makes sense it might be nice to break out as it's own PR (at least introducing LoweredAggregateBuilder

Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions on the new display vs. old display. There's always going to be a tradeoff between pretty vs. has all of the info / reflects the ugly reality. More importantly we don't have a strict API contract with that, we can change it and later revert or tweak if there are complaints.

I flagged the one public API change I saw, I think we should discuss that before merging.

@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

I'm going to try to do this and see how it looks:

  • Add a small LoweredAggregateBuilder.
  • Move only the new logical-to-physical aggregate display logic into it.
  • Restore the old public free-function signatures.
  • Mark the old free functions deprecated.
  • Migrate the planner’s internal call to the builder.
  • Keep the upgrade guide focused on:
    • human_display() now returns Option<&str>
    • prefer LoweredAggregateBuilder for aggregate lowering

@kumarUjjawal kumarUjjawal requested review from adriangb and pepijnve May 8, 2026 14:14
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits but overall LGTM!

self
}

#[doc(hidden)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hidden?

}
}

pub fn with_name(mut self, name: impl Into<String>) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add docstrings to all of these. Ideally I can read the docstrings on these functions / structs with general DataFusion knowledge but without fully understanding how planning works and get some idea of what these do.

}

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can probably port other unit tests to test the new builder directly? It seems like a good unit of code to unit test. Can be a followup / when we remove the old methods, but especially any tests testing the now deprecated methods are good candidates.

@kumarUjjawal
Copy link
Copy Markdown
Contributor Author

@pepijnve would you like to take another look

self.expr.hash(state);
self.relation.hash(state);
self.name.hash(state);
self.metadata.hash(state);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does including metadata in equality and hash calculations risk any unintended side effects? If it's not required for the plan change itself, it might be safer to not include this in this PR.


fn encode_human_display_alias(human_display: &str, alias: &str) -> String {
format!(
"{HUMAN_DISPLAY_ALIAS_PREFIX}{}:{alias}{human_display}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative here that may be kinder to other consumers would be to encode using {human_display} as {alias} kind of like the code was doing before. The split logic is less robust, but maybe good enough?

02)--SortExec: expr=[l_returnflag@0 ASC NULLS LAST, l_linestatus@1 ASC NULLS LAST], preserve_partitioning=[true]
03)----ProjectionExec: expr=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus, sum(lineitem.l_quantity)@2 as sum_qty, sum(lineitem.l_extendedprice)@3 as sum_base_price, sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)@4 as sum_disc_price, sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax)@5 as sum_charge, avg(lineitem.l_quantity)@6 as avg_qty, avg(lineitem.l_extendedprice)@7 as avg_price, avg(lineitem.l_discount)@8 as avg_disc, count(Int64(1))@9 as count_order]
04)------AggregateExec: mode=FinalPartitioned, gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus], aggr=[sum(lineitem.l_quantity), sum(lineitem.l_extendedprice), sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax), avg(lineitem.l_quantity), avg(lineitem.l_extendedprice), avg(lineitem.l_discount), count(Int64(1))]
04)------AggregateExec: mode=FinalPartitioned, gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus], aggr=[sum(lineitem.l_quantity), sum(lineitem.l_extendedprice), sum(__common_expr_1) as sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), sum(__common_expr_1 * Some(1),20,0 + lineitem.l_tax) as sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax), avg(lineitem.l_quantity), avg(lineitem.l_extendedprice), avg(lineitem.l_discount), count(Int64(1))]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @adriangb's comment I think this change may be controversial. Do we want more input on this?

@adriangb
Copy link
Copy Markdown
Contributor

@kumarUjjawal if you wanted to reduce the diff it might be nice to split #21739 (comment) into a refactor only PR since it's mostly addressing an existing code smell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change core Core DataFusion crate documentation Improvements or additions to documentation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Aliased aggregation expressions not visible in physical explain output

3 participants