Split proto serialization to encapsulate private state (#21835)#21929
Split proto serialization to encapsulate private state (#21835)#21929adriangb wants to merge 5 commits into
Conversation
9302fa9 to
5360d7d
Compare
e05de25 to
2a29583
Compare
|
@timsaucer wonder if you'd be willing to do an initial review on this change since we discussed it in the call yesterday? |
2a29583 to
0e6e8b0
Compare
0e6e8b0 to
32deeb5
Compare
|
I've started reviewing but this is a big one and it will take a bit |
222d25d to
8094825
Compare
I've tried to restack it to clean it up. But yes I know it's big, if you have any ideas for how to split it up or make it easier to review I'm all ears. |
Mirror the existing datafusion-proto-common split for the physical/logical
plan schemas. The new crate contains only the .proto file and the prost-
generated Rust types, with no datafusion deps beyond datafusion-proto-common.
datafusion-proto re-exports the proto types via its existing protobuf module
(and keeps the legacy datafusion_proto::generated::* paths working behind a
deprecation notice), so downstream consumers (datafusion-ffi,
datafusion-examples, benchmarks) continue to work without Cargo.toml changes.
Because the prost-generated types are now foreign to datafusion-proto, the
orphan rule forbids:
- inherent impl blocks on protobuf::PhysicalPlanNode
- From/TryFrom impls between proto types and types from datafusion-common,
datafusion-expr, datafusion-datasource (foreign + foreign)
To work around this:
- Inherent impls on protobuf::PhysicalPlanNode are converted to a new
PhysicalPlanNodeExt trait (callers must `use` it to call the methods).
- Foreign-foreign From/TryFrom impls are converted to FromProto/TryFromProto
traits defined in a new datafusion_proto::convert module.
CI configuration (rust.yml, licenserc.toml, rat_exclude_files.txt) is updated
for the new crate in the same commit so it builds green standalone.
This is groundwork for a follow-up that adds PhysicalExpr::try_to_proto so
expressions can serialize private state without exposing pub-for-proto
scaffolding (issue apache#21835).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8094825 to
c7d42ab
Compare
timsaucer
left a comment
There was a problem hiding this comment.
I found the description to have the answers I was looking for eventually, but the llm generated text had so much content I didn't want to read it all just to understand what is actually not a very complex refactor. This held me up more than anything. It would have been more helpful to have a more pithy description that got me right to the useful pieces.
I'm finding for my own PRs that I need to make a lot of edits to these generated descriptions because they're not always great at isolating what a reviewer really needs to understand vs cataloging all the work that was done.
| /// Takes the whole [`PhysicalExprNode`] — the exact inverse of what | ||
| /// [`PhysicalExpr::try_to_proto`] produces — so every expression's | ||
| /// `try_from_proto` shares one signature. The operator string is parsed | ||
| /// via the canonical [`Operator::from_proto_name`] mapping, so no `op` | ||
| /// argument needs to be threaded in by the caller. | ||
| /// | ||
| /// [`PhysicalExprNode`]: datafusion_proto_models::protobuf::PhysicalExprNode | ||
| /// [`PhysicalExpr::try_to_proto`]: datafusion_physical_expr_common::physical_expr::PhysicalExpr::try_to_proto | ||
| /// [`PhysicalExprDecodeCtx::decode`]: datafusion_physical_expr_common::physical_expr::proto_decode::PhysicalExprDecodeCtx::decode |
There was a problem hiding this comment.
Seems like some generic text that doesn't necessarily need to be in each implementation
| impl From<&CsvOptionsProto> for CsvOptions { | ||
| fn from(proto: &CsvOptionsProto) -> Self { |
There was a problem hiding this comment.
For this and also from_factory probably needs a line in the upgrade doc
There was a problem hiding this comment.
Same argument holds for json options, parquet, etc
There was a problem hiding this comment.
On second thought, from_factory wasn't pub so maybe this argument is moot.
| } | ||
| } | ||
|
|
||
| impl TryFrom<protobuf::WindowFrame> for WindowFrame { |
There was a problem hiding this comment.
TBH Since the methods are so similar it feels like keeping impl TryFrom is better.
There was a problem hiding this comment.
Ok, so I spent way more time than I wanted to understand the problem with the dependencies and why you had to introduce this.
|
LMK if you want me to take another look since you did another push since I reviewed |
Adds an opt-in `proto` feature on `datafusion-physical-expr-common` that
exposes `PhysicalExpr::try_to_proto`:
```rust
fn try_to_proto(
&self,
ctx: &PhysicalExprEncodeCtx<'_>,
) -> Result<Option<PhysicalExprNode>> { Ok(None) }
```
`Ok(None)` (the default) means "fall through" — `datafusion-proto`'s
existing downcast chain handles the expression. Returning `Ok(Some(node))`
lets the expression serialize itself, so types with private state
(`DynamicFilterPhysicalExpr`'s `RwLock`-wrapped inner, etc.) no longer
need pub-for-proto accessors.
The `try_` prefix is deliberate: the method is fallible, and it pairs with
the fallible `try_from_proto` decode constructors added in later commits
(and the `TryFromProto` trait in `datafusion-proto`).
Expression authors receive a concrete `PhysicalExprEncodeCtx` struct, not a
`&dyn` trait object. It wraps a sealed `PhysicalExprEncode` dispatch trait
(implemented by `ConverterEncoder` in `datafusion-proto`, which routes
through the existing `PhysicalExtensionCodec` + converter and preserves
dedup). Keeping the context a struct keeps `&dyn` out of every expression's
signature and gives a stable place to add helpers (UDF/UDAF/UDWF encoding,
registry hooks) without expanding a public trait surface — mirroring how
`PhysicalExprDecodeCtx` is shaped on the decode side.
`serialize_physical_expr_with_converter` calls `expr.try_to_proto(...)`
first, falling back to the existing chain on `Ok(None)`. No expression has
migrated yet, so behavior is unchanged. The feature is off by default on
`physical-expr-common`; `datafusion-proto` flips it on.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First expression to use the new `try_to_proto` hook, demonstrating the migration pattern. The downcast arm in `serialize_physical_expr_with_converter` is removed; the trait dispatch path now produces the same `PhysicalExprNode`. `datafusion-physical-expr` gains a passthrough `proto` feature that turns on `datafusion-physical-expr-common/proto` and pulls in `datafusion-proto-models` for the proto types. `datafusion-proto` enables it. Other built-in expressions (`UnKnownColumn`, `Literal`, `BinaryExpr`, `DynamicFilterPhysicalExpr`, etc.) still go through the downcast chain and will migrate one at a time in follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks so much @timsaucer !
It's the same content, I was just re-organizing the commits to make it easier to review 😄. I think it would be good for @alamb to take a look at this, especially since we can wait to ship it until after 54 has been released (so no big rush).
Sorry about this. I agree with you. I usually leave the LLM generated descriptions up for draft PRs or where they are "fine" and better than "I'm a lazy human so really didn't write much at all" but I do try to rewrite them where it would be helpful (e.g. incidentally I was just rewriting the one in #22300 by hand while you were reviewing this). Just bad timing / judgment on this one. I'll update it for the next reviewer and note the feedback. |
c7d42ab to
1d5a354
Compare
Adds the decode-side counterpart to `PhysicalExprEncodeCtx`. The public
surface is a single `decode(node)` method on the context — the central
match (and, in future, third-party registry lookups) live behind it.
Per-expression contract:
```rust
impl Column {
pub fn try_from_proto(
node: &PhysicalExprNode,
ctx: &PhysicalExprDecodeCtx,
) -> Result<Arc<dyn PhysicalExpr>>;
}
```
`try_from_proto` takes the whole `PhysicalExprNode` — the exact inverse of
what `try_to_proto` returns — and unwraps its own `ExprType` variant. That
keeps one signature for every expression (trait-ready) and gives decoders
access to outer-node fields such as `expr_id`. The central match in
`parse_physical_expr_with_converter` only routes to the right constructor.
`Column` is the first expression to migrate. Other variants stay inline;
they migrate in follow-ups, same shape, same trick (one expression per
commit, central match keeps shrinking).
Removes the temporary `FromProto<&PhysicalColumn> for Column` impl
introduced in the proto-models extraction — `Column` now owns both
directions of its own serialization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second expression to use the new hooks, following the same pattern as Column. The BinaryExpr downcast arm in `serialize_physical_expr_with_converter` and the inline match arm in `parse_physical_expr_with_converter` are removed; both directions now live on `BinaryExpr` itself. Linearization of nested same-op chains stays bit-for-bit identical with the previous behavior, just expressed in terms of `PhysicalExprEncodeCtx::encode_child` / `PhysicalExprDecodeCtx::decode` instead of the proto converter. `BinaryExpr::try_from_proto` takes the whole `PhysicalExprNode` and parses the operator string itself. The proto-string-to-enum mapping moves from `from_proto_binary_op` in `datafusion-proto` down to `Operator::from_proto_name` in `datafusion-expr-common`, so the logical plan path and the physical `BinaryExpr` decoder share one source of truth without `physical-expr` depending on `datafusion-proto`; `from_proto_binary_op` now delegates to it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1d5a354 to
89d6cf7
Compare
| fn try_to_proto( | ||
| &self, | ||
| _ctx: &proto_encode::PhysicalExprEncodeCtx<'_>, | ||
| ) -> Result<Option<datafusion_proto_models::protobuf::PhysicalExprNode>> { |
There was a problem hiding this comment.
@timsaucer do you think we should make this Result<...> instead of Result<Option<...>>? If we ever plan to remove the default impl, remove the codecs and force ourselves to implement this method on all expressions we produce then the default returning an error would make sense IMO. The price we pay is that in the meantime it would be hard to differentiate between "I called an expression with an implementation and it error, I should bubble that up to users so they can fix it instead of getting an unknown expression Column type error (by falling back to the match -> codec, etc.)" vs. "this expression hasn't been updated yet".
Which issue does this PR close?
Rationale for this change
datafusion-protoserializes every built-inPhysicalExprthrough a single~300-line
downcast_refchain, with a mirrormatchon the decode side. Thatchain lives outside the crate where each expression is defined, so every field
an expression wants to round-trip has to be made
pub. #21807 is thecautionary tale: it had to add five
pub"proto-only, not stable" items toDynamicFilterPhysicalExprjust to serialize anRwLock-wrapped inner.This PR adds the infrastructure so a
PhysicalExprcan serialize itself andkeep its state private.
What changes are included in this PR?
A
PhysicalExprcan now opt into serializing itself, in both directions:try_to_protoreturningOk(None)(the default) means "fall through to theold downcast chain", so the change is purely additive — nothing is forced to
migrate.
ColumnandBinaryExprare migrated as working demos; everythingelse stays on the old path and migrates later, one expression at a time, with
no wire-format change.
Five stacked commits, each builds green on its own and is independently
reviewable (or splittable into its own PR):
datafusion-proto-modelscrate — move the.protofile andprost-generated types into a lightweight crate (mirrors the existing
datafusion-proto-commonsplit).try_to_protohook — feature-gated, off by default.Columnencode.Columndecode.BinaryExpr(both directions).A few design decisions worth flagging
FromProto/TryFromPrototraits instead of plainFrom/TryFrom.Once the prost types move into their own crate they are foreign to
datafusion-proto, and the orphan rule forbidsimpl From<&protobuf::X> for Ywhen both
XandYare foreign. So those conversions becomeFromProto/TryFromPrototraits indatafusion_proto::convert, and callers go from(&x).into()toY::from_proto(&x). This is a known workaround, not the endstate — see Future work.
&dyn.PhysicalExprEncodeCtx/PhysicalExprDecodeCtxwrap a sealed dispatch trait. Keeping them concretekeeps
&dynout of every expression's signature and gives a stable place toadd helpers (UDF encoding, registry hooks) later without churning a public
trait.
try_from_prototakes the wholePhysicalExprNode, not thepre-unwrapped variant payload, so every expression's decoder has the same
signature and can still see outer-node fields like
expr_id.Are these changes tested?
No new behavior, so no new tests.
ColumnandBinaryExprproduce and consumethe same wire format as before; the existing
roundtrip_physical_plan/roundtrip_physical_exprtests already cover both directions and now exercisethe new path.
Are there any user-facing changes?
Small API breaks in
datafusion-proto:try_from_physical_plan_with_converter/try_into_physical_plan_with_convertermove to a
PhysicalPlanNodeExttrait — callers adduse datafusion_proto::physical_plan::PhysicalPlanNodeExt;.From/TryFromconversions becomeFromProto/TryFromProto(see Design decisions above).datafusion_proto::generated::*is deprecated in favor ofdatafusion_proto::protobuf; it still works.The new
protofeature ondatafusion-physical-expr(-common)is off bydefault, so crates that don't serialize plans pay nothing.
Future work
DynamicFilterPhysicalExpr, the original motivation — one per follow-up PR.ExecutionPlanserialization.FromProto/TryFromProtoworkaround: collapsedatafusion-proto-commonintodatafusion-proto-modelsand push theconversion impls down to the target-type crates so callers use plain
From/TryFromagain. Full dep-graph analysis and a step-by-step plan are in#21835 (comment).
🤖 Generated with Claude Code