Skip to content

gh-150204: Optimize literal null unpack idiom#150205

Closed
gesslerpd wants to merge 8 commits into
python:mainfrom
gesslerpd:literal-null-unpack-idiom
Closed

gh-150204: Optimize literal null unpack idiom#150205
gesslerpd wants to merge 8 commits into
python:mainfrom
gesslerpd:literal-null-unpack-idiom

Conversation

@gesslerpd
Copy link
Copy Markdown
Contributor

@gesslerpd gesslerpd commented May 21, 2026

Proposal

When parsing a set container literal whose only element is a direct *(), simplify that element out of the AST during preprocessing.

Proposed examples:

ast.parse('{*()}', mode='eval').body     # Set(elts=[])

Motivation is full ast.unparse() symmetry for the empty-set idiom.

Today, ast.unparse(ast.Set(elts=[])) renders as {*()}, because Python has no dedicated empty-set literal. Without AST canonicalization on parse, reparsing that output produces a Set containing Starred(Tuple([])) instead of an empty Set, so the AST does not round-trip cleanly.

With this change:

empty_set = ast.Set(elts=[])
ast.parse(ast.unparse(empty_set), mode='eval').body

would produce the same ast.Set(elts=[]) structure again.

Scope

This proposal is intentionally narrow for performance reasons.

Only simplify the direct lone {*()} case:

  • {*()}
  • assignment forms such as x = {*(),}

Do not simplify larger literals such as:

  • {*(), 2}
  • {1, *(), 2}

That keeps the rule cheap and focused on empty-set idiom rather than introducing a broader AST rewrite.

Compatibility and behavior

Runtime semantics do not change.

The visible AST change is that tools would no longer see Starred(Tuple([])) for the lone forms above; they would instead see empty Set node.

That is an externally visible AST change, but it is narrow and maps a degenerate form to the canonical empty set node.

Notable consequences:

  • ast.literal_eval('{*()}') can evaluate to set() once the parsed AST is canonicalized

@picnixz picnixz closed this May 22, 2026
@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 22, 2026

This comes at a cost of checking for every literal container constructions. In addition, there was no concrete support from core devs for this feature so please wait for that first.

@gesslerpd
Copy link
Copy Markdown
Contributor Author

gesslerpd commented May 22, 2026

Thanks for feedback, yes definitely thought about the potential cost so reduced the scope to only checking 1 element containers. Also have now since pushed a commit to reduce the scope to only modifying set only {*()}.

AFAIK repeated runtime performance gains of cached resulting bytecode may outweigh the small performance hit AST preprocessor takes for doing this optimization.

Will wait and see the outcome of PEP 802, this can serve as a reference implementation of what can be done if the language never has a true empty set literal form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants