Fix infinite entity expansion by detecting circular references#312
Fix infinite entity expansion by detecting circular references#312kou merged 3 commits intoruby:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds circular entity-reference detection to prevent unbounded recursive expansion (previously resulting in SystemStackError) during streaming-style parsing.
Changes:
- Track the current entity-expansion chain in
REXML::Parsers::BaseParserand raise a clear error when a loop is detected. - Extend
BaseParser#unnormalize/#entityto thread the expansion state through nested expansions. - Add regression tests covering direct/indirect circular references (including a long entity value) via
StreamParser.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
lib/rexml/parsers/baseparser.rb |
Adds loop detection during entity expansion by propagating an “currently expanding” set through recursive unnormalization. |
test/test_entity.rb |
Adds regression tests ensuring circular entity references raise an appropriate error instead of stack overflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Is this check heavy? |
19dd4a9 to
26545fc
Compare
I fixed an issue where unnecessary memory copies were occurring. I believe this falls within an acceptable range in terms of performance. |
ad36b1d to
2c057b0
Compare
| private :pull_event | ||
|
|
||
| def entity( reference, entities ) | ||
| def entity( reference, entities, expanding = nil ) |
There was a problem hiding this comment.
Could you use keyword argument here too for consistency?
|
|
||
| # Unescapes all possible entities | ||
| def unnormalize( string, entities=nil, filter=nil ) | ||
| def unnormalize( string, entities=nil, filter=nil, expanding=nil ) |
There was a problem hiding this comment.
Could you use keyword argument here too for consistency?
| error = assert_raise(REXML::ParseException) do | ||
| parser.parse | ||
| end | ||
| assert_match(/Detected an entity reference loop: x/, error.message) |
There was a problem hiding this comment.
Could you use the
rexml/test/parse/test_cdata.rb
Lines 23 to 32 in 93ed88c
…ML::Parsers::BaseParser
## Why?
Fix an issue where a `stack level too deep (SystemStackError)` error caused by a circular reference was not detected as an appropriate error.
```
require 'rexml/parsers/streamparser'
require 'rexml/streamlistener'
source = <<~XML
<!DOCTYPE root [
<!ENTITY x "&x;">
]>
<root>&x;</root>
XML
listener = Class.new { include REXML::StreamListener }.new
REXML::Parsers::StreamParser.new(source, listener).parse
```
- before
```
lib/rexml/parsers/baseparser.rb:544:in 'REXML::Parsers::BaseParser#entity': stack level too deep (SystemStackError)
```
- after
```
lib/rexml/parsers/baseparser.rb:545:in 'REXML::Parsers::BaseParser#entity': Detected an entity reference loop: x (REXML::ParseException)
Line: 4
Position: 57
Last 80 unconsumed characters:
</root>
```
…ML::Document ## Why? Fix an issue where a `stack level too deep (SystemStackError)` error caused by a circular reference was not detected as an appropriate error. ``` require 'rexml' source = <<~XML <!DOCTYPE root [ <!ENTITY x "&x;"> ]> <root>&x;</root> XML REXML::Document.new(source).root.text ``` - before ``` lib/rexml/text.rb:390:in 'String#gsub': stack level too deep (SystemStackError) ``` - after ``` lib/rexml/entity.rb:81:in 'REXML::Entity#unnormalized': Detected an entity reference loop: x (REXML::ParseException) ```
## Benchmark
```
$ benchmark-driver benchmark/parse_doctype.yaml
Calculating -------------------------------------
before after before(YJIT) after(YJIT)
single_entity(dom) 22.381k 24.492k 5.098k 5.211k i/s - 100.000 times in 0.004468s 0.004083s 0.019615s 0.019189s
single_entity(stream) 46.882k 48.054k 8.796k 8.691k i/s - 100.000 times in 0.002133s 0.002081s 0.011369s 0.011506s
chained_entities(dom) 2.665k 2.531k 2.349k 2.340k i/s - 100.000 times in 0.037520s 0.039509s 0.042574s 0.042726s
chained_entities(stream) 2.455k 2.387k 2.351k 2.290k i/s - 100.000 times in 0.040732s 0.041888s 0.042538s 0.043660s
many_entities(dom) 675.927 679.094 684.308 674.686 i/s - 100.000 times in 0.147945s 0.147255s 0.146133s 0.148217s
many_entities(stream) 2.602k 2.501k 2.403k 2.276k i/s - 100.000 times in 0.038427s 0.039986s 0.041607s 0.043944s
repeated_entity(dom) 419.090 414.295 445.286 444.575 i/s - 100.000 times in 0.238612s 0.241374s 0.224575s 0.224934s
repeated_entity(stream) 1.449k 1.401k 1.590k 1.582k i/s - 100.000 times in 0.069014s 0.071358s 0.062898s 0.063201s
Comparison:
single_entity(dom)
after: 24491.8 i/s
before: 22381.4 i/s - 1.09x slower
after(YJIT): 5211.3 i/s - 4.70x slower
before(YJIT): 5098.1 i/s - 4.80x slower
single_entity(stream)
after: 48053.8 i/s
before: 46882.3 i/s - 1.02x slower
before(YJIT): 8795.8 i/s - 5.46x slower
after(YJIT): 8691.1 i/s - 5.53x slower
chained_entities(dom)
before: 2665.2 i/s
after: 2531.1 i/s - 1.05x slower
before(YJIT): 2348.9 i/s - 1.13x slower
after(YJIT): 2340.5 i/s - 1.14x slower
chained_entities(stream)
before: 2455.1 i/s
after: 2387.3 i/s - 1.03x slower
before(YJIT): 2350.8 i/s - 1.04x slower
after(YJIT): 2290.4 i/s - 1.07x slower
many_entities(dom)
before(YJIT): 684.3 i/s
after: 679.1 i/s - 1.01x slower
before: 675.9 i/s - 1.01x slower
after(YJIT): 674.7 i/s - 1.01x slower
many_entities(stream)
before: 2602.3 i/s
after: 2500.9 i/s - 1.04x slower
before(YJIT): 2403.4 i/s - 1.08x slower
after(YJIT): 2275.6 i/s - 1.14x slower
repeated_entity(dom)
before(YJIT): 445.3 i/s
after(YJIT): 444.6 i/s - 1.00x slower
before: 419.1 i/s - 1.06x slower
after: 414.3 i/s - 1.07x slower
repeated_entity(stream)
before(YJIT): 1589.9 i/s
after(YJIT): 1582.3 i/s - 1.00x slower
before: 1449.0 i/s - 1.10x slower
after: 1401.4 i/s - 1.13x slower
```
- YJIT=ON : 0.95 - 1.09x faster
- YJIT=OFF : 0.94 - 1.02x faster
2c057b0 to
d8eee84
Compare
|
Thanks. |
Why?
Fix an issue where a
stack level too deep (SystemStackError)error caused by a circular reference was not detected as an appropriate error.REXML::Parsers::StreamParser
REXML::Document
Benchmark