Skip to content

buffer: add fast api for isUtf8 and isAscii#64169

Open
gurgunday wants to merge 1 commit into
nodejs:mainfrom
gurgunday:feat/fast-api-isUtf8
Open

buffer: add fast api for isUtf8 and isAscii#64169
gurgunday wants to merge 1 commit into
nodejs:mainfrom
gurgunday:feat/fast-api-isUtf8

Conversation

@gurgunday

@gurgunday gurgunday commented Jun 27, 2026

Copy link
Copy Markdown
Member

I like V8's fast api constraints, so I'm in the process of checking if there are any easy wins that we haven't yet spotted

Benchmarks (3 run avg, M5 Pro):

   Benchmark               Baseline avg       New avg      Delta
  ━━━━━━━━━━━━━━━━━━━━━━  ━━━━━━━━━━━━━━  ━━━━━━━━━━━━  ━━━━━━━━━
   isAscii short             35,991,851    41,781,518    +16.09%
  ──────────────────────  ──────────────  ────────────  ─────────
   isAscii long              36,288,837    41,568,699    +14.55%
  ──────────────────────  ──────────────  ────────────  ─────────
   isUtf8 regular short      62,235,173    77,679,543    +24.82%
  ──────────────────────  ──────────────  ────────────  ─────────
   isUtf8 unicode short      54,727,530    67,448,276    +23.24%
  ──────────────────────  ──────────────  ────────────  ─────────
   isUtf8 regular long       21,936,234    23,323,246     +6.32%
  ──────────────────────  ──────────────  ────────────  ─────────
   isUtf8 unicode long        1,415,756     1,424,730     +0.63%

main:

./node benchmark/run.js --filter buffer-isutf8 --filter buffer-isascii buffers
buffers/buffer-isascii.js
buffers/buffer-isascii.js input="hello world" length="short" n=20000000: 36,815,465.272773266
buffers/buffer-isascii.js input="hello world" length="long" n=20000000: 36,183,770.13174511
buffers/buffer-isascii.js input="hello world" length="short" n=20000000: 36,618,408.943863295
buffers/buffer-isascii.js input="hello world" length="long" n=20000000: 35,746,630.542265736
buffers/buffer-isascii.js input="hello world" length="short" n=20000000: 36,457,306.78481872
buffers/buffer-isascii.js input="hello world" length="long" n=20000000: 36,401,698.177805796

buffers/buffer-isutf8.js
buffers/buffer-isutf8.js input="regular string" length="short" n=20000000: 62,364,681.698319875
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="short" n=20000000: 55,971,859.319343686
buffers/buffer-isutf8.js input="regular string" length="long" n=20000000: 22,072,747.830241162
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="long" n=20000000: 1,428,668.0720477349
buffers/buffer-isutf8.js input="regular string" length="short" n=20000000: 61,692,629.901998825
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="short" n=20000000: 55,769,467.558882594
buffers/buffer-isutf8.js input="regular string" length="long" n=20000000: 21,656,043.985478327
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="long" n=20000000: 1,411,666.258768642
buffers/buffer-isutf8.js input="regular string" length="short" n=20000000: 60,205,383.199316956
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="short" n=20000000: 54,345,813.96273032
buffers/buffer-isutf8.js input="regular string" length="long" n=20000000: 21,706,677.55977208
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="long" n=20000000: 1,417,130.332308522

branch:

./node benchmark/run.js --filter buffer-isutf8 --filter buffer-isascii buffers
buffers/buffer-isascii.js
buffers/buffer-isascii.js input="hello world" length="short" n=20000000: 42,042,571.75319534
buffers/buffer-isascii.js input="hello world" length="long" n=20000000: 41,993,784.33620464
buffers/buffer-isascii.js input="hello world" length="short" n=20000000: 41,651,213.719283365
buffers/buffer-isascii.js input="hello world" length="long" n=20000000: 41,547,685.95250074
buffers/buffer-isascii.js input="hello world" length="short" n=20000000: 41,650,769.262218766
buffers/buffer-isascii.js input="hello world" length="long" n=20000000: 41,164,626.14506766

buffers/buffer-isutf8.js
buffers/buffer-isutf8.js input="regular string" length="short" n=20000000: 78,129,501.6021431
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="short" n=20000000: 67,884,812.56127538
buffers/buffer-isutf8.js input="regular string" length="long" n=20000000: 23,347,893.348125998
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="long" n=20000000: 1,438,952.7056895334
buffers/buffer-isutf8.js input="regular string" length="short" n=20000000: 78,384,482.64411178
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="short" n=20000000: 68,071,405.20227206
buffers/buffer-isutf8.js input="regular string" length="long" n=20000000: 23,444,653.604879595
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="long" n=20000000: 1,421,397.567430693
buffers/buffer-isutf8.js input="regular string" length="short" n=20000000: 76,524,646.0555287
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="short" n=20000000: 66,388,609.0424605
buffers/buffer-isutf8.js input="regular string" length="long" n=20000000: 23,177,192.30868906
buffers/buffer-isutf8.js input="∀x∈ℝ: ⌈x⌉ = −⌊−x⌋" length="long" n=20000000: 1,413,840.9062698097

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jun 27, 2026
Signed-off-by: Gürgün Dayıoğlu <hey@gurgun.day>
@gurgunday gurgunday force-pushed the feat/fast-api-isUtf8 branch from 9929ec2 to 0217108 Compare June 27, 2026 11:40
@gurgunday gurgunday added the performance Issues and PRs related to the performance of Node.js. label Jun 27, 2026
@gurgunday

Copy link
Copy Markdown
Member Author

Cc @nodejs/performance

@gurgunday gurgunday added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 29, 2026
@github-actions github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 29, 2026
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

Comment thread src/node_buffer.cc
static bool ValidateUtf8(Local<Value> value, bool* was_detached) {
ArrayBufferViewContents<char> abv(value);
*was_detached = abv.WasDetached();
return !*was_detached && simdutf::validate_utf8(abv.data(), abv.length());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would definitely recommend returning an std::pair<bool, bool> or just outright a struct here instead of a return value and a separate out-parameter boolean

Comment thread src/node_buffer.cc
ArrayBufferViewContents<char> abv(value);
*was_detached = abv.WasDetached();
return !*was_detached &&
!simdutf::validate_ascii_with_errors(abv.data(), abv.length()).error;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants