Skip to content

Commit 2f93205

Browse files
fix(execution): run pptx/docx/pdf generation inside isolated-vm sandbox (#4217)
* fix(execution): run pptx/docx/pdf generation inside isolated-vm sandbox Retires the legacy doc-worker.cjs / pptx-worker.cjs pipeline that ran user DSL via node:vm + full require() in the same UID/PID namespace as the main Next.js process. User code now runs inside the existing isolated-vm pool (V8 isolate, no process / require / fs, no /proc/1/environ reachability). Introduces a first-class SandboxTask abstraction under apps/sim/sandbox-tasks/ that mirrors apps/sim/background/ — one file per task, central typed registry, kebab-case ids. Adding a new thing that runs in the isolate is one file plus one registry entry. Runtime additions in lib/execution/: - task-mode execution in isolated-vm-worker.cjs: load pre-built library bundles, run task bootstrap, run user code, run finalize, transfer Uint8Array result as base64 via IPC - named broker IPC bridge (generalizes the existing fetch bridge) with args size, result size, and per-execution call caps - cooperative AbortSignal support: cancel IPC disposes the isolate, pool slot is freed, pending broker-call timers are swept - compiled scripts + references explicitly released per execution - isolate.isDisposed used for cancellation detection (no error-string substring matching) Library bundles (pptxgenjs, docx, pdf-lib) are built into isolate-safe IIFE bundles by apps/sim/lib/execution/sandbox/bundles/build.ts and committed; next.config.ts / trigger.config.ts / Dockerfile updated to ship them instead of the deleted dist/*-worker.cjs artifacts. Call sites migrated: - app/api/workspaces/[id]/pptx/preview/route.ts - app/api/files/serve/[...path]/route.ts (+ test mock) - lib/copilot/tools/server/files/{workspace-file,edit-content}.ts All pass owner key user:<userId> for per-user pool fairness + distributed lease accounting. Made-with: Cursor * improvement(sandbox): delegate timers to Node, add phase timings + saturation logs Follow-ups on top of the isolated-vm migration (da14027): Timer delegation (laverdet/isolated-vm#136 recommended pattern): - setTimeout / setInterval / clearTimeout / clearImmediate delegate to Node's real timer heap via ivm.Reference. Real delays are honored; clearTimeout actually cancels; ms is clamped to the script timeout so callbacks can't fire after the isolate is disposed. - Per-execution timer tracking + dispose-sweep in finally. Zero stale callbacks post-dispose. - unwrapPrimitive helper normalizes ivm.Reference-wrapped primitives (arguments: { reference: true } applies uniformly to all args). - _polyfills.ts shrinks from ~130 lines to the global->globalThis alias. Timers / TextEncoder / TextDecoder / console all install per-execution from the worker via ivm bridges. AbortSignal race fix (pre-existing bug surfaced by the timer smoke): - Listener is registered after await tryAcquireDistributedLease. If the signal aborted during that ~200ms window (Redis down), AbortSignal doesn't fire listeners registered after the fact — the abort was silently missed. Now re-checks signal.aborted synchronously after addEventListener. Observability: - executeTask returns IsolatedVMTaskTimings (setup, runtimeBootstrap, bundles, brokerInstall, taskBootstrap, harden, userCode, finalize, total) in every success + error path. run-task.ts logs these with workspaceId + queueMs so 'which tenant is slow' is queryable. - Pool saturation events now emit structured logger.warn with reason codes: queue_full_global, queue_full_owner, queue_wait_timeout, distributed_lease_limit. Matches the existing broker reject pattern. Security policy: - New .cursor/rules/sim-sandbox.mdc codifies the hard rules for the worker process: no app credentials, all credentialed work goes through host-side brokers, every broker scopes by workspaceId. Pre-merge checklist for future changes to isolated-vm-worker.cjs. Measured phase breakdown (local smoke, Redis down): pptx wall=~310ms with bundles=~16ms, finalize=~83ms; docx ~290ms / 17ms / 70ms; pdf ~235ms / 17ms / 5ms. Bundle compilation is not the bottleneck — library finalize is. Made-with: Cursor * fix(sandbox): thread AbortSignal into runSandboxTask at every call site Three remaining callers of runSandboxTask were not threading a cancellation signal, so a client disconnect mid-compile left the pool slot occupied for the full 60s task timeout. Matching the pattern the pptx-preview route already uses. - apps/sim/app/api/files/serve/[...path]/route.ts — GET forwards `request.signal` into handleLocalFile / handleCloudProxy, which forward into compileDocumentIfNeeded, which forwards into runSandboxTask. - apps/sim/lib/copilot/tools/server/files/workspace-file.ts — passes `context.abortSignal` (transport/user stop) into runSandboxTask. - apps/sim/lib/copilot/tools/server/files/edit-content.ts — same. Smoke: simulated client disconnect at t=1000ms during a task that would otherwise have waited 10s. The pool slot unwinds at t=1002ms with AbortError; previously would have sat 60s until the task-level timeout. Made-with: Cursor * chore(build): raise node heap to 8GB for next build type-check Next.js's type-check worker OOMs at the default 4GB heap on Node 23 for this project's type graph size. Bumps the heap to 8GB only for the `next build` invocation inside `bun run build`. Docker builds are unaffected — `next.config.ts` sets `typescript.ignoreBuildErrors: true` when DOCKER_BUILD=1, which skips the type-check pass entirely. This only fixes local `bun run build`. No functional code changes. Made-with: Cursor * fix lint * refactor(copilot): dedup getDocumentFormatInfo across copilot file tools The same extension -> { formatName, sourceMime, taskId } mapping was duplicated in workspace-file.ts and edit-content.ts. Any future format or task-id change had to happen in two places. Exports getDocumentFormatInfo + DocumentFormatInfo from workspace-file.ts (which already owned the PPTX/DOCX/PDF source MIME constants) and imports it in edit-content.ts. Same source-of-truth pattern the file already uses for inferContentType. Made-with: Cursor * fix(sandbox): propagate empty-message broker/fetch errors Both bridges in the isolate used truthiness to detect host-side errors: if (response.error) throw new Error(response.error); // broker if (result.error) throw new Error(result.error); // fetch If a host handler ever threw `new Error('')`, err.message would be '' (falsy), so { error: '' } was silently swallowed and the isolate saw a successful null result. Existing call sites don't throw empty-message errors, but the pattern was structurally unsafe. Switch both to typeof check === 'string' and fall back to a default message if the string is empty, so all host-reported errors propagate into the isolate regardless of message content. Made-with: Cursor
1 parent 319e0db commit 2f93205

File tree

32 files changed

+1888
-784
lines changed

32 files changed

+1888
-784
lines changed

.cursor/rules/sim-sandbox.mdc

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
description: Isolated-vm sandbox worker security policy. Hard rules for anything that lives in the worker child process that runs user code.
3+
globs: ["apps/sim/lib/execution/isolated-vm-worker.cjs", "apps/sim/lib/execution/isolated-vm.ts", "apps/sim/lib/execution/sandbox/**", "apps/sim/sandbox-tasks/**"]
4+
---
5+
6+
# Sim Sandbox — Worker Security Policy
7+
8+
The isolated-vm worker child process at
9+
`apps/sim/lib/execution/isolated-vm-worker.cjs` runs untrusted user code inside
10+
V8 isolates. The process itself is a trust boundary. Everything in this rule is
11+
about what must **never** live in that process.
12+
13+
## Hard rules
14+
15+
1. **No app credentials in the worker process**. The worker must not hold, load,
16+
or receive via IPC: database URLs, Redis URLs, AWS keys, Stripe keys,
17+
session-signing keys, encryption keys, OAuth client secrets, internal API
18+
secrets, or any LLM / email / search provider API keys. If you catch yourself
19+
`require`'ing `@/lib/auth`, `@sim/db`, `@/lib/uploads/core/storage-service`,
20+
or anything that imports `env` directly inside the worker, stop and use a
21+
host-side broker instead.
22+
23+
2. **Host-side brokers own all credentialed work**. The worker can only access
24+
resources through `ivm.Reference` / `ivm.Callback` bridges back to the host
25+
process. Today the only broker is `workspaceFileBroker`
26+
(`apps/sim/lib/execution/sandbox/brokers/workspace-file.ts`); adding a new
27+
one requires co-reviewing this file.
28+
29+
3. **Host-side brokers must scope every resource access to a single tenant**.
30+
The `SandboxBrokerContext` always carries `workspaceId`. Any new broker that
31+
accesses storage, DB, or an external API must use `ctx.workspaceId` to scope
32+
the lookup — never accept a raw path, key, or URL from isolate code without
33+
validation.
34+
35+
4. **Nothing that runs in the isolate is trusted, even if we wrote it**. The
36+
task `bootstrap` and `finalize` strings in `apps/sim/sandbox-tasks/` execute
37+
inside the isolate. They must treat `globalThis` as adversarial — no pulling
38+
values from it that might have been mutated by user code. The hardening
39+
script in `executeTask` undefines dangerous globals before user code runs.
40+
41+
## Why
42+
43+
A V8 JIT bug (Chrome ships these roughly monthly) gives an attacker a native
44+
code primitive inside the process that owns whatever that process can reach.
45+
If the worker only holds `isolated-vm` + a single narrow workspace-file broker,
46+
a V8 escape leaks one tenant's files. If the worker holds a Stripe key or a DB
47+
connection, a V8 escape leaks the service.
48+
49+
The original `doc-worker.cjs` vulnerability (CVE-class, 225 production secrets
50+
leaked via `/proc/1/environ`) was the forcing function for this architecture.
51+
Keep the blast radius small.
52+
53+
## Checklist for changes to `isolated-vm-worker.cjs`
54+
55+
Before landing any change that adds a new `require(...)` or `process.send(...)`
56+
payload or `ivm.Reference` wrapper in the worker:
57+
58+
- [ ] Does it load a credential, key, connection string, or secret? If yes,
59+
move it host-side and expose as a broker.
60+
- [ ] Does it import from `@/lib/auth`, `@sim/db`, `@/lib/uploads/core/*`,
61+
`@/lib/core/config/env`, or any module that reads `process.env` of the
62+
main app? If yes, same — move host-side.
63+
- [ ] Does it expose a resource that's workspace-scoped without taking a
64+
`workspaceId`? If yes, re-scope.
65+
- [ ] Did you update the broker limits (`IVM_MAX_BROKER_ARGS_JSON_CHARS`,
66+
`IVM_MAX_BROKER_RESULT_JSON_CHARS`, `IVM_MAX_BROKERS_PER_EXECUTION`) if
67+
the new broker can emit large payloads or fire frequently?
68+
69+
## What the worker *may* hold
70+
71+
- `isolated-vm` module
72+
- Node built-ins: `node:fs` (only for reading the checked-in bundle `.cjs`
73+
files) and `node:path`
74+
- The three prebuilt library bundles under
75+
`apps/sim/lib/execution/sandbox/bundles/*.cjs`
76+
- IPC message handlers for `execute`, `cancel`, `fetchResponse`,
77+
`brokerResponse`
78+
79+
The worker deliberately has **no host-side logger**. All errors and
80+
diagnostics flow through IPC back to the host, which has `@sim/logger`. Do
81+
not add `createLogger` or console-based logging to the worker — it would
82+
require pulling the main app's config / env, which is exactly what this
83+
rule is preventing.
84+
85+
Anything else is suspect.

apps/sim/app/api/files/serve/[...path]/route.test.ts

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,10 +75,12 @@ vi.mock('@/lib/uploads/utils/file-utils', () => ({
7575

7676
vi.mock('@/lib/uploads/setup.server', () => ({}))
7777

78-
vi.mock('@/lib/execution/doc-vm', () => ({
79-
generatePdfFromCode: vi.fn().mockResolvedValue(Buffer.from('%PDF-compiled')),
80-
generateDocxFromCode: vi.fn().mockResolvedValue(Buffer.from('PK\x03\x04compiled')),
81-
generatePptxFromCode: vi.fn().mockResolvedValue(Buffer.from('PK\x03\x04compiled')),
78+
vi.mock('@/lib/execution/sandbox/run-task', () => ({
79+
runSandboxTask: vi
80+
.fn()
81+
.mockImplementation(async (taskId: string) =>
82+
taskId === 'pdf-generate' ? Buffer.from('%PDF-compiled') : Buffer.from('PK\x03\x04compiled')
83+
),
8284
}))
8385

8486
vi.mock('@/lib/uploads/contexts/workspace/workspace-file-manager', () => ({

apps/sim/app/api/files/serve/[...path]/route.ts

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,7 @@ import { createLogger } from '@sim/logger'
44
import type { NextRequest } from 'next/server'
55
import { NextResponse } from 'next/server'
66
import { checkSessionOrInternalAuth } from '@/lib/auth/hybrid'
7-
import {
8-
generateDocxFromCode,
9-
generatePdfFromCode,
10-
generatePptxFromCode,
11-
} from '@/lib/execution/doc-vm'
7+
import { runSandboxTask } from '@/lib/execution/sandbox/run-task'
128
import { CopilotFiles, isUsingCloudStorage } from '@/lib/uploads'
139
import type { StorageContext } from '@/lib/uploads/config'
1410
import { parseWorkspaceFileKey } from '@/lib/uploads/contexts/workspace/workspace-file-manager'
@@ -22,6 +18,7 @@ import {
2218
findLocalFile,
2319
getContentType,
2420
} from '@/app/api/files/utils'
21+
import type { SandboxTaskId } from '@/sandbox-tasks/registry'
2522

2623
const logger = createLogger('FilesServeAPI')
2724

@@ -30,24 +27,24 @@ const PDF_MAGIC = Buffer.from([0x25, 0x50, 0x44, 0x46, 0x2d]) // %PDF-
3027

3128
interface CompilableFormat {
3229
magic: Buffer
33-
compile: (code: string, workspaceId: string) => Promise<Buffer>
30+
taskId: SandboxTaskId
3431
contentType: string
3532
}
3633

3734
const COMPILABLE_FORMATS: Record<string, CompilableFormat> = {
3835
'.pptx': {
3936
magic: ZIP_MAGIC,
40-
compile: generatePptxFromCode,
37+
taskId: 'pptx-generate',
4138
contentType: 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
4239
},
4340
'.docx': {
4441
magic: ZIP_MAGIC,
45-
compile: generateDocxFromCode,
42+
taskId: 'docx-generate',
4643
contentType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
4744
},
4845
'.pdf': {
4946
magic: PDF_MAGIC,
50-
compile: generatePdfFromCode,
47+
taskId: 'pdf-generate',
5148
contentType: 'application/pdf',
5249
},
5350
}
@@ -65,8 +62,10 @@ function compiledCacheSet(key: string, buffer: Buffer): void {
6562
async function compileDocumentIfNeeded(
6663
buffer: Buffer,
6764
filename: string,
68-
workspaceId?: string,
69-
raw?: boolean
65+
workspaceId: string | undefined,
66+
raw: boolean,
67+
ownerKey: string | undefined,
68+
signal: AbortSignal | undefined
7069
): Promise<{ buffer: Buffer; contentType: string }> {
7170
if (raw) return { buffer, contentType: getContentType(filename) }
7271

@@ -90,7 +89,11 @@ async function compileDocumentIfNeeded(
9089
return { buffer: cached, contentType: format.contentType }
9190
}
9291

93-
const compiled = await format.compile(code, workspaceId || '')
92+
const compiled = await runSandboxTask(
93+
format.taskId,
94+
{ code, workspaceId: workspaceId || '' },
95+
{ ownerKey, signal }
96+
)
9497
compiledCacheSet(cacheKey, compiled)
9598
return { buffer: compiled, contentType: format.contentType }
9699
}
@@ -153,10 +156,10 @@ export async function GET(
153156
const userId = authResult.userId
154157

155158
if (isUsingCloudStorage()) {
156-
return await handleCloudProxy(cloudKey, userId, raw)
159+
return await handleCloudProxy(cloudKey, userId, raw, request.signal)
157160
}
158161

159-
return await handleLocalFile(cloudKey, userId, raw)
162+
return await handleLocalFile(cloudKey, userId, raw, request.signal)
160163
} catch (error) {
161164
logger.error('Error serving file:', error)
162165

@@ -171,8 +174,10 @@ export async function GET(
171174
async function handleLocalFile(
172175
filename: string,
173176
userId: string,
174-
raw: boolean
177+
raw: boolean,
178+
signal: AbortSignal | undefined
175179
): Promise<NextResponse> {
180+
const ownerKey = `user:${userId}`
176181
try {
177182
const contextParam: StorageContext | undefined = inferContextFromKey(filename) as
178183
| StorageContext
@@ -205,7 +210,9 @@ async function handleLocalFile(
205210
rawBuffer,
206211
displayName,
207212
workspaceId,
208-
raw
213+
raw,
214+
ownerKey,
215+
signal
209216
)
210217

211218
logger.info('Local file served', { userId, filename, size: fileBuffer.length })
@@ -225,8 +232,10 @@ async function handleLocalFile(
225232
async function handleCloudProxy(
226233
cloudKey: string,
227234
userId: string,
228-
raw = false
235+
raw = false,
236+
signal: AbortSignal | undefined = undefined
229237
): Promise<NextResponse> {
238+
const ownerKey = `user:${userId}`
230239
try {
231240
const context = inferContextFromKey(cloudKey)
232241
logger.info(`Inferred context: ${context} from key pattern: ${cloudKey}`)
@@ -262,7 +271,9 @@ async function handleCloudProxy(
262271
rawBuffer,
263272
displayName,
264273
workspaceId,
265-
raw
274+
raw,
275+
ownerKey,
276+
signal
266277
)
267278

268279
logger.info('Cloud file served', {

apps/sim/app/api/workspaces/[id]/pptx/preview/route.ts

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import { createLogger } from '@sim/logger'
22
import { type NextRequest, NextResponse } from 'next/server'
33
import { getSession } from '@/lib/auth'
4-
import { generatePptxFromCode } from '@/lib/execution/doc-vm'
4+
import { runSandboxTask } from '@/lib/execution/sandbox/run-task'
55
import { verifyWorkspaceMembership } from '@/app/api/workflows/utils'
66

77
export const dynamic = 'force-dynamic'
@@ -44,7 +44,11 @@ export async function POST(req: NextRequest, { params }: { params: Promise<{ id:
4444
return NextResponse.json({ error: 'code exceeds maximum size' }, { status: 413 })
4545
}
4646

47-
const buffer = await generatePptxFromCode(code, workspaceId, req.signal)
47+
const buffer = await runSandboxTask(
48+
'pptx-generate',
49+
{ code, workspaceId },
50+
{ ownerKey: `user:${session.user.id}`, signal: req.signal }
51+
)
4852

4953
return new NextResponse(new Uint8Array(buffer), {
5054
status: 200,

apps/sim/lib/copilot/tools/server/files/edit-content.ts

Lines changed: 7 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,10 @@ import {
55
type ServerToolContext,
66
} from '@/lib/copilot/tools/server/base-tool'
77
import { toError } from '@/lib/core/utils/helpers'
8-
import {
9-
generateDocxFromCode,
10-
generatePdfFromCode,
11-
generatePptxFromCode,
12-
} from '@/lib/execution/doc-vm'
8+
import { runSandboxTask } from '@/lib/execution/sandbox/run-task'
139
import { updateWorkspaceFileContent } from '@/lib/uploads/contexts/workspace/workspace-file-manager'
1410
import { consumeLatestFileIntent } from './file-intent-store'
15-
import { inferContentType } from './workspace-file'
11+
import { getDocumentFormatInfo, inferContentType } from './workspace-file'
1612

1713
const logger = createLogger('EditContentServerTool')
1814

@@ -26,40 +22,6 @@ type EditContentResult = {
2622
data?: Record<string, unknown>
2723
}
2824

29-
function getDocumentFormatInfo(fileName: string): {
30-
isDoc: boolean
31-
formatName?: string
32-
sourceMime?: string
33-
generator?: (code: string, workspaceId: string, signal?: AbortSignal) => Promise<Buffer>
34-
} {
35-
const lowerName = fileName.toLowerCase()
36-
if (lowerName.endsWith('.pptx')) {
37-
return {
38-
isDoc: true,
39-
formatName: 'PPTX',
40-
sourceMime: 'text/x-pptxgenjs',
41-
generator: generatePptxFromCode,
42-
}
43-
}
44-
if (lowerName.endsWith('.docx')) {
45-
return {
46-
isDoc: true,
47-
formatName: 'DOCX',
48-
sourceMime: 'text/x-docxjs',
49-
generator: generateDocxFromCode,
50-
}
51-
}
52-
if (lowerName.endsWith('.pdf')) {
53-
return {
54-
isDoc: true,
55-
formatName: 'PDF',
56-
sourceMime: 'text/x-pdflibjs',
57-
generator: generatePdfFromCode,
58-
}
59-
}
60-
return { isDoc: false }
61-
}
62-
6325
export const editContentServerTool: BaseServerTool<EditContentArgs, EditContentResult> = {
6426
name: 'edit_content',
6527
async execute(params: EditContentArgs, context?: ServerToolContext): Promise<EditContentResult> {
@@ -241,7 +203,11 @@ export const editContentServerTool: BaseServerTool<EditContentArgs, EditContentR
241203

242204
if (docInfo.isDoc) {
243205
try {
244-
await docInfo.generator!(finalContent, workspaceId)
206+
await runSandboxTask(
207+
docInfo.taskId!,
208+
{ code: finalContent, workspaceId },
209+
{ ownerKey: `user:${context.userId}`, signal: context.abortSignal }
210+
)
245211
} catch (err) {
246212
const msg = toError(err).message
247213
return {

0 commit comments

Comments
 (0)