feat(webapp,run-engine): queue metrics and health dashboard#4131
feat(webapp,run-engine): queue metrics and health dashboard#4131ericallam wants to merge 22 commits into
Conversation
…signals Gauges are read inside the enqueue/dequeue Lua and returned on the script reply as a 2-tuple; counters are cumulative odometers. The run-queue Redis carries no metrics stream of its own.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds queue-metrics ingestion, storage, query, and UI support. It introduces a Redis/ClickHouse metrics pipeline package, ClickHouse queue-metrics tables and query helpers, run-queue emission hooks, gap-filling support in TSQL, and new webapp admin, dashboard, list, and detail routes. It also adds environment and feature-flag gating, seed tooling, and tests across the pipeline and query layers. Related PRs: None found. Suggested labels: enhancement, area: webapp, area: run-engine, area: internal-packages Suggested reviewers: ericallam, matt-aitken 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…counters entryOrderKey returns a string built with BigInt math so ordering stays correct at real epoch magnitudes. Odometer keys are namespaced by definition name. The consumer reports null lag for a missing consumer group instead of 0, and empty gauge values parse as NaN rather than 0.
…ng order keys The wait-time quantile materialized view now excludes wait_ms = 0 rows so it matches the count aggregation. order_key accepts a string or a number. Migration comments no longer contain semicolons that split the migration into invalid statements.
…rride The queues list tolerates a metrics query failure by rendering without metrics and logging a warning. UsageSparkline renders its total override even when every bucket is zero. The queue detail page returns 404 and its loader skips the metrics query when the feature flag is off. The seed script validates bucket size and only writes ClickHouse against a local host.
A bucket-led ORDER BY DESC combined with fillGaps emitted an ascending WITH FILL (positive step, ascending bounds), which produces invalid or empty fills. Skip the gap-fill rewrite for descending orders and let the plain descending query stand. Adds a DESC fillGaps test.
Packs the stream sequence with a 1e6 factor (was 1e5) so up to 1M entries per millisecond per shard fit before a seq could spill into the next millisecond's range, far above what a single Redis stream can produce. ms*1e6 stays within UInt64. Also fixes the webapp mapping test that still expected a numeric order_key after the switch to a BigInt-derived string.
There was a problem hiding this comment.
🚩 ClickHouse migration 035 numbering should be verified against latest main
Per REVIEW.md and internal-packages/clickhouse/CLAUDE.md, ClickHouse migrations use Goose strict mode — a migration numbered below the current applied version blocks the deploy. This file is numbered 035. If main has added migrations since this branch diverged, the number may need to be bumped. The DDL itself is correctly idempotent (CREATE TABLE IF NOT EXISTS, CREATE MATERIALIZED VIEW IF NOT EXISTS, DROP VIEW IF EXISTS, DROP TABLE IF EXISTS).
Was this helpful? React with 👍 or 👎 to provide feedback.
The queues list and queue detail pages now use the shared TimeFilter (any preset period or a custom date range) and everything on the page follows it: header tiles, per queue metric columns, charts, and stats. The custom period buttons, hand rolled chart cards, and duplicated metric fetch loops are replaced by the ChartCard and Chart primitives, UsageSparkline, and a shared useMetricResourceQuery hook. The ClickHouse list queries take an explicit end bound so fixed ranges query only their window.
Queries using deltaSumTimestampMerge failed with an unknown function error, which broke the queue detail stats and the started counts on the built in Queues dashboard.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx (1)
196-199: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicate ClickHouse-time-to-ms helper.
clickhouseTimeToMsis functionally identical totileTimeToMsalready defined in the sibling queues list route. Since this PR's stated goal is consolidating duplicated chart/query logic into shared primitives, consider moving this into a shared utility (e.g. alongsidebuildActivityTimeAxis) instead of duplicating it per-route.♻️ Proposed consolidation
// e.g. ~/components/primitives/charts/clickhouseTime.ts export function clickhouseTimeToMs(value: unknown): number { const s = String(value).replace(" ", "T"); return Date.parse(s.endsWith("Z") ? s : `${s}Z`); }Then import it from both routes instead of redefining it locally.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4d692688-1a7b-4baa-8a6c-bc8da9264541
📒 Files selected for processing (7)
apps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsxapps/webapp/app/routes/admin.queue-metrics.tsxinternal-packages/clickhouse/src/queueMetrics.tsinternal-packages/tsql/src/query/functions.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- apps/webapp/app/routes/admin.queue-metrics.tsx
- internal-packages/clickhouse/src/queueMetrics.ts
- apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
📜 Review details
⏰ Context from checks skipped due to timeout. (1)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: Access environment variables through theenvexport ofenv.server.tsinstead of directly accessingprocess.env
Use subpath exports from@trigger.dev/corepackage instead of importing from the root@trigger.dev/corepathAlways use
findFirstinstead offindUniquefor Prisma queries.
Files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}: Usepnpm run typecheckfor changes in apps (apps/*) and internal packages (internal-packages/*), and never usebuildto verify those changes.
Use Vitest for tests, and never mock anything; use testcontainers instead.
Prefer static imports over dynamicimport(), and only use dynamic imports for unresolved circular dependencies, genuine code-splitting needs, or conditional runtime loading.
Files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs,md,mdx}
📄 CodeRabbit inference engine (CLAUDE.md)
Always import from
@trigger.dev/sdkwhen writing Trigger.dev tasks; never use@trigger.dev/sdk/v3or deprecatedclient.defineJob.
Files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
apps/webapp/app/routes/**/*.tsx
📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)
Use Remix flat-file route naming with dot-separated segments in
app/routes/(for example,api.v1.tasks.$taskId.trigger.tsmaps to/api/v1/tasks/:taskId/trigger).
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
apps/webapp/**/*.tsx
📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)
apps/webapp/**/*.tsx: Only useuseCallbackanduseMemofor context provider values, expensive derived data used as dependencies, or stable refs required by a dependency array; do not wrap ordinary event handlers or trivial computations.
Use named constants for sentinel or placeholder values instead of scattering raw string literals through comparisons.
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
🧠 Learnings (22)
📚 Learning: 2026-02-06T19:53:38.843Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2994
File: apps/webapp/app/presenters/v3/DeploymentListPresenter.server.ts:233-237
Timestamp: 2026-02-06T19:53:38.843Z
Learning: When constructing Vercel dashboard URLs from deployment IDs, always strip the dpl_ prefix from the ID. Implement this by transforming the ID with .replace(/^dpl_/, "") before concatenating into the URL: https://vercel.com/${teamSlug}/${projectName}/${cleanedDeploymentId}. Consider centralizing this logic in a small helper (e.g., getVercelDeploymentId(id) or a URL builder) and add tests to verify both prefixed and non-prefixed inputs.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.
Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.
Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-06-25T18:21:51.905Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4039
File: apps/webapp/app/routes/invite-revoke.tsx:0-0
Timestamp: 2026-06-25T18:21:51.905Z
Learning: During the Zod v4 migration in the triggerdotdev/trigger.dev webapp, ensure any imports from `conform-to/zod` use the Zod-4 subpath: `conform-to/zod/v4` (e.g., `import { parseWithZod } from "conform-to/zod/v4"`). Do not import from the package root `conform-to/zod`, because it is the Zod 3 implementation and may load Zod-3-only symbols (e.g., `ZodBranded`, `ZodEffects`), which can throw at module load (notably with `zod4.4.3`). This should be enforced across `apps/webapp/**/*` where helpers like `parseWithZod` and `conformZodMessage` are used.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-07-03T17:10:21.498Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 4148
File: apps/webapp/app/models/orgMember.server.ts:149-168
Timestamp: 2026-07-03T17:10:21.498Z
Learning: In triggerdotdev/trigger.dev, `User.email` (Prisma schema: `internal-packages/database/prisma/schema.prisma`) currently does NOT use `citext` and does NOT have a `lower(email)` functional unique index. Therefore, do not introduce Prisma queries like `where: { email: { equals: <value>, mode: "insensitive" } }` (or any case-insensitive lookup) against `User.email`, because it can force sequential scans of the `users` table under load. During review, ensure email is normalized (e.g., lowercased/trimmed) before both writes and subsequent lookups, and if true case-insensitive behavior/uniqueness is required, implement it via a separate app-wide migration (e.g., switch to `citext` and/or add a functional unique index with backfill) rather than bolting it onto individual feature PRs.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsapps/webapp/app/hooks/useMetricResourceQuery.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.tsinternal-packages/tsql/src/query/functions.tsapps/webapp/app/hooks/useMetricResourceQuery.ts
📚 Learning: 2026-06-21T05:35:23.468Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 4005
File: apps/webapp/app/presenters/v3/ApiErrorListPresenter.server.ts:29-30
Timestamp: 2026-06-21T05:35:23.468Z
Learning: For triggerdotdev/trigger.dev list endpoints (and their presenters/handlers that implement list pagination), it is an established shared convention to allow both cursor query params `page[after]` and `page[before]` to be provided at the same time. When both are present, `page[before]` must take precedence (i.e., it should be used/wins). During code review, do NOT flag missing per-endpoint mutual-exclusion validation between `page[after]` and `page[before]` as a problem; if stricter enforcement is ever desired, it should be implemented as a codebase-wide shared convention (not individually per endpoint).
Applied to files:
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.ts
📚 Learning: 2026-02-03T18:27:40.429Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2994
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.environment-variables/route.tsx:553-555
Timestamp: 2026-02-03T18:27:40.429Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.environment-variables/route.tsx, the menu buttons (e.g., Edit with PencilSquareIcon) in the TableCellMenu are intentionally icon-only with no text labels as a compact UI pattern. This is a deliberate design choice for this route; preserve the icon-only behavior for consistency in this file.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-02-11T16:37:32.429Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3019
File: apps/webapp/app/components/primitives/charts/Card.tsx:26-30
Timestamp: 2026-02-11T16:37:32.429Z
Learning: In projects using react-grid-layout, avoid relying on drag-handle class to imply draggability. Ensure drag-handle elements only affect dragging when the parent grid item is configured draggable in the layout; conditionally apply cursor styles based on the draggable prop. This improves correctness and accessibility.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-04-02T19:18:26.255Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3319
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions/route.tsx:179-189
Timestamp: 2026-04-02T19:18:26.255Z
Learning: In this repo’s route components that render the Inspector `ResizablePanelGroup` panels, it’s acceptable to pass `collapsed={!isShowingInspector}` together with a no-op `onCollapseChange={() => {}}` when panel visibility is intentionally controlled only by route parameters (e.g., `*Param` search/route params) rather than user drag/collapse interactions. Do not flag an empty/no-op `onCollapseChange` as “missing wiring” in these cases; only flag it when collapse state is expected to change based on user interaction.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-05-12T21:04:00.184Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.sessions._index/route.tsx:40-42
Timestamp: 2026-05-12T21:04:00.184Z
Learning: In triggerdotdev/trigger.dev route loader implementations (Remix `route.tsx` files under `apps/webapp/app/routes/**`), follow the existing convention for missing/unauthorized environment lookups: when `findEnvironmentBySlug` (or the equivalent env resolver) returns a falsy value, handle it by throwing `new Error("Environment not found")` rather than returning a `404` `Response` (i.e., do not flag this as “missing 404 response”). Changing the error-to-404 convention is a cross-cutting refactor and should be left out of individual PRs unless the PR explicitly addresses that broader migration.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-05-08T21:00:20.973Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3538
File: apps/webapp/app/components/primitives/Resizable.tsx:60-78
Timestamp: 2026-05-08T21:00:20.973Z
Learning: In the triggerdotdev/trigger.dev codebase, treat Zod as a boundary validation tool (API handlers, request/response validation, and storage/DB read/write validation), not as inline render-time validation inside React components/primitive UI code. For render-time guards, prefer small manual type-narrowing checks (e.g., a short predicate like ~10–20 lines) over importing Zod into UI primitives, to avoid per-render schema-parse overhead and unnecessary abstraction. Use the manual guard approach unless you truly need schema validation at a boundary; only then introduce Zod.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-06-25T18:21:55.847Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4039
File: apps/webapp/app/routes/invite-resend.tsx:0-0
Timestamp: 2026-06-25T18:21:55.847Z
Learning: In the triggerdotdev/trigger.dev Zod 4 migration, avoid importing from the root package `conform-to/zod` in webapp code. It can resolve to the Zod 3 build and may crash at module load under Zod 4. When reviewing TypeScript/TSX files in `apps/webapp`, prefer importing from the Zod 4 subpath `conform-to/zod/v4` for Zod 4-compatible schemas/types.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
📚 Learning: 2026-06-25T18:21:54.729Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4039
File: apps/webapp/app/routes/confirm-basic-details.tsx:0-0
Timestamp: 2026-06-25T18:21:54.729Z
Learning: For Remix + TypeScript files that use Conform v1 (conform-to/react) and its getInputProps helper, when you intend to suppress the helper-provided default value for non-checkbox/non-radio inputs (e.g., hidden inputs managed via an explicit value prop), use the Conform v1 option key `value: false`. Do not recommend `defaultValue: false` here, because `defaultValue` is not a valid option key for these input types in Conform v1 typings.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx
🔇 Additional comments (12)
apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.ts (3)
1-27: LGTM!
129-135: LGTM!
107-107: 🗄️ Data Integrity & IntegrationNo coercion needed for
depth/peak_queued
internal-packages/clickhouse/src/queueMetrics.tsalready parses bothdepthandpeak_queuedwithz.coerce.number(), so the presenter receives numbers here.> Likely an incorrect or invalid review comment.internal-packages/tsql/src/query/functions.ts (1)
653-658: LGTM! Confirmed the-Mergecombinator takes a single state-column argument, matchingminArgs: 1, maxArgs: 1, and this entry is actually consumed by the newQueueStatsquery (deltaSumTimestampMerge(started_delta)).apps/webapp/app/hooks/useMetricResourceQuery.ts (2)
1-45: LGTM!
48-109: LGTM!apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues_.$queueParam/route.tsx (6)
1-74: LGTM!
76-90: LGTM!
92-176: LGTM!Also applies to: 178-189
201-276: LGTM!
278-324: LGTM! Query correctly consumes the newly-registereddeltaSumTimestampMergeaggregate, and the label matches the actual period-wide p95 semantics per the prior review discussion.
326-354: LGTM!
| refreshIntervalMs = 60_000, | ||
| } = opts; | ||
| const { period, from, to } = opts.timeRange; | ||
| const queuesKey = opts.queues && opts.queues.length > 0 ? opts.queues.join(",") : undefined; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Comma-joined queue names round-tripped through split(",") can misparse names containing a literal comma.
queuesKey is joined with "," purely to give useCallback a stable primitive dependency, but it's then split back into the array sent in the request body. A queue name containing a comma would be incorrectly split into extra entries, corrupting the queues scope sent to /resources/metric.
🐛 Proposed fix
- ...(queuesKey !== undefined ? { queues: queuesKey.split(",") } : {}),
+ ...(opts.queues && opts.queues.length > 0 ? { queues: opts.queues } : {}),Also applies to: 63-67
| const rangeSeconds = Math.max(60, Math.round((to.getTime() - from.getTime()) / 1000)); | ||
| const bucketSeconds = Math.max(60, Math.round(rangeSeconds / SPARKLINE_POINTS)); | ||
| const numBuckets = Math.max(1, Math.ceil(rangeSeconds / bucketSeconds)); | ||
| const gridStartSeconds = | ||
| Math.floor(Math.floor(from.getTime() / 1000) / bucketSeconds) * bucketSeconds; | ||
| const bucketStartMs = gridStartSeconds * 1000; | ||
| const bucketIntervalMs = bucketSeconds * 1000; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE='apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.ts'
echo '--- outline ---'
ast-grep outline "$FILE" --view expanded || true
echo '--- relevant lines ---'
nl -ba "$FILE" | sed -n '1,220p'Repository: triggerdotdev/trigger.dev
Length of output: 604
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
import math
SPARKLINE_POINTS = 30
def current_num_buckets(from_s, to_s):
range_seconds = max(60, round((to_s - from_s)))
bucket_seconds = max(60, round(range_seconds / SPARKLINE_POINTS))
num_buckets = max(1, math.ceil(range_seconds / bucket_seconds))
grid_start = math.floor(math.floor(from_s) / bucket_seconds) * bucket_seconds
return range_seconds, bucket_seconds, num_buckets, grid_start
def needed_num_buckets(from_s, to_s, bucket_seconds):
grid_start = math.floor(math.floor(from_s) / bucket_seconds) * bucket_seconds
return max(1, math.ceil((math.floor(to_s) - grid_start) / bucket_seconds))
cases = [
(61, 121), # aligned-ish
(62, 122),
(100, 160),
(100, 200),
(59, 119),
(0, 60),
(1, 61),
(3599, 3659),
(3599, 7199),
(1000, 4000),
(1000.2, 4000.7),
]
for f, t in cases:
range_seconds, bucket_seconds, num_buckets, grid_start = current_num_buckets(f, t)
need = needed_num_buckets(f, t, bucket_seconds)
print(f"from={f}, to={t}, range={range_seconds}, bucket={bucket_seconds}, grid_start={grid_start}, current={num_buckets}, needed={need}, diff={need-num_buckets}")
PYRepository: triggerdotdev/trigger.dev
Length of output: 1089
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE='apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.ts'
echo '--- file size ---'
wc -l "$FILE"
echo '--- lines 40-130 ---'
nl -ba "$FILE" | sed -n '40,130p'
echo '--- search for numBuckets / index checks ---'
rg -n "numBuckets|gridStartSeconds|bucketStartMs|bucketIntervalMs|index >= numBuckets|index < 0" "$FILE"Repository: triggerdotdev/trigger.dev
Length of output: 309
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE='apps/webapp/app/presenters/v3/QueueMetricsPresenter.server.ts'
echo '--- lines 30-130 ---'
sed -n '30,130p' "$FILE"
echo '--- lines 130-136 ---'
sed -n '130,136p' "$FILE"
echo '--- exact bucket math with SPARKLINE_POINTS=48 ---'
python3 - <<'PY'
import math
SPARKLINE_POINTS = 48
def current(from_s, to_s):
range_seconds = max(60, round((to_s - from_s)))
bucket_seconds = max(60, round(range_seconds / SPARKLINE_POINTS))
num_buckets = max(1, math.ceil(range_seconds / bucket_seconds))
grid_start = math.floor(math.floor(from_s) / bucket_seconds) * bucket_seconds
needed = max(1, math.ceil((math.floor(to_s) - grid_start) / bucket_seconds))
return range_seconds, bucket_seconds, grid_start, num_buckets, needed
for f, t in [(61,121), (62,122), (100,160), (100,200), (59,119), (0,60), (1,61), (3599,3659), (3599,7199), (1000,4000), (1000.2,4000.7)]:
rs, bs, gs, cur, need = current(f, t)
print(f"from={f}, to={t}, range={rs}, bucket={bs}, grid_start={gs}, current={cur}, needed={need}, diff={need-cur}")
PYRepository: triggerdotdev/trigger.dev
Length of output: 5055
Size numBuckets from gridStartSeconds to to. gridStartSeconds can back up into the previous bucket, so ceil(rangeSeconds / bucketSeconds) can leave the grid one slot short and drop the last sparkline point at index >= numBuckets. Derive the bucket count from the aligned start instead.
Summary
Adds per-queue observability to the Queues page: depth (backlog), throughput (enqueued, started, completed), concurrency, whether a queue is throttled, and the scheduling delay (how long runs wait between becoming eligible and actually starting). Each queue shows health at a glance in the list, plus a per-queue detail page with charts, so you can answer "does this queue have enough concurrency to keep up?".
Both the data collection and the dashboard are off by default and gated independently: metric emission is a global switch, and the dashboard is turned on per organization. With both off, the Queues page is unchanged.
Design
Queue operations emit two kinds of signal. Gauges (depth, running, limit, throttled) are read inside the same Redis script that performs the enqueue or dequeue, so the reading is atomic, and returned on the script's reply for the app to forward. Counters (enqueued, started, completed) are cumulative odometers, so a dropped reading self-heals: the next one restates the running total. Both land on one Redis stream on a dedicated metrics instance (falling back to the run queue's Redis when self-hosting), drain through a consumer into ClickHouse (raw, a 10-second-bucket materialized view, and a 30-day aggregate), and the dashboards read the aggregate. The run queue's own Redis carries no metrics stream.
The one change that is live the moment this deploys, independent of both flags, is the enqueue/dequeue script reply shape: those scripts now return a 2-tuple so the gauge reading can ride back to the app. That path is exercised on every queue op, so it is the part of
run-engineworth the closest review.