Skip to content

feat(gateway): Google Chat attachment support (image / file / audio + STT)#762

Merged
thepagent merged 8 commits into
openabdev:mainfrom
canyugs:feat/gateway-googlechat-attachments
May 15, 2026
Merged

feat(gateway): Google Chat attachment support (image / file / audio + STT)#762
thepagent merged 8 commits into
openabdev:mainfrom
canyugs:feat/gateway-googlechat-attachments

Conversation

@canyugs
Copy link
Copy Markdown
Contributor

@canyugs canyugs commented May 6, 2026

Summary

Adds inbound attachment support to the Google Chat gateway adapter (image, text file, audio) using the PR #731 base64 pattern, plus completes the audio → STT path in OAB core for all Custom Gateway adapters (Feishu, Google Chat, …).

User attaches file in Google Chat
  │
  ▼
Webhook → Gateway parses message.attachment[]
  │  returns 200 immediately (background task)
  ▼
Gateway tokio::spawn:
  - SA-token GET /v1/media/{resourceName}?alt=media (30s timeout)
  - image: resize 1200px JPEG75 + base64
  - text: 5×512KB whitelist + base64
  - audio: forward as-is + base64
  ▼
GatewayEvent.content.attachments → core
  │
  ▼  src/gateway.rs decodes per attachment_type:
  - "image"     → ContentBlock::Image  (already in main)
  - "text_file" → ContentBlock::Text   (already in main)
  - "audio"     → stt::transcribe → ContentBlock::Text  (NEW)
  │
  ▼
Agent sees image / text / voice-transcript and replies

Changes

File Change Description
gateway/src/adapters/googlechat.rs MOD GoogleChatMessage.attachment[] parsing + GoogleChatMediaRef enum (Image/File/Audio). download_googlechat_image / _file / _audio via Media API + SA token. parse_attachments branches on source (skips DRIVE_FILE) + contentType. Webhook handler spawns background task for downloads so it returns 200 within Google Chat's 30 s deadline. Per-request 30 s timeout. Text file caps: 5 count / 1 MB total (matches Discord/Slack).
src/gateway.rs MOD Add "audio" branch in attachment → ContentBlock conversion. Decodes base64 → stt::transcribe (gated on stt_config.enabled) → wraps transcript as [Voice message transcript]: …. Falls back to [Voice message — transcription failed for X] so STT failures aren't silent. Threads stt_config through GatewayParams.
src/main.rs MOD Pass cfg.stt.clone() to GatewayParams.stt_config.
docs/google-chat.md MOD Document inbound attachment behavior, size limits, and Drive limitations.

Key Design Decisions

  1. PR feat(gateway): feishu image and text file attachment support #731 base64 pattern — Google Chat Media API requires SA-token auth that the agent doesn't have; gateway downloads, compresses, and base64-encodes before sending over the existing WebSocket attachment channel. No new HTTP server needed (vs. the closed PR feat(gateway): support images and audio for LINE/Telegram #726 MediaStore proxy).

  2. Webhook returns 200 immediately — Multi-attachment downloads can exceed Google Chat's 30 s webhook deadline. Webhook handler does sync parsing then tokio::spawn for downloads + event emit, so Google Chat won't retry.

  3. Per-request 30 s timeout — A hung Media API connection can no longer block the spawned task indefinitely.

  4. Text file caps match Discord/SlackTEXT_FILE_COUNT_CAP = 5 + TEXT_TOTAL_CAP = 1 MB. Text files concatenate into the agent prompt and need an aggregate cap; image and audio are independent and have only per-file size caps. (Feishu currently caps nothing — that's an existing gap in the Feishu adapter, not a model to copy.)

  5. STT in core, not gatewaystt::transcribe already exists for Discord/Slack via download_and_transcribe. Custom Gateway audio just needed a "audio" branch in src/gateway.rs to bridge the same pipeline. No new STT infrastructure; only reuse.

  6. STT failure fallback — Returns [Voice message — transcription failed for X] as a ContentBlock::Text instead of silently dropping audio, so the agent can prompt the user to re-send.

  7. Drive-linked files skippedDRIVE_FILE source needs separate Drive API integration; left for a follow-up. UPLOADED_CONTENT (the common path) works.

Testing

E2E tested on k3s VPS with Google Chat workspace:

Scenario Result
Single image (red/blue/green PNG) PASS
Image + text → agent identifies color PASS — "Blue."
Markdown text file (.md) → agent summarizes PASS — correct one-sentence summary
Empty text + image only PASS — event still forwarded
Multiple images (2 PNG) → "Blue, green" PASS — attachment_count=2
.bin (non-whitelist extension) PASS — skipped (attachment_count=0)
48 MB PNG > 10 MB limit PASS — Content-Length early reject
Audio (.m4a) + STT via Whisper PASS — full transcript routed to agent
Smoke test after webhook bg refactor PASS — single image still replies "Blue."
cargo test — gateway 65 / core 238 PASS

Not Yet Supported

  • Drive-linked attachmentsDRIVE_FILE source skipped; needs Drive API + token.
  • Outbound media (bot sends image / file to user) — needs GatewayReply schema extension.
  • Streaming download — 25 MB audio is buffered in full; high-concurrency OOM is a known follow-up.

Breaking Changes

None. GatewayParams adds an stt_config field but cfg.stt already has Default, and the "audio" arm is gated on stt_config.enabled (default false).

Discord Discussion URL

https://discord.com/channels/1491295327620169908/1501153334042824764

🤖 Generated with Claude Code

@canyugs canyugs requested a review from thepagent as a code owner May 6, 2026 17:58
Copilot AI review requested due to automatic review settings May 6, 2026 17:58
@github-actions github-actions Bot added the pending-screening PR awaiting automated screening label May 6, 2026
@shaun-agent
Copy link
Copy Markdown
Contributor

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report ## Intent

PR #762 adds inbound attachment handling to the Google Chat gateway so users can send images, text files, and audio messages to OpenAB agents from Google Chat. It also fills a core gateway gap: custom gateway adapters can now convert inbound audio attachments into STT transcripts before passing them to the agent.

The operator-visible problem is that Google Chat users currently lose attachment context, and audio attachments from custom gateways do not reach the existing transcription path. This makes Google Chat less capable than Discord/Slack for multimodal or voice workflows.

Feat

This is primarily a feature PR with a small core integration fix.

Behavioral change:

  • Google Chat webhook events parse message.attachment[].
  • Uploaded Google Chat media is downloaded by the gateway with service-account auth.
  • Images are resized/compressed and forwarded as base64.
  • Whitelisted text files are capped, base64 encoded, and forwarded.
  • Audio files are forwarded as base64 and transcribed in core when STT is enabled.
  • Google Chat webhook handling returns 200 quickly, with media download and event emission moved into a background task.
  • Drive-linked files are explicitly skipped.
  • Docs describe supported attachment behavior and limits.

Who It Serves

Primary beneficiaries:

  • Google Chat end users who want to send images, text files, and voice messages to agents.
  • Deployers running OpenAB in Google Chat workspaces.
  • Agent runtime operators who need consistent custom gateway attachment behavior.
  • Maintainers trying to reduce adapter-specific gaps between Google Chat, Discord, Slack, and Feishu.

Rewritten Prompt

Implement inbound attachment support for the Google Chat gateway adapter.

Parse Google Chat message.attachment[] for uploaded media, skip unsupported Drive-linked attachments, and download supported media using service-account authenticated Google Chat Media API requests. Return the webhook response promptly and perform media download plus gateway event emission asynchronously so Google Chat does not retry due to webhook timeout.

Support:

  • Images: download, enforce size limits, resize to max 1200px, JPEG encode, base64 attach as image.
  • Text files: allow only safe text extensions/content types, enforce count and aggregate size caps, base64 attach as text_file.
  • Audio: download within limits, base64 attach as audio.

In core gateway attachment conversion, add support for audio attachments by decoding base64 and invoking the existing STT transcription path when STT is enabled. Convert successful transcripts to text content blocks and emit an explicit fallback text block when transcription fails.

Add focused tests for attachment parsing, caps, unsupported sources/types, core audio conversion, STT disabled behavior, and webhook fast-return behavior where practical. Update Google Chat docs with supported types, limits, and unsupported Drive files.

Merge Pitch

This is worth advancing because it closes a major Google Chat parity gap and reuses the existing gateway attachment channel instead of introducing a new media-serving subsystem. The PR also improves the shared custom gateway path by making audio attachments useful beyond Google Chat.

Risk profile is moderate. The high-risk areas are the large googlechat.rs change, background task lifecycle/error visibility, memory use from buffering media, and correctness of Google Chat media auth/download behavior. Reviewers will likely focus on whether the gateway-owned background work is observable, bounded, and testable enough before merge.

Best-Practice Comparison

Relevant OpenClaw principles:

  • Gateway-owned scheduling: Relevant. The gateway owns the delayed media download and event emission after webhook acknowledgement.
  • Durable job persistence: Partially relevant. This PR uses tokio::spawn, so work can be lost if the process exits after returning 200.
  • Isolated executions: Somewhat relevant. Media handling is separated into background work, but not isolated as durable jobs.
  • Explicit delivery routing: Relevant. Attachments are converted into typed gateway event content before core handling.
  • Retry/backoff and run logs: Relevant gap. The PR mentions timeouts and fallback behavior, but not durable retries or structured run logs for failed background media processing.

Relevant Hermes Agent principles:

  • Gateway daemon tick model: Mostly not relevant unless OpenAB wants scheduled or recoverable background gateway work.
  • File locking to prevent overlap: Not directly relevant for per-webhook media downloads.
  • Atomic writes for persisted state: Not relevant unless the background task is made durable.
  • Fresh session per scheduled run: Not relevant.
  • Self-contained prompts for scheduled tasks: Not relevant.

The main best-practice tension is that returning 200 before media processing is correct for Google Chat webhook behavior, but without durable persistence it trades retry avoidance for possible silent event loss during crashes or deploys.

Implementation Options

Option 1: Conservative merge with review hardening
Keep the current base64 gateway approach and background task model. Require targeted tests, stronger logging, clearer error paths, and validation of limits/auth behavior before merge. Defer durable media jobs and streaming downloads.

Option 2: Balanced gateway media job abstraction
Keep base64 attachments, but introduce a small internal media-processing job abstraction inside the gateway. Add structured run logs, bounded concurrency, explicit timeout/error reporting, and a clean test surface. Still avoid persistent storage for now.

Option 3: Durable media processing queue
After webhook parse, persist a media-processing job before returning 200. A gateway worker processes jobs with retry/backoff, logs outcomes, and emits events only after successful attachment processing. This aligns more closely with OpenClaw/Hermes durability patterns.

Option 4: MediaStore/proxy architecture
Revisit a media store or proxy design where the gateway stores fetched media and sends references instead of base64 payloads. Core or agents retrieve media through controlled URLs or internal APIs. This is broader and likely requires schema/API changes.

Comparison Table

Option Speed to ship Complexity Reliability Maintainability User impact Fit for OpenAB right now
1. Conservative merge with hardening High Low-Medium Medium Medium High Strong
2. Gateway media job abstraction Medium Medium Medium-High High High Strong
3. Durable media processing queue Low High High Medium-High High Medium
4. MediaStore/proxy architecture Low High High Medium High Weak-Medium

Recommendation

Advance this PR toward merge using Option 1, with a reviewer checklist focused on bounded background execution, observability, test coverage, and memory limits.

This is the right next step because the feature solves a real Google Chat usability gap without requiring a new media architecture. The PR already follows the existing base64 attachment pattern and keeps STT in core, which is the cleaner boundary.

Recommended follow-up split:

  • Merge Google Chat uploaded attachment support and core audio-to-STT bridging first.
  • Open separate issues for durable gateway media jobs, streaming downloads, Drive-linked files, and outbound media replies.
  • Consider Option 2 after merge if more adapters start needing the same background media-processing behavior.

@JARVIS-coding-Agent
Copy link
Copy Markdown
Contributor

Code Review Summary

Overall: Architecture is sound, test coverage is solid. A few items need addressing before merge.


Merge Requirements

1. Clarify media_url() path encoding (potential blocker)

media_url() percent-encodes the entire resourceName including /%2F. Google Chat Media API uses GET /v1/media/{resourceName=**} (reserved expansion), which expects slashes to be preserved.

If attachmentDataRef.resourceName is a path-style value like spaces/X/messages/Y/attachments/Z, this will 404. If it's an opaque token (e.g. AATT_xxx), current implementation is fine.

→ Please confirm the actual format of resourceName from webhook payloads.

2. Remove "env" from TEXT_EXTS whitelist

Users accidentally uploading .env files (containing API keys, DB passwords) would have their secrets exposed in the agent prompt context. Risk outweighs benefit.

3. Add fallback for base64 decode failure (src/gateway.rs:676)

STT failure has a proper fallback message, but base64 decode failure silently drops the audio with no log and no user-facing indication. Should add warn! + a fallback text block (e.g. [Voice message — decode failed for {filename}]).

4. Add fallback when STT is disabled (src/gateway.rs:676)

When stt_config.enabled = false, audio attachments fall through to _ => {} — silently discarded. This contradicts the PR's stated design of "STT failures aren't silent." Suggest adding [Voice message received — STT not enabled] or similar.

5. Concurrency cap for background download tasks (follow-up issue required)

Each attachment webhook spawns an unbounded tokio::spawn with no concurrency semaphore. A high-traffic room or attacker can stack large downloads (10MB images + 25MB audio) and exhaust memory. The 30s timeout only prevents hung connections, not volume. Please open a follow-up issue to add an adapter-level semaphore limiting concurrent media downloads.


Minimum merge criteria

  • Author confirms resourceName format → determines if perf: cache dependency build layer in Dockerfile #1 is a bug
  • Remove "env" from TEXT_EXTS
  • base64 decode failure: add warn! + fallback block
  • STT disabled: add explicit fallback text for audio
  • Open follow-up issue for concurrency cap

Reviewed by: JARVIS, FRIDAY, ULTRON (multi-agent group review)

@masami-agent
Copy link
Copy Markdown
Contributor

PR Review: #762

Reviewed at commit: 3b95c0a

Summary

  • Problem: Google Chat adapter lacks inbound attachment support (image, text file, audio)
  • Approach: Gateway downloads via Media API → compress/base64 → existing WebSocket attachment channel; STT reuses core stt::transcribe pipeline
  • Risk level: Medium

Core Assessment

  1. Problem clearly stated: ✅ — detailed architecture diagram, design decisions, and test matrix in PR description
  2. Approach appropriate: ✅ — follows PR feat(gateway): feishu image and text file attachment support #731 base64 pattern, no new HTTP server needed
  3. Alternatives considered: ✅ — explicitly compared against closed PR feat(gateway): support images and audio for LINE/Telegram #726 MediaStore proxy approach
  4. Best approach for now: ✅ — minimal change, reuses existing STT infrastructure

Findings

🟡 NIT

N1: media_url percent-encodes / — may not match Google API {+name} URI template semantics

  • Where: gateway/src/adapters/googlechat.rs, media_url() function
  • What: Google Media API uses /v1/media/{+name}?alt=media where {+name} is a reserved expansion (RFC 6570) that preserves / unencoded. The current implementation encodes all non-unreserved characters including /.
  • Why it matters: If attachmentDataRef.resourceName ever contains / (as the {+name} template suggests it might), the encoded %2F could cause a 404. E2E tests pass, indicating the current resourceName values are likely opaque tokens without /, or Google tolerates %2F.
  • Suggestion: Either preserve / in the encoding match arm, or add a comment explaining why full encoding is intentional and safe given the observed resourceName format.

N2: Audio attachment silently dropped when stt_config.enabled == false

  • Where: src/gateway.rs, "audio" if stt_config.enabled => match arm
  • What: When the guard fails, the audio attachment falls through to _ => {} with no log.
  • Why it matters: Operators may not realize audio attachments are being discarded.
  • Suggestion: Add a tracing::debug! in the _ => {} arm (or a separate "audio" arm) when an audio attachment is skipped due to STT being disabled.

N3: gateway/Cargo.lock version bump 0.1.00.4.0 without corresponding Cargo.toml change in diff

  • Where: gateway/Cargo.lock
  • What: The lock file shows openab-gateway version changed to 0.4.0, but no gateway/Cargo.toml change appears in this diff.
  • Suggestion: Confirm gateway/Cargo.toml already has version = "0.4.0" on the PR branch (likely from a prior commit or base branch).

N4: Minor allocation in no-attachment path

  • Where: gateway/src/adapters/googlechat.rs, webhook handler
  • What: text is converted to String via .to_string() before checking whether attachments exist. In the no-attachment fast path, this allocation is unnecessary.
  • Suggestion: Move the .to_string() to just before the tokio::spawn block. Minor perf nit, not blocking.

🟢 INFO

  • Excellent test coverage — unit tests for parse_attachments, wiremock integration tests for all three download functions, size rejection test, and extension whitelist test.
  • STT failure fallback is well-designed — returns [Voice message — transcription failed for X] instead of silently dropping, allowing the agent to prompt the user.
  • File aggregate cap logic (TEXT_FILE_COUNT_CAP + TEXT_TOTAL_CAP) is correctly implemented as match arm expressions.
  • stt::transcribe call parameters exactly match the existing function signature.
  • #[allow(clippy::too_many_arguments)] on send_googlechat_event is acceptable; struct refactor can be a follow-up.

Verdict: APPROVE

High-quality PR with thorough documentation, comprehensive E2E testing, and clean reuse of existing infrastructure. All findings are minor nits — no blockers.

@obrutjack
Copy link
Copy Markdown
Collaborator

PR Review (Stage 2 — Second Review): #762

Summary

  • Problem: Google Chat adapter lacks inbound attachment support (image, text file, audio)
  • Approach: Gateway downloads via Media API → compress/base64 → existing WebSocket attachment channel; STT reuses core stt::transcribe pipeline
  • Risk level: Medium

Prior Review Verification (JARVIS + Masami)

Finding Correct? My Assessment
JARVIS #1 / Masami N1: media_url() encodes / as %2F See 🟡 below — needs clarification
JARVIS #2: "env" in TEXT_EXTS Upgraded to 🔴 — security risk
JARVIS #3: base64 decode failure silent drop 🔴 — contradicts stated design goal
JARVIS #4: STT disabled silent discard 🟡 — should log
JARVIS #5: concurrency cap Follow-up issue is fine
Masami N3: Cargo.lock 0.1.0→0.4.0 Needs confirmation
Masami N4: allocation nit Non-blocking

Findings

🔴 Critical: "env" in TEXT_EXTS whitelist — credential exposure risk

  • Where: gateway/src/adapters/googlechat.rs, TEXT_EXTS constant
  • What: .env files typically contain API keys, database passwords, and tokens. Allowing them through the whitelist means a user who accidentally uploads a .env file will have its entire content base64-encoded and injected into the agent prompt context.
  • Why it matters: Agent prompt context may be logged, stored, or visible to other users in shared workspaces. This is a data leak vector.
  • Fix: Remove "env" from TEXT_EXTS.

🔴 Critical: base64 decode failure in audio path — silent data loss

  • Where: src/gateway.rs:676, inside "audio" if stt_config.enabled => arm
  • What: When base64::decode fails, the if let Ok(bytes) simply does not match, and the entire audio attachment is silently dropped — no log, no fallback text block.
  • Why it matters: The PR description explicitly states "STT failures aren't silent" as a design goal. A base64 decode failure is functionally equivalent to an STT failure from the user's perspective — their voice message vanishes without trace.
  • Fix:
"audio" if stt_config.enabled => {
    use base64::Engine;
    match base64::engine::general_purpose::STANDARD.decode(&att.data) {
        Ok(bytes) => {
            // existing STT logic...
        }
        Err(e) => {
            tracing::warn!(filename = %att.filename, error = %e, "gateway audio base64 decode failed");
            extra_blocks.push(ContentBlock::Text {
                text: format!("[Voice message — decode failed for {}]", att.filename),
            });
        }
    }
}

🟡 Minor: media_url() encoding — needs clarification

  • Where: gateway/src/adapters/googlechat.rs, media_url() function
  • What: The function percent-encodes ALL non-unreserved characters including /. Google Chat Media API uses GET /v1/media/{+name} (RFC 6570 reserved expansion = slashes should be preserved).
  • Analysis: E2E tests pass, suggesting resourceName values from Google Chat webhooks are likely opaque tokens without /, or Google tolerates %2F. The unit test uses a synthetic value with /.
  • Ask: Please confirm the actual resourceName format from a real webhook payload. If it's always an opaque token (no slashes), add a comment explaining this assumption.

🟡 Minor: STT disabled — audio silently discarded

  • Where: src/gateway.rs, _ => {} catch-all after "audio" if stt_config.enabled =>
  • What: When stt_config.enabled == false, audio attachments fall through to _ => {} with zero indication.
  • Fix: Add tracing::debug! or a fallback text block so operators know audio was received but STT is disabled.

🟡 Minor: Text file aggregate cap — readability

  • Where: gateway/src/adapters/googlechat.rs, File match arm in background task
  • What: The nested if let Some(ref att) = result { if size > cap { None } else { result } } pattern works correctly but is easy to misread.
  • Suggestion: Consider restructuring for clarity (not blocking).

🟢 Info (Positive Observations)

  • Excellent test coverage — wiremock integration tests for all three download paths, size rejection, extension whitelist, and attachment parsing.
  • send_googlechat_event extraction is a clean refactor.
  • Background task pattern (return 200 immediately, spawn downloads) is correct for Google Chat's 30s webhook deadline.
  • STT failure fallback design is solid.
  • Cargo.lock version bump (0.1.0 → 0.4.0) — please confirm gateway/Cargo.toml already has this version on the PR branch.

Verdict: REQUEST CHANGES

Two fixes required before approve:

  1. Remove "env" from TEXT_EXTS (security)
  2. Add base64 decode failure fallback in audio path (matches your own stated design goal)

Everything else is minor / non-blocking. Overall this is a high-quality PR with solid architecture and thorough testing. 👏

@canyugs Please address the two 🔴 items above. The rest are suggestions you can take or leave. Thanks!

@github-actions github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days needs-rebase labels May 13, 2026
canyugs and others added 3 commits May 14, 2026 11:00
Implements image / text file / audio download from Google Chat via
Media API + service account token, following the PR openabdev#731 base64 pattern.

Changes:
- GoogleChatMessage: parse attachment[] array (Attachment / AttachmentDataRef / DriveDataRef structs)
- GoogleChatMediaRef enum: Image / File / Audio variants for typed dispatch
- parse_attachments(): branches on contentType prefix, skips DRIVE_FILE source
- download_googlechat_image(): resize → 1200px JPEG q75, max 10MB, GIF preserved
- download_googlechat_file(): text extension whitelist (.txt/.md/.py/...), max 512KB
- download_googlechat_audio(): forwarded as-is for core STT pipeline, max 25MB
- media_url(): percent-encode resource_name as path segment
- webhook handler: parses attachments, async-downloads via adapter token, populates Content.attachments
- Empty-text events with attachments are now forwarded (previously dropped)
- Tests: 11 new (parse, download success/skip/oversized, URL encoding)

Refs: openabdev#731 (Feishu pattern)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extends src/gateway.rs attachment handling to transcribe audio attachments
via the existing STT pipeline (previously only Discord/Slack adapters
went through download_and_transcribe; Custom Gateway adapters got no
audio path even though stt::transcribe was available).

When a gateway adapter (Feishu, Google Chat, etc.) sends an Attachment
with attachment_type = "audio", core now:
1. Decodes base64 → audio bytes
2. Calls stt::transcribe with the configured SttConfig
3. Wraps the transcript as a ContentBlock::Text:
   "[Voice message transcript]: ..."

The audio branch is gated on stt_config.enabled — if STT is disabled in
config, audio attachments fall through unchanged (same as before).

Threads stt_config through GatewayParams and run_gateway_adapter.

This closes the audio attachment gap left by the (now-closed) PR openabdev#726
without re-introducing the HTTP MediaStore proxy approach. Pairs with
the Google Chat adapter audio download (separate PR) — once both land,
Google Chat voice/audio attachments work end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses #4 must-fix items:

#1+#2 Webhook timeout safety:
- Spawn background tokio task for attachment downloads so the webhook
  returns 200 within Google Chat's 30s deadline regardless of how long
  downloads take.
- Add 30s per-request timeout to all Media API GET calls — a single
  hung connection can no longer stall the download task indefinitely.
- Refactor event emission into send_googlechat_event helper to share
  between sync (no-attachment) and async (background download) paths.

#4 Text file caps (matches Discord/Slack):
- TEXT_FILE_COUNT_CAP = 5: skip text files past the 5th with a warning.
- TEXT_TOTAL_CAP = 1 MB: skip text files that would push the running
  aggregate past 1 MB with a warning.
- Image and audio attachments are not capped (same as Discord/Slack).

#6 STT silent failure:
- When stt::transcribe returns None, push a fallback ContentBlock::Text
  ("[Voice message — transcription failed for ...]") so the agent
  knows a voice message was attempted and can ask the user to re-send.
  Previously the failure was silent and confusing.

Skipped from issue #4: #3 (streaming download), #5 (cross-adapter
refactor — adapters stay independent per design), #7-openabdev#10 (cosmetic).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@canyugs canyugs force-pushed the feat/gateway-googlechat-attachments branch from 3b95c0a to a928269 Compare May 14, 2026 03:04
@github-actions github-actions Bot removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label May 14, 2026
…ip, add spawn panic logging

- media_url: preserve `/` as literal path separators per Google Chat
  Media API's RFC 6570 reserved expansion (`{+resourceName}`). Previously
  all `/` were encoded as `%2F` which is fragile.
- download_googlechat_file: base64-encode raw bytes directly instead of
  round-tripping through String::from_utf8_lossy which silently replaces
  invalid bytes with U+FFFD.
- Spawned attachment download task: log an error if the task panics so
  silent message drops are diagnosable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@canyugs canyugs force-pushed the feat/gateway-googlechat-attachments branch from a928269 to 2837335 Compare May 14, 2026 03:40
canyugs and others added 2 commits May 14, 2026 12:06
…decode fallback

- Remove `"env"` from TEXT_EXTS whitelist to prevent credential exposure
  if a user accidentally uploads a .env file.
- Audio base64 decode failure now produces a fallback text block
  ("[Voice message — decode failed for X]") instead of silently dropping.
- Audio attachments when STT is disabled now log at debug level instead
  of being silently discarded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n to spawn path

- Flatten nested if/else in File download cap check using early
  continue, improving readability.
- Defer text .to_string() allocation to the tokio::spawn path so the
  no-attachment fast path avoids a heap allocation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days and removed needs-rebase pending-contributor labels May 14, 2026
@masami-agent
Copy link
Copy Markdown
Contributor

PR Review: #762

Reviewed commit: b05ba347e638cbdb47f0985cabf14fc53d4800b9

Summary

  • Problem: Google Chat adapter lacks inbound attachment support (image/file/audio), and Custom Gateway adapters have no audio→STT path in core.
  • Approach: Follows PR feat(gateway): feishu image and text file attachment support #731 base64 pattern — gateway downloads via Media API + SA token, compresses/encodes, emits via existing WebSocket attachment channel. STT reuses existing stt::transcribe in core.
  • Risk level: Medium (new async download path in webhook handler, touches core gateway audio processing)

Core Assessment

  1. Problem clearly stated: ✅ (exemplary PR description with architecture diagram)
  2. Approach appropriate: ✅ (reuses proven feat(gateway): feishu image and text file attachment support #731 pattern, no new infrastructure)
  3. Alternatives considered: ✅ (explicitly mentions closed PR feat(gateway): support images and audio for LINE/Telegram #726 MediaStore proxy approach)
  4. Best approach for now: ✅ (minimal, well-scoped, known limitations documented)

Review Summary (Traffic Light)

🟢 INFO

  • Clean separation: parse_attachments (sync) → tokio::spawn → download (async) → send_googlechat_event helper. Easy to follow.
  • Good use of Content-Length pre-check before downloading full body — avoids buffering oversized responses.
  • TEXT_EXTS whitelist is sensible; .env correctly excluded to prevent credential exposure.
  • media_url RFC 6570 reserved expansion is correct — preserves / as literal path separators while encoding other specials.
  • STT fallback messages in src/gateway.rs are user-friendly: both decode failure and transcription failure produce visible feedback to the agent.
  • Tests cover parsing (image/audio/file/drive-skip/missing-resource), download success, extension rejection, and oversized Content-Length rejection. Good coverage.
  • PR body is exemplary — clear design decisions, E2E testing matrix, known limitations, and breaking change assessment.
  • CI: all 10 checks pass (check ×2, gateway, 7 smoke-tests).

🟡 NIT

  1. Double-spawn pattern for panic logging (gateway/src/adapters/googlechat.rs)

    The outer tokio::spawn wraps an inner handle.await just to log panics:

    let handle = tokio::spawn(async move { /* download work */ });
    tokio::spawn(async move {
        if let Err(e) = handle.await { error!(...) }
    });

    This works but is unconventional. A single spawn that logs on JoinError at the call site (or catch_unwind inside) would be simpler. Not a correctness issue — just slightly surprising to read.

  2. Aggregate cap check after download (gateway/src/adapters/googlechat.rs)

    The text file aggregate cap (TEXT_TOTAL_CAP) is checked after download_googlechat_file returns. This means the file that pushes over the cap is still fully downloaded (wasted bandwidth up to 512 KB). Minor since it is bounded by FILE_MAX_DOWNLOAD, but checking Content-Length against remaining budget before downloading would be slightly more efficient.

  3. Unused content_type in Image/File variants (gateway/src/adapters/googlechat.rs)

    GoogleChatMediaRef::Image and ::File store content_type but never use it (only ::Audio passes it to download_googlechat_audio). The #[allow(dead_code)] on the enum suppresses the warning, but the field could be removed from those variants or used for future MIME-based decisions.

🔴 SUGGESTED CHANGES

None. No critical issues found.

Verdict

APPROVE — Well-written, follows established patterns, handles failure modes gracefully, and includes solid test coverage. The minor nits above are non-blocking suggestions for future cleanup.

@obrutjack
Copy link
Copy Markdown
Collaborator

Re-review (Stage 2): #762 — All Issues Resolved ✅

Reviewed at commit: b05ba34

Fix Verification

Original Finding Status Commit
🔴 "env" in TEXT_EXTS ✅ Fixed — removed 9363202
🔴 base64 decode silent drop ✅ Fixed — match with Err arm + fallback text block 9363202
🟡 media_url() encodes / ✅ Fixed — / now preserved per RFC 6570 2837335
🟡 STT disabled silent discard ✅ Fixed — explicit "audio" arm with tracing::debug! 9363202
🟡 Text file cap readability ✅ Fixed — early continue pattern b05ba34

Additional Improvements (not requested, but welcome)

  • Removed lossy UTF-8 round-trip in download_googlechat_file — now base64-encodes raw bytes directly. Good catch, prevents silent data corruption on non-UTF8 text files.
  • Added panic logging for the spawned download task — improves observability.
  • Deferred text.to_string() to spawn path — avoids allocation in no-attachment fast path.

Verdict: APPROVE

All critical and minor findings have been addressed cleanly. Code quality is high, tests are comprehensive, and the contributor went beyond the minimum fixes with additional improvements. Ready for maintainer merge.

@canyugs Nice work addressing all the feedback. 👏

obrutjack
obrutjack previously approved these changes May 14, 2026
Copy link
Copy Markdown
Collaborator

@obrutjack obrutjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review findings addressed. Architecture is sound, tests comprehensive, and security concerns resolved. Approved.

- Replace double-spawn panic logging with single spawn +
  catch_unwind — more idiomatic, same observability.
- Remove unused content_type from Image/File variants of
  GoogleChatMediaRef; only Audio needs it. Drops #[allow(dead_code)]
  on the enum.
- Pass remaining aggregate budget to download_googlechat_file so
  Content-Length is checked against the budget before downloading,
  avoiding wasted bandwidth on files that would exceed the cap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label May 14, 2026
@chaodu-agent
Copy link
Copy Markdown
Collaborator

LGTM ✅ — Well-architected feature with solid test coverage. Non-blocking suggestions below.

What This PR Does

Adds inbound attachment support (image / text file / audio + STT) to the Google Chat gateway adapter, plus completes the audio → STT path in src/gateway.rs for all Custom Gateway adapters.

How It Works

  • Webhook parses attachment[], returns 200 immediately, spawns background download task
  • Images: resize ≤1200px JPEG q75, GIF passthrough, max 10MB
  • Text files: extension whitelist, max 512KB/file, 5-file cap, 1MB aggregate
  • Audio: forwarded as-is for core STT pipeline, max 25MB
  • Core: "audio" match arm with proper error handling — decode failure and STT failure both produce fallback text blocks

Findings

# Severity Finding Location
1 🟡 download_googlechat_file post-download size check uses FILE_MAX_DOWNLOAD instead of max_size — chunked responses without Content-Length can bypass aggregate budget googlechat.rs ~L491
2 🟡 Video MIME attachments silently classified as File → rejected by extension whitelist with no user-facing feedback googlechat.rs parse_attachments
3 🟡 No concurrency bound on spawned download tasks — N simultaneous audio uploads = N×25MB memory webhook handler
4 🟡 needs-rebase label present — confirm merge is clean PR metadata
Finding Details

🟡 F1: Post-download aggregate budget check

After resp.bytes(), the check is bytes.len() as u64 > FILE_MAX_DOWNLOAD but should be bytes.len() as u64 > max_size to enforce the remaining budget when Content-Length is absent. Bounded by 512KB per file so not a security issue, just semantic precision.

🟡 F2: Video MIME silent drop

parse_attachments routes non-image/non-audio content types to File variant. Video files (e.g. .mp4) then fail the text extension whitelist silently. Consider logging at info! level or adding a user-facing note that video is not yet supported.

🟡 F3: Unbounded concurrent downloads

Each spawned task can buffer up to 25MB. A semaphore or bounded channel would prevent memory spikes under burst load. Recommend tracking as a follow-up issue.

🟡 F4: Rebase status

Label indicates possible conflicts. Please confirm clean merge before merging.

What's Good (🟢)
  • All prior critical findings (.env in whitelist, base64 decode silent drop, media_url / encoding, STT-disabled silent discard) verified fixed
  • tokio::spawn + immediate 200 correctly handles Google Chat 30s webhook deadline
  • catch_unwind on spawned task with panic logging improves observability
  • src/gateway.rs change is minimal and correct — proper match arms with fallback text
  • Comprehensive test coverage: parse, download success/skip/oversized, URL encoding
  • remaining_budget parameter for text file downloads is a good design
  • Drive-sourced attachments explicitly skipped with clear follow-up path

All 🟡 items are non-blocking suggestions. Prior 🔴 findings have been resolved. Ready for maintainer merge.

Reviewed by: 超渡法師 (coordinator), 口渡法師, 擺渡法師, 普渡法師

…ipped video attachments

- download_googlechat_file: post-download size check now uses max_size
  (min of FILE_MAX_DOWNLOAD and remaining_budget) instead of only
  FILE_MAX_DOWNLOAD, ensuring TEXT_TOTAL_CAP is respected even when
  Content-Length header is absent.
- parse_attachments: video/ MIME type now gets an explicit info! log
  and is skipped early, instead of silently failing the text extension
  whitelist downstream.
@thepagent thepagent enabled auto-merge (squash) May 14, 2026 23:45
@chaodu-agent chaodu-agent disabled auto-merge May 15, 2026 00:29
@thepagent thepagent merged commit 0079352 into openabdev:main May 15, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-screening PR awaiting automated screening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants