Honor LLM_TIMEOUT, expose generation params, and harden black-box reports by VoidChecksum · Pull Request #549 · usestrix/strix

VoidChecksum · 2026-06-09T07:44:51Z

Four small, independently-verified LLM/reporting fixes. Per maintainer request these are bundled into a single PR rather than four; each commit is self-contained and references its tracking issue, so they can be split or cherry-picked individually if preferred.

1. `fix(llm)`: honor `LLM_TIMEOUT` during scans, not just warm-up (closes #426)

LLM_TIMEOUT is documented as the request timeout and is the recommended workaround for slow local models, but it was only applied to the warm-up call in strix/interface/main.py. Scan calls go through the SDK's LiteLLM model, which invokes litellm.acompletion without a timeout and falls back to LiteLLM's module default (6000s), so export LLM_TIMEOUT=600 had no effect on the actual run. _configure_litellm_request_timeout now sets litellm.request_timeout from settings inside configure_sdk_model_defaults.

2. `fix(reporting)`: drop fabricated `code_locations` in black-box scans (closes #321)

In a black-box scan no source tree exists, yet create_vulnerability_report accepted code_locations unconditionally, letting the model fabricate file paths/line numbers into the customer-facing report. The existing is_whitebox flag is now threaded into the tool run-context (it propagates to child agents via dict(parent_ctx)), and code_locations are dropped when the scan is black-box; black-box guidance was added to the tool docstring and system prompt.

3. `feat(llm)`: expose `temperature`, `top_p`, `max_tokens` (closes #514)

Adds optional STRIX_LLM_TEMPERATURE / STRIX_LLM_TOP_P / STRIX_LLM_MAX_TOKENS, threaded through make_model_settings into the SDK ModelSettings. All unset by default → provider defaults, so no behavior change. Params a model rejects are dropped automatically (litellm.drop_params is already enabled).

4. `docs(local-models)`: document context-window sizing (closes #286)

Documents that agentic prompts easily exceed small local-runtime context windows (Ollama defaults to ~4096 tokens) and how to raise it. Docs only.

Verification

ruff 0.11.13 (lint + format), mypy 1.16.0 (strict, --install-types), and bandit are green on the full changed set.
Behavioral checks run locally for each change: the timeout setter, the black-box code_locations drop (and whitebox-preserve), and generation-param threading with None defaults.
strix --help and importing every changed module both succeed.
No test harness is added — the project verifies statically, so introducing pytest infra would be out of scope for these fixes.

Strix's agentic prompts (large system prompt + tool schema + growing history) exceed the small default context many local runtimes use (Ollama defaults to ~4096 tokens), which silently clips context and manifests as looping agents, truncated/unexecuted tool calls, and mid-scan failures. Add a "Context Window Sizing" section to the local-models docs: the 4096 pitfall, how to raise it (Ollama OLLAMA_CONTEXT_LENGTH / Modelfile num_ctx, LM Studio, llama.cpp -c, vLLM --max-model-len), symptoms of clipping for self-diagnosis, recommended minimum per scan mode, and a note on the VRAM tradeoff. Fixes usestrix#286

Add optional generation parameters so users can steer model behavior — particularly useful for local / OpenAI-compatible models that need a lower temperature for steadier tool calling (see usestrix#514). - STRIX_LLM_TEMPERATURE, STRIX_LLM_TOP_P, STRIX_LLM_MAX_TOKENS (all unset by default -> provider defaults, so no behavior change). - Threaded through make_model_settings into the SDK ModelSettings used for the scan. Params a given model rejects are dropped automatically (litellm.drop_params is already enabled). - Documented in docs/advanced/configuration.mdx. Closes usestrix#514

In a black-box scan no source tree is available, yet create_vulnerability_report accepted code_locations unconditionally and render_vulnerability_md emitted a "Code Analysis" section from them. The model could therefore fabricate file paths, line numbers, and snippets into the customer-facing report (usestrix#321). Thread the existing is_whitebox flag into the tool run-context (it propagates to child agents via dict(parent_ctx)) and drop code_locations when the scan is black-box. Also add black-box guidance to the reporting tool docstring and the system prompt so the model does not assert source locations it cannot see. Fixes usestrix#321

LLM_TIMEOUT is documented (docs/advanced/configuration.mdx) as the request timeout for LLM calls and is offered as the workaround for slow local models, but it was only applied to the warm-up call in strix/interface/main.py. Scan LLM calls go through the SDK's LiteLLM model, which invokes litellm.acompletion without a timeout and falls back to LiteLLM's module default (6000s) -- so `export LLM_TIMEOUT=600` had no effect on the actual run. Set litellm.request_timeout from settings in configure_sdk_model_defaults so the documented setting takes effect for LiteLLM-routed scans. Fixes usestrix#426

greptile-apps · 2026-06-09T07:50:54Z

Greptile Summary

This PR bundles four self-contained fixes: honoring LLM_TIMEOUT in scan calls (guarded by model_fields_set so users without the env var keep LiteLLM's default), dropping fabricated code_locations in true black-box scans while correctly preserving them for repository-type targets via the new source_in_scope flag, and exposing temperature/top_p/max_tokens as optional generation params with Pydantic range constraints and None defaults.

LLM_TIMEOUT fix: _configure_litellm_request_timeout is called only when LLM_TIMEOUT is explicitly in the environment (model_fields_set check), so users who never set it are not silently capped at the 300 s Pydantic default.
Black-box code_locations fix: source_in_scope = is_whitebox or (type == \"repository\") is threaded into the root context and inherited by child agents via dict(parent_ctx) in _start_child_runner; the reporting tool drops code_locations only when this flag is falsy.
Generation params: Pydantic ge/le constraints on temperature and top_p surface bad values at config-load time; litellm.drop_params handles provider-level incompatibilities.

Confidence Score: 4/5

Safe to merge with awareness that litellm.request_timeout is now mutated as a module-level global on every scan start, which can affect concurrent in-flight requests if two scans with different timeouts overlap.

The _configure_litellm_request_timeout call writes to litellm.request_timeout, a module-level variable read by every acompletion call. Because it is invoked inside run_strix_scan rather than once at process startup, two concurrent scans with different LLM_TIMEOUT values can race to overwrite it mid-flight, potentially applying the wrong timeout to an already in-flight request. The model_fields_set guard correctly avoids the 300 s default regression, but the global-mutation race is a real production risk for any host running parallel scans.

strix/config/models.py — the new _configure_litellm_request_timeout function and its call site in configure_sdk_model_defaults.

Important Files Changed

Filename	Overview
strix/config/models.py	Adds `_configure_litellm_request_timeout` called conditionally via `model_fields_set` check; extends the pre-existing module-level litellm global mutation pattern to `request_timeout`, which is called on every scan start and affects all in-flight concurrent requests.
strix/config/settings.py	Adds `temperature`, `top_p`, `max_tokens` optional fields with correct Pydantic ge/le constraints and None defaults; no behavior change when unset.
strix/core/runner.py	Correctly introduces `source_in_scope` (covers both `local_code` and `repository` scan types), threads it into the root context for child propagation, and wires the three new generation params into `make_model_settings`.
strix/tools/reporting/tool.py	Guards `code_locations` on the new `source_in_scope` context key, correctly preserving them for repository and local-code scans while dropping fabricated paths for true black-box targets.
strix/core/inputs.py	Adds `temperature`, `top_p`, `max_tokens` as optional kwargs to `make_model_settings` and passes them unconditionally to `ModelSettings`; None values are handled by the SDK/litellm layer.
strix/agents/prompts/system_prompt.jinja	Adds one-line black-box reporting instruction to the execution guidelines; no logic change.
docs/advanced/configuration.mdx	Documents the three new generation params; temperature range `0.0–2.0` is the constraint enforced by Strix but provider-specific narrower ranges (e.g. Anthropic: `0.0–1.0`) are not mentioned.
docs/llm-providers/local.mdx	Adds context-window sizing guidance for Ollama, llama.cpp, LM Studio, and vLLM; docs-only change.

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
docs/advanced/configuration.mdx:33-35
**Out-of-range temperature causes API errors, not silent drops**

The description says "Parameters unsupported by a given model are dropped automatically," but `litellm.drop_params` only removes parameters that the model does not accept **at all** — it does not sanitize out-of-range values. A user who sets `STRIX_LLM_TEMPERATURE=1.5` and points Strix at an Anthropic model (valid range: `0.0–1.0`) will get a provider API error at runtime rather than the graceful behaviour the copy implies. Clarifying that the statement refers to unsupported param *names* rather than out-of-range *values* would prevent confusion.

_{Reviews (2): Last reviewed commit: "Address review feedback on #549" | Re-trigger Greptile}

greptile-apps · 2026-06-09T07:50:57Z

+    if not inner.get("is_whitebox") and code_locations:
+        # Black-box scan: no source tree is available, so any file paths /
+        # line numbers / snippets in code_locations can only be fabricated.
+        # Drop them so a hallucinated "Code Analysis" section can never reach
+        # the customer-facing report (#321).
+        logger.info("Black-box scan: dropping code_locations from report %r", title)
+        code_locations = None


is_whitebox=False covers repository scans that have cloned source

is_whitebox is False for any target whose type is not "local_code" — including "repository" scans where the code is actually cloned to /workspace. For those scans the agent receives the workspace path in its task prompt (see build_root_task) and can legitimately read the source and populate accurate code_locations. After this change those valid locations are silently dropped, producing vulnerability reports with no code_locations even though real line numbers were identified.

The guard should distinguish between scans that are truly source-free (e.g., web_application, ip_address) and scans that simply use the "repository" target type. A concrete failure: a user scans a cloned repo via the "repository" target type; the agent identifies an injected SQL query at src/db.py:42; the report arrives at the customer with code_locations removed and a log line reading "Black-box scan: dropping code_locations".

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/tools/reporting/tool.py Line: 492-498 Comment: **`is_whitebox=False` covers repository scans that have cloned source** `is_whitebox` is `False` for any target whose type is not `"local_code"` — including `"repository"` scans where the code is actually cloned to `/workspace`. For those scans the agent receives the workspace path in its task prompt (see `build_root_task`) and can legitimately read the source and populate accurate `code_locations`. After this change those valid locations are silently dropped, producing vulnerability reports with no `code_locations` even though real line numbers were identified. The guard should distinguish between scans that are truly source-free (e.g., `web_application`, `ip_address`) and scans that simply use the `"repository"` target type. A concrete failure: a user scans a cloned repo via the `"repository"` target type; the agent identifies an injected SQL query at `src/db.py:42`; the report arrives at the customer with `code_locations` removed and a log line reading "Black-box scan: dropping code_locations". How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-09T07:50:59Z

+def _configure_litellm_request_timeout(timeout: int) -> None:
+    """Apply the configured ``LLM_TIMEOUT`` to LiteLLM-routed scan calls.
+
+    The SDK's LiteLLM model invokes ``litellm.acompletion`` without an explicit
+    per-request timeout, so without this it falls back to LiteLLM's module
+    default and the documented ``LLM_TIMEOUT`` only affected the warm-up call
+    (``strix/interface/main.py``). Setting the module-level default makes
+    ``export LLM_TIMEOUT=600`` take effect for the actual scan, restoring the
+    documented behavior for slow local / self-hosted models.
+    """
+    import litellm
+
+    # litellm doesn't re-export request_timeout in its public surface, but it is
+    # the module-level default it reads for each acompletion call.
+    litellm.request_timeout = timeout  # type: ignore[attr-defined]


Effective default timeout drops from ~6000 s to 300 s for all existing users

Before this PR, scan calls used LiteLLM's built-in module default (~6000 s); after it they use the LLM_TIMEOUT Pydantic default of 300 s. Any user who has not explicitly set LLM_TIMEOUT — including existing deployments and the project's own CI — will now hit a 300 s per-request wall where they previously had ~6000 s. For anyone running a slow local model (Ollama, llama.cpp) this will cause ReadTimeout errors on the first multi-step scan call that exceeds five minutes, with no indication that the timeout was silently tightened.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/config/models.py Line: 132-146 Comment: **Effective default timeout drops from ~6000 s to 300 s for all existing users** Before this PR, scan calls used LiteLLM's built-in module default (~6000 s); after it they use the `LLM_TIMEOUT` Pydantic default of 300 s. Any user who has not explicitly set `LLM_TIMEOUT` — including existing deployments and the project's own CI — will now hit a 300 s per-request wall where they previously had ~6000 s. For anyone running a slow local model (Ollama, llama.cpp) this will cause `ReadTimeout` errors on the first multi-step scan call that exceeds five minutes, with no indication that the timeout was silently tightened. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-09T07:50:59Z

    llm = settings.llm
    set_tracing_disabled(True)
    _configure_litellm_compatibility()
+    _configure_litellm_request_timeout(llm.timeout)


Module-level global mutated on every run_strix_scan call

configure_sdk_model_defaults (and now _configure_litellm_request_timeout) writes to litellm's module-level state. If a host process runs two concurrent scans that were started with different LLM_TIMEOUT values, the second call to configure_sdk_model_defaults will overwrite litellm.request_timeout for the first scan's in-flight requests. The pattern is pre-existing for api_key/api_base, but extending it to request_timeout (now called per scan, not once at startup) makes the race window much more likely to affect real requests.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/config/models.py Line: 66 Comment: **Module-level global mutated on every `run_strix_scan` call** `configure_sdk_model_defaults` (and now `_configure_litellm_request_timeout`) writes to `litellm`'s module-level state. If a host process runs two concurrent scans that were started with different `LLM_TIMEOUT` values, the second call to `configure_sdk_model_defaults` will overwrite `litellm.request_timeout` for the first scan's in-flight requests. The pattern is pre-existing for `api_key`/`api_base`, but extending it to `request_timeout` (now called per scan, not once at startup) makes the race window much more likely to affect real requests. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-09T07:51:00Z

    reasoning_effort: ReasoningEffort = Field(default="high", alias="STRIX_REASONING_EFFORT")
    timeout: int = Field(default=300, alias="LLM_TIMEOUT")
+    temperature: float | None = Field(default=None, alias="STRIX_LLM_TEMPERATURE")


No range validation on temperature and top_p

Pydantic will accept any float for both fields. Out-of-range values (e.g., STRIX_LLM_TEMPERATURE=3.5) will be forwarded to LiteLLM. For most providers this causes an API error at runtime rather than a startup-time config failure, and users have no immediate feedback that the value is wrong. top_p values outside [0, 1] are universally invalid. Adding ge=0.0 / le=2.0 (temperature) and ge=0.0, le=1.0 (top_p) constraints would surface problems early.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/config/settings.py Line: 38-40 Comment: **No range validation on `temperature` and `top_p`** Pydantic will accept any `float` for both fields. Out-of-range values (e.g., `STRIX_LLM_TEMPERATURE=3.5`) will be forwarded to LiteLLM. For most providers this causes an API error at runtime rather than a startup-time config failure, and users have no immediate feedback that the value is wrong. `top_p` values outside `[0, 1]` are universally invalid. Adding `ge=0.0` / `le=2.0` (temperature) and `ge=0.0, le=1.0` (top_p) constraints would surface problems early. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

- reporting: gate code_locations on whether source is in scope (local_code OR repository), not is_whitebox. is_whitebox is False for repository targets even though their source is cloned to /workspace, so the prior guard silently dropped valid code_locations on repo scans. runner now threads source_in_scope; create_vulnerability_report keys off it. - llm timeout: only override litellm.request_timeout when LLM_TIMEOUT is explicitly set (llm.model_fields_set), so users who never set it keep LiteLLM's ~6000s default instead of being silently capped at the 300s default. - settings: bound STRIX_LLM_TEMPERATURE to [0,2], STRIX_LLM_TOP_P to [0,1], STRIX_LLM_MAX_TOKENS to >=1 so invalid values fail at config load instead of at the provider; docs note the ranges.

VoidChecksum · 2026-06-09T08:18:00Z

Thanks for the review — addressed in 3804421.

1. is_whitebox=False covers repository scans (P1) — fixed. Correct catch: collect_local_sources mounts source for both local_code and repository, but is_whitebox is local_code-only, so the guard would have dropped valid code_locations on repo scans. The runner now threads source_in_scope = local_code OR repository, and create_vulnerability_report keys off that instead of is_whitebox. Repository and local-code scans keep their locations; only truly source-free scans (URL/IP/domain) drop them.

2. Default timeout 6000s → 300s regression (P1) — fixed. _configure_litellm_request_timeout is now only called when LLM_TIMEOUT is explicitly set (llm.model_fields_set). Users who never set it keep LiteLLM's built-in default; the documented setting still takes effect for scans when provided. This directly avoids regressing the slow-local-model case from #426.

3. Module-level global mutation / concurrent scans (P2) — intentionally out of scope. As noted, this is the pre-existing pattern for api_key/api_base in the same function, and Strix runs one scan per process (CLI). Reworking SDK-global config into per-scan state is a broader refactor that shouldn't ride along with these fixes — happy to file a separate issue if you'd like it tracked.

4. No range validation on temperature/top_p (P2) — fixed. Added ge=0.0, le=2.0 (temperature), ge=0.0, le=1.0 (top_p), and ge=1 (max_tokens) so bad values fail at config load; docs note the ranges.

All changes verified locally: ruff (lint + format), mypy --strict (with --install-types), bandit green, plus focused behavioral tests for the source-in-scope guard, the timeout gating, and the new bounds.

bearsyankees · 2026-06-10T12:24:33Z

@greptile

VoidChecksum added 4 commits June 9, 2026 07:42

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Honor LLM_TIMEOUT, expose generation params, and harden black-box reports#549

Honor LLM_TIMEOUT, expose generation params, and harden black-box reports#549
VoidChecksum wants to merge 5 commits into
usestrix:mainfrom
VoidChecksum:fix/llm-config-and-blackbox-reporting

VoidChecksum commented Jun 9, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

VoidChecksum commented Jun 9, 2026

Uh oh!

bearsyankees commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VoidChecksum commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. fix(llm): honor LLM_TIMEOUT during scans, not just warm-up (closes #426)

2. fix(reporting): drop fabricated code_locations in black-box scans (closes #321)

3. feat(llm): expose temperature, top_p, max_tokens (closes #514)

4. docs(local-models): document context-window sizing (closes #286)

Verification

Uh oh!

greptile-apps Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

VoidChecksum commented Jun 9, 2026

Uh oh!

bearsyankees commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VoidChecksum commented Jun 9, 2026 •

edited

Loading

1. `fix(llm)`: honor `LLM_TIMEOUT` during scans, not just warm-up (closes #426)

2. `fix(reporting)`: drop fabricated `code_locations` in black-box scans (closes #321)

3. `feat(llm)`: expose `temperature`, `top_p`, `max_tokens` (closes #514)

4. `docs(local-models)`: document context-window sizing (closes #286)

greptile-apps Bot commented Jun 9, 2026 •

edited

Loading