Honor LLM_TIMEOUT, expose generation params, and harden black-box reports#549
Honor LLM_TIMEOUT, expose generation params, and harden black-box reports#549VoidChecksum wants to merge 5 commits into
Conversation
Strix's agentic prompts (large system prompt + tool schema + growing history) exceed the small default context many local runtimes use (Ollama defaults to ~4096 tokens), which silently clips context and manifests as looping agents, truncated/unexecuted tool calls, and mid-scan failures. Add a "Context Window Sizing" section to the local-models docs: the 4096 pitfall, how to raise it (Ollama OLLAMA_CONTEXT_LENGTH / Modelfile num_ctx, LM Studio, llama.cpp -c, vLLM --max-model-len), symptoms of clipping for self-diagnosis, recommended minimum per scan mode, and a note on the VRAM tradeoff. Fixes usestrix#286
Add optional generation parameters so users can steer model behavior — particularly useful for local / OpenAI-compatible models that need a lower temperature for steadier tool calling (see usestrix#514). - STRIX_LLM_TEMPERATURE, STRIX_LLM_TOP_P, STRIX_LLM_MAX_TOKENS (all unset by default -> provider defaults, so no behavior change). - Threaded through make_model_settings into the SDK ModelSettings used for the scan. Params a given model rejects are dropped automatically (litellm.drop_params is already enabled). - Documented in docs/advanced/configuration.mdx. Closes usestrix#514
In a black-box scan no source tree is available, yet create_vulnerability_report accepted code_locations unconditionally and render_vulnerability_md emitted a "Code Analysis" section from them. The model could therefore fabricate file paths, line numbers, and snippets into the customer-facing report (usestrix#321). Thread the existing is_whitebox flag into the tool run-context (it propagates to child agents via dict(parent_ctx)) and drop code_locations when the scan is black-box. Also add black-box guidance to the reporting tool docstring and the system prompt so the model does not assert source locations it cannot see. Fixes usestrix#321
LLM_TIMEOUT is documented (docs/advanced/configuration.mdx) as the request timeout for LLM calls and is offered as the workaround for slow local models, but it was only applied to the warm-up call in strix/interface/main.py. Scan LLM calls go through the SDK's LiteLLM model, which invokes litellm.acompletion without a timeout and falls back to LiteLLM's module default (6000s) -- so `export LLM_TIMEOUT=600` had no effect on the actual run. Set litellm.request_timeout from settings in configure_sdk_model_defaults so the documented setting takes effect for LiteLLM-routed scans. Fixes usestrix#426
Greptile SummaryThis PR bundles four self-contained fixes: honoring
Confidence Score: 4/5Safe to merge with awareness that The strix/config/models.py — the new Important Files Changed
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
docs/advanced/configuration.mdx:33-35
**Out-of-range temperature causes API errors, not silent drops**
The description says "Parameters unsupported by a given model are dropped automatically," but `litellm.drop_params` only removes parameters that the model does not accept **at all** — it does not sanitize out-of-range values. A user who sets `STRIX_LLM_TEMPERATURE=1.5` and points Strix at an Anthropic model (valid range: `0.0–1.0`) will get a provider API error at runtime rather than the graceful behaviour the copy implies. Clarifying that the statement refers to unsupported param *names* rather than out-of-range *values* would prevent confusion.
Reviews (2): Last reviewed commit: "Address review feedback on #549" | Re-trigger Greptile |
| if not inner.get("is_whitebox") and code_locations: | ||
| # Black-box scan: no source tree is available, so any file paths / | ||
| # line numbers / snippets in code_locations can only be fabricated. | ||
| # Drop them so a hallucinated "Code Analysis" section can never reach | ||
| # the customer-facing report (#321). | ||
| logger.info("Black-box scan: dropping code_locations from report %r", title) | ||
| code_locations = None |
There was a problem hiding this comment.
is_whitebox=False covers repository scans that have cloned source
is_whitebox is False for any target whose type is not "local_code" — including "repository" scans where the code is actually cloned to /workspace. For those scans the agent receives the workspace path in its task prompt (see build_root_task) and can legitimately read the source and populate accurate code_locations. After this change those valid locations are silently dropped, producing vulnerability reports with no code_locations even though real line numbers were identified.
The guard should distinguish between scans that are truly source-free (e.g., web_application, ip_address) and scans that simply use the "repository" target type. A concrete failure: a user scans a cloned repo via the "repository" target type; the agent identifies an injected SQL query at src/db.py:42; the report arrives at the customer with code_locations removed and a log line reading "Black-box scan: dropping code_locations".
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/tools/reporting/tool.py
Line: 492-498
Comment:
**`is_whitebox=False` covers repository scans that have cloned source**
`is_whitebox` is `False` for any target whose type is not `"local_code"` — including `"repository"` scans where the code is actually cloned to `/workspace`. For those scans the agent receives the workspace path in its task prompt (see `build_root_task`) and can legitimately read the source and populate accurate `code_locations`. After this change those valid locations are silently dropped, producing vulnerability reports with no `code_locations` even though real line numbers were identified.
The guard should distinguish between scans that are truly source-free (e.g., `web_application`, `ip_address`) and scans that simply use the `"repository"` target type. A concrete failure: a user scans a cloned repo via the `"repository"` target type; the agent identifies an injected SQL query at `src/db.py:42`; the report arrives at the customer with `code_locations` removed and a log line reading "Black-box scan: dropping code_locations".
How can I resolve this? If you propose a fix, please make it concise.| def _configure_litellm_request_timeout(timeout: int) -> None: | ||
| """Apply the configured ``LLM_TIMEOUT`` to LiteLLM-routed scan calls. | ||
|
|
||
| The SDK's LiteLLM model invokes ``litellm.acompletion`` without an explicit | ||
| per-request timeout, so without this it falls back to LiteLLM's module | ||
| default and the documented ``LLM_TIMEOUT`` only affected the warm-up call | ||
| (``strix/interface/main.py``). Setting the module-level default makes | ||
| ``export LLM_TIMEOUT=600`` take effect for the actual scan, restoring the | ||
| documented behavior for slow local / self-hosted models. | ||
| """ | ||
| import litellm | ||
|
|
||
| # litellm doesn't re-export request_timeout in its public surface, but it is | ||
| # the module-level default it reads for each acompletion call. | ||
| litellm.request_timeout = timeout # type: ignore[attr-defined] |
There was a problem hiding this comment.
Effective default timeout drops from ~6000 s to 300 s for all existing users
Before this PR, scan calls used LiteLLM's built-in module default (~6000 s); after it they use the LLM_TIMEOUT Pydantic default of 300 s. Any user who has not explicitly set LLM_TIMEOUT — including existing deployments and the project's own CI — will now hit a 300 s per-request wall where they previously had ~6000 s. For anyone running a slow local model (Ollama, llama.cpp) this will cause ReadTimeout errors on the first multi-step scan call that exceeds five minutes, with no indication that the timeout was silently tightened.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/config/models.py
Line: 132-146
Comment:
**Effective default timeout drops from ~6000 s to 300 s for all existing users**
Before this PR, scan calls used LiteLLM's built-in module default (~6000 s); after it they use the `LLM_TIMEOUT` Pydantic default of 300 s. Any user who has not explicitly set `LLM_TIMEOUT` — including existing deployments and the project's own CI — will now hit a 300 s per-request wall where they previously had ~6000 s. For anyone running a slow local model (Ollama, llama.cpp) this will cause `ReadTimeout` errors on the first multi-step scan call that exceeds five minutes, with no indication that the timeout was silently tightened.
How can I resolve this? If you propose a fix, please make it concise.| llm = settings.llm | ||
| set_tracing_disabled(True) | ||
| _configure_litellm_compatibility() | ||
| _configure_litellm_request_timeout(llm.timeout) |
There was a problem hiding this comment.
Module-level global mutated on every
run_strix_scan call
configure_sdk_model_defaults (and now _configure_litellm_request_timeout) writes to litellm's module-level state. If a host process runs two concurrent scans that were started with different LLM_TIMEOUT values, the second call to configure_sdk_model_defaults will overwrite litellm.request_timeout for the first scan's in-flight requests. The pattern is pre-existing for api_key/api_base, but extending it to request_timeout (now called per scan, not once at startup) makes the race window much more likely to affect real requests.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/config/models.py
Line: 66
Comment:
**Module-level global mutated on every `run_strix_scan` call**
`configure_sdk_model_defaults` (and now `_configure_litellm_request_timeout`) writes to `litellm`'s module-level state. If a host process runs two concurrent scans that were started with different `LLM_TIMEOUT` values, the second call to `configure_sdk_model_defaults` will overwrite `litellm.request_timeout` for the first scan's in-flight requests. The pattern is pre-existing for `api_key`/`api_base`, but extending it to `request_timeout` (now called per scan, not once at startup) makes the race window much more likely to affect real requests.
How can I resolve this? If you propose a fix, please make it concise.| reasoning_effort: ReasoningEffort = Field(default="high", alias="STRIX_REASONING_EFFORT") | ||
| timeout: int = Field(default=300, alias="LLM_TIMEOUT") | ||
| temperature: float | None = Field(default=None, alias="STRIX_LLM_TEMPERATURE") |
There was a problem hiding this comment.
No range validation on
temperature and top_p
Pydantic will accept any float for both fields. Out-of-range values (e.g., STRIX_LLM_TEMPERATURE=3.5) will be forwarded to LiteLLM. For most providers this causes an API error at runtime rather than a startup-time config failure, and users have no immediate feedback that the value is wrong. top_p values outside [0, 1] are universally invalid. Adding ge=0.0 / le=2.0 (temperature) and ge=0.0, le=1.0 (top_p) constraints would surface problems early.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/config/settings.py
Line: 38-40
Comment:
**No range validation on `temperature` and `top_p`**
Pydantic will accept any `float` for both fields. Out-of-range values (e.g., `STRIX_LLM_TEMPERATURE=3.5`) will be forwarded to LiteLLM. For most providers this causes an API error at runtime rather than a startup-time config failure, and users have no immediate feedback that the value is wrong. `top_p` values outside `[0, 1]` are universally invalid. Adding `ge=0.0` / `le=2.0` (temperature) and `ge=0.0, le=1.0` (top_p) constraints would surface problems early.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
- reporting: gate code_locations on whether source is in scope (local_code OR repository), not is_whitebox. is_whitebox is False for repository targets even though their source is cloned to /workspace, so the prior guard silently dropped valid code_locations on repo scans. runner now threads source_in_scope; create_vulnerability_report keys off it. - llm timeout: only override litellm.request_timeout when LLM_TIMEOUT is explicitly set (llm.model_fields_set), so users who never set it keep LiteLLM's ~6000s default instead of being silently capped at the 300s default. - settings: bound STRIX_LLM_TEMPERATURE to [0,2], STRIX_LLM_TOP_P to [0,1], STRIX_LLM_MAX_TOKENS to >=1 so invalid values fail at config load instead of at the provider; docs note the ranges.
|
Thanks for the review — addressed in 1. 2. Default timeout 6000s → 300s regression (P1) — fixed. 3. Module-level global mutation / concurrent scans (P2) — intentionally out of scope. As noted, this is the pre-existing pattern for 4. No range validation on All changes verified locally: |
|
@greptile |
Four small, independently-verified LLM/reporting fixes. Per maintainer request these are bundled into a single PR rather than four; each commit is self-contained and references its tracking issue, so they can be split or cherry-picked individually if preferred.
1.
fix(llm): honorLLM_TIMEOUTduring scans, not just warm-up (closes #426)LLM_TIMEOUTis documented as the request timeout and is the recommended workaround for slow local models, but it was only applied to the warm-up call instrix/interface/main.py. Scan calls go through the SDK's LiteLLM model, which invokeslitellm.acompletionwithout a timeout and falls back to LiteLLM's module default (6000s), soexport LLM_TIMEOUT=600had no effect on the actual run._configure_litellm_request_timeoutnow setslitellm.request_timeoutfrom settings insideconfigure_sdk_model_defaults.2.
fix(reporting): drop fabricatedcode_locationsin black-box scans (closes #321)In a black-box scan no source tree exists, yet
create_vulnerability_reportacceptedcode_locationsunconditionally, letting the model fabricate file paths/line numbers into the customer-facing report. The existingis_whiteboxflag is now threaded into the tool run-context (it propagates to child agents viadict(parent_ctx)), andcode_locationsare dropped when the scan is black-box; black-box guidance was added to the tool docstring and system prompt.3.
feat(llm): exposetemperature,top_p,max_tokens(closes #514)Adds optional
STRIX_LLM_TEMPERATURE/STRIX_LLM_TOP_P/STRIX_LLM_MAX_TOKENS, threaded throughmake_model_settingsinto the SDKModelSettings. All unset by default → provider defaults, so no behavior change. Params a model rejects are dropped automatically (litellm.drop_paramsis already enabled).4.
docs(local-models): document context-window sizing (closes #286)Documents that agentic prompts easily exceed small local-runtime context windows (Ollama defaults to ~4096 tokens) and how to raise it. Docs only.
Verification
ruff 0.11.13(lint + format),mypy 1.16.0(strict,--install-types), andbanditare green on the full changed set.code_locationsdrop (and whitebox-preserve), and generation-param threading withNonedefaults.strix --helpand importing every changed module both succeed.pytestinfra would be out of scope for these fixes.