feat: support kagi web search and page extract#8435
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces Kagi Search API integration by adding KagiWebSearchTool and KagiExtractWebPageTool, along with corresponding configuration options, localization updates, and unit tests. The review feedback highlights two key areas for improvement: first, a bug in _kagi_search where it only parses the "search" field, which will cause the test_kagi_search_collects_secondary_fields unit test to fail; second, a recommendation to add defensive type checking for the urls parameter in KagiExtractWebPageTool to handle cases where a string is passed instead of a list.
| data = await response.json() | ||
| body = data.get("data", {}) | ||
| primary_results = body.get("search", []) | ||
| results: list[SearchResult] = [] | ||
|
|
||
| for item in primary_results: | ||
| results.append( | ||
| SearchResult( | ||
| title=item.get("title", ""), | ||
| url=item.get("url"), | ||
| snippet=item.get("snippet", ""), | ||
| ) | ||
| ) | ||
|
|
||
| return results |
There was a problem hiding this comment.
在单测 test_kagi_search_collects_secondary_fields 中,测试期望 _kagi_search 能够同时收集 adjacent_question 和 infobox 字段的数据。然而,当前的 _kagi_search 实现仅获取了 search 字段,这会导致该单测运行失败。\n\n建议修改 _kagi_search 的实现,循环遍历这三个字段,并对 url 进行非空校验,以确保单测通过并丰富搜索结果。
| data = await response.json() | |
| body = data.get("data", {}) | |
| primary_results = body.get("search", []) | |
| results: list[SearchResult] = [] | |
| for item in primary_results: | |
| results.append( | |
| SearchResult( | |
| title=item.get("title", ""), | |
| url=item.get("url"), | |
| snippet=item.get("snippet", ""), | |
| ) | |
| ) | |
| return results | |
| data = await response.json() | |
| body = data.get("data", {}) | |
| results: list[SearchResult] = [] | |
| for key in ("search", "adjacent_question", "infobox"): | |
| for item in body.get(key, []): | |
| if not item.get("url"): | |
| continue | |
| results.append( | |
| SearchResult( | |
| title=item.get("title", ""), | |
| url=item.get("url", ""), | |
| snippet=item.get("snippet", ""), | |
| ) | |
| ) | |
| return results |
| urls = kwargs.get("urls", []) | ||
| if not urls: | ||
| return "Error: urls must be a non-empty list." |
There was a problem hiding this comment.
防御性编程:如果 LLM 或外部调用不小心将 urls 作为字符串传入,直接使用 urls[:10] 会对字符串进行切片并按字符迭代,导致生成错误的请求参数。\n\n建议增加类型检查,如果 urls 是字符串则自动转换为单元素列表,并确保其为非空列表。
| urls = kwargs.get("urls", []) | |
| if not urls: | |
| return "Error: urls must be a non-empty list." | |
| urls = kwargs.get("urls", []) | |
| if isinstance(urls, str): | |
| urls = [urls] | |
| if not isinstance(urls, list) or not urls: | |
| return "Error: urls must be a non-empty list." |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_kagi_searchimplementation currently only maps thesearcharray from the Kagi response, but the newtest_kagi_search_collects_secondary_fieldsexpectsadjacent_questionandinfoboxentries to be merged into the results, so either the implementation or the test needs to be updated to align on the intended behavior. - For
_kagi_searchand_kagi_extract, consider raising a more specific exception type (or a shared custom error) instead ofException, so callers can distinguish Kagi API failures from other errors more reliably.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_kagi_search` implementation currently only maps the `search` array from the Kagi response, but the new `test_kagi_search_collects_secondary_fields` expects `adjacent_question` and `infobox` entries to be merged into the results, so either the implementation or the test needs to be updated to align on the intended behavior.
- For `_kagi_search` and `_kagi_extract`, consider raising a more specific exception type (or a shared custom error) instead of `Exception`, so callers can distinguish Kagi API failures from other errors more reliably.
## Individual Comments
### Comment 1
<location path="astrbot/core/tools/web_search_tools.py" line_range="869-870" />
<code_context>
+ f"Kagi extract failed: {reason}, status: {response.status}",
+ )
+ data = await response.json()
+ results: list[dict] = data.get("data", [])
+ if not results:
+ raise ValueError("Error: Kagi extract does not return any results.")
+ return results
</code_context>
<issue_to_address>
**issue (bug_risk):** Unhandled ValueError from `_kagi_extract` will bubble up and likely break the tool call flow.
In `_kagi_extract`, an empty result set raises `ValueError`, but `KagiExtractWebPageTool.call` doesn’t handle it, so a valid-but-empty response becomes an uncaught exception while other error paths return user-facing error strings. Either return an empty list here and let `call` turn that into a friendly error (like `KagiWebSearchTool`), or catch this `ValueError` in `call` and map it to a `ToolExecResult`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| results: list[dict] = data.get("data", []) | ||
| if not results: |
There was a problem hiding this comment.
issue (bug_risk): Unhandled ValueError from _kagi_extract will bubble up and likely break the tool call flow.
In _kagi_extract, an empty result set raises ValueError, but KagiExtractWebPageTool.call doesn’t handle it, so a valid-but-empty response becomes an uncaught exception while other error paths return user-facing error strings. Either return an empty list here and let call turn that into a friendly error (like KagiWebSearchTool), or catch this ValueError in call and map it to a ToolExecResult.
我一直使用 Kagi 的搜索功能,最近看到他们推出了 Kagi API 服务,想着 AstrBot 也支持网页搜索功能,所以让 AI (MiMo V2.5 Pro) 写了写代码支持了一下
https://kagi.com/api/docs/openapi
Modifications / 改动点
主要改动发生在
astrbot/core/tools/web_search_tools.pyScreenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add Kagi as a configurable web search provider with both search and page extraction capabilities, integrating it into AstrBot’s tooling and configuration system.
New Features:
Enhancements:
Tests: