Skip to content

feat: support kagi web search and page extract#8435

Open
imlonghao wants to merge 1 commit into
AstrBotDevs:masterfrom
imlonghao:feat/kagi
Open

feat: support kagi web search and page extract#8435
imlonghao wants to merge 1 commit into
AstrBotDevs:masterfrom
imlonghao:feat/kagi

Conversation

@imlonghao
Copy link
Copy Markdown

@imlonghao imlonghao commented May 30, 2026

我一直使用 Kagi 的搜索功能,最近看到他们推出了 Kagi API 服务,想着 AstrBot 也支持网页搜索功能,所以让 AI (MiMo V2.5 Pro) 写了写代码支持了一下

https://kagi.com/api/docs/openapi

Modifications / 改动点

主要改动发生在 astrbot/core/tools/web_search_tools.py

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

PixPin_2026-05-30_16-17-27 PixPin_2026-05-30_16-18-18 PixPin_2026-05-30_16-19-08

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add Kagi as a configurable web search provider with both search and page extraction capabilities, integrating it into AstrBot’s tooling and configuration system.

New Features:

  • Introduce Kagi-based web search and web page extraction tools, including API key configuration and provider selection in settings.
  • Support Kagi search and extract tools in the main agent’s web search tool injection flow and citation-aware message rendering in the dashboard.

Enhancements:

  • Normalize legacy web search configuration to migrate single Kagi API keys into the new list-based format.
  • Register Kagi tools as builtin function tools alongside existing web search providers.

Tests:

  • Add unit tests covering Kagi search and extract behavior, error handling, HTTP interaction, key configuration, and tool registration in the agent and function tool manager.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 30, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Kagi Search API integration by adding KagiWebSearchTool and KagiExtractWebPageTool, along with corresponding configuration options, localization updates, and unit tests. The review feedback highlights two key areas for improvement: first, a bug in _kagi_search where it only parses the "search" field, which will cause the test_kagi_search_collects_secondary_fields unit test to fail; second, a recommendation to add defensive type checking for the urls parameter in KagiExtractWebPageTool to handle cases where a string is passed instead of a list.

Comment on lines +834 to +848
data = await response.json()
body = data.get("data", {})
primary_results = body.get("search", [])
results: list[SearchResult] = []

for item in primary_results:
results.append(
SearchResult(
title=item.get("title", ""),
url=item.get("url"),
snippet=item.get("snippet", ""),
)
)

return results
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

在单测 test_kagi_search_collects_secondary_fields 中,测试期望 _kagi_search 能够同时收集 adjacent_questioninfobox 字段的数据。然而,当前的 _kagi_search 实现仅获取了 search 字段,这会导致该单测运行失败。\n\n建议修改 _kagi_search 的实现,循环遍历这三个字段,并对 url 进行非空校验,以确保单测通过并丰富搜索结果。

Suggested change
data = await response.json()
body = data.get("data", {})
primary_results = body.get("search", [])
results: list[SearchResult] = []
for item in primary_results:
results.append(
SearchResult(
title=item.get("title", ""),
url=item.get("url"),
snippet=item.get("snippet", ""),
)
)
return results
data = await response.json()
body = data.get("data", {})
results: list[SearchResult] = []
for key in ("search", "adjacent_question", "infobox"):
for item in body.get(key, []):
if not item.get("url"):
continue
results.append(
SearchResult(
title=item.get("title", ""),
url=item.get("url", ""),
snippet=item.get("snippet", ""),
)
)
return results

Comment on lines +944 to +946
urls = kwargs.get("urls", [])
if not urls:
return "Error: urls must be a non-empty list."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

防御性编程:如果 LLM 或外部调用不小心将 urls 作为字符串传入,直接使用 urls[:10] 会对字符串进行切片并按字符迭代,导致生成错误的请求参数。\n\n建议增加类型检查,如果 urls 是字符串则自动转换为单元素列表,并确保其为非空列表。

Suggested change
urls = kwargs.get("urls", [])
if not urls:
return "Error: urls must be a non-empty list."
urls = kwargs.get("urls", [])
if isinstance(urls, str):
urls = [urls]
if not isinstance(urls, list) or not urls:
return "Error: urls must be a non-empty list."

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _kagi_search implementation currently only maps the search array from the Kagi response, but the new test_kagi_search_collects_secondary_fields expects adjacent_question and infobox entries to be merged into the results, so either the implementation or the test needs to be updated to align on the intended behavior.
  • For _kagi_search and _kagi_extract, consider raising a more specific exception type (or a shared custom error) instead of Exception, so callers can distinguish Kagi API failures from other errors more reliably.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_kagi_search` implementation currently only maps the `search` array from the Kagi response, but the new `test_kagi_search_collects_secondary_fields` expects `adjacent_question` and `infobox` entries to be merged into the results, so either the implementation or the test needs to be updated to align on the intended behavior.
- For `_kagi_search` and `_kagi_extract`, consider raising a more specific exception type (or a shared custom error) instead of `Exception`, so callers can distinguish Kagi API failures from other errors more reliably.

## Individual Comments

### Comment 1
<location path="astrbot/core/tools/web_search_tools.py" line_range="869-870" />
<code_context>
+                    f"Kagi extract failed: {reason}, status: {response.status}",
+                )
+            data = await response.json()
+            results: list[dict] = data.get("data", [])
+            if not results:
+                raise ValueError("Error: Kagi extract does not return any results.")
+            return results
</code_context>
<issue_to_address>
**issue (bug_risk):** Unhandled ValueError from `_kagi_extract` will bubble up and likely break the tool call flow.

In `_kagi_extract`, an empty result set raises `ValueError`, but `KagiExtractWebPageTool.call` doesn’t handle it, so a valid-but-empty response becomes an uncaught exception while other error paths return user-facing error strings. Either return an empty list here and let `call` turn that into a friendly error (like `KagiWebSearchTool`), or catch this `ValueError` in `call` and map it to a `ToolExecResult`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +869 to +870
results: list[dict] = data.get("data", [])
if not results:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Unhandled ValueError from _kagi_extract will bubble up and likely break the tool call flow.

In _kagi_extract, an empty result set raises ValueError, but KagiExtractWebPageTool.call doesn’t handle it, so a valid-but-empty response becomes an uncaught exception while other error paths return user-facing error strings. Either return an empty list here and let call turn that into a friendly error (like KagiWebSearchTool), or catch this ValueError in call and map it to a ToolExecResult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant