scholaraio

Webtools Integration (Optional)

ScholarAIO is agent-first: users talk to an agent, and the agent orchestrates local ScholarAIO skills.
If you also want live web search/extraction, ScholarAIO can integrate the same backend daemons used by AnterCreeper/claude-webtools as an external capability layer.

When to use this

Native ScholarAIO entrypoint

ScholarAIO now provides a native URL-ingest command:

scholaraio ingest-link https://example.com/page

This command:

  1. Calls a running qt-web-extractor service.
  2. Pulls rendered page content instead of only raw HTML source.
  3. Writes extracted Markdown into a temporary document inbox.
  4. Reuses the existing ScholarAIO document ingest pipeline.

In practice, this means scholaraio ingest-link can ingest:

The current command resolves qt-web-extractor in this order:

Preferred MCP Transports

For agent workflows, prefer the MCP endpoints exposed by the optional webtools daemons when available. This keeps browser rendering/search state on a local or remote service and lets agents use the same tools without installing Qt WebEngine everywhere.

websearch:
  transport: mcp
  mcp_url: http://127.0.0.1:8765/mcp
  api_key: your_key     # optional; sent as Bearer auth
  mcp_tool: search_bing # optional; default

webextract:
  transport: mcp
  mcp_url: http://127.0.0.1:8766/mcp
  api_key: your_key   # optional; sent as Bearer auth
  mcp_tool: fetch_url # optional; default

If mcp_url is omitted, ScholarAIO derives it from the corresponding base_url by appending /mcp. The older HTTP paths remain supported:

websearch:
  transport: http
  base_url: http://127.0.0.1:8765
  api_key: your_key

webextract:
  transport: http
  base_url: http://127.0.0.1:8766
  api_key: your_key

Environment fallbacks are also supported:

ScholarAIO’s MCP client follows the MCP Streamable HTTP lifecycle: initialize -> notifications/initialized -> tools/list or tools/call. It sends Bearer auth when configured and honors the protocol version negotiated by the server.

Protocol references:

Agent MCP Registration

You can also expose the same daemon tools directly to MCP-capable agents. The repository-level .mcp.json is an agent-neutral server inventory for hosts that support that convention. Other agents, including Codex, may need the same servers registered in their own MCP config store. Do not commit real secrets in project files.

Generic MCP endpoints:

Server URL Tool
web-search http://127.0.0.1:8765/mcp search_bing
web-extractor http://127.0.0.1:8766/mcp fetch_url

Claude Code can consume the project .mcp.json directly, or register the same servers with:

claude mcp add --transport http web-search http://127.0.0.1:8765/mcp
claude mcp add --transport http web-extractor http://127.0.0.1:8766/mcp

claude mcp add --transport http web-search http://127.0.0.1:8765/mcp \
  --header "Authorization: Bearer your-search-key"
claude mcp add --transport http web-extractor http://127.0.0.1:8766/mcp \
  --header "Authorization: Bearer your-extractor-key"

Codex / OpenAI Codex CLI uses its own MCP registry. Register the same Streamable HTTP servers with:

codex mcp add web-search --url http://127.0.0.1:8765/mcp
codex mcp add web-extractor --url http://127.0.0.1:8766/mcp

codex mcp add web-search --url http://127.0.0.1:8765/mcp \
  --bearer-token-env-var GUILESS_BING_SEARCH_API_KEY
codex mcp add web-extractor --url http://127.0.0.1:8766/mcp \
  --bearer-token-env-var QT_WEB_EXTRACTOR_API_KEY

The equivalent Codex config shape in ~/.codex/config.toml is:

[mcp_servers.web-search]
url = "http://127.0.0.1:8765/mcp"

[mcp_servers.web-extractor]
url = "http://127.0.0.1:8766/mcp"
bearer_token_env_var = "QT_WEB_EXTRACTOR_API_KEY"

Project-scoped .mcp.json example for agents that support it:

{
  "mcpServers": {
    "web-search": {
      "type": "http",
      "url": "${GUILESS_BING_SEARCH_MCP_URL:-http://127.0.0.1:8765/mcp}",
      "headers": {
        "Authorization": "Bearer ${GUILESS_BING_SEARCH_API_KEY:-}"
      }
    },
    "web-extractor": {
      "type": "http",
      "url": "${QT_WEB_EXTRACTOR_MCP_URL:-http://127.0.0.1:8766/mcp}",
      "headers": {
        "Authorization": "Bearer ${QT_WEB_EXTRACTOR_API_KEY:-}"
      }
    }
  }
}

Known tool names:

Verification commands:

claude mcp list
codex mcp list --json
  1. Install and configure the backend services:
    • qt-web-extractor for rendered URL/PDF extraction, preferably exposed via MCP
    • optional GUILessBingSearch for search-first workflows, preferably exposed via MCP when using agent tooling
  2. Keep ScholarAIO as the authoritative local knowledge pipeline (ingest/index/enrich).
  3. In agent workflows:
    • use ScholarAIO first for reproducible local evidence;
    • use webtools only when freshness or external coverage is required.

qt-web-extractor is an external daemon, not a built-in ScholarAIO fetcher. ScholarAIO delegates browser rendering to that service, through MCP or the legacy HTTP endpoint, and then continues with its own ingest pipeline.

Operational guidelines