ScholarAIO uses two config files:
| File | Tracked | Purpose |
|---|---|---|
config.yaml |
Yes | Default settings |
config.local.yaml |
No (git-ignored) | API keys and local overrides |
LLM API key lookup order:
config.local.yaml → llm.api_keySCHOLARAIO_LLM_API_KEYllm.backend:
openai-compat: DEEPSEEK_API_KEY → OPENAI_API_KEYanthropic: ANTHROPIC_API_KEYgoogle: GOOGLE_API_KEY → GEMINI_API_KEYconfig.local.yamlllm:
api_key: "sk-your-key-here"
ingest:
mineru_api_key: "your-mineru-token" # compatibility alias; MINERU_TOKEN is preferred
s2_api_key: "your-semantic-scholar-key" # optional
zotero:
api_key: "your-zotero-key" # optional
library_id: "1234567" # optional
You can also keep the token out of YAML entirely and set MINERU_TOKEN in the environment. MINERU_API_KEY is still accepted as a compatibility alias.
Default: DeepSeek (deepseek-chat) via OpenAI-compatible protocol.
llm:
model: deepseek-chat
base_url: https://api.deepseek.com
ingest:
extractor: robust # regex + LLM (default)
# Other options: auto, regex, llm
embed:
source: modelscope # default (China)
# source: huggingface # for international users
ScholarAIO can sync its data/ directory to a remote machine through rsync.
scholaraio backup run always invokes SSH in batch mode (-o BatchMode=yes), so password prompts and host-key confirmation prompts are intentionally disabled.
backup:
source_dir: data
targets:
lab:
host: backup.example.com
user: alice
path: /srv/scholaraio
port: 22
identity_file: ~/.ssh/id_ed25519
mode: default
compress: true
enabled: true
exclude:
- "*.tmp"
- "metrics.db"
mode supports default, append, and append-verify.Use default for the full ScholarAIO data/ tree, especially when it includes mutable files such as SQLite databases.
append / append-verify for append-only artifacts where the remote copy is expected to be a prefix of the local file.identity_file in config.local.yaml when possible.known_hosts entry ahead of time; otherwise backup run will fail fast instead of waiting for interactive input.Recommended split:
# config.yaml
backup:
source_dir: data
targets:
lab:
host: 192.168.31.229
user: lzmo
path: /srv/scholaraio
port: 1393
mode: default
compress: true
enabled: true
# config.local.yaml
backup:
targets:
lab:
identity_file: ~/.ssh/id_ed25519
# password: your-ssh-password # Optional fallback when the server does not accept your key
Recommended first-run checklist:
known_hosts:
ssh-keyscan -p 1393 192.168.31.229 >> ~/.ssh/known_hostsssh -i ~/.ssh/id_ed25519 -p 1393 lzmo@192.168.31.229 truepassword in config.local.yaml; ScholarAIO will switch to internal non-interactive askpass mode automatically.scholaraio backup run lab --dry-runFor agent workflows, prefer MCP endpoints for both live search and rendered web/PDF extraction:
websearch:
transport: mcp
mcp_url: http://127.0.0.1:8765/mcp
api_key: "optional-token"
mcp_tool: search_bing
webextract:
transport: mcp
mcp_url: http://127.0.0.1:8766/mcp
api_key: "optional-token"
mcp_tool: fetch_url
The legacy HTTP endpoints are still supported:
websearch:
transport: http
base_url: http://127.0.0.1:8765
api_key: "optional-token"
webextract:
transport: http
base_url: http://127.0.0.1:8766
api_key: "optional-token"
Paper2Any is an optional external extension. ScholarAIO keeps the OpenDCAI/Paper2Any checkout outside tracked source, normally under data/runtime/extensions/paper2any/Paper2Any, and talks to it through a lightweight MCP sidecar:
paper2any:
transport: mcp
mcp_url: http://127.0.0.1:8770/mcp
root: null
base_url: http://127.0.0.1:8000
api_key: "optional-sidecar-token"
backend_api_key: "optional-upstream-backend-token"
Agent workflows should start the sidecar with:
scholaraio paper2any setup
scholaraio paper2any mcp-serve
scholaraio paper2any backend-serve # optional, only when a FastAPI workflow is needed
scholaraio paper2any status
If the user wants the agent to prepare Paper2Any’s isolated upstream Python runtime as well, the agent can run scholaraio paper2any setup --install-runtime.
published/ is a local, git-ignored archive for final audited deliverables. The publish-site command can generate a separate static site from published/*/metadata.json.
publish:
site_output_dir: ~/generated-report
# published_dir: published
Run:
scholaraio publish-site
By default, PDFs and generated source ZIPs are copied into the output site so it can be deployed as a standalone GitHub Pages repository. Use --symlink only for local preview.