scholaraio

Installation

Requirements

Install from PyPI

# Core installation
pip install scholaraio

# Full installation (embed + topics + import + pdf + office + draw)
pip install "scholaraio[full]"

Then run:

scholaraio setup

Install from Source

git clone https://github.com/zimoliao/scholaraio.git
cd scholaraio

# Core only (search, export, audit)
pip install -e .

# Full installation (embed + topics + import + pdf + office + draw)
pip install -e ".[full]"

Use the source install path when you want to inspect the codebase, edit the package locally, or contribute changes upstream.

Optional Dependencies

Extra What it adds
embed Semantic search (sentence-transformers + FAISS)
topics BERTopic topic modeling
pdf PyMuPDF-based PDF fallback and long-PDF utilities
import Endnote / Zotero import
office DOCX / PPTX / XLSX ingest and inspection
draw Python helpers for Mermaid and custom SVG drawing; Graphviz dot and Inkscape are system tools checked by setup check
full Core research workflow extras: embed + topics + import + pdf + office + draw
dev Development tools (pytest, ruff, mypy)

Setup Wizard

Run the interactive setup wizard to configure API keys and directories:

scholaraio setup

Or check what’s already configured:

scholaraio setup check

setup check is the most complete initial diagnostic surface. It covers:

Current setup guidance prefers MinerU first whenever a MinerU path is available (local service or mineru-open-api + token). Docling and then PyMuPDF remain the fallback chain when MinerU is not usable or when the user explicitly prefers a lighter parser path.

Cost transparency:

Agent Setup

If you want to know which path to use for Claude Code, Codex, OpenClaw, Cursor, or other agents, see:

That guide separates:

Embedding Model

The embedding model (Qwen3-Embedding-0.6B, ~1.2 GB) downloads automatically on first use. For users outside China, set embed.source: huggingface in config.yaml.