Status: Historical compatibility-window record
Last Updated: 2026-04-24
Scope: historical execution order for directory and path migration. This file preserves the compatibility-window migration record; it is no longer the source of truth for the active breaking-cleanup runtime.
2026-04-24 breaking cleanup note:
docs/development/breaking-compat-cleanup-plan.mdscholaraio migrate finalize --confirmThis document defines the recommended execution order for migrating ScholarAIO toward the target directory structure described in docs/development/directory-structure-spec.md.
This is an implementation-order document, not a vision document. Its job is to answer:
This document should be read together with:
docs/development/migration-mechanism-spec.mdThat companion document defines the control-plane contract (instance.json, migration.lock, journal, verification, cleanup gating). This document defines when that machinery must appear in the execution order.
The sequence below is based on direct inspection of the current codebase and on path-related regression tests revalidated on 2026-04-19 after the earlier 2026-04-16, 2026-04-17, and 2026-04-18 audits.
The entries in this subsection describe the compatibility-window state that was audited before the 2026-04-24 breaking cleanup. Parenthetical legacy import paths below are historical targets from that window; they are not active public facades in the current release generation.
scholaraio/core/config.py (scholaraio.config compatibility alias)scholaraio/cli.pyscholaraio/projects/workspace.py (scholaraio.workspace compatibility alias)scholaraio/services/insights.py (scholaraio.insights compatibility alias)scholaraio/services/setup.py (scholaraio.setup compatibility alias)scholaraio/services/index.py (scholaraio.index compatibility alias)scholaraio/services/loader.py (scholaraio.loader compatibility alias)scholaraio/services/topics.py (scholaraio.topics compatibility alias)scholaraio/services/vectors.py (scholaraio.vectors compatibility alias)scholaraio/stores/explore.py (scholaraio.explore compatibility alias)scholaraio/services/diagram.py (scholaraio.diagram compatibility alias)scholaraio/stores/proceedings.py (scholaraio.proceedings compatibility alias)scholaraio/services/patent_fetch.py (scholaraio.patent_fetch compatibility alias)scholaraio/providers/arxiv.py (scholaraio.sources.arxiv compatibility alias)scholaraio/providers/endnote.py (scholaraio.sources.endnote compatibility alias)scholaraio/providers/zotero.py (scholaraio.sources.zotero compatibility alias)scholaraio/providers/mineru.py (scholaraio.ingest.mineru compatibility alias)scholaraio/providers/pdf_fallback.py (scholaraio.ingest.pdf_fallback compatibility alias)scholaraio/services/ingest_metadata/extractor.py (scholaraio.ingest.extractor compatibility alias)scholaraio/services/ingest/parser_matrix_benchmark.py (scholaraio.ingest.parser_matrix_benchmark compatibility alias)scholaraio/services/ingest/proceedings_volume.py (scholaraio.ingest.proceedings compatibility alias)scholaraio/providers/uspto_odp.py (scholaraio.uspto_odp compatibility alias)scholaraio/providers/uspto_ppubs.py (scholaraio.uspto_ppubs compatibility alias)scholaraio/services/backup.py (scholaraio.backup compatibility alias)scholaraio/sources/webtools.pyscholaraio/toolref/paths.pyscholaraio/toolref/_legacy_snapshot.pyscholaraio/stores/citation_styles.py (scholaraio.citation_styles compatibility alias)scholaraio/services/translate.py (scholaraio.translate compatibility alias)scholaraio/services/ingest/pipeline.py (scholaraio.ingest.pipeline compatibility alias).qwen/QWEN.mdclawhub.yaml.cursor/rules/scholaraio.mdcHistorical audited facts from the compatibility-window generation:
The following bullets intentionally preserve the pre-cleanup compatibility
language used during migration planning. In the active release generation, legacy
public import facades have been removed; see docs/development/breaking-compat-cleanup-plan.md
and docs/guide/agent-reference.md for the current contract.
scholaraio/core/config.py now exposes a much broader runtime-path accessor surface and routes ensure_dirs() through those accessors, which lowers path-migration risk but does not yet remove direct path construction in downstream modulesscholaraio/core/config.py now resolves index_db, metrics_db_path, and topics_model_dir through logical state_root subdirectories for fresh installs, while still auto-detecting existing legacy data/index.db, data/metrics.db, and data/topic_model/ storesscholaraio/core/config.py locks current root-level config.yaml discovery behavior; scholaraio.config remains the legacy import path for compatibilityscholaraio/projects/workspace.py now owns the workspace paper-index layout contract and supports both legacy root papers.json and future-compatible refs/papers.json; scholaraio.workspace remains the compatibility aliasscholaraio/cli.py now routes workspace-root defaults through _workspace_root() and defers workspace index existence checks to scholaraio.workspacescholaraio/services/setup.py:531-537 now checks current runtime directories via Config accessors instead of fixed string literalsscholaraio/services/index.py now owns keyword search, proceedings search, registry lookup, citation graph helpers, and unified search orchestration; scholaraio.index remains a module alias so CLI and tests can keep using the public legacy import pathscholaraio/services/loader.py now owns L1-L4 layered paper loading, agent notes, TOC enrichment, and L3 extraction; scholaraio.loader remains a module alias so CLI and tests can keep using the public legacy import pathscholaraio/services/topics.py now owns BERTopic fitting, topic browsing, reduction/merge helpers, model persistence, and visualizations; scholaraio.topics remains a module alias so CLI, explore, workspace, and tests can keep using the public legacy import pathscholaraio/services/vectors.py now owns embedding backend selection, vector index maintenance, FAISS helpers, and semantic search; scholaraio.vectors remains a module alias so CLI, topics, explore, and tests can keep using the public legacy import pathscholaraio/stores/explore.py now follows cfg.explore_root; scholaraio.explore remains the compatibility alias, fresh instances default to data/libraries/explore/, while existing legacy data/explore/ remains auto-detectedscholaraio/services/diagram.py now routes default output through cfg.workspace_figures_dir; fresh instances default to workspace/_system/figures/scholaraio/stores/proceedings.py now owns proceedings storage iteration and proceedings DB path helpers; scholaraio.proceedings remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/services/patent_fetch.py:download_patent_pdf now prefers cfg.patent_inbox_dir; fresh no-config defaults use data/spool/inbox-patent, while legacy data/inbox-patent remains auto-detected through Configscholaraio/providers/arxiv.py now owns arXiv search, metadata fetch, and PDF download helpers; scholaraio.sources.arxiv remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/providers/endnote.py now owns EndNote XML/RIS parsing and PDF attachment discovery; scholaraio.sources.endnote remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/providers/zotero.py now owns Zotero Web API and local SQLite parsing; scholaraio.sources.zotero remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/providers/mineru.py now owns MinerU local/cloud PDF parsing helpers and its module CLI; scholaraio.ingest.mineru remains a module alias/import-compatible CLI delegator so existing imports, monkeypatches, and python -m scholaraio.ingest.mineru still target the same implementationscholaraio/providers/pdf_fallback.py now owns Docling/PyMuPDF fallback parsing; scholaraio.ingest.pdf_fallback remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/services/ingest_metadata/extractor.py now owns Stage-1 metadata extraction modes; scholaraio.ingest.extractor remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/services/ingest/parser_matrix_benchmark.py now owns parser matrix benchmarking helpers; scholaraio.ingest.parser_matrix_benchmark remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/services/ingest/proceedings_volume.py now owns proceedings volume preparation, split-plan application, and clean-plan application; scholaraio.ingest.proceedings remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/providers/uspto_odp.py now owns the USPTO ODP API client; scholaraio.uspto_odp remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/providers/uspto_ppubs.py now owns the USPTO PPUBS session/search/PDF-export client; scholaraio.uspto_ppubs remains a module alias so existing imports and monkeypatches still target the same implementationscholaraio/services/ingest/pipeline.py now exposes the ingest pipeline compatibility facade, and routes inbox/pending/proceedings defaults through small accessor helpers (_inbox_dir, _pending_dir, _proceedings_dir, etc.), with fresh queue defaults under data/spool/scholaraio/services/backup.py plus scholaraio/core/config.py confirm backup still defaults to syncing data/, not the full runtime rootscholaraio/providers/webtools.py forms the external-adapter seam; websearch supports both legacy HTTP /search and GUILessBingSearch MCP search_bing, while webextract supports both legacy HTTP /extract and qt-web-extractor MCP fetch_urlscholaraio/toolref/paths.py:9-20 now follows cfg.toolref_root; fresh instances default to data/libraries/toolref/, while existing legacy data/toolref/ remains auto-detectedscholaraio/toolref/_legacy_snapshot.py:111-130 preserves the same cfg.toolref_root behavior in a parallel legacy implementationscholaraio/stores/citation_styles.py:253-255 now follows cfg.citation_styles_dir; fresh instances default to data/libraries/citation_styles/, while existing legacy data/citation_styles/ remains auto-detectedscholaraio/services/translate.py now resolves portable translation bundles through cfg.translation_bundle_root; fresh instances default to workspace/_system/translation-bundles/scholaraio/services/ingest/pipeline.py still concentrates the compatibility surface for queue/proceedings orchestration, but its default directory resolution now flows through explicit helper functions instead of raw literals.qwen/QWEN.md:9-13 and clawhub.yaml:16-127 confirm that skill discovery is rooted in .claude/skills/tests/test_cursor_rules.py:8-27, tests/test_academic_writing_skills.py:80-87, tests/test_workspace.py, tests/test_explore.py:42-43, and tests/test_translate.py:230-320 lock the current discovery and path contracts, including workspace legacy/future compatibility plus configured translation-bundle overridesThe following test batches were executed successfully as migration-baseline verification:
python -m pytest -q \
tests/test_cursor_rules.py \
tests/test_writing_docs_alignment.py \
tests/test_academic_writing_skills.py \
tests/test_skill_routing_smoke.py \
tests/test_workspace.py \
tests/test_config.py
python -m pytest -q \
tests/test_explore.py \
tests/test_translate.py \
tests/test_webtools_source.py \
tests/test_ingest_link_cli.py \
tests/test_proceedings.py \
tests/test_cli_messages.py
# 2026-04-17 revalidation after additional develop-branch merges
python -m pytest -q \
tests/test_config.py \
tests/test_explore.py \
tests/test_translate.py \
tests/test_proceedings.py \
tests/test_metrics.py
python -m pytest -q \
tests/test_patent_tools.py \
tests/test_backup.py \
tests/test_diagram.py \
tests/test_document.py
Observed result:
283 passing tests on 2026-04-18165 passing tests plus 100 passing tests and 3 skipsThese constraints are already enforced by current code, wrappers, and tests. The migration sequence MUST treat them as frozen until their replacements are intentionally designed and tested.
Compatibility-window behavior in scholaraio/core/config.py (before removal of
the legacy scholaraio.config import path):
load_config() resolves paths relative to the directory containing config.yaml_find_config_file() searches upward for config.yaml~/.scholaraio/config.yamlImplication:
config.yaml and config.local.yaml MUST remain valid at runtime-instance root during early and middle migration phasesconfig/ MUST NOT happen before config discovery is redesignedCurrent wrappers and tests assume fixed root-level entry points:
AGENTS.mdCLAUDE.mdAGENTS_CN.md.qwen/QWEN.md.cursor/rules/.clinerules.windsurfrules.github/copilot-instructions.md.claude-plugin/clawhub.yamlImplication:
Current audited behavior:
.claude/skills/ is the canonical skill source.agents/skills, .qwen/skills, and skills are compatibility aliasesclawhub.yaml registers skill paths as .claude/skills/<name>.qwen/QWEN.md explicitly instructs Qwen to use .qwen/skills/.cursor/rules/scholaraio.mdc references .claude/skills/*/SKILL.mdImplication:
SKILL.md files into scholaraio/Current code assumes both of the following runtime top-level directories exist:
data/workspace/Implication:
data/libraries, data/spool, data/state, and related subtrees can only happen after accessor cutoverMigration MUST follow this order:
Direct physical directory moves before consumer cutover are explicitly out of order.
The execution order below follows the actual current coupling, not the desired architecture.
Config Is Now the Runtime Path Authority for First Migration Rootsscholaraio/core/config.py currently exposes accessors for the major runtime roots:
ensure_dirs() now routes through these accessors rather than recreating fixed legacy strings.
Implication:
Current behavior:
scholaraio/projects/workspace.py now owns the paper-index layout contract and can read both <workspace-root>/<name>/papers.json and <workspace-root>/<name>/refs/papers.jsonscholaraio/projects/workspace.py still creates new workspaces with root papers.json by default, but preserves existing future-compatible refs/papers.json layouts for reads and writesscholaraio/cli.py now routes workspace paths through _workspace_root() / cfg.workspace_dir and uses scholaraio.workspace helpers for workspace index detectionscholaraio/services/insights.py now counts workspaces through scholaraio.workspace.paper_count()scholaraio/services/diagram.py defaults generated diagrams through cfg.workspace_figures_dir, which resolves to <workspace>/_system/figures/ for fresh instancesscholaraio/interfaces/cli/export.py defaults DOCX export through cfg.workspace_docx_output_path, which resolves to <workspace>/_system/output/output.docx for fresh instancestests/test_workspace.py now locks both legacy and future-compatible workspace paper-index contractstests/test_translate.py locks both the fresh portable bundle default under workspace/_system/translation-bundles/ and explicit translation_bundle_root overridesImplication:
projects/workspace.pypapers.json path checkspipeline.pyCurrent behavior:
scholaraio/services/ingest/pipeline.py still owns the compatibility surface for queue/proceedings orchestration, but default directory resolution now uses helper accessors wired to Configscholaraio/cli.py now routes arXiv downloads through cfg.inbox_dir / _default_inbox_dir()scholaraio/services/patent_fetch.py now routes patent downloads through cfg.patent_inbox_dir when available, with legacy fallback preserved for compatibilityscholaraio/services/setup.py checks configured queue roots, pending spool, and workspace through ConfigImplication:
Current audited path helpers:
scholaraio/stores/explore.py now follows cfg.explore_root with a durable-library fresh default and legacy fallbackscholaraio/toolref/paths.py now follows cfg.toolref_root with a durable-library fresh default and legacy fallbackscholaraio/toolref/_legacy_snapshot.py still mirrors the same config-backed root behaviorscholaraio/stores/citation_styles.py now follows cfg.citation_styles_dir with a durable-library fresh default and legacy fallbackscholaraio/services/translate.py now resolves portable translation bundles through translation_bundle_rootImplication:
explore, toolref, citation_styles, and portable translation outputs are safer physical-move candidates than papersCurrent behavior:
services/backup.py syncs cfg.backup_source_dirdata/workspace/, config files, or future migration-control metadataImplication:
The migration is split into two tracks:
Track A is the critical path and MUST happen first. Track B SHOULD begin only after Track A has established stable accessors and compatibility layers.
Objective:
Actions:
.claude/skills/ unchangeddata/ and workspace/ unchangedDo not do yet:
config.yamlExit criteria:
Config the Complete Path AuthorityObjective:
ConfigPrimary audit reference:
docs/development/config-surface-audit.mdRequired additions in or around scholaraio/core/config.py:
inbox_dirdoc_inbox_dirthesis_inbox_dirpatent_inbox_dirproceedings_inbox_dirpending_dirproceedings_direxplore_roottoolref_rootcitation_styles_dirtranslation_bundle_rootstate_rootcache_rootruntime_rootImmediate consumer updates in this phase:
Config.ensure_dirs() must switch to the new accessorsscholaraio/services/setup.py directory checks must switch to the new accessorsRationale:
Exit criteria:
Objective:
Modules in scope:
scholaraio/stores/explore.py (scholaraio.explore compatibility alias)scholaraio/toolref/paths.pyscholaraio/toolref/_legacy_snapshot.pyscholaraio/stores/citation_styles.py (scholaraio.citation_styles compatibility alias)scholaraio/services/translate.py (scholaraio.translate compatibility alias)Required changes:
cfg._root / "data" / ... constructions with Config accessorscitation_styles_dir accessor behavior: fresh durable-library default plus legacy fallback until cleanup is completeworkspace/translation-ws/ in helper implementations; route through translation_bundle_rootWhy this phase comes early:
pipeline.pyExit criteria:
tests/test_explore.py and tests/test_translate.py still passObjective:
cfg._root / "workspace" and similar constructions from non-orchestration interface codeModules in scope:
scholaraio/cli.pyscholaraio/services/insights.py (scholaraio.insights compatibility alias)Required changes:
cfg.workspace_dir everywhere instead of re-constructing workspace rootworkspace/figures/ and workspace/output.docx behind explicit helpers or compatibility rules instead of raw literalsWhy this phase is separate from A2:
Exit criteria:
cmd_ws, _resolve_ws_paper_ids, arXiv inbox download, patent fetch defaults, and insights workspace listing no longer hardcode raw runtime paths where an accessor existsObjective:
workspace/<name>/papers.json contractCurrent constraint:
papers.json at workspace root as the canonical paper-ref indexRequired changes:
scholaraio/projects/workspace.py to become the single authority for workspace layoutpapers.jsonrefs/papers.jsonworkspace.yaml manifest is introduced, treat it as additive first, not replacing legacy files immediatelyDo not do yet:
workspace/papers.jsonRationale:
For the compatibility window and the next migration design pass, the workspace-topology direction is now fixed:
workspace/translation-ws/ -> workspace/_system/translation-bundles/workspace/figures/ -> workspace/_system/figures/workspace/output.* -> workspace/_system/output/workspace.yaml envelope is schema_version, optional name / description / tags, optional explicit mounts, and optional outputs; it MUST NOT replace root papers.json or future-compatible refs/papers.jsonoutputs.default_dir stays workspace-relative, and shared-store mounts are logical IDs rather than physical pathsexplore remains a shared store for the compatibility window; if workspace-local mounts are added later, they MUST be explicit manifest-declared opt-ins and SHOULD start read-only.claude/skills/ remains the canonical skill source and is not a migration targetworkspace/<name>/outputs/Exit criteria:
workspace.py owns the layout contracttests/test_workspace.py still passImplementation status (2026-04-20):
workspace.py, cli.py, and insights.py now consume the compatibility layer instead of assuming only root papers.jsonprojects/workspace.py:read_manifest() now parses workspace.yaml when present, normalizes supported schema-v1 metadata, preserves unknown top-level keys, rejects path-like shared-store mounts, and treats newer schema versions as opaque metadata instead of rewriting them blindlyinterfaces/cli/workspace.py now surfaces additive workspace.yaml metadata in ws list / ws show, while keeping manifest-declared mounts informational only and not turning them into active runtime routingrefs/papers.json workspacepipeline.pyObjective:
Modules in scope:
scholaraio/services/ingest/pipeline.py (scholaraio.ingest.pipeline compatibility alias)scholaraio/cli.pyscholaraio/services/setup.py (scholaraio.setup compatibility alias)Required changes:
run_pipeline()import_external()_move_to_pending()Why this is late:
pipeline.py is the densest operational hub for runtime directoriesExit criteria:
tests/test_ingest_link_cli.py and tests/test_proceedings.py still passObjective:
Current central state locations:
index.dbmetrics.dbtopic_model/Required changes:
data/state/search/data/state/metrics/data/state/topics/Why this phase precedes major library moves:
papersExit criteria:
Implementation status (2026-04-19):
Config now exposes logical search_state_dir, metrics_state_dir, and topics_state_dirdata/state/search/index.db, data/state/metrics/metrics.db, and data/state/topics/data/index.db, data/metrics.db, and data/topic_model/ are still discovered automatically when presentObjective:
Required changes:
.scholaraio-control/ directoryinstance.jsonmigration.lockRequired reference:
docs/development/migration-mechanism-spec.mdWhy this phase exists here:
Exit criteria:
Historical implementation status during the compatibility window (2026-04-23, superseded by the 2026-04-24 breaking cleanup):
The bullets below are retained as an execution log. Statements that a legacy module path “remains” an alias refer to the compatibility-window implementation, not to the active release generation.
Config exposes .scholaraio-control/, instance.json, migration.lock, and journal-root accessorsinstance.json with legacy_implicit state when metadata is absentmigration.lock exists, and scholaraio migrate status|recover --clear-lock provides the recovery surfaceinstance.json.layout_version is newer than the running program supports, while migrate status remains available for diagnosis.scholaraio-control/migrations/<migration-id>/, and migrate status reports the current journal inventoryscholaraio migrate plan creates a non-executing journal-backed inventory record (plan.json) with store-level target metadata and planned legacy-move recordsscholaraio migrate verify refreshes verify.json and records component-aware checks covering papers/workspaces/index-registry/keyword-search/citation-style loadability/explore openability/toolref current-version resolution/proceedings search/translation-resume inventoryscholaraio migrate run --store citation_styles --confirm, toolref, explore, proceedings, spool, and papers copy legacy stores into their current targets and record cleanup candidatesscholaraio migrate cleanup enforces a passed-verify gate, records preview/confirm journal steps, and archives explicit cleanup candidates under the migration journal instead of deleting them directlyObjective:
Recommended move order:
citation_stylestoolrefexploreTarget subtree:
data/libraries/Recommended approach:
Why these three come first:
papersexplore and writing/skill discovery separatelyExit criteria:
proceedings as a Durable LibraryObjective:
data/proceedings into the durable-library subtree only after pipeline consumers no longer depend on fixed raw pathsTarget:
data/libraries/proceedings/Why this is not grouped with A7:
proceedings is more tightly coupled to ingest orchestration than toolref, explore, or citation_stylesWhy this still happens before papers:
Exit criteria:
Implementation status (2026-04-20):
Config.proceedings_dir resolves to data/libraries/proceedings/, existing data/proceedings/ remains readable as a legacy fallback, migrate run --store proceedings --confirm copies the full tree, and migrate cleanup --confirm archives the legacy tree into the migration journalObjective:
data/spool/ only after pipeline consumers have been fully abstractedRecommended target:
data/spool/inboxdata/spool/inbox-thesisdata/spool/inbox-patentdata/spool/inbox-docdata/spool/inbox-proceedingsdata/spool/pendingNotes:
Exit criteria:
Implementation status (2026-04-20):
Config queue roots resolve to data/spool/inbox* and data/spool/pending, existing legacy data/inbox* and data/pending queues remain readable as legacy fallbacks, migrate run --store spool --confirm copies all queue roots into data/spool/, and migrate cleanup --confirm archives the legacy queue roots into the migration journalpapers LastObjective:
Target:
data/libraries/papers/Why this is last:
papers is used by the largest number of modulespapers affects search, vectors, topics, workspace references, notes, export, enrich, audit, translate, and many CLI flowsRequired preconditions:
data/papers by string/path conventionExit criteria:
papers physical location is no longer special-cased anywhere outside configuration and explicit migration toolingImplementation status (2026-04-20):
Config.papers_dir resolves to data/libraries/papers/, legacy-default aliases such as data/papers continue to auto-detect existing legacy libraries, migrate run --store papers --confirm copies the full paper tree, and migrate cleanup --confirm archives the legacy paper tree into the migration journalObjective:
Actions:
AGENTS.md, CLAUDE.md, README, setup docs, and relevant skills to describe the final runtime layout plus explicit legacy auto-detectionExit criteria:
Do not do yet:
Config legacy fallback readers in this branchpapers.json compatibilityImplementation status (2026-04-20):
data/libraries/, data/spool/, and data/state/ defaults while documenting legacy auto-detection and explicit migration toolingTrack B is intentionally later and slower than Track A.
gui/ ImmediatelyLow-risk action:
gui/ as a reserved source directory at any timeConstraint:
gui/ MUST remain presentation-onlyImplementation status (2026-04-20):
gui/README.md exists and documents that GUI code must remain presentation-oriented and must not become the source of truth for runtime layout or business behaviorRecommended target namespaces:
scholaraio/core/scholaraio/providers/scholaraio/stores/scholaraio/projects/scholaraio/services/scholaraio/interfaces/scholaraio/compat/Method:
Why not earlier:
Historical implementation status during the compatibility window (2026-04-23; superseded by the 2026-04-24 breaking cleanup):
core / providers / stores / projects / services namespaces, and legacy public module paths were still compatibility aliases at that timeRecommended order:
toolref/*explore.pycitation_styles.pyproceedings.pysources/webtools.pyuspto_odp.pyuspto_ppubs.pyworkspace.pydiagram.pytranslate.pyinsights.pyAdditional constraint for webtools:
websearch, webextract, and ingest-linkLate movers:
cli.pyservices/ingest/pipeline.py (ingest/pipeline.py compatibility alias)Reason:
Historical implementation status during the compatibility window (2026-04-23; superseded by the 2026-04-24 breaking cleanup):
The bullets below are retained as historical execution notes. Statements that a legacy module path “remains” an alias do not describe the active breaking-cleanup release generation.
scholaraio.core.config, scholaraio.core.log, scholaraio.stores.citation_styles, scholaraio.stores.toolref, scholaraio.stores.explore, scholaraio.stores.proceedings, scholaraio.providers.arxiv, scholaraio.providers.endnote, scholaraio.providers.zotero, scholaraio.providers.mineru, scholaraio.providers.pdf_fallback, scholaraio.providers.webtools, scholaraio.providers.uspto_odp, scholaraio.providers.uspto_ppubs, scholaraio.projects.workspace, scholaraio.services.audit, scholaraio.services.backup, scholaraio.services.citation_check, scholaraio.services.diagram, scholaraio.services.document, scholaraio.services.export, scholaraio.services.index, scholaraio.services.loader, scholaraio.services.migration_control, scholaraio.services.patent_fetch, scholaraio.services.setup, scholaraio.services.topics, scholaraio.services.translate, scholaraio.services.vectors, scholaraio.services.insights, scholaraio.services.ingest_metadata (including extractor), scholaraio.services.ingest.parser_matrix_benchmark, and scholaraio.services.ingest.proceedings_volume now re-export the existing implementationsconfig implementation has moved to scholaraio.core.config; scholaraio.config remains a module alias so legacy monkeypatch/import paths still target the real implementationlog implementation has moved to scholaraio.core.log; scholaraio.log remains a module alias so legacy monkeypatch/import paths still target the real implementationaudit implementation has moved to scholaraio.services.audit; scholaraio.audit remains a module alias so legacy monkeypatch/import paths still target the real implementationcitation_styles implementation has moved to scholaraio.stores.citation_styles; scholaraio.citation_styles remains a module alias so legacy monkeypatch/import paths still target the real implementationpapers implementation has moved to scholaraio.stores.papers; scholaraio.papers remains a module alias so legacy monkeypatch/import paths still target the real implementationproceedings implementation has moved to scholaraio.stores.proceedings; scholaraio.proceedings remains a module alias so legacy monkeypatch/import paths still target the real implementationwebtools implementation has moved to scholaraio.providers.webtools; scholaraio.sources.webtools remains a module alias so legacy monkeypatch/import paths still target the real implementationarxiv implementation has moved to scholaraio.providers.arxiv; scholaraio.sources.arxiv remains a module alias so legacy monkeypatch/import paths still target the real implementationendnote implementation has moved to scholaraio.providers.endnote; scholaraio.sources.endnote remains a module alias so legacy monkeypatch/import paths still target the real implementationzotero implementation has moved to scholaraio.providers.zotero; scholaraio.sources.zotero remains a module alias so legacy monkeypatch/import paths still target the real implementationmineru implementation has moved to scholaraio.providers.mineru; scholaraio.ingest.mineru remains a module alias and module-CLI delegator so legacy monkeypatch/import and python -m paths still target the real implementationpdf_fallback implementation has moved to scholaraio.providers.pdf_fallback; scholaraio.ingest.pdf_fallback remains a module alias so legacy monkeypatch/import paths still target the real implementationingest.extractor implementation has moved to scholaraio.services.ingest_metadata.extractor; scholaraio.ingest.extractor remains a module alias so legacy monkeypatch/import paths still target the real implementationparser_matrix_benchmark implementation has moved to scholaraio.services.ingest.parser_matrix_benchmark; scholaraio.ingest.parser_matrix_benchmark remains a module alias so legacy monkeypatch/import paths still target the real implementationingest.proceedings volume implementation has moved to scholaraio.services.ingest.proceedings_volume; scholaraio.ingest.proceedings remains a module alias so legacy monkeypatch/import paths still target the real implementationuspto_odp implementation has moved to scholaraio.providers.uspto_odp; scholaraio.uspto_odp remains a module alias so legacy monkeypatch/import paths still target the real implementationuspto_ppubs implementation has moved to scholaraio.providers.uspto_ppubs; scholaraio.uspto_ppubs remains a module alias so legacy monkeypatch/import paths still target the real implementationexplore implementation has moved to scholaraio.stores.explore; scholaraio.explore remains a module alias so legacy monkeypatch/import paths still target the real implementationworkspace implementation has moved to scholaraio.projects.workspace; scholaraio.workspace remains a module alias so legacy monkeypatch/import paths still target the real implementationtranslate implementation has moved to scholaraio.services.translate; scholaraio.translate remains a module alias so legacy monkeypatch/import paths still target the real implementationinsights implementation has moved to scholaraio.services.insights; scholaraio.insights remains a module alias so legacy monkeypatch/import paths still target the real implementationmetrics implementation has moved to scholaraio.services.metrics; scholaraio.metrics remains a module alias so legacy monkeypatch/import paths still target the real implementationbackup implementation has moved to scholaraio.services.backup; scholaraio.backup remains a module alias so legacy monkeypatch/import paths still target the real implementationcitation_check implementation has moved to scholaraio.services.citation_check; scholaraio.citation_check remains a module alias so legacy monkeypatch/import paths still target the real implementationdiagram implementation has moved to scholaraio.services.diagram; scholaraio.diagram remains a module alias so legacy monkeypatch/import paths still target the real implementationdocument implementation has moved to scholaraio.services.document; scholaraio.document remains a module alias so legacy monkeypatch/import paths still target the real implementationexport implementation has moved to scholaraio.services.export; scholaraio.export remains a module alias so legacy monkeypatch/import paths still target the real implementationindex implementation has moved to scholaraio.services.index; scholaraio.index remains a module alias so legacy monkeypatch/import paths still target the real implementationloader implementation has moved to scholaraio.services.loader; scholaraio.loader remains a module alias so legacy monkeypatch/import paths still target the real implementationtopics implementation has moved to scholaraio.services.topics; scholaraio.topics remains a module alias so legacy monkeypatch/import paths still target the real implementationvectors implementation has moved to scholaraio.services.vectors; scholaraio.vectors remains a module alias so legacy monkeypatch/import paths still target the real implementationmigration_control implementation has moved to scholaraio.services.migration_control; scholaraio.migration_control remains a module alias so legacy monkeypatch/import paths still target the real implementationpatent_fetch implementation has moved to scholaraio.services.patent_fetch; scholaraio.patent_fetch remains a module alias so legacy monkeypatch/import paths still target the real implementationsetup implementation has moved to scholaraio.services.setup; scholaraio.setup remains a module alias so legacy monkeypatch/import paths still target the real implementationtoolref implementation has moved to scholaraio.stores.toolref; scholaraio.toolref remains a package alias, including legacy submodule aliases such as scholaraio.toolref.storageingest.metadata implementation has moved to scholaraio.services.ingest_metadata; scholaraio.ingest.metadata remains a package alias, including legacy submodule aliases such as scholaraio.ingest.metadata._apicli.py and ingest/pipeline.py) are now covered by Phase B3 facade splits; the ingest pipeline compatibility facade now lives at scholaraio.services.ingest.pipeline, while scholaraio.ingest.pipeline remains a module aliascli.py and pipeline.py LastRecommended target:
interfaces/cli/ for command registration and per-domain handlersservices/ingest/ for pipeline orchestration and ingest subflowsPrecondition:
Implementation status (2026-04-20):
insights command handling has moved to scholaraio.interfaces.cli.insights; scholaraio.cli.cmd_insights remains the parser-facing callablemetrics command handling has moved to scholaraio.interfaces.cli.metrics; scholaraio.cli.cmd_metrics remains the parser-facing callabletranslate command handling has moved to scholaraio.interfaces.cli.translate; scholaraio.cli.cmd_translate remains the parser-facing callablebackup command handling has moved to scholaraio.interfaces.cli.backup; scholaraio.cli.cmd_backup remains the parser-facing callablestyle command handling has moved to scholaraio.interfaces.cli.style; scholaraio.cli.cmd_style remains the parser-facing callabledocument command handling has moved to scholaraio.interfaces.cli.document; scholaraio.cli.cmd_document remains the parser-facing callableexport command handling has moved to scholaraio.interfaces.cli.export; scholaraio.cli.cmd_export and existing _cmd_export_* helper aliases remain availablediagram command handling has moved to scholaraio.interfaces.cli.diagram; scholaraio.cli.cmd_diagram and existing diagram helper aliases remain availablesetup command handling has moved to scholaraio.interfaces.cli.setup; scholaraio.cli.cmd_setup remains the parser-facing callableindex command handling has moved to scholaraio.interfaces.cli.index; scholaraio.cli.cmd_index remains the parser-facing callablesearch and search-author command handling has moved to scholaraio.interfaces.cli.search; scholaraio.cli.cmd_search and scholaraio.cli.cmd_search_author remain parser-facing callablestop-cited command handling has moved to scholaraio.interfaces.cli.citations; scholaraio.cli.cmd_top_cited remains the parser-facing callablerename command handling has moved to scholaraio.interfaces.cli.rename; scholaraio.cli.cmd_rename remains the parser-facing callableaudit command handling has moved to scholaraio.interfaces.cli.audit; scholaraio.cli.cmd_audit remains the parser-facing callablerefs, citing, shared-refs) has moved to scholaraio.interfaces.cli.graph; the corresponding scholaraio.cli.cmd_* callables remain parser-facing aliasestoolref command handling has moved to scholaraio.interfaces.cli.toolref; scholaraio.cli.cmd_toolref remains the parser-facing callableshow command handling has moved to scholaraio.interfaces.cli.show; scholaraio.cli.cmd_show remains the parser-facing callablecitation-check command handling has moved to scholaraio.interfaces.cli.citation_check; scholaraio.cli.cmd_citation_check remains the parser-facing callablemigrate command handling has moved to scholaraio.interfaces.cli.migrate; scholaraio.cli.cmd_migrate remains the parser-facing callableproceedings command handling has moved to scholaraio.interfaces.cli.proceedings; scholaraio.cli.cmd_proceedings remains the parser-facing callableimport-endnote command handling has moved to scholaraio.interfaces.cli.import_endnote; scholaraio.cli.cmd_import_endnote remains the parser-facing callableimport-zotero command handling has moved to scholaraio.interfaces.cli.import_zotero; scholaraio.cli.cmd_import_zotero and the existing _import_zotero_collections_as_workspaces helper alias remain availablefsearch command handling has moved to scholaraio.interfaces.cli.fsearch; scholaraio.cli.cmd_fsearch and existing arXiv lookup helper aliases remain availablews command handling has moved to scholaraio.interfaces.cli.workspace; scholaraio.cli.cmd_ws remains the parser-facing callableembed, vsearch, usearch) has moved to scholaraio.interfaces.cli.retrieval; the corresponding scholaraio.cli.cmd_* callables remain parser-facing aliasesrepair command handling has moved to scholaraio.interfaces.cli.repair; scholaraio.cli.cmd_repair remains the parser-facing callablepipeline command handling has moved to scholaraio.interfaces.cli.pipeline; scholaraio.cli.cmd_pipeline remains the parser-facing callablebackfill-abstract command handling has moved to scholaraio.interfaces.cli.backfill_abstract; scholaraio.cli.cmd_backfill_abstract remains the parser-facing callabletopics command handling has moved to scholaraio.interfaces.cli.topics; scholaraio.cli.cmd_topics and the existing _write_all_viz helper alias remain availablerefetch command handling has moved to scholaraio.interfaces.cli.refetch; scholaraio.cli.cmd_refetch remains the parser-facing callableenrich-toc, enrich-l3) has moved to scholaraio.interfaces.cli.enrich; scholaraio.cli.cmd_enrich_* and existing enrichment helper aliases remain availablearxiv search, arxiv fetch) has moved to scholaraio.interfaces.cli.arxiv; the corresponding scholaraio.cli.cmd_arxiv_* callables remain parser-facing aliaseswebsearch, webextract) has moved to scholaraio.interfaces.cli.web; scholaraio.cli.cmd_web* and the existing _terminal_preview helper alias remain availableexplore command handling has moved to scholaraio.interfaces.cli.explore; scholaraio.cli.cmd_explore and the existing _explore_root helper alias remain availableingest-link command handling has moved to scholaraio.interfaces.cli.ingest_link; scholaraio.cli.cmd_ingest_link and existing ingest-link helper aliases remain availablepatent-fetch, patent-search) has moved to scholaraio.interfaces.cli.patent; the corresponding scholaraio.cli.cmd_patent_* callables remain parser-facing aliasesattach-pdf command handling has moved to scholaraio.interfaces.cli.attach_pdf; scholaraio.cli.cmd_attach_pdf and the existing _batch_convert_pdfs helper alias remain availablescholaraio.interfaces.cli.arguments; legacy scholaraio.cli._add_result_limit_arg, _resolve_result_limit, _resolve_top, and _add_filter_args remain aliasesscholaraio.interfaces.cli.output; legacy scholaraio.cli._print_search_result, _print_search_next_steps, _format_match_tag, and _format_citations remain aliasesscholaraio.interfaces.cli.dependencies; legacy scholaraio.cli._INSTALL_HINTS and _check_import_error remain aliases and keep the old logging monkeypatch pointscholaraio.interfaces.cli.paths; legacy scholaraio.cli._resolve_ws_paper_ids, _workspace_root, and _default_inbox_dir remain aliases and keep old UI/helper monkeypatch pointsscholaraio.interfaces.cli.paper; legacy scholaraio.cli._lookup_registry_by_candidates, _resolve_paper, _print_header, and _enrich_show_header remain aliases and keep old UI/logging/helper monkeypatch pointsscholaraio.interfaces.cli.search_metrics; legacy scholaraio.cli._record_search_metrics remains an alias and keeps the old logging monkeypatch pointscholaraio.interfaces.cli.runtime; legacy scholaraio.cli.main remains the script entrypoint alias and keeps old config/UI/parser monkeypatch pointsscholaraio.interfaces.cli.parser; legacy scholaraio.cli._build_parser remains an alias and dynamically binds parser defaults from current scholaraio.cli command/helper aliasesscholaraio.services.ingest.pipeline; current CLI-interface consumers import that target namespace, while scholaraio.ingest.pipeline remains the same module object for legacy imports and monkeypatch pathsscholaraio.services.ingest.paths; legacy private helpers such as scholaraio.ingest.pipeline._inbox_dir remain aliases during the compatibility windowStepResult, StepDef, InboxCtx) have moved to scholaraio.services.ingest.types; scholaraio.ingest.pipeline keeps the same public names as compatibility aliasesscholaraio.services.ingest.assets; legacy private helper names in scholaraio.ingest.pipeline remain aliasesscholaraio.services.ingest.identifiers; legacy scholaraio.ingest.pipeline._collect_existing_* and _normalize_arxiv_id remain aliasesscholaraio.services.ingest.detection; legacy pipeline private helper names remain aliasesscholaraio.services.ingest.documents; legacy pipeline private helper names remain aliasesscholaraio.services.ingest.registry; legacy _ensure_registry_schema, _update_registry, and _registry_migrated remain aliasesscholaraio.services.ingest.cleanup; legacy scholaraio.ingest.pipeline._cleanup_inbox remains an aliasscholaraio.services.ingest.pending; legacy scholaraio.ingest.pipeline._move_to_pending remains an aliasscholaraio.services.ingest.proceedings; proceedings volume preparation has moved to scholaraio.services.ingest.proceedings_volume; legacy scholaraio.ingest.pipeline._ingest_proceedings_ctx and scholaraio.ingest.proceedings remain aliasestoc, l3, translate, refetch, embed, index) have moved to scholaraio.services.ingest.steps; legacy public scholaraio.ingest.pipeline.step_* names remain aliasesscholaraio.services.ingest.batch_assets; legacy _move_batch_images and _flatten_cloud_batch_output remain aliasesscholaraio.services.ingest.batch_postprocess; legacy _postprocess_convert remains an aliasscholaraio.services.ingest.batch_postprocess; legacy _batch_postprocess remains an alias and still honors pipeline-level step/UI monkeypatchesscholaraio.services.ingest.batch_convert; legacy batch_convert_pdfs remains an alias and still honors pipeline-level helper/UI monkeypatchesscholaraio.services.ingest.external_import; legacy import_external remains an alias and still honors pipeline-level helper/step/UI monkeypatchesscholaraio.services.ingest.step_registry; legacy STEPS, PRESETS, _DOC_INBOX_STEPS, and _OFFICE_EXTENSIONS remain aliasesscholaraio.services.ingest.inbox_orchestration; legacy _process_inbox remains an alias and still honors pipeline-level step/helper/UI/logging monkeypatchesscholaraio.services.ingest.pipeline_runner; legacy run_pipeline remains an alias and still honors pipeline-level step/helper/UI/logging monkeypatchesscholaraio.services.ingest.inbox_steps; legacy step_office_convert remains an alias and still honors pipeline-level logging monkeypatchesscholaraio.services.ingest.inbox_steps; legacy step_mineru remains an alias and still honors pipeline-level UI/logging monkeypatchesscholaraio.services.ingest.inbox_steps; legacy step_extract_doc and step_extract remain aliases and still honor pipeline-level helper/UI/logging monkeypatchesscholaraio.services.ingest.inbox_steps; legacy step_dedup remains an alias and still honors pipeline-level helper/UI/logging monkeypatchesscholaraio.services.ingest.inbox_steps; legacy step_ingest remains an alias and still honors pipeline-level helper/UI/logging monkeypatchesThe following implementation details should remain deferred until the earlier phases above are complete:
workspace.yaml field schema beyond the minimal additive metadata/mount envelope aboveexplore mounts after the shared-store compatibility windowThe safe execution order is:
Config the complete path authoritypipeline.pyproceedings as a durable librarypapers lastAnything that starts by directly renaming data/, workspace/, or .claude/skills/ is not aligned with the audited current codebase.