scholaraio

ScholarAIO Directory Structure Specification

Status: Current layout specification

Last Updated: 2026-04-24

Scope: repository layout, runtime instance layout, agent-surface placement, and migration constraints for future refactors.

2026-04-24 status note:

1. Purpose

This document defines the target directory structure for ScholarAIO and the compatibility constraints that MUST be preserved while migrating from the current layout.

This is a refactoring specification, not a release note. It exists to:

2. Normative Language

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as requirement levels for future refactors.

3. Design Principles

3.1 Source vs Runtime

ScholarAIO MUST distinguish between:

The repository root and the runtime-instance root MAY be the same directory in local-clone mode. In plugin mode, the runtime-instance root MAY instead be ~/.scholaraio/.

3.2 Lifecycle Separation

Directories MUST be partitioned by lifecycle and ownership, not only by feature name. At minimum, the design MUST distinguish:

3.3 Stable Agent Entry Points

Agent host discovery relies on fixed file locations. Therefore:

4. Repository Root Specification

The repository root is the top-level project tree used by contributors and agent hosts.

4.1 Required Root-Level Integration Surface

The following files or directories MUST remain at repository root:

Rationale:

4.2 Canonical Skill Placement

The canonical skill source MUST be:

The following compatibility entry points MUST continue to resolve to the same skill set:

These MAY remain symlinks or MAY become equivalent wrapper directories, but they MUST continue to expose the same skill inventory.

scholaraio/ MUST NOT become the canonical physical home of SKILL.md files.

4.3 Target Repository Layout

The target repository layout is:

repo-root/
├── AGENTS.md
├── CLAUDE.md
├── AGENTS_CN.md
├── .claude/skills/
├── .agents/skills -> ../.claude/skills
├── .qwen/QWEN.md
├── .qwen/skills -> ../.claude/skills
├── .cursor/rules/
├── .clinerules
├── .windsurfrules
├── .github/copilot-instructions.md
├── .claude-plugin/
├── clawhub.yaml
├── scholaraio/
├── gui/
├── docs/
├── tests/
└── scripts/

4.4 scholaraio/ Package Layout

The Python package SHOULD evolve toward the following second-level structure:

scholaraio/
├── core/
├── providers/
├── stores/
├── projects/
├── services/
├── interfaces/
└── compat/

The intended responsibilities are:

4.5 gui/

gui/ MUST be reserved as a top-level source directory for the future presentation shell.

gui/:

5. Runtime Instance Root Specification

The runtime-instance root is the directory relative to which ScholarAIO resolves config and user data.

5.1 Current Compatibility Constraint

Until config discovery is redesigned, the following files MUST remain valid at the runtime-instance root:

Rationale:

5.2 Current Compatibility Top Level

For compatibility with the current codebase, the runtime-instance root MUST continue to support:

This applies both in repository-local mode and in plugin mode.

In addition, future migration-capable versions SHOULD reserve a root-level control directory:

5.3 Target Runtime Layout

Within those top-level compatibility anchors, the target runtime layout is:

instance-root/
├── config.yaml
├── config.local.yaml
├── data/
│   ├── libraries/
│   ├── spool/
│   ├── state/
│   ├── cache/
│   └── runtime/
├── .scholaraio-control/
└── workspace/

The purpose of each subtree is defined below.

5.4 Root-Level Control Metadata

.scholaraio-control/ is the reserved root-level control directory for migration and instance metadata.

It SHOULD contain control-plane artifacts such as:

It MUST NOT be treated as part of:

Rationale:

The detailed contract for this directory is defined in:

6. data/ Subtree Specification

6.1 data/libraries/

data/libraries/ contains durable, user-meaningful knowledge stores.

Target second-level layout:

data/libraries/
├── papers/
├── proceedings/
├── explore/
├── toolref/
└── citation_styles/

Requirements:

6.2 data/spool/

data/spool/ contains queued work items awaiting later processing or manual review.

Target second-level layout:

data/spool/
├── inbox/
├── inbox-thesis/
├── inbox-patent/
├── inbox-doc/
├── inbox-proceedings/
└── pending/

Requirements:

6.3 data/state/

data/state/ contains persistent internal state that is important to the application but is not itself a user-facing library.

Target second-level layout:

data/state/
├── search/
├── metrics/
├── topics/
└── sessions/

Examples:

Requirements:

6.4 data/cache/

data/cache/ contains rebuildable derived data.

Target second-level layout:

data/cache/
├── parser/
├── previews/
├── vectors/
└── topics/

Requirements:

6.5 data/runtime/

data/runtime/ contains temporary runtime artifacts.

Target second-level layout:

data/runtime/
├── tmp/
├── locks/
└── sockets/

Requirements:

7. workspace/ Subtree Specification

7.1 Workspace as Independent Project Boundary

workspace/ MUST be treated as a first-class project root, not merely as a paper-subset helper.

Each workspace MAY contain:

Therefore, workspace/ MUST NOT be modeled only as a view over data/libraries/papers/.

7.2 Target Workspace Layout

The target layout for a user workspace is:

workspace/<name>/
├── workspace.yaml
├── refs/
│   ├── papers.json
│   ├── explore.json
│   └── toolref.json
├── notes/
├── drafts/
├── outputs/
├── runs/
└── .git/

This reference shape is not a rigid scaffold. Named workspaces remain user-owned project roots and MAY contain additional files or subdirectories beyond the examples above.

Requirements:

7.2.1 Minimal workspace.yaml Envelope

For the next design pass, the minimal additive workspace.yaml envelope SHOULD be:

schema_version: 1
name: turbulence-review
description: Drafting workspace for a turbulence review article
tags:
  - review
  - turbulence
mounts:
  explore: []
  toolref: []
outputs:
  default_dir: outputs/

Interpretation rules:

7.2.2 Manifest Validation and Normalization Rules

The minimal workspace.yaml envelope above SHOULD follow these validation and normalization rules:

7.3 Reserved Workspace Namespace

System-generated workspaces or workspace-like output trees SHOULD use a reserved namespace under workspace/.

Recommended form:

workspace/_system/

Examples:

Legacy compatibility outputs such as workspace/translation-ws/, workspace/figures/, or root-level files like workspace/output.docx MAY remain temporarily, but system-owned or cross-workspace outputs SHOULD converge under workspace/_system/. Only outputs that are explicitly scoped to one named workspace SHOULD prefer workspace/<name>/outputs/.

8. Decoupling Rules

8.1 Between Top-Level Runtime Trees

8.2 Inside the Python Package

8.3 Skills and Agent Surfaces

9. Multi-Agent Discovery and Registration Constraints

The following constraints are mandatory:

9.1 Canonical Skill Source

9.2 Host-Specific Discovery Paths

The following discovery surfaces MUST continue to work:

9.3 Migration Rule

Any refactor that changes the physical location or wrapper path of skills MUST update:

No directory-structure migration is complete until those discovery surfaces still work.

10. Compatibility Mapping for Refactor Planning

The current codebase still uses legacy paths. During migration, the following logical mapping SHOULD be adopted:

Current path Target logical location
data/papers/ data/libraries/papers/
data/proceedings/ data/libraries/proceedings/
data/explore/ data/libraries/explore/
data/toolref/ data/libraries/toolref/
data/citation_styles/ data/libraries/citation_styles/
data/inbox* data/spool/*
data/pending/ data/spool/pending/
data/index.db data/state/search/index.db
data/metrics.db data/state/metrics/metrics.db
data/topic_model/ data/state/topics/ or data/cache/topics/, depending on rebuild policy
workspace/translation-ws/ workspace/_system/translation-bundles/
workspace/figures/ workspace/_system/figures/
workspace/output.* workspace/_system/output/

This mapping is a migration target, not a requirement for an all-at-once rename.

11. Migration Constraints

The migration MUST be incremental.

Before any large directory move, ScholarAIO SHOULD first:

  1. centralize all runtime directory access through config accessors
  2. stop constructing sibling runtime paths with raw cfg._root / "data" / ... expressions in feature modules
  3. introduce compatibility shims or alias paths where needed
  4. update tests, agent wrappers, and host setup docs in the same change set

The current codebase is not yet ready for an atomic layout flip. Therefore:

12. Non-Goals

This specification does not define:

Those belong in companion architecture and execution documents.

13. Immediate Governance Outcome

Until superseded by a later approved version, future refactors SHOULD treat this document as the governing directory-structure target for: