scholaraio

ScholarAIO Migration Mechanism Specification

Status: Compatibility-window mechanism specification

Last Updated: 2026-04-24

Scope: implementation-facing migration control-plane specification for future runtime-layout upgrades.

2026-04-24 breaking cleanup note:

2026-04-23 implementation note:

1. Purpose

This document defines the minimum migration mechanisms ScholarAIO SHOULD implement before performing a real runtime-layout move for existing users.

This is not the directory-vision document and not the user-facing migration story document. Its job is narrower:

This document exists so that future implementation work does not jump directly from “new directory idea” to “move user files”.

2. Relationship to Other Migration Docs

This document is a companion to the existing migration documents:

This document fills the missing middle layer:

3. Agreed Product Decisions

The following product decisions are treated as settled assumptions for the first migration-capable implementation.

3.1 Offline Migration Only

The first real runtime-layout migration MUST be an offline migration.

Meaning:

Rationale:

3.2 One-Way Compatibility

Compatibility for the first migration generation MUST be one-way only.

Meaning:

Rationale:

3.3 No Large Automatic Move on Startup

Large runtime-layout migration MUST remain an explicit user operation.

Meaning:

Small additive metadata upgrades MAY happen automatically if they are local, reversible, and do not relocate user content.

3.4 Multi-Root Merge Is Out of Scope for V1

The first migration generation MAY assume a single canonical runtime root chosen by the user or by the existing config-resolution path.

Meaning:

This is an intentional simplification, not a claim that multiple roots never exist.

4. Control Metadata Root

ScholarAIO SHOULD reserve a hidden control directory at runtime-instance root:

instance-root/
├── config.yaml
├── config.local.yaml
├── data/
├── workspace/
└── .scholaraio-control/

Recommended contents:

.scholaraio-control/
├── instance.json
├── migration.lock
└── migrations/
    └── <migration-id>/
        ├── plan.json
        ├── steps.jsonl
        ├── verify.json
        ├── rollback.json
        └── summary.md

4.1 Why a Dedicated Hidden Control Directory

The control metadata SHOULD NOT be mixed into data/ or workspace/ because those trees are themselves migration targets.

It SHOULD NOT use a generic filename such as layout.json at instance root because layout.json already has an established meaning inside paper directories for MinerU-derived layout artifacts.

The control metadata SHOULD live at a stable root-level location that works in both:

5. Instance Metadata: instance.json

instance.json is the minimum durable record that tells ScholarAIO what kind of runtime root it is opening.

5.1 Required Responsibilities

instance.json SHOULD answer four questions:

The exact schema MAY evolve, but the first version SHOULD include fields equivalent to:

Recommended meanings:

5.3 Startup Behavior

The startup rules SHOULD be:

5.4 File-Naming Constraint

The root-level layout marker MUST NOT be a plain layout.json.

Reason:

6. Migration Lock: migration.lock

migration.lock is the mechanism that turns “offline migration only” into an enforceable runtime rule.

6.1 Required Responsibilities

The lock MUST:

The lock file SHOULD record:

mode MAY distinguish states such as plan, run, or rollback if needed later. For V1, run is sufficient.

6.3 Command Gating Rule

While migration.lock exists, normal ScholarAIO commands SHOULD fail fast unless they belong to the migration/recovery surface.

For V1, the simplest rule is:

This is intentionally conservative. The goal is not to maximize availability during migration. The goal is to minimize accidental mixed-state access.

6.4 Stale Lock Handling

The first implementation SHOULD treat stale locks cautiously.

Minimum behavior:

The system MUST NOT silently delete a lock just because a timestamp looks old.

7. Migration Journal

Each migration run SHOULD create a dedicated journal directory under:

.scholaraio-control/migrations/<migration-id>/

The journal is the durable record of what the migration attempted and what actually happened.

7.1 Required Responsibilities

The journal MUST make it possible to answer:

plan.json

The frozen migration plan for this run.

It SHOULD record at least:

steps.jsonl

An append-only execution log.

Each entry SHOULD include:

JSONL is preferred because it is easy to append safely and easy to inspect incrementally.

verify.json

The structured verification result.

It SHOULD record:

rollback.json

The rollback recipe or rollback-relevant state.

This file does not require a perfect reverse operation graph in V1. It MUST at least preserve enough information to support deterministic recovery decisions.

summary.md

A human-readable report.

It SHOULD explain in plain language:

7.3 Journal Lifetime

The migration journal MUST survive until cleanup completes.

The journal MUST NOT be deleted immediately after a successful run.

8. Verification Contract

Verification is the final safety gate between “data moved” and “migration accepted”.

8.1 Minimum Verification Scope

The first implementation SHOULD verify at least:

8.2 Verification Philosophy

Verification SHOULD focus on user-visible system health, not just filesystem existence.

That means the verification target is not:

It is:

For compatible store-by-store migration runs, verification MAY additionally distinguish:

8.3 Success Rule

A migration MUST NOT be marked fully accepted until verification succeeds.

Implication:

Compatibility-window nuance:

9. Product Flow Requirements

9.1 Startup Flow

The desired startup behavior is:

  1. detect legacy-compatible layout
  2. continue working normally
  3. inform the user that migration is available
  4. do not perform the large move automatically

This matches the compatibility-first strategy already defined in the migration strategy document.

9.2 migrate plan

migrate plan SHOULD:

migrate plan MUST NOT mutate user data.

9.3 migrate run

migrate run SHOULD:

If migration fails mid-run, the runtime root MUST remain recoverable and MUST NOT be silently treated as fully migrated.

9.4 migrate verify

migrate verify SHOULD:

verify.json.status SHOULD support at least:

This command is especially useful when the user wants extra confidence before cleanup.

9.5 migrate cleanup

migrate cleanup SHOULD:

Cleanup MUST be a separate step from migration execution.

9.6 Rollback Surface

Whether rollback is fully automatic or partly operator-driven MAY vary in V1.

However, the implementation MUST support one of the following:

The system MUST NOT leave operators with only an unstructured partial filesystem and no durable record of what happened.

10. Safety Constraints for Planning and Inventory

10.1 Planning Must Be Non-Executing

migrate plan and inventory logic MUST NOT execute arbitrary user-provided code as part of discovery.

Implication:

10.2 Unknown Files Are Reported, Not Dropped

If planning encounters files that do not match a known schema, it SHOULD:

It MUST NOT:

10.3 Current Backup Command Is Not Sufficient as the Only Safety Net

Migration protection MUST NOT rely only on the current backup feature, because current backup defaults still focus on data/ rather than the full runtime root.

Migration needs its own lock, journal, and verification path even if backup integration is added later.

11. Non-Goals for the First Implementation

The first implementation does not need to solve all future migration concerns.

It MAY explicitly defer:

The V1 goal is narrower:

12. Minimum Deliverables Before Real Physical Migration

Before ScholarAIO performs a real runtime-layout move for existing users, the codebase SHOULD have all of the following:

Without these pieces, ScholarAIO would still be doing a refactor, not a user-safe upgrade.