Blueprint · 2026

S2M2 Three-Stage Training Architecture

This document describes the design and rationale of the three-stage training plan (Stage A / B / C) for the S2M2 Polarization-Aware model. It focuses on the architectural design of the training pipeline and does not cover experimental results or performance numbers.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

When a stereo matching model is composed of “base model + auxiliary correction module”, joint training from the start runs into a fundamental training-order problem. The three-stage training plan is designed to solve precisely this problem.

1.1 Problem: the correction task is impossible

If the base model has never seen the target domain (e.g., the pretrained weights have never seen synthetic transparent-object data), training the correction module under such conditions produces:

Flow showing why the correction task is impossible

Root cause: the upper bound of the correction is far smaller than the base model’s error, so even a perfect correction can only fix a small portion. The network discovers that “always outputting the upper bound” is a local optimum; the saturated region of tanh has near-zero gradient and cannot learn spatial structure.

Core constraint: training the correction module while the base model has no understanding of the target domain makes the task itself impossible. The base model’s error must first be pushed below the correction’s upper bound before the correction task becomes feasible.


2. Architecture: Three-Stage Training Plan

Three-stage training plan Stage A/B/C flow


3. Three-Stage Components and Design

3.1 Stage A — Baseline Domain Adaptation

ItemSetting
GoalLet the base model first adapt to the target data domain
InputPretrained S2M2 weights
Training scopeTrain only the base model; no polarization module
Expected effectError over transparent regions drops substantially below the correction’s upper bound
Key roleEstablishes a reasonable foundation so the subsequent correction task becomes feasible

3.2 Stage B — Polarization Only Learning

ItemSetting
GoalLet the correction module learn the pol_diff -> disparity correction mapping
InputStage A checkpoint
Training scopeFreeze the base model; train only the correction module
Warp modeGT warp (ensures pol_diff quality)
Key rolebase_err is already below max_correction, so the task is feasible

Another reason to freeze the base model is to avoid a shortcut: if the base were trainable, it would take the shortcut of learning disparity directly, and the correction would not be forced to learn.

3.3 Stage C — Finetune Integration

ItemSetting
GoalJoint fine-tuning so the base and correction work together
InputStage B checkpoint
Training scopeEverything unfrozen; joint fine-tune
Warp modeSwitch to pred warp (simulates real inference)
Key roleBoth modules are already pretrained, so joint tuning does not shortcut

4. Why Three Stages Solve the Problem

ProblemHow the three-stage plan handles it
Base error too large; correction cannot move itStage A first pushes the base error below the correction’s upper bound
Correction does not learn (task impossible)Stage B trains the correction under conditions where the task is feasible
Shortcut problem (base steals credit)Stage B freezes the base, forcing the correction to learn from pol_diff
Joint training collapseBy Stage C both sides are pretrained, so joint tuning is stable

The key feasibility constraint: before training, confirm max_correction >= base_error; otherwise the network cannot solve the task and will converge to a constant output. The role of Stage A is exactly to push base_error below max_correction.


5. Tensor / Training State Dimensions

StageTrainable modulesFrozen modulesSource of warp disparity
Stage Abase model (S2M2)not applicable (no polarization module)
Stage Bcorrection modulebase modelGT disparity
Stage Cbase model + correction modulepredicted disparity

6. Polarization Injection Points

The Pol injection point in the three-stage training enters the correction module via warp-based pol_diff. The three stages are a “training pipeline design”; they do not change the injection point itself, but progressively make the modules around the injection point reach a trainable state:

  • Stage A: the correction module that hosts the injection point is not yet enabled.
  • Stage B: the injection point is enabled; pol_diff is computed with GT warp; only the correction module is trained.
  • Stage C: the injection point is enabled; pol_diff is computed with pred warp; the full model is jointly trained.

7. Design Decisions and Rationale

DecisionRationale
Add Stage A (Domain Adaptation)Pretrained weights have not seen the target domain; base_error must first be pushed below max_correction
Freeze base model in Stage BAvoids the base taking a shortcut and stealing credit; forces correction to learn from pol_diff
GT warp in Stage BEnsures pol_diff quality so the correction learns semantics under perfect alignment
Unfreeze everything in Stage C joint fine-tuneBoth modules are pretrained, so joint fine-tuning does not shortcut
Switch to pred warp in Stage CSimulates the real inference setting
Three-stage as a general patternDomain Adaptation -> Module-specific Learning -> Joint Fine-tune applies to any “base + auxiliary module” architecture

8. Highlights

  • Breaks an impossible task through training order: first lower the base error, then train the correction, avoiding correction being forced to converge to a constant output when the base has no understanding of the domain.
  • Explicit feasibility constraint: before training, use max_correction >= base_error as the task-feasibility criterion, and use tanh saturation (raw) as a danger signal.
  • Freezing the base prevents shortcut: Stage B freezes the base, forcing the correction to truly learn from pol_diff rather than letting the base shortcut and steal credit.
  • Pretrain each side before joint fine-tuning: by Stage C both modules are in place, avoiding the situation where one side has not yet learned and is overwhelmed by the other.
  • Transferable general training pattern: Domain Adaptation -> Module-specific Learning -> Joint Fine-tune applies to any two-stage “base + auxiliary module” architecture.

← All blueprints