Blueprint · 2026

S2M2 Curriculum Learning Training Architecture

This document describes the Curriculum Learning plan for training S2M2 from scratch (Stage 1 basic stereo / Stage 2 polarized transparent objects) and its rationale. It focuses on the architecture and data strategy of curriculum learning and does not cover experimental results or performance numbers.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

When a stereo matching model has to learn to use polarization (an auxiliary signal) to handle transparent objects, the training strategy itself determines whether the polarization signal is actually used by the model. The Curriculum Learning plan is designed precisely to ensure that the polarization signal becomes an indispensable cue.

1.1 Problem: Domain Adaptation kills the value of the auxiliary signal

If the training path is “pretrained weights + domain adaptation”:

  • After domain adaptation, the base model has learned to “handle transparent objects on this dataset”, and the error over transparent regions drops substantially.
  • As a result the base model is already good enough; it does not need polarization to help.
  • pol_diff tells the model “this is a transparent object”, but the base model already knows how to handle transparent objects.
  • Polarization becomes a redundant signal.

Flow showing how Domain Adaptation kills polarization value

Core insight: “can be used” is not the same as “needs to be used”. The pol_diff signal exists, but if the base model can already handle transparent objects, this signal will not be learned and used. To force the model to rely on polarization, it must first learn the basics without ever seeing transparent objects, and then transparent objects must be introduced as difficult samples — this is exactly the design goal of Curriculum Learning.


2. Architecture: Two Stages of Curriculum Learning

Two-stage Curriculum Learning architecture


3. Components and Stage Design

3.1 Stage 1 — Basic Stereo Learning

ItemSetting
DataGeneric stereo dataset / scenes without transparent objects
GoalLearn basic stereo matching
Model stateThe model does not know what a “transparent object” is

Stage 1 trains from scratch (not pretrained) to build only generic stereo matching ability, deliberately avoiding any contact with transparent-object samples.

3.2 Stage 2 — Polarization Glass Learning

ItemSetting
DataPolarized transparent-object scenes
GoalLearn to use polarization to handle transparent objects
KeyTransparent objects are brand-new difficult samples; polarization is the only new cue

Since Stage 1 has never seen transparent objects, transparent objects are a brand-new difficult sample in Stage 2; the model is forced to rely on polarization to handle them rather than memorizing visual features.

3.3 Training-Order Logic of Curriculum Learning

Curriculum Learning training-order logic

Training from scratch (rather than pretrained + domain adaptation) is intended to prevent the base model from learning to handle transparent objects on its own before polarization is introduced, ensuring that polarization is indispensable in Stage 2.


4. Data Efficiency Ablation

Stage 2 is tested with different data volumes to answer “how much polarization data is worth collecting”:

SettingStage 2 scene count
Config 11000 scenes
Config 23000 scenes
Config 36000 scenes

The purpose of this ablation is to quantify the return on investment of polarization-data collection.


5. Tensor / Data Dimensions

StageInput dataInput format
Stage 1Generic stereo datasetGeneric stereo pair (no transparent objects, no polarization)
Stage 2Polarized transparent-object scenesPolarized stereo pair (I∥, I⊥)

6. Polarization Injection Points

Curriculum Learning is a “training-data curriculum” design and does not itself add Pol injection points:

  • Stage 1: input contains no polarization signal; the model only learns basic stereo.
  • Stage 2: the polarization signal enters the existing Pol injection point (6-channel input / refinement injection, etc.) together with the data; it then becomes the only new cue for transparent-object scenes, forcing the model to learn to use it.

7. Design Decisions and Rationale

DecisionRationale
Adopt Curriculum Learningpretrained + domain adaptation makes polarization redundant; switch to curriculum learning
Stage 1 from scratch, no transparent objectsEnsures transparent objects are a brand-new difficult sample in Stage 2 and polarization is the only new cue
Use a generic stereo dataset for Stage 1Build basic stereo matching ability
Introduce polarized transparent objects in Stage 2Transparent objects are difficult samples that force the model to rely on polarization rather than memorize visual features
Data Efficiency Ablation (1k / 3k / 6k)Quantify the return on investment of polarization-data collection

8. Highlights

  • Forces the model to rely on polarization through curriculum design: first learn basic stereo on data without transparent objects, then introduce transparent objects so that polarization becomes the only new cue for the difficult samples.
  • Avoids the auxiliary-signal redundancy trap: by not using pretrained + domain adaptation, the base model is prevented from learning to handle transparent objects on its own before polarization is introduced.
  • “Can be used” is not “needs to be used”: explicitly distinguishes “the signal exists” from “the signal is learned and used”, and uses training order to ensure the latter.
  • Quantifies data return on investment: with three polarization-data volumes (1k / 3k / 6k) as an ablation, “how much polarization data is needed” becomes a measurable decision-making basis.

← All blueprints