S2M2 Curriculum Learning Training Architecture

1. Design Goals

When a stereo matching model has to learn to use polarization (an auxiliary signal) to handle transparent objects, the training strategy itself determines whether the polarization signal is actually used by the model. The Curriculum Learning plan is designed precisely to ensure that the polarization signal becomes an indispensable cue.

1.1 Problem: Domain Adaptation kills the value of the auxiliary signal

If the training path is “pretrained weights + domain adaptation”:

After domain adaptation, the base model has learned to “handle transparent objects on this dataset”, and the error over transparent regions drops substantially.
As a result the base model is already good enough; it does not need polarization to help.
pol_diff tells the model “this is a transparent object”, but the base model already knows how to handle transparent objects.
Polarization becomes a redundant signal.

Flow showing how Domain Adaptation kills polarization value

Core insight: “can be used” is not the same as “needs to be used”. The pol_diff signal exists, but if the base model can already handle transparent objects, this signal will not be learned and used. To force the model to rely on polarization, it must first learn the basics without ever seeing transparent objects, and then transparent objects must be introduced as difficult samples — this is exactly the design goal of Curriculum Learning.

2. Architecture: Two Stages of Curriculum Learning

Two-stage Curriculum Learning architecture

3. Components and Stage Design

3.1 Stage 1 — Basic Stereo Learning

Item	Setting
Data	Generic stereo dataset / scenes without transparent objects
Goal	Learn basic stereo matching
Model state	The model does not know what a “transparent object” is

Stage 1 trains from scratch (not pretrained) to build only generic stereo matching ability, deliberately avoiding any contact with transparent-object samples.

3.2 Stage 2 — Polarization Glass Learning

Item	Setting
Data	Polarized transparent-object scenes
Goal	Learn to use polarization to handle transparent objects
Key	Transparent objects are brand-new difficult samples; polarization is the only new cue

Since Stage 1 has never seen transparent objects, transparent objects are a brand-new difficult sample in Stage 2; the model is forced to rely on polarization to handle them rather than memorizing visual features.

3.3 Training-Order Logic of Curriculum Learning

Curriculum Learning training-order logic

Training from scratch (rather than pretrained + domain adaptation) is intended to prevent the base model from learning to handle transparent objects on its own before polarization is introduced, ensuring that polarization is indispensable in Stage 2.

4. Data Efficiency Ablation

Stage 2 is tested with different data volumes to answer “how much polarization data is worth collecting”:

Setting	Stage 2 scene count
Config 1	1000 scenes
Config 2	3000 scenes
Config 3	6000 scenes

The purpose of this ablation is to quantify the return on investment of polarization-data collection.

5. Tensor / Data Dimensions

Stage	Input data	Input format
Stage 1	Generic stereo dataset	Generic stereo pair (no transparent objects, no polarization)
Stage 2	Polarized transparent-object scenes	Polarized stereo pair (I∥, I⊥)

6. Polarization Injection Points

Curriculum Learning is a “training-data curriculum” design and does not itself add Pol injection points:

Stage 1: input contains no polarization signal; the model only learns basic stereo.
Stage 2: the polarization signal enters the existing Pol injection point (6-channel input / refinement injection, etc.) together with the data; it then becomes the only new cue for transparent-object scenes, forcing the model to learn to use it.

7. Design Decisions and Rationale

Decision	Rationale
Adopt Curriculum Learning	pretrained + domain adaptation makes polarization redundant; switch to curriculum learning
Stage 1 from scratch, no transparent objects	Ensures transparent objects are a brand-new difficult sample in Stage 2 and polarization is the only new cue
Use a generic stereo dataset for Stage 1	Build basic stereo matching ability
Introduce polarized transparent objects in Stage 2	Transparent objects are difficult samples that force the model to rely on polarization rather than memorize visual features
Data Efficiency Ablation (1k / 3k / 6k)	Quantify the return on investment of polarization-data collection

8. Highlights

Forces the model to rely on polarization through curriculum design: first learn basic stereo on data without transparent objects, then introduce transparent objects so that polarization becomes the only new cue for the difficult samples.
Avoids the auxiliary-signal redundancy trap: by not using pretrained + domain adaptation, the base model is prevented from learning to handle transparent objects on its own before polarization is introduced.
“Can be used” is not “needs to be used”: explicitly distinguishes “the signal exists” from “the signal is learned and used”, and uses training order to ensure the latter.
Quantifies data return on investment: with three polarization-data volumes (1k / 3k / 6k) as an ablation, “how much polarization data is needed” becomes a measurable decision-making basis.