1. Design Goals
When a stereo matching model has to learn to use polarization (an auxiliary signal) to handle transparent objects, the training strategy itself determines whether the polarization signal is actually used by the model. The Curriculum Learning plan is designed precisely to ensure that the polarization signal becomes an indispensable cue.
1.1 Problem: Domain Adaptation kills the value of the auxiliary signal
If the training path is “pretrained weights + domain adaptation”:
- After domain adaptation, the base model has learned to “handle transparent objects on this dataset”, and the error over transparent regions drops substantially.
- As a result the base model is already good enough; it does not need polarization to help.
- pol_diff tells the model “this is a transparent object”, but the base model already knows how to handle transparent objects.
- Polarization becomes a redundant signal.
Core insight: “can be used” is not the same as “needs to be used”. The pol_diff signal exists, but if the base model can already handle transparent objects, this signal will not be learned and used. To force the model to rely on polarization, it must first learn the basics without ever seeing transparent objects, and then transparent objects must be introduced as difficult samples — this is exactly the design goal of Curriculum Learning.
2. Architecture: Two Stages of Curriculum Learning
3. Components and Stage Design
3.1 Stage 1 — Basic Stereo Learning
| Item | Setting |
|---|---|
| Data | Generic stereo dataset / scenes without transparent objects |
| Goal | Learn basic stereo matching |
| Model state | The model does not know what a “transparent object” is |
Stage 1 trains from scratch (not pretrained) to build only generic stereo matching ability, deliberately avoiding any contact with transparent-object samples.
3.2 Stage 2 — Polarization Glass Learning
| Item | Setting |
|---|---|
| Data | Polarized transparent-object scenes |
| Goal | Learn to use polarization to handle transparent objects |
| Key | Transparent objects are brand-new difficult samples; polarization is the only new cue |
Since Stage 1 has never seen transparent objects, transparent objects are a brand-new difficult sample in Stage 2; the model is forced to rely on polarization to handle them rather than memorizing visual features.
3.3 Training-Order Logic of Curriculum Learning
Training from scratch (rather than pretrained + domain adaptation) is intended to prevent the base model from learning to handle transparent objects on its own before polarization is introduced, ensuring that polarization is indispensable in Stage 2.
4. Data Efficiency Ablation
Stage 2 is tested with different data volumes to answer “how much polarization data is worth collecting”:
| Setting | Stage 2 scene count |
|---|---|
| Config 1 | 1000 scenes |
| Config 2 | 3000 scenes |
| Config 3 | 6000 scenes |
The purpose of this ablation is to quantify the return on investment of polarization-data collection.
5. Tensor / Data Dimensions
| Stage | Input data | Input format |
|---|---|---|
| Stage 1 | Generic stereo dataset | Generic stereo pair (no transparent objects, no polarization) |
| Stage 2 | Polarized transparent-object scenes | Polarized stereo pair (I∥, I⊥) |
6. Polarization Injection Points
Curriculum Learning is a “training-data curriculum” design and does not itself add Pol injection points:
- Stage 1: input contains no polarization signal; the model only learns basic stereo.
- Stage 2: the polarization signal enters the existing Pol injection point (6-channel input / refinement injection, etc.) together with the data; it then becomes the only new cue for transparent-object scenes, forcing the model to learn to use it.
7. Design Decisions and Rationale
| Decision | Rationale |
|---|---|
| Adopt Curriculum Learning | pretrained + domain adaptation makes polarization redundant; switch to curriculum learning |
| Stage 1 from scratch, no transparent objects | Ensures transparent objects are a brand-new difficult sample in Stage 2 and polarization is the only new cue |
| Use a generic stereo dataset for Stage 1 | Build basic stereo matching ability |
| Introduce polarized transparent objects in Stage 2 | Transparent objects are difficult samples that force the model to rely on polarization rather than memorize visual features |
| Data Efficiency Ablation (1k / 3k / 6k) | Quantify the return on investment of polarization-data collection |
8. Highlights
- Forces the model to rely on polarization through curriculum design: first learn basic stereo on data without transparent objects, then introduce transparent objects so that polarization becomes the only new cue for the difficult samples.
- Avoids the auxiliary-signal redundancy trap: by not using pretrained + domain adaptation, the base model is prevented from learning to handle transparent objects on its own before polarization is introduced.
- “Can be used” is not “needs to be used”: explicitly distinguishes “the signal exists” from “the signal is learned and used”, and uses training order to ensure the latter.
- Quantifies data return on investment: with three polarization-data volumes (1k / 3k / 6k) as an ablation, “how much polarization data is needed” becomes a measurable decision-making basis.