1. Design Goals
1.1 Training/inference distribution mismatch
Dual-stream polarization stereo matching requires warping the right image into the left view when computing polarization features:
- Training: use GT disparity for warping → perfect alignment; the model learns “clean” polarization features.
- Inference: no GT disparity is available, only the predicted disparity → predictions contain errors, and
pol_diffis contaminated with geometric noise.
If the model has only ever seen “cleanly aligned” pol_diff during training, it has never been exposed to the “noisy pol_diff” that actually appears at inference, and there is a gap between the two input distributions.
1.2 Approach
Make training-time pol_diff also carry “noise similar to inference error,” so the model sees noisy polarization features during training—i.e., inject Gaussian noise into the GT disparity. Use curriculum learning to transition from clean samples to noisy samples gradually, avoiding disrupting training from the start.
2. Architecture
3. Components and Modules
3.1 Model side: noise-injection logic
StereoDualStream adds a disparity_noise_std parameter and accepts noise_ratio in forward:
class StereoDualStream(nn.Module):
def __init__(self, ..., disparity_noise_std: float = 2.0):
self.disparity_noise_std = disparity_noise_std
def forward(self, left, right, ..., noise_ratio: float = 0.0):
# Training-Inference Consistency
disparity_for_pol = disparity_gt
if self.training and disparity_gt is not None and noise_ratio > 0:
if torch.rand(1).item() < noise_ratio:
# Add Gaussian noise to simulate inference error
noise = torch.randn_like(disparity_gt) * self.disparity_noise_std
disparity_for_pol = disparity_gt + noise
pol_feat = self.pol_encoder(left, right, disparity_for_pol)
Behavior:
- Noise is only considered when
self.trainingis true,disparity_gtis available, andnoise_ratio > 0. - Each sample is noised with probability
noise_ratio(torch.rand(1) < noise_ratio). - Noise form:
disparity_gt + N(0, disparity_noise_std²), with noise generated bytorch.randn_like. - The noised
disparity_for_polis passed topol_encoderfor warp alignment.
3.2 Training-side parameters
Curriculum-learning parameters:
parser.add_argument('--disparity_noise_std', type=float, default=2.0,
help='Disparity noise std (pixels)')
parser.add_argument('--noise_warmup_steps', type=int, default=20000,
help='Steps to warmup noise_ratio from 0 to max')
parser.add_argument('--max_noise_ratio', type=float, default=0.5,
help='Maximum noise ratio (0.5 = 50% noisy samples)')
3.3 Curriculum-learning function
noise_ratio grows linearly with the training step:
def _get_noise_ratio(self, step: int) -> float:
"""Linear growth: 0 -> max_noise_ratio over warmup_steps"""
if step >= self.args.noise_warmup_steps:
return self.args.max_noise_ratio
else:
return self.args.max_noise_ratio * (step / self.args.noise_warmup_steps)
3.4 Curriculum-learning progression
Step noise_ratio Training sample distribution
────────────────────────────────────────────────────────
0 0.00 100% GT disparity
10000 0.25 75% GT, 25% noisy
20000 0.50 50% GT, 50% noisy
60000 0.50 50% GT, 50% noisy (held)
3.5 Multi-Update Inference
In addition to training improvements, inference can also use multiple pol_feat updates:
Single update: pol_update_iters = [11]
Progressive update: pol_update_iters = [6, 12, 18]
Rationale: each update yields more accurate disparity → better pol_diff alignment → the next update is more accurate.
4. Data Flow
A single training step:
- Get the current
step; call_get_noise_ratio(step)→noise_ratio. - Enter
model.forward(..., noise_ratio=noise_ratio). - Roll a die for each sample:
torch.rand(1) < noise_ratio.- Hit →
disparity_for_pol = disparity_gt + randn_like(disparity_gt) * disparity_noise_std. - Miss →
disparity_for_pol = disparity_gt.
- Hit →
pol_encoder(left, right, disparity_for_pol)warps usingdisparity_for_poland computespol_feat.- The rest follows the dual-stream architecture (fusion → Corr + GRU → disparity).
5. Tensor Dimensions
| Tensor | Dimensions / Type | Description |
|---|---|---|
disparity_gt | (B, 1, H, W) | GT disparity map |
noise | (B, 1, H, W) | torch.randn_like(disparity_gt) * disparity_noise_std |
disparity_for_pol | (B, 1, H, W) | Noised (or original) disparity map passed to pol_encoder |
noise_ratio | float (scalar) | Current-step noise probability, range 0 → max_noise_ratio |
step | int (scalar) | Current training step |
6. Hyperparameters
| Parameter | Default | Suggested Range | Too small | Too large |
|---|---|---|---|---|
disparity_noise_std | 2.0 | 1.5–3.0 | Effect not noticeable | Training instability |
noise_warmup_steps | 20000 | 15k–25k | Too fast to adapt | Wastes time |
max_noise_ratio | 0.5 | 0.3–0.6 | Insufficient effect | Too few GT samples |
Monitoring Metrics
| Metric | Description |
|---|---|
train/noise_ratio | Current curriculum progress |
7. Design Decisions and Rationale
7.1 Why inject noise into GT disparity
At inference, pol_diff carries geometric noise from the predicted-disparity error. Adding Gaussian noise to the GT disparity during training simulates this inference error, so the model “sees” noisy polarization features and the training/inference distribution gap narrows.
7.2 Why curriculum learning instead of full-strength noise from the start
If many noisy samples appear at the very beginning of training, the model has not yet learned the basics of clean polarization features and is already disturbed by noise, which may prevent stable convergence. Curriculum learning grows noise_ratio linearly from 0 to max_noise_ratio: clean samples build the foundation early on, and noise is introduced gradually later to adapt to the inference setting.
7.3 Why max_noise_ratio is 0.5 rather than 1.0
Keeping about half of the samples clean GT lets the model see both “clean” and “noisy” polarization features. Full-strength noise leaves too few GT samples, and the model may lose its learning signal for clean polarization features.
7.4 Why Gaussian noise
Inference error is statistically a roughly zero-mean random perturbation; Gaussian noise (N(0, σ²)) is the most natural approximation. disparity_noise_std controls noise magnitude and is set close to the actual inference disparity error.
7.5 Pairing with Multi-Update at inference
Training-side: the model is made noise-tolerant. Inference-side: multiple pol_update_iters (e.g., [6, 12, 18]) provide progressive alignment; each update yields more accurate disparity and better pol_diff alignment. Together they reduce the training/inference gap.
8. Highlights
- Training simulates inference error: Gaussian noise is injected into the GT disparity so that training-time
pol_diffcarries geometric noise similar to inference error, exposing the model to noisy polarization features during training and shrinking the training/inference distribution gap. - Curriculum-style progressive noising:
noise_ratiogrows linearly from 0 tomax_noise_ratio; clean samples build the foundation early on and noise is introduced later, avoiding disrupting convergence from the start. - Coexistence of clean and noisy samples:
max_noise_ratiois set to 0.5, keeping about half of the samples as clean GT so the model retains learning signal for both clean and noisy polarization features. - Zero-mean Gaussian as a physical approximation:
N(0, σ²)approximates statistically zero-mean inference disparity error, anddisparity_noise_stddirectly corresponds to noise magnitude, making it easy to tune to actual error levels. - Training and inference work together: the training side makes the model noise-tolerant; the inference side improves alignment quality through progressive multi-step
pol_update_iters. Both ends jointly reduce the training/inference gap.