Curriculum Disparity Noise Mechanism — Po-Ting Lin (林柏廷)

1. Design Goals

1.1 Training/inference distribution mismatch

Dual-stream polarization stereo matching requires warping the right image into the left view when computing polarization features:

Training: use GT disparity for warping → perfect alignment; the model learns “clean” polarization features.
Inference: no GT disparity is available, only the predicted disparity → predictions contain errors, and pol_diff is contaminated with geometric noise.

If the model has only ever seen “cleanly aligned” pol_diff during training, it has never been exposed to the “noisy pol_diff” that actually appears at inference, and there is a gap between the two input distributions.

Root cause of training/inference distribution mismatch

1.2 Approach

Make training-time pol_diff also carry “noise similar to inference error,” so the model sees noisy polarization features during training—i.e., inject Gaussian noise into the GT disparity. Use curriculum learning to transition from clean samples to noisy samples gradually, avoiding disrupting training from the start.

2. Architecture

Curriculum Disparity Noise flow

3. Components and Modules

3.1 Model side: noise-injection logic

StereoDualStream adds a disparity_noise_std parameter and accepts noise_ratio in forward:

class StereoDualStream(nn.Module):
    def __init__(self, ..., disparity_noise_std: float = 2.0):
        self.disparity_noise_std = disparity_noise_std

    def forward(self, left, right, ..., noise_ratio: float = 0.0):
        # Training-Inference Consistency
        disparity_for_pol = disparity_gt
        if self.training and disparity_gt is not None and noise_ratio > 0:
            if torch.rand(1).item() < noise_ratio:
                # Add Gaussian noise to simulate inference error
                noise = torch.randn_like(disparity_gt) * self.disparity_noise_std
                disparity_for_pol = disparity_gt + noise

        pol_feat = self.pol_encoder(left, right, disparity_for_pol)

Behavior:

Noise is only considered when self.training is true, disparity_gt is available, and noise_ratio > 0.
Each sample is noised with probability noise_ratio (torch.rand(1) < noise_ratio).
Noise form: disparity_gt + N(0, disparity_noise_std²), with noise generated by torch.randn_like.
The noised disparity_for_pol is passed to pol_encoder for warp alignment.

3.2 Training-side parameters

Curriculum-learning parameters:

parser.add_argument('--disparity_noise_std', type=float, default=2.0,
                    help='Disparity noise std (pixels)')
parser.add_argument('--noise_warmup_steps', type=int, default=20000,
                    help='Steps to warmup noise_ratio from 0 to max')
parser.add_argument('--max_noise_ratio', type=float, default=0.5,
                    help='Maximum noise ratio (0.5 = 50% noisy samples)')

3.3 Curriculum-learning function

noise_ratio grows linearly with the training step:

def _get_noise_ratio(self, step: int) -> float:
    """Linear growth: 0 -> max_noise_ratio over warmup_steps"""
    if step >= self.args.noise_warmup_steps:
        return self.args.max_noise_ratio
    else:
        return self.args.max_noise_ratio * (step / self.args.noise_warmup_steps)

3.4 Curriculum-learning progression

Step         noise_ratio    Training sample distribution
────────────────────────────────────────────────────────
    0        0.00           100% GT disparity
10000        0.25            75% GT, 25% noisy
20000        0.50            50% GT, 50% noisy
60000        0.50            50% GT, 50% noisy (held)

3.5 Multi-Update Inference

In addition to training improvements, inference can also use multiple pol_feat updates:

Single update: pol_update_iters = [11]
Progressive update: pol_update_iters = [6, 12, 18]

Rationale: each update yields more accurate disparity → better pol_diff alignment → the next update is more accurate.

4. Data Flow

A single training step:

Get the current step; call _get_noise_ratio(step) → noise_ratio.
Enter model.forward(..., noise_ratio=noise_ratio).
Roll a die for each sample: torch.rand(1) < noise_ratio.
- Hit → disparity_for_pol = disparity_gt + randn_like(disparity_gt) * disparity_noise_std.
- Miss → disparity_for_pol = disparity_gt.
pol_encoder(left, right, disparity_for_pol) warps using disparity_for_pol and computes pol_feat.
The rest follows the dual-stream architecture (fusion → Corr + GRU → disparity).

5. Tensor Dimensions

Tensor	Dimensions / Type	Description
`disparity_gt`	(B, 1, H, W)	GT disparity map
`noise`	(B, 1, H, W)	`torch.randn_like(disparity_gt) * disparity_noise_std`
`disparity_for_pol`	(B, 1, H, W)	Noised (or original) disparity map passed to `pol_encoder`
`noise_ratio`	`float` (scalar)	Current-step noise probability, range 0 → `max_noise_ratio`
`step`	`int` (scalar)	Current training step

6. Hyperparameters

Parameter	Default	Suggested Range	Too small	Too large
`disparity_noise_std`	2.0	1.5–3.0	Effect not noticeable	Training instability
`noise_warmup_steps`	20000	15k–25k	Too fast to adapt	Wastes time
`max_noise_ratio`	0.5	0.3–0.6	Insufficient effect	Too few GT samples

Monitoring Metrics

Metric	Description
`train/noise_ratio`	Current curriculum progress

7. Design Decisions and Rationale

7.1 Why inject noise into GT disparity

At inference, pol_diff carries geometric noise from the predicted-disparity error. Adding Gaussian noise to the GT disparity during training simulates this inference error, so the model “sees” noisy polarization features and the training/inference distribution gap narrows.

7.2 Why curriculum learning instead of full-strength noise from the start

If many noisy samples appear at the very beginning of training, the model has not yet learned the basics of clean polarization features and is already disturbed by noise, which may prevent stable convergence. Curriculum learning grows noise_ratio linearly from 0 to max_noise_ratio: clean samples build the foundation early on, and noise is introduced gradually later to adapt to the inference setting.

7.3 Why max_noise_ratio is 0.5 rather than 1.0

Keeping about half of the samples clean GT lets the model see both “clean” and “noisy” polarization features. Full-strength noise leaves too few GT samples, and the model may lose its learning signal for clean polarization features.

7.4 Why Gaussian noise

Inference error is statistically a roughly zero-mean random perturbation; Gaussian noise (N(0, σ²)) is the most natural approximation. disparity_noise_std controls noise magnitude and is set close to the actual inference disparity error.

7.5 Pairing with Multi-Update at inference

Training-side: the model is made noise-tolerant. Inference-side: multiple pol_update_iters (e.g., [6, 12, 18]) provide progressive alignment; each update yields more accurate disparity and better pol_diff alignment. Together they reduce the training/inference gap.

8. Highlights

Training simulates inference error: Gaussian noise is injected into the GT disparity so that training-time pol_diff carries geometric noise similar to inference error, exposing the model to noisy polarization features during training and shrinking the training/inference distribution gap.
Curriculum-style progressive noising: noise_ratio grows linearly from 0 to max_noise_ratio; clean samples build the foundation early on and noise is introduced later, avoiding disrupting convergence from the start.
Coexistence of clean and noisy samples: max_noise_ratio is set to 0.5, keeping about half of the samples as clean GT so the model retains learning signal for both clean and noisy polarization features.
Zero-mean Gaussian as a physical approximation: N(0, σ²) approximates statistically zero-mean inference disparity error, and disparity_noise_std directly corresponds to noise magnitude, making it easy to tune to actual error levels.
Training and inference work together: the training side makes the model noise-tolerant; the inference side improves alignment quality through progressive multi-step pol_update_iters. Both ends jointly reduce the training/inference gap.