Scheduled Residual Polarization Residual Architecture

1. Design Goals

When a polarization residual is injected into the correlation volume, using a static residual (with a fixed residual strength throughout the GRU iterations):

corr_enhanced = corr + pol_residual(pol_corr)

causes the problem of “injecting polarization at the wrong time with the wrong strength”:

Iteration	Problem
Early	Stereo is still aligning coarse geometry, adding the pol residual directly = amplifying early noise
Late	RAFT has entered refinement, the pol residual is no longer “more important”, just “equally important”

In other words, a fixed-strength residual makes pol merely a distractor throughout the entire process, rather than a refinement tool.

The design goal of this architecture is: let the residual strength grow with GRU iterations, so that pol takes effect at the “correct moment” — barely intervening early on to protect stereo geometry, and only being fully injected later as a refinement prior.

2. Architecture Mechanism: Scheduled Residual

The core mechanism is to replace the fixed residual with an “iteration-scheduled” residual:

# Scheduled Residual
alpha = i / max(iters - 1, 1)  # 0 → 1
corr_enhanced = corr + alpha * self.pol_residual(pol_corr)

where i is the current GRU iteration index and iters is the total number of iterations (e.g. 24). alpha is 0 at the first iteration and 1 at the last, growing linearly.

The PolCorrResidual module consists of three convolutional layers plus a learnable scale, with the last layer initialized to 0, outputting the scaled residual Δcorr.

3. Three-Phase α Schedule Philosophy

The α schedule performs three things at once in a single formula:

Phase	α Value	Role	Description
Early	α ≈ 0	Protect stereo geometry	Equivalent to plain RAFT-Stereo; pretrained stereo is not disrupted by polarization
Mid	α gradually grows	Pol becomes auxiliary evidence	Stereo already has a reasonable disparity; pol only nudges (boundaries, specular)
Late	α → 1	Pol = refinement prior	RAFT itself is doing small corrections; pol’s scale/semantic/timing all match

Summary of design philosophy: the α schedule turns pol from a “distractor throughout the process” into a “refinement tool at the right moment”.

Early phase: the stereo backbone is still aligning coarse geometry. With α≈0 the model behaves equivalently to plain RAFT-Stereo, avoiding amplification of early noise by the pol residual.
Mid phase: stereo has obtained a reasonable disparity. α gradually increases, and pol serves as auxiliary evidence performing small nudging on boundary and specular regions.
Late phase: RAFT itself is only doing small corrections, so α → 1, and pol’s strength (scale), semantics (semantic), and timing all match the refinement need exactly.

4. Data Flow

Scheduled Residual data flow

The key point is that alpha is a function of iteration, recomputed at every iteration, so early iterations inject almost no pol while late iterations inject it fully.

5. Components and Modules

5.1 PolCorrResidual

class PolCorrResidual(nn.Module):
    def __init__(self, pol_dim, corr_dim, hidden_dim=64, init_scale=0.1):
        self.net = nn.Sequential(
            nn.Conv2d(pol_dim, hidden_dim, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_dim, corr_dim, 1),  # project to corr dimension
        )
        self.scale = nn.Parameter(torch.tensor(init_scale))
        # Initialize the last layer to 0 → initial Δcorr ≈ 0
        nn.init.zeros_(self.net[-1].weight)
        nn.init.zeros_(self.net[-1].bias)

    def forward(self, pol_corr):
        return self.scale * self.net(pol_corr)

net: three convolutional layers (3×3 → 3×3 → 1×1); the final 1×1 convolution projects features to corr_dim.
scale: learnable scalar, init_scale=0.1.
The last layer is initialized to 0, so Δcorr ≈ 0 at the start of training and learning begins from a stable starting point.

5.2 Schedule coefficient alpha

alpha = i / max(iters - 1, 1) is a fixed function of iteration, not a learnable parameter. It is recomputed from the current index at every GRU iteration.

6. Tensor Dimensions

Tensor	Shape / Type	Description
`pol_corr`	`(B, pol_dim, H, W)`	Output of PolCorrBlock
`Δcorr`	`(B, corr_dim, H, W)`	Residual output of `pol_residual`
`alpha`	scalar (plain value, not a parameter)	`i / max(iters-1, 1)`, range [0, 1]
`corr_enhanced`	`(B, corr_dim, H, W)`	`corr + alpha * Δcorr`

7. Hyperparameters

Hyperparameter	Value	Description
`pol_levels`	4	Number of pyramid levels in the polarization volume
`pol_radius`	4	Lookup radius of the polarization volume
`iters`	24	Number of GRU iterations (also determines the length of the α schedule)
`hidden_dim`	64	Number of channels in the intermediate layer of `PolCorrResidual`
`init_scale`	0.1	Initial value of the learnable `scale` parameter

8. Design Decisions and Rationale

Decision	Rationale
Introduce a linear schedule `alpha = i/(iters-1)`	Lets the pol residual strength grow with GRU iterations, aligned with the refinement timing
Early phase α≈0	Protects pretrained stereo geometry from being disturbed by the pol residual
Late phase α→1	RAFT does only small corrections in late iterations, the right moment for pol to act as a refinement prior
`alpha` is a fixed function, not learnable	The schedule is a prior structure determined directly by the iteration; no learning needed
Last layer of `PolCorrResidual` initialized to 0	`Δcorr≈0` at the start of training, learning starts from a stable point
UpdateBlock keeps the original RAFT	Fully preserves pretrained capability

9. Highlights

A linear iteration schedule alpha = i / (iters-1) grows the polarization residual strength from 0 to 1, so pol only intervenes at the “right moment”.
Three phases in one shot: early phase α≈0 protects stereo geometry, mid phase α grows for auxiliary evidence, late phase α→1 becomes a refinement prior.
alpha is a fixed function of iteration rather than a learnable parameter — the schedule serves as a prior structure determined directly by the iteration index, with zero extra parameters.
The polarization residual turns pol from a “distractor throughout the process” into a “refinement tool at the right moment”, exactly aligned with RAFT’s small-correction behavior in late iterations.
The last layer of PolCorrResidual is initialized to 0, giving Δcorr ≈ 0 at the start of training, and the original RAFT UpdateBlock is reused, maximally preserving pretrained capability.