Blueprint · 2026

Scheduled Residual Polarization Residual Architecture

Model class: `StereoPolVolumeV2B` Subtitle: Scheduled Residual (iteration-scheduled polarization residual) Document type: Architecture design specification (design only, no experimental results)

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

When a polarization residual is injected into the correlation volume, using a static residual (with a fixed residual strength throughout the GRU iterations):

corr_enhanced = corr + pol_residual(pol_corr)

causes the problem of “injecting polarization at the wrong time with the wrong strength”:

IterationProblem
EarlyStereo is still aligning coarse geometry, adding the pol residual directly = amplifying early noise
LateRAFT has entered refinement, the pol residual is no longer “more important”, just “equally important”

In other words, a fixed-strength residual makes pol merely a distractor throughout the entire process, rather than a refinement tool.

The design goal of this architecture is: let the residual strength grow with GRU iterations, so that pol takes effect at the “correct moment” — barely intervening early on to protect stereo geometry, and only being fully injected later as a refinement prior.


2. Architecture Mechanism: Scheduled Residual

The core mechanism is to replace the fixed residual with an “iteration-scheduled” residual:

# Scheduled Residual
alpha = i / max(iters - 1, 1)  # 0 → 1
corr_enhanced = corr + alpha * self.pol_residual(pol_corr)

where i is the current GRU iteration index and iters is the total number of iterations (e.g. 24). alpha is 0 at the first iteration and 1 at the last, growing linearly.

The PolCorrResidual module consists of three convolutional layers plus a learnable scale, with the last layer initialized to 0, outputting the scaled residual Δcorr.


3. Three-Phase α Schedule Philosophy

The α schedule performs three things at once in a single formula:

Phaseα ValueRoleDescription
Earlyα ≈ 0Protect stereo geometryEquivalent to plain RAFT-Stereo; pretrained stereo is not disrupted by polarization
Midα gradually growsPol becomes auxiliary evidenceStereo already has a reasonable disparity; pol only nudges (boundaries, specular)
Lateα → 1Pol = refinement priorRAFT itself is doing small corrections; pol’s scale/semantic/timing all match

Summary of design philosophy: the α schedule turns pol from a “distractor throughout the process” into a “refinement tool at the right moment”.

  • Early phase: the stereo backbone is still aligning coarse geometry. With α≈0 the model behaves equivalently to plain RAFT-Stereo, avoiding amplification of early noise by the pol residual.
  • Mid phase: stereo has obtained a reasonable disparity. α gradually increases, and pol serves as auxiliary evidence performing small nudging on boundary and specular regions.
  • Late phase: RAFT itself is only doing small corrections, so α → 1, and pol’s strength (scale), semantics (semantic), and timing all match the refinement need exactly.

4. Data Flow

Scheduled Residual data flow

The key point is that alpha is a function of iteration, recomputed at every iteration, so early iterations inject almost no pol while late iterations inject it fully.


5. Components and Modules

5.1 PolCorrResidual

class PolCorrResidual(nn.Module):
    def __init__(self, pol_dim, corr_dim, hidden_dim=64, init_scale=0.1):
        self.net = nn.Sequential(
            nn.Conv2d(pol_dim, hidden_dim, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_dim, corr_dim, 1),  # project to corr dimension
        )
        self.scale = nn.Parameter(torch.tensor(init_scale))
        # Initialize the last layer to 0 → initial Δcorr ≈ 0
        nn.init.zeros_(self.net[-1].weight)
        nn.init.zeros_(self.net[-1].bias)

    def forward(self, pol_corr):
        return self.scale * self.net(pol_corr)
  • net: three convolutional layers (3×3 → 3×3 → 1×1); the final 1×1 convolution projects features to corr_dim.
  • scale: learnable scalar, init_scale=0.1.
  • The last layer is initialized to 0, so Δcorr ≈ 0 at the start of training and learning begins from a stable starting point.

5.2 Schedule coefficient alpha

alpha = i / max(iters - 1, 1) is a fixed function of iteration, not a learnable parameter. It is recomputed from the current index at every GRU iteration.


6. Tensor Dimensions

TensorShape / TypeDescription
pol_corr(B, pol_dim, H, W)Output of PolCorrBlock
Δcorr(B, corr_dim, H, W)Residual output of pol_residual
alphascalar (plain value, not a parameter)i / max(iters-1, 1), range [0, 1]
corr_enhanced(B, corr_dim, H, W)corr + alpha * Δcorr

7. Hyperparameters

HyperparameterValueDescription
pol_levels4Number of pyramid levels in the polarization volume
pol_radius4Lookup radius of the polarization volume
iters24Number of GRU iterations (also determines the length of the α schedule)
hidden_dim64Number of channels in the intermediate layer of PolCorrResidual
init_scale0.1Initial value of the learnable scale parameter

8. Design Decisions and Rationale

DecisionRationale
Introduce a linear schedule alpha = i/(iters-1)Lets the pol residual strength grow with GRU iterations, aligned with the refinement timing
Early phase α≈0Protects pretrained stereo geometry from being disturbed by the pol residual
Late phase α→1RAFT does only small corrections in late iterations, the right moment for pol to act as a refinement prior
alpha is a fixed function, not learnableThe schedule is a prior structure determined directly by the iteration; no learning needed
Last layer of PolCorrResidual initialized to 0Δcorr≈0 at the start of training, learning starts from a stable point
UpdateBlock keeps the original RAFTFully preserves pretrained capability

9. Highlights

  • A linear iteration schedule alpha = i / (iters-1) grows the polarization residual strength from 0 to 1, so pol only intervenes at the “right moment”.
  • Three phases in one shot: early phase α≈0 protects stereo geometry, mid phase α grows for auxiliary evidence, late phase α→1 becomes a refinement prior.
  • alpha is a fixed function of iteration rather than a learnable parameter — the schedule serves as a prior structure determined directly by the iteration index, with zero extra parameters.
  • The polarization residual turns pol from a “distractor throughout the process” into a “refinement tool at the right moment”, exactly aligned with RAFT’s small-correction behavior in late iterations.
  • The last layer of PolCorrResidual is initialized to 0, giving Δcorr ≈ 0 at the start of training, and the original RAFT UpdateBlock is reused, maximally preserving pretrained capability.

← All blueprints