Blueprint · 2026

Pol-Conditioned Correlation Residual Architecture

Model class: `StereoPolVolumeV2A` Subtitle: Pol-Conditioned Corr Residual (Additive Bias in Disparity Space) Document type: Architecture design specification (design only, no experimental results)

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

When the polarization signal participates in stereo matching, both the “manner” and the “space” in which it intervenes directly affect stability:

  • If polarization only intervenes in the spatial domain ([H,W]) via attention or a multiplicative gate, it cannot distinguish “which disparity” matters — the entire disparity dimension is scaled uniformly.
  • A multiplicative gate scales the correlation values up or zeroes them out globally, breaking the inductive bias that RAFT has built around the correlation volume.

The design goal of this architecture is to let polarization intervene without breaking RAFT’s inductive bias: polarization does not interfere via a multiplicative gate, but injects a bias into the correlation volume as an additive residual.

pol_corr → PolCorrResidual → Δcorr
corr_enhanced = corr + α × Δcorr  (Additive Bias in Disparity Space)

Since the correlation volume itself lives in disparity space, this residual is an “additive bias in disparity space”, and the UpdateBlock remains the original RAFT, so its inductive bias is preserved.


2. Architecture

Overall architecture of Pol-Conditioned Corr Residual

Data Flow

  1. The left and right images pass through FeatureEncoder to produce fmap1 / fmap2, which form the standard CorrBlock.
  2. The left and right images are each Downsampled to 1/4 and fed into PolCorrBlock to compute the polarization difference volume.
  3. The output of PolCorrBlock is fed into PolCorrResidual to produce the correlation residual Δcorr.
  4. corr_enhanced = corr + α × Δcorr additively injects the residual into the correlation volume.
  5. The enhanced corr_enhanced is fed into the original RAFT UpdateBlock (without any modification).

3. Components and Modules

3.1 PolCorrResidual

class PolCorrResidual(nn.Module):
    def __init__(self, pol_dim, corr_dim, hidden_dim=64, init_scale=0.1):
        self.net = nn.Sequential(
            nn.Conv2d(pol_dim, hidden_dim, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_dim, corr_dim, 1),  # project to corr dimension
        )
        self.scale = nn.Parameter(torch.tensor(init_scale))
        # Initialize the last layer to 0 → initial Δcorr ≈ 0
        nn.init.zeros_(self.net[-1].weight)
        nn.init.zeros_(self.net[-1].bias)

    def forward(self, pol_corr):
        return self.scale * self.net(pol_corr)

Design points:

  • net: three convolutional layers (3×3 → 3×3 → 1×1); the final 1×1 convolution projects features to corr_dim (aligned with the correlation volume channels).
  • scale: a learnable scalar parameter, initialized as init_scale=0.1.
  • The last layer’s weights and bias are initialized to 0: at the start of training Δcorr ≈ 0, i.e. corr_enhanced ≈ corr, so the initial behavior is identical to plain RAFT-Stereo without pol. As training proceeds, the residual is gradually learned, providing a stable training starting point.
  • forward returns scale * net(pol_corr), i.e. the scaled residual Δcorr.

3.2 Additive Bias in Disparity Space

The key idea of this architecture is the combination of “additive bias” and “disparity space”:

  • Additive: corr + α × Δcorr, with no multiplicative interference. Addition does not scale the original correlation values up or zero them out — it only shifts them, thereby preserving RAFT’s existing understanding of correlation.
  • Disparity space: the residual acts directly on the correlation volume, and each channel/index of the correlation volume corresponds to a disparity candidate. Therefore the pol correction naturally carries disparity semantics, rather than being purely spatial.

4. Tensor Dimensions

TensorShape / ParameterDescription
PolCorrResidual input pol_corr(B, pol_dim, H, W)From PolCorrBlock
Intermediate layer in nethidden_dim=64Two 3×3 convolutions
Δcorr output(B, corr_dim, H, W)Projected to corr dimension
scalescalarLearnable parameter, init 0.1
corr_enhanced(B, corr_dim, H, W)corr + α × Δcorr

5. Hyperparameters

HyperparameterValueDescription
pol_levels4Number of pyramid levels in the polarization volume
pol_radius4Lookup radius of the polarization volume
iters24Number of GRU iterations
hidden_dim64Number of channels in the intermediate layer of PolCorrResidual
init_scale0.1Initial value of the learnable scale parameter

6. Design Decisions and Rationale

DecisionRationale
Use an additive residual instead of a multiplicative gateAddition does not break RAFT’s inductive bias, multiplication does
Apply the residual to the correlation volumeThe correlation volume is disparity space; the correction carries disparity semantics
Initialize the last layer of PolCorrResidual to 0Δcorr ≈ 0 at the start of training, behavior close to plain RAFT-Stereo, learning starts from a stable point
scale set as a learnable parameter (init 0.1)Lets the model decide the overall strength of the pol residual
UpdateBlock fully reuses the original RAFTDownstream remains unchanged, maximally preserving pretrained capability
Pol images downsampled to 1/4Aligned with the correlation volume resolution

7. Highlights

  • Polarization is injected into the correlation volume as an additive residual, only shifting and never globally scaling correlation values, fully preserving RAFT’s existing understanding of correlation.
  • The residual acts directly on the correlation volume rather than the spatial domain, so the polarization correction naturally carries disparity semantics and can distinguish “which disparity”.
  • The last layer of PolCorrResidual is initialized to 0, so Δcorr ≈ 0 at the start of training and the model learns the polarization residual from a stable starting point equivalent to plain RAFT-Stereo.
  • A single learnable scalar scale controls the overall strength of the polarization residual, letting the model decide how much to trust pol.
  • UpdateBlock fully reuses the original RAFT with no modifications, maximally preserving pretrained capability.

← All blueprints