Blueprint · 2026

Polarization Volume Architecture

A specification of the `StereoPolVolume` model: by analogy with RAFT-Stereo's Correlation Volume, the polarization difference is pre-computed for all disparity candidates as a Polarization Volume, so that polarization-feature computation does not depend on any disparity estimate.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

Computing the polarization difference (pol_diff) via warp suffers from a fundamental contradiction:

Ideal case:  pol_diff = warp(I_left, GT_disp) - I_right     → perfectly aligned ✓
Actual case: pol_diff = warp(I_left, pred_disp) - I_right   → contains error ✗

Core contradiction: computing pol_diff needs disparity, but disparity is precisely what we want to predict.

Using GT disparity to warp yields a perfectly aligned pol_diff; but at inference time only predicted disparity is available, and the warped pol_diff necessarily contains error. This creates a training/inference inconsistency.

Key Insight

In a rectified stereo system, corresponding points in the left and right images always lie on the same horizontal line (epipolar line).

We can therefore borrow the Correlation Volume idea from RAFT-Stereo: without knowing disparity in advance, pre-compute pol_diff for “all disparity candidates” into a volume, and later query it inside the GRU loop using the current disparity. The design goal of this architecture is to eliminate pol_diff’s dependence on a disparity estimate.


2. Solution: Polarization Volume

Analogously to RAFT-Stereo’s Correlation Volume, pre-compute pol_diff for all disparities:

# Approach that requires knowing disparity
pol_diff = warp(left, d_pred) - right

# Polarization Volume approach (no disparity needed)
pol_volume[d] = shift(left, d) - right   # for all d in [0, max_disp]

Because pol_volume is pre-computed over “all disparity candidates,” the entire forward pass no longer depends on any disparity estimate; the pol_volume computed under any disparity condition is identical.


3. Architecture

StereoPolVolume overall architecture

Data Flow

  1. The left and right images are encoded by FeatureEncoder into fmap1 / fmap2, which form a CorrBlock (the standard RAFT-Stereo correlation volume).
  2. The left and right images are also AvgPool 4x downsampled to left_ds / right_ds, which form a PolCorrBlock (the polarization-difference volume).
  3. The left image is additionally processed by ContextEncoder to produce context and the initial hidden state.
  4. Enter a 24-iteration GRU loop: each iteration queries both CorrBlock (yielding corr) and PolCorrBlock (yielding pol) using the current disparity.
  5. UpdateBlockWithPol concatenates corr, pol, and disp as input and produces the disparity increment Δdisp.
  6. disp = disp + Δdisp, iteratively updated.

4. Components and Modules

4.1 CorrBlock vs PolCorrBlock

ItemCorrBlockPolCorrBlock
Computationdot(fmap1[x], fmap2[x-d])left[x] - right[x-d]
MeaningFeature similarityPolarization difference
Dimensions(B*H, 1, W, W)(B*H, 1, W, W)
QuerySample by dispSample by disp
  • CorrBlock computes the inner product (dot product) of left and right features, measuring “feature similarity”—the core of standard RAFT-Stereo.
  • PolCorrBlock computes per-pixel differences between the (downsampled) left and right images along the same epipolar line, measuring “polarization difference.” The two are structurally symmetric, both forming all-pairs volumes of shape (B*H, 1, W, W), and both are sampled by the current disparity.

4.2 UpdateBlockWithPol

Compared with the standard RAFT-Stereo UpdateBlock, UpdateBlockWithPol accepts an additional input pol_corr, using concat(corr, pol, disp) as the update basis before the GRU produces the disparity increment.


5. Tensor Dimensions

TensorDimensionsDescription
fmap1 / fmap2FeatureEncoder outputsUsed to form CorrBlock
corr volume(B*H, 1, W, W)All-pairs feature similarity
pol volume(B*H, 1, W, W)All-pairs polarization difference
left_ds / right_ds1/4 resolutionAfter AvgPool 4x downsampling
Query outputControlled by pol_levels=4 and pol_radius=4corr / pol features sampled per GRU iteration

6. Hyperparameters

ParameterValueDescription
pol_volumeenabledEnable the Polarization Volume architecture
pretrainedraftstereo-sceneflow.pthSceneFlow pre-trained weights
pol_levels4Number of pyramid levels for the Polarization Volume
pol_radius4Query sampling radius
iters24GRU iterations
batch_size8Training batch size
num_steps60000Training steps
lr0.0003Learning rate

7. Design Decisions and Rationale

DecisionRationale
Pre-compute pol_diff like a Correlation VolumeLeverages epipolar geometry; disparity need not be known in advance
AvgPool 4x the polarization images before building the volumeMatches the corr-volume resolution and reduces compute
Make pol and corr structurally symmetricAllows sampling via the same disparity-query mechanism
UpdateBlockWithPol uses concat fusionThe most direct multi-source fusion

Core Advantage

Warp-based pol_diff requires a disparity input (GT or predicted), so a gap exists between “ideal alignment” and “actual alignment,” and training/inference behaviors are inconsistent. The Polarization Volume instead uses shift(left, d) - right to pre-compute the polarization volume for all disparity candidates; the forward pass never needs disparity. As a result:

  • Disparity is not required to compute polarization differences.
  • There is no “ideal vs actual alignment” gap.
  • The polarization-feature computation behaves identically at training and inference.

8. Highlights

  • Volume replaces warp for polarization computation: by analogy with the Correlation Volume, pol_diff for all disparity candidates is pre-computed into a Polarization Volume, completely removing the circular dependency of “needing disparity to compute pol_diff while disparity is exactly what we want to predict.”
  • Training and inference fully consistent: the polarization volume does not depend on any disparity estimate; the pol_volume is identical at training and inference, eliminating the “ideal vs actual alignment” gap.
  • Exploits the epipolar geometric constraint: leverages the property that corresponding points in a rectified stereo system lie on the same horizontal line, using shift in place of warp and encoding this geometric prior directly into volume construction.
  • Symmetric design with the Correlation Volume: PolCorrBlock is structurally symmetric with CorrBlock—both are all-pairs volumes of shape (B*H, 1, W, W) and can be sampled with the same disparity-query mechanism.
  • Downsampling reduces compute: polarization images are AvgPool 4x downsampled before volume construction, matching the correlation-volume resolution and controlling compute without losing the macroscopic polarization signal.
  • Multi-source fusion within iterations: UpdateBlockWithPol consumes both similarity and polarization difference per GRU iteration via concat(corr, pol, disp), letting the two signals jointly guide disparity convergence.

← All blueprints