Blueprint · 2026

Per-Disparity Gate Architecture

Model class: `StereoPolVolumeV2D` Subtitle: Per-Disparity Gate (disparity-aware polarization gating implemented with 3D Conv) Document type: Architecture design specification (design only, no experimental results)

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

When polarization participates in stereo matching, if gating is only performed in the spatial domain ([H,W]), the gate treats the entire disparity dimension uniformly and cannot answer “which disparity candidate is implausible”.

The core shift in this architecture is:

Pol does not just tell the model “this place is important” (spatial gate) but tells the model “this disparity is implausible” (per-disparity gate)

In other words, the role of polarization shifts from an “importance mask” to a “validity judge”:

Spatial gatePer-Disparity gate (this architecture)
gate dimension[H,W][H,W,D]
Can it distinguish disparity
Pol roleimportance maskvalidity judge
Consistent with stereo physicspartialfully consistent

The gate in this architecture is [H,W,D], able to give a separate validity judgement to “each disparity candidate”.


2. Architecture Design

Step 1: Disparity-Aware Pol Volume

# For each disparity candidate d:
right_at_d = sample(right, x - d)  # value of the right image at disparity d
pol_diff_d = left - right_at_d     # disparity-aware pol_diff

# Physical meaning:
# - d = d_gt (correct): pol_diff reflects material characteristics
# - d ≠ d_gt (wrong / false match): pol_diff reflects geometric misalignment

For each disparity candidate d, shift the right image by d and subtract it from the left image to obtain pol_diff_d under that disparity hypothesis. Stacking the results across all d forms a pol volume with a disparity dimension.

Physical meaning:

  • When d = d_gt (correct disparity): left and right correspond to the same physical point, and pol_diff reflects the true material polarization characteristics.
  • When d ≠ d_gt (wrong / false match): left and right correspond to different physical points, and pol_diff reflects the difference caused by geometric misalignment.

Therefore the distribution of the pol volume along the disparity dimension itself carries the information of “which d is the plausible match”.

Step 2: Per-Disparity Gate (3D Conv)

self.pol_gate = nn.Sequential(
    nn.Conv3d(3, 8, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.Conv3d(8, 1, kernel_size=1),
    nn.Sigmoid()
)
# Input:  pol_volume [B, 3, D, H, W]
# Output: gate [B, D, H, W]
  • Use 3D convolution to process the pol volume of shape [B, 3, D, H, W]; the kernel spans both the disparity dimension and the spatial dimensions.
  • The first layer Conv3d(3, 8, k=3) extracts disparity-aware features; the second layer Conv3d(8, 1, k=1) projects to a single channel.
  • The final Sigmoid outputs gate [B, D, H, W] — each (d, h, w) position has an independent [0, 1] gating value.
  • pol_gate_hidden=8 is the number of hidden channels (configurable via pol_gate_hidden).

Step 3: Residual Modulation

# Does not break the RGB stereo baseline
corr_mod = corr * (1.0 + alpha * (gate - 0.5) * 2)
# gate=0.5 is neutral, <0.5 suppresses, >0.5 enhances
  • The gate is remapped to [-1, 1] via (gate - 0.5) * 2:
    • gate = 0.5 → 0 (neutral, does not change corr).
    • gate < 0.5 → negative (suppresses the correlation of that disparity).
    • gate > 0.5 → positive (enhances the correlation of that disparity).
  • corr_mod = corr * (1 + α · 2·(gate-0.5)): modulates correlation multiplicatively, but symmetrically enhances or suppresses around the “neutral point 1.0”, so the entire baseline is not pulled off.
  • alpha (e.g. 0.2) limits the modulation amplitude, ensuring that the RGB stereo baseline is not broken.

3. Design Principles Followed by This Architecture

This architecture follows three design principles when polarization intervenes in stereo matching:

PrincipleContentHow this architecture follows it
1Pol does not enter fnet (feature extraction)Pol only acts on the cost volume and does not touch feature extraction
2Pol does not only perform spatial [H,W] gatingThe gate is [H,W,D], carrying a disparity dimension
3Pol can only act in a disparity-aware spaceBoth the pol volume and the gate explicitly carry a disparity index

4. Architecture (Data Flow)

Per-Disparity Gate data flow


5. Tensor Dimensions

TensorShapeDescription
pol_volume[B, 3, D, H, W]Disparity-aware pol volume, input to 3D Conv
pol_gate first layer output[B, 8, D, H, W]Conv3d(3→8, k=3)
gate[B, D, H, W]Per-disparity gate, sigmoid → [0,1]
corr[B, ..., D, H, W]Correlation volume
corr_modsame as corrcorr * (1 + α·2·(gate-0.5))
alphascalarModulation amplitude, e.g. 0.2

6. Hyperparameters

HyperparameterValueDescription
pol_gate_hidden8Number of hidden channels in pol_gate
pol_alpha0.2Modulation amplitude of Residual Modulation
pol_levels4Number of pyramid levels in the polarization volume
pol_radius4Lookup radius of the polarization volume
iters24Number of GRU iterations

7. Design Decisions and Rationale

DecisionRationale
Pol uses a per-disparity gate [H,W,D]Enables pol to judge “which disparity is implausible”, consistent with stereo physics
Compute left - sample(right, x-d) for each dCorrect d reflects material, wrong d reflects geometric misalignment; the distribution itself carries validity information
Use 3D Conv to process the pol volumeThe kernel must span both disparity and spatial dimensions to learn disparity-aware gating
pol_gate only 2 layers, hidden=8Stays lightweight, serving as mechanism validation
Remap gate as (gate-0.5)*2 with multiplicative modulationSymmetric enhancement/suppression around the neutral point 1.0, does not break RGB baseline
alpha (e.g. 0.2) limits modulation amplitudeEnsures pol acts as “correction” rather than “dominant”, protecting the stereo baseline
UpdateBlock keeps the original RAFTDoes not change downstream, preserves pretrained capability

Why this direction matters

  1. Targets gross errors (D1/D3): this architecture does not do smooth refinement but “suppresses false matches” — directly lowering the correlation of implausible disparities.
  2. Physically consistent: pol provides a judgement of “under this disparity hypothesis, do left and right come from the same physical point”.
  3. Interpretable intervention point: clearly indicates at which step pol provides information that RGB cannot — namely as a validity judge along the disparity dimension of the cost volume.

8. Highlights

  • Shifts the role of polarization from “importance mask” to “validity judge”: the gate carries a disparity dimension [H,W,D] and can judge whether each disparity candidate is plausible.
  • The Disparity-Aware Pol Volume computes left - sample(right, x-d) for each d; correct d reflects material and wrong d reflects geometric misalignment, so the pol volume distribution naturally carries validity information.
  • 3D convolution processes the pol volume with kernels spanning both disparity and spatial dimensions, which is necessary to learn disparity-aware gating.
  • Residual Modulation remaps the gate as (gate-0.5)*2 to symmetrically enhance or suppress correlation around the neutral point 1.0, with alpha limiting the amplitude, ensuring polarization acts as a “correction” rather than “dominant”.
  • The mechanism directly targets gross errors (false matches) — lowering the correlation of implausible disparities rather than performing smooth refinement — and reuses the original RAFT UpdateBlock to preserve pretrained capability.

← All blueprints