1. Design Goals
When polarization participates in stereo matching, if gating is only performed in the spatial domain ([H,W]), the gate treats the entire disparity dimension uniformly and cannot answer “which disparity candidate is implausible”.
The core shift in this architecture is:
Pol does not just tell the model “this place is important” (spatial gate) but tells the model “this disparity is implausible” (per-disparity gate)
In other words, the role of polarization shifts from an “importance mask” to a “validity judge”:
| Spatial gate | Per-Disparity gate (this architecture) | |
|---|---|---|
| gate dimension | [H,W] | [H,W,D] |
| Can it distinguish disparity | ❌ | ✅ |
| Pol role | importance mask | validity judge |
| Consistent with stereo physics | partial | fully consistent |
The gate in this architecture is [H,W,D], able to give a separate validity judgement to “each disparity candidate”.
2. Architecture Design
Step 1: Disparity-Aware Pol Volume
# For each disparity candidate d:
right_at_d = sample(right, x - d) # value of the right image at disparity d
pol_diff_d = left - right_at_d # disparity-aware pol_diff
# Physical meaning:
# - d = d_gt (correct): pol_diff reflects material characteristics
# - d ≠ d_gt (wrong / false match): pol_diff reflects geometric misalignment
For each disparity candidate d, shift the right image by d and subtract it from the left image to obtain pol_diff_d under that disparity hypothesis. Stacking the results across all d forms a pol volume with a disparity dimension.
Physical meaning:
- When
d = d_gt(correct disparity): left and right correspond to the same physical point, andpol_diffreflects the true material polarization characteristics. - When
d ≠ d_gt(wrong / false match): left and right correspond to different physical points, andpol_diffreflects the difference caused by geometric misalignment.
Therefore the distribution of the pol volume along the disparity dimension itself carries the information of “which d is the plausible match”.
Step 2: Per-Disparity Gate (3D Conv)
self.pol_gate = nn.Sequential(
nn.Conv3d(3, 8, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv3d(8, 1, kernel_size=1),
nn.Sigmoid()
)
# Input: pol_volume [B, 3, D, H, W]
# Output: gate [B, D, H, W]
- Use 3D convolution to process the pol volume of shape
[B, 3, D, H, W]; the kernel spans both the disparity dimension and the spatial dimensions. - The first layer
Conv3d(3, 8, k=3)extracts disparity-aware features; the second layerConv3d(8, 1, k=1)projects to a single channel. - The final Sigmoid outputs
gate [B, D, H, W]— each (d, h, w) position has an independent [0, 1] gating value. pol_gate_hidden=8is the number of hidden channels (configurable viapol_gate_hidden).
Step 3: Residual Modulation
# Does not break the RGB stereo baseline
corr_mod = corr * (1.0 + alpha * (gate - 0.5) * 2)
# gate=0.5 is neutral, <0.5 suppresses, >0.5 enhances
- The gate is remapped to
[-1, 1]via(gate - 0.5) * 2:gate = 0.5→ 0 (neutral, does not change corr).gate < 0.5→ negative (suppresses the correlation of that disparity).gate > 0.5→ positive (enhances the correlation of that disparity).
corr_mod = corr * (1 + α · 2·(gate-0.5)): modulates correlation multiplicatively, but symmetrically enhances or suppresses around the “neutral point 1.0”, so the entire baseline is not pulled off.alpha(e.g. 0.2) limits the modulation amplitude, ensuring that the RGB stereo baseline is not broken.
3. Design Principles Followed by This Architecture
This architecture follows three design principles when polarization intervenes in stereo matching:
| Principle | Content | How this architecture follows it |
|---|---|---|
| 1 | Pol does not enter fnet (feature extraction) | Pol only acts on the cost volume and does not touch feature extraction |
| 2 | Pol does not only perform spatial [H,W] gating | The gate is [H,W,D], carrying a disparity dimension |
| 3 | Pol can only act in a disparity-aware space | Both the pol volume and the gate explicitly carry a disparity index |
4. Architecture (Data Flow)
5. Tensor Dimensions
| Tensor | Shape | Description |
|---|---|---|
pol_volume | [B, 3, D, H, W] | Disparity-aware pol volume, input to 3D Conv |
pol_gate first layer output | [B, 8, D, H, W] | Conv3d(3→8, k=3) |
gate | [B, D, H, W] | Per-disparity gate, sigmoid → [0,1] |
corr | [B, ..., D, H, W] | Correlation volume |
corr_mod | same as corr | corr * (1 + α·2·(gate-0.5)) |
alpha | scalar | Modulation amplitude, e.g. 0.2 |
6. Hyperparameters
| Hyperparameter | Value | Description |
|---|---|---|
pol_gate_hidden | 8 | Number of hidden channels in pol_gate |
pol_alpha | 0.2 | Modulation amplitude of Residual Modulation |
pol_levels | 4 | Number of pyramid levels in the polarization volume |
pol_radius | 4 | Lookup radius of the polarization volume |
iters | 24 | Number of GRU iterations |
7. Design Decisions and Rationale
| Decision | Rationale |
|---|---|
| Pol uses a per-disparity gate [H,W,D] | Enables pol to judge “which disparity is implausible”, consistent with stereo physics |
Compute left - sample(right, x-d) for each d | Correct d reflects material, wrong d reflects geometric misalignment; the distribution itself carries validity information |
| Use 3D Conv to process the pol volume | The kernel must span both disparity and spatial dimensions to learn disparity-aware gating |
pol_gate only 2 layers, hidden=8 | Stays lightweight, serving as mechanism validation |
Remap gate as (gate-0.5)*2 with multiplicative modulation | Symmetric enhancement/suppression around the neutral point 1.0, does not break RGB baseline |
alpha (e.g. 0.2) limits modulation amplitude | Ensures pol acts as “correction” rather than “dominant”, protecting the stereo baseline |
| UpdateBlock keeps the original RAFT | Does not change downstream, preserves pretrained capability |
Why this direction matters
- Targets gross errors (D1/D3): this architecture does not do smooth refinement but “suppresses false matches” — directly lowering the correlation of implausible disparities.
- Physically consistent: pol provides a judgement of “under this disparity hypothesis, do left and right come from the same physical point”.
- Interpretable intervention point: clearly indicates at which step pol provides information that RGB cannot — namely as a validity judge along the disparity dimension of the cost volume.
8. Highlights
- Shifts the role of polarization from “importance mask” to “validity judge”: the gate carries a disparity dimension [H,W,D] and can judge whether each disparity candidate is plausible.
- The Disparity-Aware Pol Volume computes
left - sample(right, x-d)for each d; correct d reflects material and wrong d reflects geometric misalignment, so the pol volume distribution naturally carries validity information. - 3D convolution processes the pol volume with kernels spanning both disparity and spatial dimensions, which is necessary to learn disparity-aware gating.
- Residual Modulation remaps the gate as
(gate-0.5)*2to symmetrically enhance or suppress correlation around the neutral point 1.0, withalphalimiting the amplitude, ensuring polarization acts as a “correction” rather than “dominant”. - The mechanism directly targets gross errors (false matches) — lowering the correlation of implausible disparities rather than performing smooth refinement — and reuses the original RAFT UpdateBlock to preserve pretrained capability.