1. Design Goals
The use of polarization information in stereo matching can occur at two levels: as a post-correlation intervention that corrects the correlation volume after it has formed, or as an early fusion that lets polarization participate before feature extraction.
The fundamental limitation of post-correlation intervention is this: no matter how residuals, schedules, or gating are added after the correlation step, polarization can only act after matching has already happened, and cannot change the matching features themselves.
The design goal of this architecture is to let polarization information enter feature extraction and directly influence matching itself: by feeding polarization in before features are extracted, the matching features learned by the feature encoder are directly shaped by polarization.
The core change is as follows:
# Conventional approach
fmap1 = fnet(left) # 3 channels
fmap2 = fnet(right) # 3 channels
# Pol can only intervene after corr is formed
# Pol-in-Feature (this architecture)
pol_diff = left - right
fmap1 = fnet(concat(left, pol_diff)) # 6 channels
fmap2 = fnet(concat(right, pol_diff)) # 6 channels
# Pol directly influences feature extraction
Concrete steps:
- Compute
pol_diff = left - right(3-channel polarization difference). - Concatenate
pol_diffwith the original image along the channel dimension: left pathconcat(left, pol_diff), right pathconcat(right, pol_diff), each 6 channels. - Feed the 6-channel tensor into the feature encoder
fnetto obtainfmap1/fmap2.
In this way, polarization participates before features are extracted, and the matching features learned by the feature encoder are directly influenced by polarization.
2. Design Principles
This architecture deliberately adopts the most minimal and pure design in order to isolate a single variable and validate the effectiveness of early fusion:
- No new branch.
- No attention.
- No residual.
- No gating.
- One question only: can polarization influence feature matching?
No sophisticated module is introduced; the goal is to answer the single question “does feeding polarization into the feature extraction stage actually help?”
The design principle this architecture follows: polarization does not perform spatial gating on its own, but instead enters the feature extraction stage where it can influence matching.
3. Architecture (Data Flow)
Note that pol_diff is fed to both the left and right paths, and both paths share the same pol_diff.
4. Components and Modules
- pol_diff computation:
pol_diff = left - right, a 3-channel polarization difference tensor, shared between the left and right paths. - fnet (Feature Encoder): input changed from 3 channels to 6 channels; runs once for each side, producing
fmap1/fmap2. - CorrBlock: builds the correlation volume from
fmap1/fmap2, following the original RAFT design. - UpdateBlock / GRU Loop: reuses the original RAFT update unit and iteration loop to produce disparity.
5. Tensor Dimensions
| Tensor | Dimensions | Description |
|---|---|---|
left / right | (B, 3, H, W) | Original RGB images |
pol_diff | (B, 3, H, W) | left - right, 3-channel polarization difference |
concat(left, pol_diff) | (B, 6, H, W) | Left-path fnet input |
concat(right, pol_diff) | (B, 6, H, W) | Right-path fnet input |
fnet.conv1 input channels | 3 → 6 | Changed from 3 to 6 channels |
fmap1 / fmap2 | fnet output features | Used to build CorrBlock |
6. Hyperparameters
| Hyperparameter | Description |
|---|---|
pol_levels = 4 | Pol pyramid levels |
pol_radius = 4 | Pol lookup radius |
iters = 24 | GRU iterations |
| curriculum | Curriculum training schedule enabled |
7. Design Decisions and Rationale
| Decision | Rationale |
|---|---|
| Polarization enters feature extraction (early fusion) | The only injection point that can directly influence matching itself |
| Inject polarization via 6-channel concat | The most direct form of early fusion, no extra modules needed |
Left and right paths share the same pol_diff | pol_diff is a single tensor formed by left minus right; there is only one copy |
| No branch / attention / residual / gating | Isolates a single variable to purely test “can polarization influence feature matching” |
| Reuses RAFT’s original CorrBlock and UpdateBlock | Only the input to feature extraction is changed; downstream is untouched |
8. Implementation Notes
Changing fnet’s input from 3 to 6 channels means the weight shape of fnet.conv1 changes. The pretrained fnet.conv1 trained for a 3-channel input is incompatible with the new 6-channel input. When loading pretrained weights this layer is skipped and must be retrained from scratch. This is a cost to be aware of in this design: the first convolutional layer loses its pretrained initialization.
9. Highlights
- Earliest possible polarization injection point: polarization participates before features are extracted, the only way to directly change the matching features themselves, breaking through the ceiling of post-correlation corrections.
- Zero extra modules: no new branch, attention, residual, or gating; early fusion is achieved purely through channel concatenation, the cleanest control design for validating the early fusion hypothesis.
- Single-variable isolation: deliberately minimal design so that “can polarization influence feature matching” becomes the sole testable variable.
- Downstream fully reused: CorrBlock and UpdateBlock are untouched; all changes are concentrated at the input side of feature extraction, minimizing the architectural change surface.