Blueprint · 2026

Pol-in-Feature Early Fusion Architecture

An early fusion architecture that injects polarization information into the feature encoder via 6-channel concatenation during feature extraction, allowing polarization to directly influence stereo matching itself.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

The use of polarization information in stereo matching can occur at two levels: as a post-correlation intervention that corrects the correlation volume after it has formed, or as an early fusion that lets polarization participate before feature extraction.

The fundamental limitation of post-correlation intervention is this: no matter how residuals, schedules, or gating are added after the correlation step, polarization can only act after matching has already happened, and cannot change the matching features themselves.

The design goal of this architecture is to let polarization information enter feature extraction and directly influence matching itself: by feeding polarization in before features are extracted, the matching features learned by the feature encoder are directly shaped by polarization.

The core change is as follows:

# Conventional approach
fmap1 = fnet(left)   # 3 channels
fmap2 = fnet(right)  # 3 channels
# Pol can only intervene after corr is formed

# Pol-in-Feature (this architecture)
pol_diff = left - right
fmap1 = fnet(concat(left, pol_diff))   # 6 channels
fmap2 = fnet(concat(right, pol_diff))  # 6 channels
# Pol directly influences feature extraction

Concrete steps:

  1. Compute pol_diff = left - right (3-channel polarization difference).
  2. Concatenate pol_diff with the original image along the channel dimension: left path concat(left, pol_diff), right path concat(right, pol_diff), each 6 channels.
  3. Feed the 6-channel tensor into the feature encoder fnet to obtain fmap1 / fmap2.

In this way, polarization participates before features are extracted, and the matching features learned by the feature encoder are directly influenced by polarization.


2. Design Principles

This architecture deliberately adopts the most minimal and pure design in order to isolate a single variable and validate the effectiveness of early fusion:

  • No new branch.
  • No attention.
  • No residual.
  • No gating.
  • One question only: can polarization influence feature matching?

No sophisticated module is introduced; the goal is to answer the single question “does feeding polarization into the feature extraction stage actually help?”

The design principle this architecture follows: polarization does not perform spatial gating on its own, but instead enters the feature extraction stage where it can influence matching.


3. Architecture (Data Flow)

Pol-in-Feature Early Fusion data flow

Note that pol_diff is fed to both the left and right paths, and both paths share the same pol_diff.


4. Components and Modules

  • pol_diff computation: pol_diff = left - right, a 3-channel polarization difference tensor, shared between the left and right paths.
  • fnet (Feature Encoder): input changed from 3 channels to 6 channels; runs once for each side, producing fmap1 / fmap2.
  • CorrBlock: builds the correlation volume from fmap1 / fmap2, following the original RAFT design.
  • UpdateBlock / GRU Loop: reuses the original RAFT update unit and iteration loop to produce disparity.

5. Tensor Dimensions

TensorDimensionsDescription
left / right(B, 3, H, W)Original RGB images
pol_diff(B, 3, H, W)left - right, 3-channel polarization difference
concat(left, pol_diff)(B, 6, H, W)Left-path fnet input
concat(right, pol_diff)(B, 6, H, W)Right-path fnet input
fnet.conv1 input channels3 → 6Changed from 3 to 6 channels
fmap1 / fmap2fnet output featuresUsed to build CorrBlock

6. Hyperparameters

HyperparameterDescription
pol_levels = 4Pol pyramid levels
pol_radius = 4Pol lookup radius
iters = 24GRU iterations
curriculumCurriculum training schedule enabled

7. Design Decisions and Rationale

DecisionRationale
Polarization enters feature extraction (early fusion)The only injection point that can directly influence matching itself
Inject polarization via 6-channel concatThe most direct form of early fusion, no extra modules needed
Left and right paths share the same pol_diffpol_diff is a single tensor formed by left minus right; there is only one copy
No branch / attention / residual / gatingIsolates a single variable to purely test “can polarization influence feature matching”
Reuses RAFT’s original CorrBlock and UpdateBlockOnly the input to feature extraction is changed; downstream is untouched

8. Implementation Notes

Changing fnet’s input from 3 to 6 channels means the weight shape of fnet.conv1 changes. The pretrained fnet.conv1 trained for a 3-channel input is incompatible with the new 6-channel input. When loading pretrained weights this layer is skipped and must be retrained from scratch. This is a cost to be aware of in this design: the first convolutional layer loses its pretrained initialization.


9. Highlights

  • Earliest possible polarization injection point: polarization participates before features are extracted, the only way to directly change the matching features themselves, breaking through the ceiling of post-correlation corrections.
  • Zero extra modules: no new branch, attention, residual, or gating; early fusion is achieved purely through channel concatenation, the cleanest control design for validating the early fusion hypothesis.
  • Single-variable isolation: deliberately minimal design so that “can polarization influence feature matching” becomes the sole testable variable.
  • Downstream fully reused: CorrBlock and UpdateBlock are untouched; all changes are concentrated at the input side of feature extraction, minimizing the architectural change surface.

← All blueprints