Blueprint · 2026

Implicit Polarization Input Architecture

A specification of the "implicit polarization input" approach: without adding any polarization encoder, feed the cross-polarized image pair `I∥` / `I⊥` directly into a standard RAFT-Stereo as ordinary left/right images.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

The core physical scenario this architecture addresses is: on transparent surfaces (glass), the left camera (I∥, with a 0° polarizer) captures strong specular reflections while the right camera (I⊥, with a 90° polarizer) suppresses them, so I∥ >> I⊥; on diffuse backgrounds, I∥ ≈ I⊥. Transparent surfaces are nearly invisible to standard depth sensing, but the polarized image pair carries an exploitable brightness-difference signal in glass regions.

Implicit polarization input is the most direct, lowest-cost way to exploit this physical signal:

  • No architectural changes: still uses standard RAFT-Stereo.
  • No polarization encoder added: no polarization-specific module, no dual-stream design.
  • The only change is the input data: the polarized pair I∥ (left) and I⊥ (right) is fed in directly as ordinary left/right images.

The question this design seeks to answer is: “Without any architectural modifications, can the network learn glass-region cues from the polarized image pair on its own?” The term “implicit” refers to polarization information not being processed “explicitly” by any dedicated module, but rather being expected to be exploited “implicitly” by the network during training.


2. Architecture

In terms of architecture, implicit polarization input is equivalent to standard RAFT-Stereo; only the input differs:

Implicit polarization input data flow

The polarized pair I∥ / I⊥ replaces the usual left/right images fed into the network, while the architecture itself is unchanged.


3. Components and Modules

This method adds no new modules. All components are the three standard RAFT-Stereo components:

  1. Feature Encoder (fnet): left/right shared weights, processing I∥ and I⊥ separately.
  2. Context Encoder (cnet): looks only at I∥ (left).
  3. Correlation Pyramid + GRU: iteratively refines disparity.

Polarization information is not processed “explicitly” by any dedicated module; it is expected to be exploited “implicitly” by the network during training.


4. Data Flow

  1. The polarized pair I∥, I⊥ is fed directly as the left/right input.
  2. I∥ and I⊥ pass through the shared-weight fnet separately → fmap1, fmap2.
  3. I∥ passes through cnet → context + hidden state.
  4. The Correlation Pyramid + GRU iteratively produce the disparity.

5. Tensor Dimensions

Identical to standard RAFT-Stereo:

  • Inputs I∥ / I⊥: 640 × 480.
  • fnet outputs fmap1, fmap2: downsampled feature maps.
  • Output disparity: matches the input resolution.

Since no module is added, the model parameter count is identical to standard RAFT-Stereo.


6. Hyperparameters

ParameterValueDescription
pretrainedraftstereo-sceneflow.pthSceneFlow pretrained weights
glass_weight3.0Loss weighting for glass regions
lr0.00005Learning rate
batch_size8Training batch size
num_steps50000Training steps
iters16GRU iterations
schedulercosineCosine learning rate schedule
d1_weight0.2D1 metric weight
PrecisionFP32More stable than BF16

7. Design Decisions and Rationale

7.1 Why an Implicit, Architecture-Free Approach

Implicit polarization input is the lowest-cost path to exploiting the signal. If the network can leverage polarization on its own, there is no need to design a complex polarization encoder. It cleanly isolates the contribution of “polarization information itself” from “architectural complexity”: the architecture is identical to ordinary stereo matching, with the sole variable being “whether the input contains a polarization difference”.

7.2 The Fundamental Limit of the Implicit Approach: BatchNorm Washes Out the Polarization Signal

The core problem with implicit polarization input is that the first layer of fnet contains BatchNorm. The value of the polarization signal lies in the macroscopic brightness-magnitude difference between I∥ and I⊥, and BN normalizes the input distribution and washes this difference away:

BatchNorm washes out the polarization signal

In other words, the polarization signal fed in by implicit polarization input has its macroscopic difference effectively normalized away after passing through the BN in fnet, and the network can only fall back on micro-structure for matching. For texture-less transparent glass, that fallback does not hold.

7.3 Applicability Boundary of the Implicit Approach

Because BN washes out the macroscopic magnitude difference of the polarization signal, implicit input cannot guarantee that the network “actively exploits” the polarization cue. For the polarization signal to be used effectively, polarization processing must completely bypass BN and operate from the raw |I∥ - I⊥|, which calls for an explicit polarization encoder. Implicit input is therefore suited to “low-cost exploration of whether polarization information has value”, but not as a final solution that depends on the polarization signal.


8. Highlights

  • Zero-modification path to using polarization: no new modules are added; only the input data is replaced, and the model parameter count is identical to a standard stereo matching network.
  • Clean variable isolation: the architecture is identical to ordinary stereo matching, with the only variable being “whether the input contains a polarization difference”, so the contribution of the polarization information itself can be evaluated in isolation.
  • Explicit physical assumption: grounded in the polarization physics of “I∥ >> I⊥ in glass regions and I∥ ≈ I⊥ on diffuse backgrounds”, turning glass visibility into a brightness difference between the image pair.
  • Identifies the fundamental BatchNorm limit: clearly delineates that the first BN layer of fnet normalizes away the macroscopic magnitude difference of the polarization signal, marking the applicability boundary of the implicit approach — for texture-less glass, implicit input is not enough to guarantee that the polarization signal is exploited.

← All blueprints