Blueprint · 2026

Learnable Polarization Volume Architecture

Model class: `StereoLearnablePol` Document type: architecture design specification (design only; no experimental conclusions)

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

In polarization stereo matching, the most direct approach is to compute the difference between the left and right polarization images using a fixed formula:

pol_diff[x, d] = I_∥[x] - I_⊥[x-d]

A fixed formula has the following problems:

  • A fixed formula cannot adapt to different scene conditions (brightness, angle, material).
  • It cannot learn “what constitutes a robust polarization signal.”
  • It can be disturbed by noise or false positives.

The design goal of this architecture is to upgrade polarization-signal generation from a “hand-crafted formula” to a “learnable representation”:

Fixed formula:
    pol_diff = I_∥ - I_⊥

Learnable version:
    pol_feat = PolHead(fmap)              # Learn to extract polarization features
    pol_corr = |pol_feat_L - pol_feat_R|  # L1 difference

In other words, instead of directly subtracting raw images, a learnable PolHead first extracts “polarization features” from the feature map, and then the L1 difference between left and right polarization features is computed.


2. Architecture

Learnable Polarization Volume overall architecture

Data Flow

  1. The left and right images are each encoded by FeatureEncoder into fmap1 / fmap2.
  2. fmap1 / fmap2 on one hand enter the standard CorrBlock to form corr_volume.
  3. The same feature maps are also fed into the shared PolHead (LightweightPolHead), producing pol_feat1 / pol_feat2.
  4. LearnablePolCorrBlock computes pol_volume in a vectorized form from the left and right polarization features.
  5. corr_volume and pol_volume together enter UpdateBlockWithPol, which iteratively outputs disparity.

3. Components and Modules

3.1 Component Overview

ClassFunctionParameter count
LightweightPolHeadLightweight polarization head (256→64→32)~18K
LearnablePolCorrBlockVectorized polarization-volume computation0
GlassAwareLossGlass-aware auxiliary loss0
StereoLearnablePolIntegrated modelbase + 18K

Relative to a pure RGB stereo-matching model, the overall model adds only about 18K parameters (all from LightweightPolHead).

3.2 LightweightPolHead Design

class LightweightPolHead(nn.Module):
    """
    Lightweight polarization head (~18K parameters)
    Structure: feature_dim -> 64 -> pol_dim
    """
    def __init__(self, in_dim=256, pol_dim=32):
        self.net = nn.Sequential(
            nn.Conv2d(in_dim, 64, 1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, pol_dim, 1),
        )
  • Input dimension in_dim=256 (feature-map channels), output dimension pol_dim=32.
  • The structure is two 1×1 convolutions 256 → 64 → 32 with BatchNorm and ReLU in between.
  • The 1×1 convolutions are per-pixel channel transforms; they do not mix spatial neighborhoods, deliberately keeping the head lightweight.
  • The left and right images share the same PolHead (shared weights), ensuring the left and right polarization features lie in the same representation space.

3.3 LearnablePolCorrBlock

Vectorized computation of the polarization volume; it itself has no learnable parameters (0 params). It computes the pol volume as an L1 difference between left and right pol_feat.

3.4 Glass-aware Auxiliary Loss

Forces PolHead to learn “polarization features” rather than “texture”:

# pol_volume should be high in glass regions
# pol_volume should be low in background regions
loss = max(0, margin + bg_pol - glass_pol)

This is a margin-based hinge loss: it forces the features output by PolHead to have high differences in glass regions and low differences in background regions. Its role is to give PolHead a clear learning signal and prevent it from degenerating into a generic texture extractor. It is weighted at glass_aware_weight=0.1 during training.


4. Tensor Dimensions

TensorDimensions / ParametersDescription
fmap1 / fmap2256 channelsFeatureEncoder outputs
PolHead inputin_dim=256Feature-map channels
PolHead middle layer641×1 Conv + BN + ReLU
pol_feat1 / pol_feat2pol_dim=32Polarization features
pol_volumeControlled by pol_levels=4 and pol_radius=4L1-difference volume

5. Hyperparameters

HyperparameterValueDescription
pol_dim32Polarization feature dimension
pol_levels4Number of pyramid levels for the polarization volume
pol_radius4Polarization-volume query radius
iters24GRU iterations
glass_aware_weight0.1Weight on the Glass-aware auxiliary loss

6. Design Decisions and Rationale

DecisionRationale
Replace the fixed subtraction formula with PolHeadMake polarization features learnable and adaptable to brightness/angle/material changes
PolHead uses 1×1 convolutions, only ~18K paramsKeep it lightweight, only per-pixel channel transforms
Left and right share PolHeadEnsures left and right polarization features lie in the same representation space so the L1 difference is meaningful
Use L1 difference instead of inner productThe physical meaning of polarization “difference” is intensity difference; L1 is closer than an inner product
Add GlassAwareLossProvides PolHead with a clear supervision signal and prevents it from learning generic texture features
Feed into UpdateBlockWithPol as the downstream update moduleThe update module consumes both corr volume and pol volume

7. Highlights

  • Upgrades the polarization signal from “hand-crafted physical formula” to “learnable representation”: PolHead extracts polarization features before computing the L1 difference, adapting to changes in brightness, angle, and material.
  • LightweightPolHead uses two 1×1 convolutions with only about 18K parameters and performs only per-pixel channel transforms—trading minimal capacity for learnable ability.
  • The left and right images share the same PolHead, ensuring left and right polarization features lie in the same representation space so the L1 difference carries physical meaning.
  • GlassAwareLoss provides a clear supervision signal in margin-hinge form, forcing polarization features to be high in glass regions and low in background regions and preventing PolHead from degenerating into a generic texture extractor.
  • The polarization-volume computation module LearnablePolCorrBlock is vectorized and has zero learnable parameters; all added capacity is concentrated in the feature head.

← All blueprints