Learnable Polarization Volume Architecture

1. Design Goals

In polarization stereo matching, the most direct approach is to compute the difference between the left and right polarization images using a fixed formula:

pol_diff[x, d] = I_∥[x] - I_⊥[x-d]

A fixed formula has the following problems:

A fixed formula cannot adapt to different scene conditions (brightness, angle, material).
It cannot learn “what constitutes a robust polarization signal.”
It can be disturbed by noise or false positives.

The design goal of this architecture is to upgrade polarization-signal generation from a “hand-crafted formula” to a “learnable representation”:

Fixed formula:
    pol_diff = I_∥ - I_⊥

Learnable version:
    pol_feat = PolHead(fmap)              # Learn to extract polarization features
    pol_corr = |pol_feat_L - pol_feat_R|  # L1 difference

In other words, instead of directly subtracting raw images, a learnable PolHead first extracts “polarization features” from the feature map, and then the L1 difference between left and right polarization features is computed.

2. Architecture

Learnable Polarization Volume overall architecture

Data Flow

The left and right images are each encoded by FeatureEncoder into fmap1 / fmap2.
fmap1 / fmap2 on one hand enter the standard CorrBlock to form corr_volume.
The same feature maps are also fed into the shared PolHead (LightweightPolHead), producing pol_feat1 / pol_feat2.
LearnablePolCorrBlock computes pol_volume in a vectorized form from the left and right polarization features.
corr_volume and pol_volume together enter UpdateBlockWithPol, which iteratively outputs disparity.

3. Components and Modules

3.1 Component Overview

Class	Function	Parameter count
`LightweightPolHead`	Lightweight polarization head (256→64→32)	~18K
`LearnablePolCorrBlock`	Vectorized polarization-volume computation	0
`GlassAwareLoss`	Glass-aware auxiliary loss	0
`StereoLearnablePol`	Integrated model	base + 18K

Relative to a pure RGB stereo-matching model, the overall model adds only about 18K parameters (all from LightweightPolHead).

3.2 LightweightPolHead Design

class LightweightPolHead(nn.Module):
    """
    Lightweight polarization head (~18K parameters)
    Structure: feature_dim -> 64 -> pol_dim
    """
    def __init__(self, in_dim=256, pol_dim=32):
        self.net = nn.Sequential(
            nn.Conv2d(in_dim, 64, 1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, pol_dim, 1),
        )

Input dimension in_dim=256 (feature-map channels), output dimension pol_dim=32.
The structure is two 1×1 convolutions 256 → 64 → 32 with BatchNorm and ReLU in between.
The 1×1 convolutions are per-pixel channel transforms; they do not mix spatial neighborhoods, deliberately keeping the head lightweight.
The left and right images share the same PolHead (shared weights), ensuring the left and right polarization features lie in the same representation space.

3.3 LearnablePolCorrBlock

Vectorized computation of the polarization volume; it itself has no learnable parameters (0 params). It computes the pol volume as an L1 difference between left and right pol_feat.

3.4 Glass-aware Auxiliary Loss

Forces PolHead to learn “polarization features” rather than “texture”:

# pol_volume should be high in glass regions
# pol_volume should be low in background regions
loss = max(0, margin + bg_pol - glass_pol)

This is a margin-based hinge loss: it forces the features output by PolHead to have high differences in glass regions and low differences in background regions. Its role is to give PolHead a clear learning signal and prevent it from degenerating into a generic texture extractor. It is weighted at glass_aware_weight=0.1 during training.

4. Tensor Dimensions

Tensor	Dimensions / Parameters	Description
`fmap1` / `fmap2`	256 channels	FeatureEncoder outputs
`PolHead` input	`in_dim=256`	Feature-map channels
`PolHead` middle layer	64	1×1 Conv + BN + ReLU
`pol_feat1` / `pol_feat2`	`pol_dim=32`	Polarization features
`pol_volume`	Controlled by `pol_levels=4` and `pol_radius=4`	L1-difference volume

5. Hyperparameters

Hyperparameter	Value	Description
`pol_dim`	32	Polarization feature dimension
`pol_levels`	4	Number of pyramid levels for the polarization volume
`pol_radius`	4	Polarization-volume query radius
`iters`	24	GRU iterations
`glass_aware_weight`	0.1	Weight on the Glass-aware auxiliary loss

6. Design Decisions and Rationale

Decision	Rationale
Replace the fixed subtraction formula with `PolHead`	Make polarization features learnable and adaptable to brightness/angle/material changes
`PolHead` uses 1×1 convolutions, only ~18K params	Keep it lightweight, only per-pixel channel transforms
Left and right share `PolHead`	Ensures left and right polarization features lie in the same representation space so the L1 difference is meaningful
Use L1 difference instead of inner product	The physical meaning of polarization “difference” is intensity difference; L1 is closer than an inner product
Add `GlassAwareLoss`	Provides `PolHead` with a clear supervision signal and prevents it from learning generic texture features
Feed into `UpdateBlockWithPol` as the downstream update module	The update module consumes both corr volume and pol volume

7. Highlights

Upgrades the polarization signal from “hand-crafted physical formula” to “learnable representation”: PolHead extracts polarization features before computing the L1 difference, adapting to changes in brightness, angle, and material.
LightweightPolHead uses two 1×1 convolutions with only about 18K parameters and performs only per-pixel channel transforms—trading minimal capacity for learnable ability.
The left and right images share the same PolHead, ensuring left and right polarization features lie in the same representation space so the L1 difference carries physical meaning.
GlassAwareLoss provides a clear supervision signal in margin-hinge form, forcing polarization features to be high in glass regions and low in background regions and preventing PolHead from degenerating into a generic texture extractor.
The polarization-volume computation module LearnablePolCorrBlock is vectorized and has zero learnable parameters; all added capacity is concentrated in the feature head.