1. Design Goals
In polarization stereo matching, the most direct approach is to compute the difference between the left and right polarization images using a fixed formula:
pol_diff[x, d] = I_∥[x] - I_⊥[x-d]
A fixed formula has the following problems:
- A fixed formula cannot adapt to different scene conditions (brightness, angle, material).
- It cannot learn “what constitutes a robust polarization signal.”
- It can be disturbed by noise or false positives.
The design goal of this architecture is to upgrade polarization-signal generation from a “hand-crafted formula” to a “learnable representation”:
Fixed formula:
pol_diff = I_∥ - I_⊥
Learnable version:
pol_feat = PolHead(fmap) # Learn to extract polarization features
pol_corr = |pol_feat_L - pol_feat_R| # L1 difference
In other words, instead of directly subtracting raw images, a learnable PolHead first extracts “polarization features” from the feature map, and then the L1 difference between left and right polarization features is computed.
2. Architecture
Data Flow
- The left and right images are each encoded by
FeatureEncoderintofmap1/fmap2. fmap1/fmap2on one hand enter the standardCorrBlockto formcorr_volume.- The same feature maps are also fed into the shared
PolHead(LightweightPolHead), producingpol_feat1/pol_feat2. LearnablePolCorrBlockcomputespol_volumein a vectorized form from the left and right polarization features.corr_volumeandpol_volumetogether enterUpdateBlockWithPol, which iteratively outputs disparity.
3. Components and Modules
3.1 Component Overview
| Class | Function | Parameter count |
|---|---|---|
LightweightPolHead | Lightweight polarization head (256→64→32) | ~18K |
LearnablePolCorrBlock | Vectorized polarization-volume computation | 0 |
GlassAwareLoss | Glass-aware auxiliary loss | 0 |
StereoLearnablePol | Integrated model | base + 18K |
Relative to a pure RGB stereo-matching model, the overall model adds only about 18K parameters (all from LightweightPolHead).
3.2 LightweightPolHead Design
class LightweightPolHead(nn.Module):
"""
Lightweight polarization head (~18K parameters)
Structure: feature_dim -> 64 -> pol_dim
"""
def __init__(self, in_dim=256, pol_dim=32):
self.net = nn.Sequential(
nn.Conv2d(in_dim, 64, 1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.Conv2d(64, pol_dim, 1),
)
- Input dimension
in_dim=256(feature-map channels), output dimensionpol_dim=32. - The structure is two 1×1 convolutions
256 → 64 → 32with BatchNorm and ReLU in between. - The 1×1 convolutions are per-pixel channel transforms; they do not mix spatial neighborhoods, deliberately keeping the head lightweight.
- The left and right images share the same
PolHead(shared weights), ensuring the left and right polarization features lie in the same representation space.
3.3 LearnablePolCorrBlock
Vectorized computation of the polarization volume; it itself has no learnable parameters (0 params). It computes the pol volume as an L1 difference between left and right pol_feat.
3.4 Glass-aware Auxiliary Loss
Forces PolHead to learn “polarization features” rather than “texture”:
# pol_volume should be high in glass regions
# pol_volume should be low in background regions
loss = max(0, margin + bg_pol - glass_pol)
This is a margin-based hinge loss: it forces the features output by PolHead to have high differences in glass regions and low differences in background regions. Its role is to give PolHead a clear learning signal and prevent it from degenerating into a generic texture extractor. It is weighted at glass_aware_weight=0.1 during training.
4. Tensor Dimensions
| Tensor | Dimensions / Parameters | Description |
|---|---|---|
fmap1 / fmap2 | 256 channels | FeatureEncoder outputs |
PolHead input | in_dim=256 | Feature-map channels |
PolHead middle layer | 64 | 1×1 Conv + BN + ReLU |
pol_feat1 / pol_feat2 | pol_dim=32 | Polarization features |
pol_volume | Controlled by pol_levels=4 and pol_radius=4 | L1-difference volume |
5. Hyperparameters
| Hyperparameter | Value | Description |
|---|---|---|
pol_dim | 32 | Polarization feature dimension |
pol_levels | 4 | Number of pyramid levels for the polarization volume |
pol_radius | 4 | Polarization-volume query radius |
iters | 24 | GRU iterations |
glass_aware_weight | 0.1 | Weight on the Glass-aware auxiliary loss |
6. Design Decisions and Rationale
| Decision | Rationale |
|---|---|
Replace the fixed subtraction formula with PolHead | Make polarization features learnable and adaptable to brightness/angle/material changes |
PolHead uses 1×1 convolutions, only ~18K params | Keep it lightweight, only per-pixel channel transforms |
Left and right share PolHead | Ensures left and right polarization features lie in the same representation space so the L1 difference is meaningful |
| Use L1 difference instead of inner product | The physical meaning of polarization “difference” is intensity difference; L1 is closer than an inner product |
Add GlassAwareLoss | Provides PolHead with a clear supervision signal and prevents it from learning generic texture features |
Feed into UpdateBlockWithPol as the downstream update module | The update module consumes both corr volume and pol volume |
7. Highlights
- Upgrades the polarization signal from “hand-crafted physical formula” to “learnable representation”:
PolHeadextracts polarization features before computing the L1 difference, adapting to changes in brightness, angle, and material. LightweightPolHeaduses two 1×1 convolutions with only about 18K parameters and performs only per-pixel channel transforms—trading minimal capacity for learnable ability.- The left and right images share the same
PolHead, ensuring left and right polarization features lie in the same representation space so the L1 difference carries physical meaning. GlassAwareLossprovides a clear supervision signal in margin-hinge form, forcing polarization features to be high in glass regions and low in background regions and preventingPolHeadfrom degenerating into a generic texture extractor.- The polarization-volume computation module
LearnablePolCorrBlockis vectorized and has zero learnable parameters; all added capacity is concentrated in the feature head.