Minimal Polarization Injection Architecture

1. Design Goals

This architecture introduces polarization information into the stereo matching framework with “minimalism” as the core design principle.

When polarization features are computed by pure physics formulas, pol_diff and pol_ratio already have good separability between glass and non-glass. Under this premise, attaching a learnable pre-encoder after the physics features may bring three issues:

Increased optimization difficulty: additional learnable parameters need more training time to converge.
The physics features themselves are already sufficient: pol_diff and pol_ratio already have good discriminative power and may not need further encoder transformation.
Risk of over-parameterization: expanding 6ch to 32ch is not guaranteed to bring real gains.

The design goal of this architecture: adopt a minimalist strategy — use no learnable polarization pre-encoder, retain only the discriminative polarization features (pol_diff and pol_ratio, 6ch in total), and inject them directly into the MotionEncoder.

Design principle: when physics features already have good separability, the simplest option should be tried first.

2. Architecture

Architecture of Minimal Polarization Injection

Two key points of this architecture:

Polarization features contain only pol_diff and pol_ratio, totaling 6ch (no Sobel edge features).
The 6ch polarization features are injected directly into the MotionEncoder without any learnable encoder in between.

3. Components and Modules

3.1 PolarizationFeaturesV62

Physics-based polarization features, retaining only pol_diff and pol_ratio, 6ch in total, downsampled to H/8.

class PolarizationFeaturesV62(nn.Module):
    def forward(self, left, right):
        pol_diff = torch.abs(left - right)           # 3ch
        pol_ratio = left / (left + right + 1e-6)     # 3ch
        pol_features = torch.cat([pol_diff, pol_ratio], dim=1)  # 6ch
        return F.avg_pool2d(pol_features, kernel_size=8)

pol_diff = |I∥ - I⊥| (3ch): polarization intensity, the core feature for glass / non-glass discrimination.
pol_ratio = I∥ / (I∥ + I⊥ + ε) (3ch): polarization ratio, related to incidence angle, with auxiliary value.
The two are concatenated into 6ch and downsampled to H/8 via avg_pool2d(kernel_size=8).

3.2 MotionEncoderV62

Extends the MotionEncoder so that the polarization branch directly accepts 6ch polarization features (no pre-encoder).

class MotionEncoderV62(nn.Module):
    def __init__(self, corr_dim=36, disp_dim=1, pol_dim=6):  # 6ch direct input
        self.convp1 = nn.Conv2d(pol_dim, 32, 3, padding=1)
        self.convp2 = nn.Conv2d(32, 32, 3, padding=1)
        # ...

Polarization branch: Conv(6→32) → Conv(32→32). Input channels: 6.

4. Tensor Dimensions

Item	Shape	Description
pol_diff	(B, 3, H, W)	`
pol_ratio	(B, 3, H, W)	`I∥ / (I∥ + I⊥ + ε)`
pol_features (after concat)	(B, 6, H, W) → (B, 6, H/8, W/8)	avg_pool2d kernel=8
MotionEncoder pol branch input	(B, 6, H/8, W/8)	Direct injection, no pre-encoder
convp1 output	(B, 32, H/8, W/8)	Conv(6→32)
convp2 output	(B, 32, H/8, W/8)	Conv(32→32)

5. Parameter Count

Component	Parameters
Polarization branch (pol_dim=6)	—
Total added	~35K

6. Hyperparameters

Hyperparameter	Value	Description
pol_dim	6	Number of polarization feature channels (pol_diff 3 + pol_ratio 3)
avg_pool2d kernel	8	Downsample polarization features to H/8
corr_dim	36	Number of RGB correlation channels
disp_dim	1	Number of disparity channels
ε	1e-6	Denominator stabilizer for pol_ratio

7. Design Decisions and Rationale

7.1 Keep only pol_diff and pol_ratio

The polarization features retain only pol_diff (polarization intensity, the core feature for glass discrimination) and pol_ratio (polarization ratio, an auxiliary feature related to incidence angle); no edge-type features are included. The feature set is concise and the model is cleaner.

7.2 No learnable pre-encoder

When physics features already have good separability, adding a learnable pre-encoder may only add optimization difficulty without bringing expressiveness gains. Not every learnable component is better than handcrafted features.

7.3 Simple first

Given that physics features are already discriminative enough, the simplest option is tried first: 6ch physics features are injected directly into the MotionEncoder and absorbed by the two-layer conv of the polarization branch.

7.4 Fewer parameters

This architecture adds roughly 35K parameters. With the pre-encoder and redundant features removed, the number of added parameters reaches the minimum form for this class of polarization injection design.

8. Highlights

The minimal polarization injection path: 6ch physics features go directly into the MotionEncoder’s polarization branch with no learnable pre-module in between; the path is the shortest and the optimization the easiest.
A concise feature set: polarization features retain only the discriminative pol_diff and pol_ratio — the feature set is clean and free of redundancy.
A handcrafted-over-learnable trade-off: given that physics features already have good separability, explicitly chooses not to add a learnable encoder, avoiding over-parameterization and the extra optimization burden.
A tiny ~35K parameter overhead: reaches the minimal form among polarization injection designs of the same class, with the lowest number of added parameters.