Blueprint · 2026

S2M2 Polarization Injection Points (A/B/C)

This document describes how to inject the polarization (Pol) signal into the S2M2 backbone, focusing on the position, modification and rationale of the three injection points A/B/C. It does not cover any experimental results or performance numbers.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

For a stereo matching architecture to use the polarization signal to detect transparent objects, the key is to let the polarization signal participate in the core matching computation rather than only being applied as a post-hoc correction. The architectural characteristics of S2M2 make this possible:

  • The matching core can be modulated: the matching computation cv = einsum(...) in S2M2 lies before the Sinkhorn Optimal Transport. It is an open interface, and a polarization weight can directly modulate cv element-wise, influencing the entire optimal transport.
  • OT inherently contains an assignment mechanism: all-pairs correlation matches in a higher-dimensional space, and Optimal Transport has a built-in confidence / assignment mechanism that is well suited to accommodating the uncertainty information that polarization brings.
  • Cross-attention perceives appearance differences: the MRT cross-attention can directly perceive polarization-induced appearance differences (I∥ vs I⊥) when the left and right features interact.
  • LayerNorm preserves polarization magnitudes: S2M2 uses LayerNorm in DispInit; unlike BatchNorm, it does not wash out magnitude differences within a batch. The essence of the polarization signal is the magnitude difference between I∥ and I⊥, so the choice of normalization directly determines whether the polarization signal can be preserved.
# S2M2 DispInit
self.layer_norm = nn.LayerNorm(dim, elementwise_affine=True)

This document defines three polarization injection points A / B / C, corresponding respectively to the input layer, the cross-scale fusion layer, and the matching core.


2. Architecture: Positions of the Three Injection Points

Positions of the three S2M2 polarization injection points A/B/C


3. Design and Code for the Three Injection Points

3.1 Injection Point C: Correlation Volume (best)

# Original
cv = torch.einsum('...hic,...hjc -> ...hij', feature0, feature1)

# Modification: Pol-weighted Correlation
pol_weight = compute_pol_weight(left, right)  # [B, H, W, W]
cv = torch.einsum('...hic,...hjc -> ...hij', feature0, feature1) * pol_weight

Advantages:

  1. Polarization directly participates in core matching rather than serving as side info.
  2. Modulating before Sinkhorn influences the entire optimal transport.

3.2 Injection Point B: FeatureFusion (AGFL)

# Original: two-stream fusion
z_out = fusion(cat(z0,z1)) + w*z0 + (1-w)*z1

# Modification: three-stream fusion
z_out = fusion(cat(z0,z1,pol)) + w0*z0 + w1*z1 + w2*pol

3.3 Injection Point A: CNNEncoder Input

# Original
self.conv0 = nn.Conv2d(3, 16, kernel_size=1)

# Modification: extend the number of input channels
self.conv0 = nn.Conv2d(3 + pol_channels, 16, kernel_size=1)

4. Tensor Dimensions

Injection PointInjected TensorShape / Description
Apol_channelsConcatenated with RGB; input channels extended to 3 + pol_channels
Aconv0 input(B, 3 + pol_channels, H, W)
BPol feature streamSame scale as z0, z1; participates in three-stream fusion
Cfeature0 / feature1MRT output features; einsum indices ...hic / ...hjc
Ccv[B, ..., H, W, W] (along epipolar i, j dimensions)
Cpol_weight[B, H, W, W], element-wise multiplied with cv

5. Comparison of Injection Points and Design Decisions

Injection PointPositionPol RoleParticipates in Core Matching?
ACNNEncoder inputInput channel extension; integrated from the first layerIndirect (via feature extraction)
BFeatureFusion (AGFL)Third stream in three-stream fusionIndirect (via cross-scale fusion)
CCorrelation VolumeModulates cv, before SinkhornDirectly modulates core matching

5.1 Design Decisions and Rationale

DecisionRationale
Prioritize injection point CPolarization directly participates in core matching; modulating before OT has the deepest effect
Injection point C is before SinkhornModulating cv influences the entire optimal transport, achieving the deepest effect
Injection point B as the second choiceThree-stream fusion is a smaller change; cross-scale gating can selectively use Pol
Injection point A as the simplest optionMinimal change (only the conv0 input channels are modified); polarization is integrated from the first layer
Preserve a “direct” pathThe S2M2 serial architecture is tightly coupled, so polarization injection must keep a safety net to avoid single-point failure

5.2 Additional Advantages of S2M2 (auxiliary for transparent object detection)

  1. Occlusion output: can be used to detect boundaries of transparent objects (such as glass).
  2. Confidence output: can be used to identify uncertain regions (transparent regions typically fall into this category).
  3. Transformer architecture: cross-attention is naturally suited to handling appearance differences (I∥ vs I⊥).

6. Highlights

  • Three injection points cover the whole pipeline: A (input layer), B (cross-scale fusion), and C (matching core), with change magnitude from small to large, selectable as needed.
  • Injection point C brings polarization into core matching: by element-wise modulating cv before Sinkhorn, the polarization signal influences the entire optimal transport rather than serving merely as a post-hoc correction.
  • OT’s built-in assignment mechanism accommodates polarization information: all-pairs correlation matches in a high-dimensional space and inherently includes confidence/assignment, well suited to accommodating polarization injection.
  • Multi-output auxiliary detection: S2M2 outputs disparity, occlusion, and confidence at once; the latter two can assist in detecting the boundaries and uncertain regions of transparent objects.
  • A direct safety net is preserved: the serial architecture is tightly coupled, so when polarization is injected, a direct channel that bypasses the polarization path is kept to avoid a single point of failure dragging down the whole pipeline.

← All blueprints