1. Design Goals
1.1 Problem: Real 6ch Input Destroys the Polarization Phase Relationship
When the polarization pair [I∥, I⊥] is fed in as 6-channel real input, the first convolution linearly mixes the 6 channels, and the relative relationship (phase) between I∥ and I⊥ is no longer structurally preserved. The essence of the polarization signal is exactly the relative relationship between I∥ and I⊥, so “destroying the phase at the first layer” loses polarization information at the most critical location.
Design goal: use a Complex-Valued Neural Network to encode the polarization pair as the complex number z = I∥ + i·I⊥, so that the phase relationship is an intrinsic property of the complex number and is structurally preserved inside the encoder until it is converted back to the real domain.
1.2 Design Decision: Modify Only the CNNEncoder
Rationale:
- The physical meaning of polarization is most explicit at the raw input.
- Once inside the abstract feature space, the phase relationship no longer matters.
- There is no need to rewrite the whole S2M2 (Transformer, Refiner, etc.).
2. Polarization Phase Distribution Analysis
To understand the meaning of the complex encoding, we first analyze the phase distribution of the rendered data (excluding non-polarized scenes, sampled from 100 examples).
2.1 Key Finding 1: Phase Range
Phase = atan2(I⊥, I∥)
Range: 0° ~ 90° (full first quadrant)
Pixels with I⊥ > I∥: 39.35% (average)
Implication: we cannot assume I∥ is always larger than I⊥; the encoder must handle the full quadrant.
2.2 Key Finding 2: Glass vs Background Phase Distribution
Glass and background are separated using the glass mask:
Glass Background
Peak: 25°-30° 45°-50°
Mean: 37.51° 43.82°
Phase distribution comparison:
Phase Glass BG
─────────────────────────
0°- 5° 5.9% 4.9%
25°-30° 16.0% 5.0% <- Glass clearly higher
30°-35° 15.9% 6.1% <- Glass clearly higher
40°-45° 13.2% 22.6% <- BG clearly higher
45°-50° 10.6% 23.4% <- BG clearly higher
85°-90° 4.3% 5.2%
2.3 Key Finding 3: Physical Interpretation
Scene lighting design:
- Polarized light source (0°) -> illuminates glass -> specular reflection -> I∥ >> I⊥ -> phase 25°-35°
- Non-polarized background light -> diffuse reflection -> I∥ ≈ I⊥ -> phase ~45°
Conclusion:
- 45° is the baseline (non-polarized background).
- Shifting toward 0° = glass polarization signal.
- The 85°-90° region appears in both Glass and BG; likely noise or edge cases.
3. Architecture: ComplexCNNEncoder End-to-End Flow
4. Components and Modules
4.1 ComplexConv2d (Lite Version)
Complex convolution: the real part and the imaginary part each use one nn.Conv2d, combined according to the complex-multiplication rule.
class ComplexConv2d(nn.Module):
def __init__(self, in_ch, out_ch, kernel_size, stride=1, padding=0):
self.conv_real = nn.Conv2d(in_ch, out_ch, kernel_size, stride, padding)
self.conv_imag = nn.Conv2d(in_ch, out_ch, kernel_size, stride, padding)
def forward(self, x): # x is complex
real = self.conv_real(x.real) - self.conv_imag(x.imag)
imag = self.conv_real(x.imag) + self.conv_imag(x.real)
return torch.complex(real, imag)
The Lite version uses 2 Conv2d (2x parameters), corresponding to complex multiplication (a+bi)(c+di) = (ac-bd) + (ad+bc)i.
4.2 ComplexGELU
Activation is applied only on the magnitude, keeping the phase unchanged.
def complex_gelu(x):
magnitude = x.abs()
phase = x.angle()
return F.gelu(magnitude) * torch.exp(1j * phase)
4.3 complex_to_real (Key Design)
Converts complex features back to the real domain, outputting a concatenation of 4 kinds of features.
def complex_to_real(z):
magnitude = z.abs() # total intensity
phase_dev = torch.abs(z.angle() - π/4) # deviation from 45° = polarization signal!
return torch.cat([magnitude, phase_dev, z.real, z.imag], dim=1)
The 4 kinds of features:
magnitude = |z|: total intensity.phase_dev = |∠z - 45°|: deviation from 45°, i.e. the strength of the polarization signal.z.real: real part.z.imag: imaginary part.
5. Tensor Dimensions
| Stage | Tensor | Shape | Type |
|---|---|---|---|
| Input | [I∥, I⊥] | (B, 6, H, W) | real |
| Real -> Complex | z = I∥ + i·I⊥ | (B, 3, H, W) | complex |
| ComplexConv0 | 3 -> 32 ch, k=3 | (B, 32, H, W) | complex |
| ComplexConv1 | 32 -> 64 ch, k=3, stride=2 | (B, 64, H/2, W/2) | complex |
| ComplexConv2 | 64 -> 64 ch, k=3 | (B, 64, H/2, W/2) | complex |
| Complex -> Real | concat(magnitude, phase_dev, real, imag) | (B, 256, H/2, W/2) | real |
| conv1x1 | 256 -> 128 | (B, 128, H/2, W/2) | real |
| RealConv | 128 -> 128, stride=2 | (B, 128, H/4, W/4) | real |
| Output | to Transformer (original S2M2) | (B, 128, H/4, W/4) | real |
Note: at the Complex -> Real stage, 4 kinds of features are concatenated, so 64ch complex features become 4 × 64 = 256 channels of real features.
6. Polarization Injection Points
Injection point: CNNEncoder input layer + the entire ComplexCNNEncoder.
The polarization pair is encoded as the complex number z = I∥ + i·I⊥ at the Real -> Complex step and propagates as complex throughout the ComplexCNNEncoder (ComplexConv0/1/2). The polarization phase relationship is preserved as an intrinsic property of the complex number until complex_to_real converts it back to the real domain. Downstream of the injection point is the original S2M2 (Transformer and Refiner unchanged).
Key: the phase (atan2(I⊥, I∥)) carries the relative relationship between I∥ and I⊥. The phase_dev = |∠z - 45°| produced by complex_to_real directly hands “the degree of deviation from the 45° baseline” to the downstream network as a glass-fingerprint feature.
6.1 Roles of Stage 1 and Stage 2
- Stage 1 uses a duplicated
[RGB, RGB]input: the phase is constantly 45°, so the model learns the baseline “no polarization = normal background” here. - Stage 2 uses real polarization data: the phase varies, so the model learns “deviation from 45° = glass present”.
7. Design Decisions and Rationale
7.1 Core Design Decision Table
| Question | Decision | Rationale |
|---|---|---|
| ComplexConv2d version | Lite (2x parameters) | Sufficient to capture cross-polarization interaction; avoids overfitting |
| ComplexGELU style | Preserve phase | Phase carries the I∥/I⊥ relative relationship; destroying phase = losing polarization information |
| Number of C->R features | All 4 | phase_dev is unstable in low-magnitude regions; raw real/imag compensate |
| ComplexBatchNorm | Not added | BN normalizes away the I∥/I⊥ difference, which is exactly the polarization signal |
7.2 In-Depth Analysis of BN
BN forces each channel’s mean -> 0 and variance -> 1:
- In glass regions, I∥ >> I⊥, and this ratio is the polarization signal.
- After BN standardization, this difference is washed out.
- ComplexBN’s whitening is even worse and would destroy phase clustering.
Alternatives (if training is unstable):
- Magnitude normalization:
z / (|z|.mean() + eps). - Use GroupNorm (with few groups) in the real domain.
7.3 Other Design Decisions
| Decision | Rationale |
|---|---|
| Modify only the CNNEncoder | The physical meaning of polarization is clearest at the raw input; phase no longer matters in abstract feature space; no need to rewrite the whole S2M2 |
Encode as the complex number z = I∥ + i·I⊥ | 6ch real input destroys the polarization phase relationship at the first Conv; complex makes phase an intrinsic property |
| Encoder handles the full quadrant | Pixels with I⊥ > I∥ make up about 39%; cannot assume I∥ is always larger than I⊥ |
| Progressive design | Modify only the Encoder first; consider extending to a fully complex network only after validation |
8. Highlights
- Phase becomes an intrinsic property: encoding the polarization pair as the complex number
z = I∥ + i·I⊥structurally preserves the relative relationship (phase) between I∥ and I⊥ in the encoder so that it is not linearly mixed away by the first convolution. - phase deviation as a glass fingerprint:
complex_to_realoutputsphase_dev = |∠z - 45°|, directly handing “the degree of deviation from the 45° non-polarization baseline” to the downstream network as a glass feature. - Data analysis first: before designing the architecture, the phase distribution is analyzed; with ~39% of pixels having I⊥ > I∥, the encoder is designed to handle the full quadrant, avoiding wrong assumptions.
- Phase-preserving activation: ComplexGELU activates only the magnitude and keeps the phase unchanged, avoiding the phase chaos that a split GELU on real/imag separately would cause.
- Deliberately no BatchNorm: BN would normalize away the intensity difference between I∥ and I⊥, which is exactly the polarization signal, so the encoder avoids BN throughout.
- Minimal progressive modification: only the CNNEncoder is replaced; the downstream Transformer and Refiner are reused from the original S2M2, and extension to a fully complex network is considered only after validation.