1. Design Goals
This architecture is a concrete implementation of Two-Pass polarization stereo matching, with two design goals:
- Use RGB polarization input instead of grayscale input to provide richer polarization color information.
- Use a Two-Pass architecture so that Pass 2 performs pol-aware refinement on Pass 1’s geometric result.
To ensure Pass 1 itself does not have the Oracle-Real Gap, Pass 1 uses Pol Volume to inject polarization directly into the correlation space (the V2-C polarization guidance mechanism). On the training side, PIW warmup and Dual Loss are introduced so that Pass 1 first stabilizes geometric search, and then the polarization injection of Pass 2 is gradually enabled.
2. Architecture (with Data Flow)
Data flow overview
| Stage | Task | Iterations | Polarization mechanism |
|---|---|---|---|
| Pass 1 | Geometric search (V2-C) | 24 GRU iters | Pol Volume injected into correlation space; per-iteration scheduling alpha = i/(iters-1) |
| Between | warp + normalized contrast + PolEncoder | — | Produces pol_feat |
| Pass 2 | Pol-aware context refinement | 6 GRU iters (GRU₂ ≠ GRU₁) | Additive context injection, controlled by PIW |
3. Components and Modules
3.1 Pass 1: V2-C (gradient gating + scheduled residual)
- 24 GRU iterations.
- Polarization is injected directly into the correlation space via Pol Volume, so Pass 1 itself has no Oracle-Real Gap.
- Per-iteration scheduling:
alpha = i / (iters - 1), linearly ramped from 0 to 1 across iterations, always fully on, unaffected by PIW.
3.2 Between: polarization extraction chain
Data flow: disp₁ → warp → normalized contrast → PolEncoder → pol_feat.
- Warp right to the left viewpoint using disp₁.
- Compute normalized contrast.
- Encode it via PolEncoder into
pol_feat.
3.3 Pass 2: Pol-aware Context Refinement
- 6 GRU iterations, using GRU₂ (weights independent from GRU₁).
- Additive context injection:
context₂ = cnet(left) + PIW * Wc(pol_feat)hidden₂ = cnet_h(left) + PIW * Wh(pol_feat)
3.4 PIW (pol_inject_warmup)
- Linearly ramps from 0 to 1 over 5000 steps.
- Only affects Pass 2’s context injection (does not affect Pass 1’s alpha scheduling).
3.5 Dual Loss
total = 0.3 × L₁(Pass1) + 1.0 × L₂(Pass2).
4. Tensor Dimensions
| Tensor | Dimensions / Setting | Description |
|---|---|---|
| left / right | (B, 3, H, W) | RGB polarization image pair, B = 8 |
| disp₁ | (B, 1, H, W) | Pass 1 disparity output |
| right_warped | (B, 3, H, W) | Right warped with disp₁ |
| pol_diff | (B, 3, H, W) | Normalized contrast |
| pol_feat | (B, 32, H, W) | PolEncoder output, pol_encoder_channels = 32 |
| Wc(pol_feat) / Wh(pol_feat) | (B, 128, ·, ·) | Projected to context / hidden dimensions |
| context₂ / hidden₂ | (B, 128, ·, ·) | After additive injection |
| disp₂ | (B, 1, H, W) | Pass 2 disparity output |
5. Hyperparameters
| Hyperparameter | Value | Architectural meaning |
|---|---|---|
| Pass 1 iterations | 24 | Pass 1 GRU iterations |
| Pass 2 iterations | 6 | Pass 2 GRU iterations |
| L₁ / L₂ weights | 0.3 / 1.0 | Dual Loss weights |
| Pol pyramid levels / lookup radius | 4 / 4 | Pol Volume setting |
| PIW (pol_inject_warmup) | 5000 | Number of steps to ramp the Pass 2 injection coefficient from 0 to 1 |
| PolEncoder channels | 32 | Number of PolEncoder output channels |
| Pretraining starting point | Scene Flow pretrained | Pretrained weights corresponding to RGB input |
6. Design Decisions and Rationale
| Design | Choice | Rationale |
|---|---|---|
| Polarization input color | RGB | Provides richer polarization color information than grayscale |
| Pass 1 polarization injection | Pol Volume @ correlation space (V2-C) | Pass 1 itself has no Oracle-Real Gap |
| Pass 1 per-iteration scheduling | alpha = i/(iters-1), always fully on | Decoupled from PIW; Pass 1 polarization is not suppressed by warmup |
| Pass 2 injection form | Additive injection (context + hidden) | Additively injects polarization features at the Context Encoder output |
| PIW warmup | 0→1 over 5000 steps | Lets Pass 1 stabilize geometry search first, then gradually enables Pass 2 polarization |
| GRU₂ ≠ GRU₁ | Independent weights | Pass 1 learns geometry search; Pass 2 learns material cues |
| Dual Loss | 0.3 × L₁ + 1.0 × L₂ | disp₁.detach cuts the gradient, so Pass 1 needs its own loss |
| Pretraining starting point | Scene Flow pretrained | Pretrained weights matching RGB input |
7. Polarization Injection Points
| Injection point | Location | Form | Control |
|---|---|---|---|
| Pass 1 — Pol Volume | correlation space | V2-C: gradient gating + scheduled residual; per-iteration alpha = i/(iters-1) | Always fully on, unaffected by PIW |
| Pass 2 — context₂ | Context Encoder output | additive: cnet(left) + PIW * Wc(pol_feat) | Controlled by PIW (0→1 over 5000 steps) |
| Pass 2 — hidden₂ | GRU initial hidden | additive: cnet_h(left) + PIW * Wh(pol_feat) | Controlled by PIW (one-shot) |
8. Implementation Notes
When implementing the Two-Pass code, the following three correctness issues require special attention:
| Issue | Description | Impact | Fix |
|---|---|---|---|
| Missing top-level metrics | forward_two_pass lacks top-level metrics | KeyError crash | Add metrics2 as top-level |
| warp sign convention | The disparity sign convention of warp_with_disparity | Wrong warp direction | Negate disp1 before warping |
| loss overwritten | metrics['loss'] is overwritten by the metrics2 loop | The logged loss shows Pass 2 instead of total | Restore total_loss.item() after the loop |
In addition, the additive context injection of Pass 2 has three structural weaknesses to be aware of:
- The triple zero-init (PolEncoder’s last layer + pol_ctx_proj + pol_hid_proj) makes the polarization path’s gradient too weak.
- One-shot additive injection is easily overwhelmed by the magnitude of
cnet(left). - When Pass 1 already performs well and Pass 2’s residual is small, the gradient back-propagated to the polarization path becomes weak.
9. Highlights
- Pass 1 has built-in polarization guidance: polarization is injected directly into the correlation space via Pol Volume, so Pass 1 itself has no Oracle-Real Gap and does not depend on subsequent alignment.
- Decoupled two-level warmup: Pass 1’s per-iteration alpha scheduling is always fully on, while Pass 2’s polarization injection is controlled by PIW and ramps from 0 to 1; the two are decoupled so that geometry stabilizes first before polarization is enabled.
- Dual Loss safeguards Pass 1:
disp₁.detach()cuts the gradient flowing back from Pass 2, and Dual Loss provides Pass 1 with independent supervision, ensuring warp quality and stable inputs for polarization extraction. - RGB polarization input: RGB replaces grayscale, preserving the color information in the polarization signal.
- Division of labor between geometry and material: GRU₁ and GRU₂ have independent weights, focusing respectively on geometry search and material-cue refinement.