Two-Pass RGB Architecture — Po-Ting Lin (林柏廷)

1. Design Goals

This architecture is a concrete implementation of Two-Pass polarization stereo matching, with two design goals:

Use RGB polarization input instead of grayscale input to provide richer polarization color information.
Use a Two-Pass architecture so that Pass 2 performs pol-aware refinement on Pass 1’s geometric result.

To ensure Pass 1 itself does not have the Oracle-Real Gap, Pass 1 uses Pol Volume to inject polarization directly into the correlation space (the V2-C polarization guidance mechanism). On the training side, PIW warmup and Dual Loss are introduced so that Pass 1 first stabilizes geometric search, and then the polarization injection of Pass 2 is gradually enabled.

2. Architecture (with Data Flow)

Two-Pass RGB data flow

Data flow overview

Stage	Task	Iterations	Polarization mechanism
Pass 1	Geometric search (V2-C)	24 GRU iters	Pol Volume injected into correlation space; per-iteration scheduling `alpha = i/(iters-1)`
Between	warp + normalized contrast + PolEncoder	—	Produces pol_feat
Pass 2	Pol-aware context refinement	6 GRU iters (GRU₂ ≠ GRU₁)	Additive context injection, controlled by PIW

3. Components and Modules

3.1 Pass 1: V2-C (gradient gating + scheduled residual)

24 GRU iterations.
Polarization is injected directly into the correlation space via Pol Volume, so Pass 1 itself has no Oracle-Real Gap.
Per-iteration scheduling: alpha = i / (iters - 1), linearly ramped from 0 to 1 across iterations, always fully on, unaffected by PIW.

3.2 Between: polarization extraction chain

Data flow: disp₁ → warp → normalized contrast → PolEncoder → pol_feat.

Warp right to the left viewpoint using disp₁.
Compute normalized contrast.
Encode it via PolEncoder into pol_feat.

3.3 Pass 2: Pol-aware Context Refinement

6 GRU iterations, using GRU₂ (weights independent from GRU₁).
Additive context injection:
- context₂ = cnet(left) + PIW * Wc(pol_feat)
- hidden₂ = cnet_h(left) + PIW * Wh(pol_feat)

3.4 PIW (pol_inject_warmup)

Linearly ramps from 0 to 1 over 5000 steps.
Only affects Pass 2’s context injection (does not affect Pass 1’s alpha scheduling).

3.5 Dual Loss

total = 0.3 × L₁(Pass1) + 1.0 × L₂(Pass2).

4. Tensor Dimensions

Tensor	Dimensions / Setting	Description
left / right	(B, 3, H, W)	RGB polarization image pair, B = 8
disp₁	(B, 1, H, W)	Pass 1 disparity output
right_warped	(B, 3, H, W)	Right warped with disp₁
pol_diff	(B, 3, H, W)	Normalized contrast
pol_feat	(B, 32, H, W)	PolEncoder output, `pol_encoder_channels = 32`
Wc(pol_feat) / Wh(pol_feat)	(B, 128, ·, ·)	Projected to context / hidden dimensions
context₂ / hidden₂	(B, 128, ·, ·)	After additive injection
disp₂	(B, 1, H, W)	Pass 2 disparity output

5. Hyperparameters

Hyperparameter	Value	Architectural meaning
Pass 1 iterations	24	Pass 1 GRU iterations
Pass 2 iterations	6	Pass 2 GRU iterations
L₁ / L₂ weights	0.3 / 1.0	Dual Loss weights
Pol pyramid levels / lookup radius	4 / 4	Pol Volume setting
PIW (pol_inject_warmup)	5000	Number of steps to ramp the Pass 2 injection coefficient from 0 to 1
PolEncoder channels	32	Number of PolEncoder output channels
Pretraining starting point	Scene Flow pretrained	Pretrained weights corresponding to RGB input

6. Design Decisions and Rationale

Design	Choice	Rationale
Polarization input color	RGB	Provides richer polarization color information than grayscale
Pass 1 polarization injection	Pol Volume @ correlation space (V2-C)	Pass 1 itself has no Oracle-Real Gap
Pass 1 per-iteration scheduling	`alpha = i/(iters-1)`, always fully on	Decoupled from PIW; Pass 1 polarization is not suppressed by warmup
Pass 2 injection form	Additive injection (context + hidden)	Additively injects polarization features at the Context Encoder output
PIW warmup	0→1 over 5000 steps	Lets Pass 1 stabilize geometry search first, then gradually enables Pass 2 polarization
GRU₂ ≠ GRU₁	Independent weights	Pass 1 learns geometry search; Pass 2 learns material cues
Dual Loss	0.3 × L₁ + 1.0 × L₂	disp₁.detach cuts the gradient, so Pass 1 needs its own loss
Pretraining starting point	Scene Flow pretrained	Pretrained weights matching RGB input

7. Polarization Injection Points

Injection point	Location	Form	Control
Pass 1 — Pol Volume	correlation space	V2-C: gradient gating + scheduled residual; per-iteration `alpha = i/(iters-1)`	Always fully on, unaffected by PIW
Pass 2 — context₂	Context Encoder output	additive: `cnet(left) + PIW * Wc(pol_feat)`	Controlled by PIW (0→1 over 5000 steps)
Pass 2 — hidden₂	GRU initial hidden	additive: `cnet_h(left) + PIW * Wh(pol_feat)`	Controlled by PIW (one-shot)

8. Implementation Notes

When implementing the Two-Pass code, the following three correctness issues require special attention:

Issue	Description	Impact	Fix
Missing top-level metrics	`forward_two_pass` lacks top-level metrics	KeyError crash	Add metrics2 as top-level
warp sign convention	The disparity sign convention of `warp_with_disparity`	Wrong warp direction	Negate disp1 before warping
loss overwritten	`metrics['loss']` is overwritten by the metrics2 loop	The logged loss shows Pass 2 instead of total	Restore `total_loss.item()` after the loop

In addition, the additive context injection of Pass 2 has three structural weaknesses to be aware of:

The triple zero-init (PolEncoder’s last layer + pol_ctx_proj + pol_hid_proj) makes the polarization path’s gradient too weak.
One-shot additive injection is easily overwhelmed by the magnitude of cnet(left).
When Pass 1 already performs well and Pass 2’s residual is small, the gradient back-propagated to the polarization path becomes weak.

9. Highlights

Pass 1 has built-in polarization guidance: polarization is injected directly into the correlation space via Pol Volume, so Pass 1 itself has no Oracle-Real Gap and does not depend on subsequent alignment.
Decoupled two-level warmup: Pass 1’s per-iteration alpha scheduling is always fully on, while Pass 2’s polarization injection is controlled by PIW and ramps from 0 to 1; the two are decoupled so that geometry stabilizes first before polarization is enabled.
Dual Loss safeguards Pass 1: disp₁.detach() cuts the gradient flowing back from Pass 2, and Dual Loss provides Pass 1 with independent supervision, ensuring warp quality and stable inputs for polarization extraction.
RGB polarization input: RGB replaces grayscale, preserving the color information in the polarization signal.
Division of labor between geometry and material: GRU₁ and GRU₂ have independent weights, focusing respectively on geometry search and material-cue refinement.