Per-Iteration Polarization Injection Architecture

1. Design Goals

In two-stage polarization stereo matching, when Pass 2 uses additive context injection (additively injecting polarization at the Context Encoder output), the polarization path is structurally weak. The root causes are threefold:

Multiple zero-init layers on the polarization path push the gradient on that path close to zero.
A static additive bias injected into the context is easily overwhelmed by the magnitude of the context itself.
During the injection-coefficient warmup (0→1), the difference between Pass 1 and Pass 2 outputs can be completely flat, making the polarization injection’s effect hard to observe.

The design goal of this architecture is to let polarization participate explicitly in every Pass 2 update: promote pol_feat from an “additive bias on context” to a “first-class input of GRU₂”, explicitly fed in at every iteration.

2. Architecture (with Data Flow)

Per-Iteration Pol Injection data flow

Data flow overview

Stage	Task	Iterations	Polarization mechanism
Pass 1	Geometric search (V2-C)	24 GRU iters	Pol Volume @ correlation space
Between	warp + normalized contrast + PolEncoder	—	Produces pol_feat (64ch)
Pass 2	UpdateBlockWithPolFeat	6 GRU iters	pol_feat as a first-class input of GRU₂, fed in at every iteration

In Pass 2, context₂ / hidden₂ return to standard RAFT (no injection); polarization is carried entirely by the GRU₂ input concatenation.

3. Components and Modules

3.1 PolEncoder

5 layers, 64 channels, residual structure, normal init (not zero-init).
Encodes normalized contrast into pol_feat (64 channels).
Uses a deeper and wider design to provide ample capacity; normal init ensures the polarization path’s gradient is not near zero.

3.2 UpdateBlockWithPolFeat (GRU₂)

The Pass 2 GRU update unit; its weights are independent from those of GRU₁ in Pass 1.
GRU₂ input is cat[motion, context₂, PIW * pol_feat].
Input dimension is 320 (the standard 256 plus 64 channels of pol_feat), making pol_feat a first-class input of GRU₂.

3.3 Polarization path

pol_feat is concatenated directly into the GRU₂ input; it is no longer projected to context / hidden dimensions.
Therefore no context / hidden polarization projection layers are needed.

3.4 Pass 2 learning rate

The Pass 2 learning rate is set to 0.5× the base (pass2_lr_mult = 0.5) to reduce the volatility of Pass 2 loss.

4. Tensor Dimensions

Tensor	Dimensions / Setting	Description
left / right	(B, 3, H, W)	RGB polarization image pair, B = 8
disp₁	(B, 1, H, W)	Pass 1 disparity output
pol_feat	(B, 64, H, W)	PolEncoder output (`pol_encoder_channels = 64`)
context₂	(B, ·, ·, ·)	Standard RAFT, no injection
GRU₂ input	320 = 256 + 64	`cat[motion, context₂, PIW * pol_feat]`
disp₂	(B, 1, H, W)	Pass 2 disparity output

Derivation of GRU₂ input dimension: standard input 256 (motion + context₂) + 64 (pol_feat) = 320.

5. Hyperparameters

Hyperparameter	Value	Architectural meaning
Pass 1 iterations	24	Pass 1 GRU iterations
Pass 2 iterations	6	Pass 2 GRU iterations
L₁ / L₂ weights	0.3 / 1.0	Dual Loss weights
Pol pyramid levels / lookup radius	4 / 4	Pol Volume setting
PIW (pol_inject_warmup)	5000	Number of steps to ramp the Pass 2 polarization injection coefficient from 0 to 1
PolEncoder channels	64	Number of PolEncoder output channels
Pass 2 LR multiplier	0.5	Pass 2 LR = 0.5 × base, reduces L2 volatility

6. Design Decisions and Rationale

Design	Choice	Rationale
Injection timing	per-iteration (every GRU iteration)	Polarization participates explicitly in every iteration and is not overwritten after a single early injection
Injection form	concat into GRU₂ input	Makes polarization a first-class input rather than a hidden bias on context
PolEncoder init	normal init	Avoids the polarization path’s gradient being near zero
PolEncoder depth / width	5 layers / 64ch / residual	Provides ample capacity
Polarization projection layers	No context / hidden projection layers	GRU₂ consumes pol_feat directly; projection to context / hidden dimensions is unnecessary
Pass 2 LR	0.5 × base	Reduces Pass 2 loss volatility
context₂ / hidden₂	Standard RAFT (no injection)	Concentrates the polarization path on GRU₂ input, avoiding dispersion and mutual drowning

Architecture validation design

pol_feat detach ablation

--pol_feat_detach flag: cuts the gradient of pol_feat.
If Pass 1 / Pass 2 output differences disappear after detach, it confirms that the gradient of the polarization branch was being used.

Glass-only tail statistics

New segmentation-tail statistics: glass_epe_median, glass_epe_p90, glass_epe_p95.
Recorded per pass: pass1_glass_epe_p90 vs pass2_glass_epe_p90.

7. Polarization Injection Points

Injection point	Location	Form	Control
Pass 1 — Pol Volume	correlation space	V2-C: gradient gating + scheduled residual	Always fully on
Pass 2 — GRU₂ input	UpdateBlockWithPolFeat input concatenation	per-iteration first-class: `cat[motion, context₂, PIW * pol_feat]`	Controlled by PIW (0→1 over 5000 steps)

The Pass 2 polarization injection point is at the “GRU₂ input (concat, every iteration)”; context₂ / hidden₂ have no polarization injection and remain in standard RAFT form.

8. Highlights

Polarization becomes a first-class input: pol_feat is concatenated directly into the GRU₂ input (256→320 dim), participating explicitly in every iteration. It is no longer hidden in a context bias, and not overwritten after a single early injection.
Centralized polarization path: context₂ / hidden₂ return to standard RAFT, and polarization is carried entirely by the GRU₂ input. This avoids dispersing the polarization signal across multiple injection points where they could drown each other out.
Gradient-healthy PolEncoder: a 5-layer / 64ch / residual / normal-init design provides ample capacity and ensures the polarization path’s gradient is not near zero.
Verifiable polarization contribution: built-in pol_feat detach ablation and glass-only tail metrics (P90/P95) make “whether the polarization branch is actually being used” a directly observable indicator.