Blueprint · 2026

Per-Iteration Polarization Injection Architecture

A two-stage polarization stereo matching architecture that promotes polarization features to a first-class input of the Pass 2 GRU and explicitly concatenates them at every iteration.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

In two-stage polarization stereo matching, when Pass 2 uses additive context injection (additively injecting polarization at the Context Encoder output), the polarization path is structurally weak. The root causes are threefold:

  • Multiple zero-init layers on the polarization path push the gradient on that path close to zero.
  • A static additive bias injected into the context is easily overwhelmed by the magnitude of the context itself.
  • During the injection-coefficient warmup (0→1), the difference between Pass 1 and Pass 2 outputs can be completely flat, making the polarization injection’s effect hard to observe.

The design goal of this architecture is to let polarization participate explicitly in every Pass 2 update: promote pol_feat from an “additive bias on context” to a “first-class input of GRU₂”, explicitly fed in at every iteration.


2. Architecture (with Data Flow)

Per-Iteration Pol Injection data flow

Data flow overview

StageTaskIterationsPolarization mechanism
Pass 1Geometric search (V2-C)24 GRU itersPol Volume @ correlation space
Betweenwarp + normalized contrast + PolEncoderProduces pol_feat (64ch)
Pass 2UpdateBlockWithPolFeat6 GRU iterspol_feat as a first-class input of GRU₂, fed in at every iteration

In Pass 2, context₂ / hidden₂ return to standard RAFT (no injection); polarization is carried entirely by the GRU₂ input concatenation.


3. Components and Modules

3.1 PolEncoder

  • 5 layers, 64 channels, residual structure, normal init (not zero-init).
  • Encodes normalized contrast into pol_feat (64 channels).
  • Uses a deeper and wider design to provide ample capacity; normal init ensures the polarization path’s gradient is not near zero.

3.2 UpdateBlockWithPolFeat (GRU₂)

  • The Pass 2 GRU update unit; its weights are independent from those of GRU₁ in Pass 1.
  • GRU₂ input is cat[motion, context₂, PIW * pol_feat].
  • Input dimension is 320 (the standard 256 plus 64 channels of pol_feat), making pol_feat a first-class input of GRU₂.

3.3 Polarization path

  • pol_feat is concatenated directly into the GRU₂ input; it is no longer projected to context / hidden dimensions.
  • Therefore no context / hidden polarization projection layers are needed.

3.4 Pass 2 learning rate

  • The Pass 2 learning rate is set to 0.5× the base (pass2_lr_mult = 0.5) to reduce the volatility of Pass 2 loss.

4. Tensor Dimensions

TensorDimensions / SettingDescription
left / right(B, 3, H, W)RGB polarization image pair, B = 8
disp₁(B, 1, H, W)Pass 1 disparity output
pol_feat(B, 64, H, W)PolEncoder output (pol_encoder_channels = 64)
context₂(B, ·, ·, ·)Standard RAFT, no injection
GRU₂ input320 = 256 + 64cat[motion, context₂, PIW * pol_feat]
disp₂(B, 1, H, W)Pass 2 disparity output

Derivation of GRU₂ input dimension: standard input 256 (motion + context₂) + 64 (pol_feat) = 320.


5. Hyperparameters

HyperparameterValueArchitectural meaning
Pass 1 iterations24Pass 1 GRU iterations
Pass 2 iterations6Pass 2 GRU iterations
L₁ / L₂ weights0.3 / 1.0Dual Loss weights
Pol pyramid levels / lookup radius4 / 4Pol Volume setting
PIW (pol_inject_warmup)5000Number of steps to ramp the Pass 2 polarization injection coefficient from 0 to 1
PolEncoder channels64Number of PolEncoder output channels
Pass 2 LR multiplier0.5Pass 2 LR = 0.5 × base, reduces L2 volatility

6. Design Decisions and Rationale

DesignChoiceRationale
Injection timingper-iteration (every GRU iteration)Polarization participates explicitly in every iteration and is not overwritten after a single early injection
Injection formconcat into GRU₂ inputMakes polarization a first-class input rather than a hidden bias on context
PolEncoder initnormal initAvoids the polarization path’s gradient being near zero
PolEncoder depth / width5 layers / 64ch / residualProvides ample capacity
Polarization projection layersNo context / hidden projection layersGRU₂ consumes pol_feat directly; projection to context / hidden dimensions is unnecessary
Pass 2 LR0.5 × baseReduces Pass 2 loss volatility
context₂ / hidden₂Standard RAFT (no injection)Concentrates the polarization path on GRU₂ input, avoiding dispersion and mutual drowning

Architecture validation design

pol_feat detach ablation

  • --pol_feat_detach flag: cuts the gradient of pol_feat.
  • If Pass 1 / Pass 2 output differences disappear after detach, it confirms that the gradient of the polarization branch was being used.

Glass-only tail statistics

  • New segmentation-tail statistics: glass_epe_median, glass_epe_p90, glass_epe_p95.
  • Recorded per pass: pass1_glass_epe_p90 vs pass2_glass_epe_p90.

7. Polarization Injection Points

Injection pointLocationFormControl
Pass 1 — Pol Volumecorrelation spaceV2-C: gradient gating + scheduled residualAlways fully on
Pass 2 — GRU₂ inputUpdateBlockWithPolFeat input concatenationper-iteration first-class: cat[motion, context₂, PIW * pol_feat]Controlled by PIW (0→1 over 5000 steps)

The Pass 2 polarization injection point is at the “GRU₂ input (concat, every iteration)”; context₂ / hidden₂ have no polarization injection and remain in standard RAFT form.


8. Highlights

  • Polarization becomes a first-class input: pol_feat is concatenated directly into the GRU₂ input (256→320 dim), participating explicitly in every iteration. It is no longer hidden in a context bias, and not overwritten after a single early injection.
  • Centralized polarization path: context₂ / hidden₂ return to standard RAFT, and polarization is carried entirely by the GRU₂ input. This avoids dispersing the polarization signal across multiple injection points where they could drown each other out.
  • Gradient-healthy PolEncoder: a 5-layer / 64ch / residual / normal-init design provides ample capacity and ensures the polarization path’s gradient is not near zero.
  • Verifiable polarization contribution: built-in pol_feat detach ablation and glass-only tail metrics (P90/P95) make “whether the polarization branch is actually being used” a directly observable indicator.

← All blueprints