Blueprint · 2026

Two-Pass RGB Architecture

A two-stage stereo matching architecture that takes RGB polarization images as input, injects polarization into the correlation space for geometric search in Pass 1, and performs pol-aware refinement via additive context injection in Pass 2.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

This architecture is a concrete implementation of Two-Pass polarization stereo matching, with two design goals:

  • Use RGB polarization input instead of grayscale input to provide richer polarization color information.
  • Use a Two-Pass architecture so that Pass 2 performs pol-aware refinement on Pass 1’s geometric result.

To ensure Pass 1 itself does not have the Oracle-Real Gap, Pass 1 uses Pol Volume to inject polarization directly into the correlation space (the V2-C polarization guidance mechanism). On the training side, PIW warmup and Dual Loss are introduced so that Pass 1 first stabilizes geometric search, and then the polarization injection of Pass 2 is gradually enabled.


2. Architecture (with Data Flow)

Two-Pass RGB data flow

Data flow overview

StageTaskIterationsPolarization mechanism
Pass 1Geometric search (V2-C)24 GRU itersPol Volume injected into correlation space; per-iteration scheduling alpha = i/(iters-1)
Betweenwarp + normalized contrast + PolEncoderProduces pol_feat
Pass 2Pol-aware context refinement6 GRU iters (GRU₂ ≠ GRU₁)Additive context injection, controlled by PIW

3. Components and Modules

3.1 Pass 1: V2-C (gradient gating + scheduled residual)

  • 24 GRU iterations.
  • Polarization is injected directly into the correlation space via Pol Volume, so Pass 1 itself has no Oracle-Real Gap.
  • Per-iteration scheduling: alpha = i / (iters - 1), linearly ramped from 0 to 1 across iterations, always fully on, unaffected by PIW.

3.2 Between: polarization extraction chain

Data flow: disp₁ → warp → normalized contrast → PolEncoder → pol_feat.

  • Warp right to the left viewpoint using disp₁.
  • Compute normalized contrast.
  • Encode it via PolEncoder into pol_feat.

3.3 Pass 2: Pol-aware Context Refinement

  • 6 GRU iterations, using GRU₂ (weights independent from GRU₁).
  • Additive context injection:
    • context₂ = cnet(left) + PIW * Wc(pol_feat)
    • hidden₂ = cnet_h(left) + PIW * Wh(pol_feat)

3.4 PIW (pol_inject_warmup)

  • Linearly ramps from 0 to 1 over 5000 steps.
  • Only affects Pass 2’s context injection (does not affect Pass 1’s alpha scheduling).

3.5 Dual Loss

  • total = 0.3 × L₁(Pass1) + 1.0 × L₂(Pass2).

4. Tensor Dimensions

TensorDimensions / SettingDescription
left / right(B, 3, H, W)RGB polarization image pair, B = 8
disp₁(B, 1, H, W)Pass 1 disparity output
right_warped(B, 3, H, W)Right warped with disp₁
pol_diff(B, 3, H, W)Normalized contrast
pol_feat(B, 32, H, W)PolEncoder output, pol_encoder_channels = 32
Wc(pol_feat) / Wh(pol_feat)(B, 128, ·, ·)Projected to context / hidden dimensions
context₂ / hidden₂(B, 128, ·, ·)After additive injection
disp₂(B, 1, H, W)Pass 2 disparity output

5. Hyperparameters

HyperparameterValueArchitectural meaning
Pass 1 iterations24Pass 1 GRU iterations
Pass 2 iterations6Pass 2 GRU iterations
L₁ / L₂ weights0.3 / 1.0Dual Loss weights
Pol pyramid levels / lookup radius4 / 4Pol Volume setting
PIW (pol_inject_warmup)5000Number of steps to ramp the Pass 2 injection coefficient from 0 to 1
PolEncoder channels32Number of PolEncoder output channels
Pretraining starting pointScene Flow pretrainedPretrained weights corresponding to RGB input

6. Design Decisions and Rationale

DesignChoiceRationale
Polarization input colorRGBProvides richer polarization color information than grayscale
Pass 1 polarization injectionPol Volume @ correlation space (V2-C)Pass 1 itself has no Oracle-Real Gap
Pass 1 per-iteration schedulingalpha = i/(iters-1), always fully onDecoupled from PIW; Pass 1 polarization is not suppressed by warmup
Pass 2 injection formAdditive injection (context + hidden)Additively injects polarization features at the Context Encoder output
PIW warmup0→1 over 5000 stepsLets Pass 1 stabilize geometry search first, then gradually enables Pass 2 polarization
GRU₂ ≠ GRU₁Independent weightsPass 1 learns geometry search; Pass 2 learns material cues
Dual Loss0.3 × L₁ + 1.0 × L₂disp₁.detach cuts the gradient, so Pass 1 needs its own loss
Pretraining starting pointScene Flow pretrainedPretrained weights matching RGB input

7. Polarization Injection Points

Injection pointLocationFormControl
Pass 1 — Pol Volumecorrelation spaceV2-C: gradient gating + scheduled residual; per-iteration alpha = i/(iters-1)Always fully on, unaffected by PIW
Pass 2 — context₂Context Encoder outputadditive: cnet(left) + PIW * Wc(pol_feat)Controlled by PIW (0→1 over 5000 steps)
Pass 2 — hidden₂GRU initial hiddenadditive: cnet_h(left) + PIW * Wh(pol_feat)Controlled by PIW (one-shot)

8. Implementation Notes

When implementing the Two-Pass code, the following three correctness issues require special attention:

IssueDescriptionImpactFix
Missing top-level metricsforward_two_pass lacks top-level metricsKeyError crashAdd metrics2 as top-level
warp sign conventionThe disparity sign convention of warp_with_disparityWrong warp directionNegate disp1 before warping
loss overwrittenmetrics['loss'] is overwritten by the metrics2 loopThe logged loss shows Pass 2 instead of totalRestore total_loss.item() after the loop

In addition, the additive context injection of Pass 2 has three structural weaknesses to be aware of:

  1. The triple zero-init (PolEncoder’s last layer + pol_ctx_proj + pol_hid_proj) makes the polarization path’s gradient too weak.
  2. One-shot additive injection is easily overwhelmed by the magnitude of cnet(left).
  3. When Pass 1 already performs well and Pass 2’s residual is small, the gradient back-propagated to the polarization path becomes weak.

9. Highlights

  • Pass 1 has built-in polarization guidance: polarization is injected directly into the correlation space via Pol Volume, so Pass 1 itself has no Oracle-Real Gap and does not depend on subsequent alignment.
  • Decoupled two-level warmup: Pass 1’s per-iteration alpha scheduling is always fully on, while Pass 2’s polarization injection is controlled by PIW and ramps from 0 to 1; the two are decoupled so that geometry stabilizes first before polarization is enabled.
  • Dual Loss safeguards Pass 1: disp₁.detach() cuts the gradient flowing back from Pass 2, and Dual Loss provides Pass 1 with independent supervision, ensuring warp quality and stable inputs for polarization extraction.
  • RGB polarization input: RGB replaces grayscale, preserving the color information in the polarization signal.
  • Division of labor between geometry and material: GRU₁ and GRU₂ have independent weights, focusing respectively on geometry search and material-cue refinement.

← All blueprints