1. Design Goals
In two-stage polarization stereo matching, when Pass 2 uses additive context injection (additively injecting polarization at the Context Encoder output), the polarization path is structurally weak. The root causes are threefold:
- Multiple zero-init layers on the polarization path push the gradient on that path close to zero.
- A static additive bias injected into the context is easily overwhelmed by the magnitude of the context itself.
- During the injection-coefficient warmup (0→1), the difference between Pass 1 and Pass 2 outputs can be completely flat, making the polarization injection’s effect hard to observe.
The design goal of this architecture is to let polarization participate explicitly in every Pass 2 update: promote pol_feat from an “additive bias on context” to a “first-class input of GRU₂”, explicitly fed in at every iteration.
2. Architecture (with Data Flow)
Data flow overview
| Stage | Task | Iterations | Polarization mechanism |
|---|---|---|---|
| Pass 1 | Geometric search (V2-C) | 24 GRU iters | Pol Volume @ correlation space |
| Between | warp + normalized contrast + PolEncoder | — | Produces pol_feat (64ch) |
| Pass 2 | UpdateBlockWithPolFeat | 6 GRU iters | pol_feat as a first-class input of GRU₂, fed in at every iteration |
In Pass 2, context₂ / hidden₂ return to standard RAFT (no injection); polarization is carried entirely by the GRU₂ input concatenation.
3. Components and Modules
3.1 PolEncoder
- 5 layers, 64 channels, residual structure, normal init (not zero-init).
- Encodes normalized contrast into
pol_feat(64 channels). - Uses a deeper and wider design to provide ample capacity; normal init ensures the polarization path’s gradient is not near zero.
3.2 UpdateBlockWithPolFeat (GRU₂)
- The Pass 2 GRU update unit; its weights are independent from those of GRU₁ in Pass 1.
- GRU₂ input is
cat[motion, context₂, PIW * pol_feat]. - Input dimension is 320 (the standard 256 plus 64 channels of pol_feat), making pol_feat a first-class input of GRU₂.
3.3 Polarization path
- pol_feat is concatenated directly into the GRU₂ input; it is no longer projected to context / hidden dimensions.
- Therefore no context / hidden polarization projection layers are needed.
3.4 Pass 2 learning rate
- The Pass 2 learning rate is set to 0.5× the base (
pass2_lr_mult = 0.5) to reduce the volatility of Pass 2 loss.
4. Tensor Dimensions
| Tensor | Dimensions / Setting | Description |
|---|---|---|
| left / right | (B, 3, H, W) | RGB polarization image pair, B = 8 |
| disp₁ | (B, 1, H, W) | Pass 1 disparity output |
| pol_feat | (B, 64, H, W) | PolEncoder output (pol_encoder_channels = 64) |
| context₂ | (B, ·, ·, ·) | Standard RAFT, no injection |
| GRU₂ input | 320 = 256 + 64 | cat[motion, context₂, PIW * pol_feat] |
| disp₂ | (B, 1, H, W) | Pass 2 disparity output |
Derivation of GRU₂ input dimension: standard input 256 (motion + context₂) + 64 (pol_feat) = 320.
5. Hyperparameters
| Hyperparameter | Value | Architectural meaning |
|---|---|---|
| Pass 1 iterations | 24 | Pass 1 GRU iterations |
| Pass 2 iterations | 6 | Pass 2 GRU iterations |
| L₁ / L₂ weights | 0.3 / 1.0 | Dual Loss weights |
| Pol pyramid levels / lookup radius | 4 / 4 | Pol Volume setting |
| PIW (pol_inject_warmup) | 5000 | Number of steps to ramp the Pass 2 polarization injection coefficient from 0 to 1 |
| PolEncoder channels | 64 | Number of PolEncoder output channels |
| Pass 2 LR multiplier | 0.5 | Pass 2 LR = 0.5 × base, reduces L2 volatility |
6. Design Decisions and Rationale
| Design | Choice | Rationale |
|---|---|---|
| Injection timing | per-iteration (every GRU iteration) | Polarization participates explicitly in every iteration and is not overwritten after a single early injection |
| Injection form | concat into GRU₂ input | Makes polarization a first-class input rather than a hidden bias on context |
| PolEncoder init | normal init | Avoids the polarization path’s gradient being near zero |
| PolEncoder depth / width | 5 layers / 64ch / residual | Provides ample capacity |
| Polarization projection layers | No context / hidden projection layers | GRU₂ consumes pol_feat directly; projection to context / hidden dimensions is unnecessary |
| Pass 2 LR | 0.5 × base | Reduces Pass 2 loss volatility |
| context₂ / hidden₂ | Standard RAFT (no injection) | Concentrates the polarization path on GRU₂ input, avoiding dispersion and mutual drowning |
Architecture validation design
pol_feat detach ablation
--pol_feat_detachflag: cuts the gradient of pol_feat.- If Pass 1 / Pass 2 output differences disappear after detach, it confirms that the gradient of the polarization branch was being used.
Glass-only tail statistics
- New segmentation-tail statistics:
glass_epe_median,glass_epe_p90,glass_epe_p95. - Recorded per pass:
pass1_glass_epe_p90vspass2_glass_epe_p90.
7. Polarization Injection Points
| Injection point | Location | Form | Control |
|---|---|---|---|
| Pass 1 — Pol Volume | correlation space | V2-C: gradient gating + scheduled residual | Always fully on |
| Pass 2 — GRU₂ input | UpdateBlockWithPolFeat input concatenation | per-iteration first-class: cat[motion, context₂, PIW * pol_feat] | Controlled by PIW (0→1 over 5000 steps) |
The Pass 2 polarization injection point is at the “GRU₂ input (concat, every iteration)”; context₂ / hidden₂ have no polarization injection and remain in standard RAFT form.
8. Highlights
- Polarization becomes a first-class input: pol_feat is concatenated directly into the GRU₂ input (256→320 dim), participating explicitly in every iteration. It is no longer hidden in a context bias, and not overwritten after a single early injection.
- Centralized polarization path: context₂ / hidden₂ return to standard RAFT, and polarization is carried entirely by the GRU₂ input. This avoids dispersing the polarization signal across multiple injection points where they could drown each other out.
- Gradient-healthy PolEncoder: a 5-layer / 64ch / residual / normal-init design provides ample capacity and ensures the polarization path’s gradient is not near zero.
- Verifiable polarization contribution: built-in pol_feat detach ablation and glass-only tail metrics (P90/P95) make “whether the polarization branch is actually being used” a directly observable indicator.