Blueprint · 2026

Polarization-Guided Segmented Matching System

This document describes the architectural design of the Polarization-Guided Segmented Matching system: the Step 1-3 pipeline, injection point, Soft/Hard modes, and the `inject_polarization_confidence` code. It focuses on the architecture and design rationale and does not cover experimental results or performance numbers.

  • stereo matching
  • polarization
  • RAFT-Stereo

Using these blueprints

Everything here is an architecture proposal I designed and chose to publish openly. Free to use, adapt, or build on — no permission needed.

If one turns out useful and crediting is convenient, a link back to this site is appreciated. It's never required.

1. Design Goals

1.1 Core Insight: Polarization Inherently Cannot Do Alignment

Polarization inherently cannot do alignment — I∥ and I⊥ have completely different brightness over transparent regions such as glass, so correlation cannot find matching points; the polarization cost volume has no peak at the GT disparity.

But this “defect” can be turned into an “advantage”: the polarization signal directly tells the model “this is glass; do not trust the correlation”. The design goal of this system is to use polarization as a physical confidence arbiter, overriding the confidence produced by stereo matching.

1.2 Problem: OT-Inferred Confidence Is Unreliable

In the original S2M2, confidence is inferred by Optimal Transport. The problem is that on synthetic data the glass regions are also assigned high confidence, so the model cannot use confidence to identify “regions where correlation should not be trusted”. The polarization-guided system instead overrides confidence directly using polarization physics rules.

MethodConfidence sourceCharacteristics
Original S2M2OT-inferredGlass also gets high confidence on synthetic data — unreliable
Polarization-guidedPolarization physics ruleDirect override; does not depend on the model learning it

2. Architecture: Three Steps of Segmented Matching

Three-step polarization-guided segmented matching flow


3. Components and Modules (Three Steps in Detail)

3.1 Step 1 — Polarization Detection

  • pol_diff = mean(|I∥_RGB - I⊥_RGB|, dim=channel): per-channel difference averaged is the most stable (B > G > R).
  • I∥ = left (left camera), I⊥ = right (right camera).
  • glass_prob = sigmoid(20 * (pol_diff - 0.05)): convert pol_diff into glass probability with a sigmoid.

3.2 Step 2 — Dilation + Confidence Override

  • glass_prob = gaussian_blur(glass_prob, kernel=21): dilate the glass probability map.
  • Γ_modified = Γ * (1 - glass_prob): confidence over glass regions is suppressed, regardless of what OT produces.

3.3 Step 3 — Global Refinement Propagates Automatically

  • Disparity from non-glass regions (high confidence) propagates into glass regions (low confidence).
  • The disparity from surrounding aligned regions is interpolated into the glass regions.
  • Similar to a bilateral filter: neighbors that are spatially close and feature-similar have the largest influence.

3.4 Why Dilation Is Needed

Specular reflection only produces signal where the Fresnel reflection is strong (near the Brewster angle). In the center of a glass surface with near-normal incidence, the polarization difference may be very small, so the pol_diff mask may cover only part of the glass and must be dilated outward.

Reason for choosing Gaussian blur: smoother than max_pool, with a gradual confidence falloff at edges. At 1/4 resolution, kernel=21 corresponds to roughly 40 px of actual extension in the original image — enough to cover regions that the Brewster-angle detection misses.


4. Parameter Design

ParameterValueSource / Rationale
pol_diff computationmean(|I∥ - I⊥|, dim=C)per-channel difference averaged is most stable (B > G > R)
threshold0.05glass=0.124, non-glass=-0.016, separation=0.14; pick the lower-middle
k20sigmoid steepness: 0.05 down -> 0, 0.1 up -> 1
Dilation methodGaussian blur 21x21smoother than max_pool; ~10 px extension
Injection locationAfter Γ from OT, before global refinementminimal change

5. Soft Mode and Hard Mode

The system provides two confidence-override modes:

ModeOverride ruleDescription
Soft Modeconf_modified = conf * (1 - glass_prob)continuous decay; higher glass probability suppresses confidence more
Hard Modeconf = 0.1 if pol_diff > thresholdbinarized; glass regions are set to 0.1 directly (below GlobalRefiner’s 0.2 threshold)

The 0.1 in Hard Mode is intentionally below GlobalRefiner’s internal 0.2 threshold for “trustworthy regions”, ensuring that glass regions are treated as untrustworthy and trigger propagation.


6. Implementation Code

Insert before the global_refiner call (about 10 lines):

def inject_polarization_confidence(left, right, pred_conf, threshold=0.05, k=20):
    """
    Override confidence using the polarization signal so that global refinement
    propagates automatically.

    Args:
        left: (B, C, H, W) I∥ image, in [0, 255]
        right: (B, C, H, W) I⊥ image, in [0, 255]
        pred_conf: (B, 1, H/4, W/4) confidence output by OT
        threshold: pol_diff threshold
        k: sigmoid steepness

    Returns:
        modified confidence
    """
    # 1. Compute pol_diff (original resolution)
    pol_diff = (left - right).abs().mean(dim=1, keepdim=True) / 255.0  # (B,1,H,W)

    # 2. Downsample to 1/4 resolution (to match confidence)
    pol_diff_4x = F.interpolate(pol_diff, size=pred_conf.shape[-2:], mode='bilinear', align_corners=True)

    # 3. Sigmoid -> glass probability
    glass_prob = torch.sigmoid(k * (pol_diff_4x - threshold))

    # 4. Gaussian blur for dilation
    glass_prob = torchvision.transforms.functional.gaussian_blur(glass_prob, kernel_size=21)

    # 5. Override confidence
    return pred_conf * (1 - glass_prob)

7. Tensor Dimensions

TensorShapeDescription
left (I∥) / right (I⊥)(B, C, H, W)original resolution, range [0, 255]
pol_diff(B, 1, H, W)per-channel mean, normalized by /255
pol_diff_4x(B, 1, H/4, W/4)downsampled to match confidence
glass_prob(B, 1, H/4, W/4)after sigmoid, dilated by gaussian_blur(21x21)
pred_conf (Γ)(B, 1, H/4, W/4)confidence output by OT
output conf_modified(B, 1, H/4, W/4)pred_conf * (1 - glass_prob)

8. Polarization Injection Points

Injection location: after Γ (confidence) from OT, before global refinement.

Polarization confidence injection location

This is the minimal-change injection point: it does not modify any S2M2 network weights, only inserting about 10 lines of confidence override before the GlobalRefiner call. The polarization signal does not participate in feature extraction or matching; it is a pure physical rule that overrides confidence, letting GlobalRefiner’s propagation mechanism automatically bring disparity from non-glass regions into glass regions.


9. Design Decisions and Rationale

DecisionRationale
Turn “polarization cannot align” into an advantagePolarization directly tells the model “this is glass; do not trust the correlation”
Override confidence with a physical ruleOT-inferred confidence is also high over glass on synthetic data — unreliable
Inject after Γ and before refinementMinimal change; no training, no weight changes
pol_diff via per-channel meanper-channel difference averaged is the most stable (B > G > R)
threshold = 0.05glass=0.124, non-glass=-0.016, separation=0.14; pick the lower-middle
k = 20sigmoid steepness so that 0.05 down -> 0 and 0.1 up -> 1
Dilate with Gaussian blur (not max_pool)Smoother, with gradual confidence falloff at edges
Provide both Soft / Hard modesSoft for continuous decay; Hard for binarization set below GlobalRefiner’s 0.2 threshold

9.1 Expected Applicable Conditions

Driven purely by physical rules; no training required. By design, it is expected to deliver the most value in scenarios where “correlation truly fails over glass regions” (real-world, pretrained models).


10. Highlights

  • Turns a defect into an advantage: polarization inherently cannot do alignment, yet it is used as a physical arbiter signaling “this region is untrustworthy” and directly overriding confidence.
  • Zero training, zero weight changes: only about 10 lines of confidence override are inserted before the GlobalRefiner call; no network weights are modified.
  • Leverages the existing propagation mechanism: after confidence over glass regions is suppressed, GlobalRefiner’s propagation automatically interpolates disparity from nearby non-glass regions into the glass regions.
  • Dilation fills the Brewster-angle blind spot: a Gaussian blur dilates the glass probability map outward to cover the glass-center regions where, under near-normal incidence, the polarization difference is too weak to be detected.
  • Soft / Hard dual modes: Soft preserves gradual decay; Hard binarizes and is intentionally set below GlobalRefiner’s internal 0.2 threshold to guarantee propagation is triggered.

← All blueprints