Polarization-Guided Segmented Matching System

1. Design Goals

1.1 Core Insight: Polarization Inherently Cannot Do Alignment

Polarization inherently cannot do alignment — I∥ and I⊥ have completely different brightness over transparent regions such as glass, so correlation cannot find matching points; the polarization cost volume has no peak at the GT disparity.

But this “defect” can be turned into an “advantage”: the polarization signal directly tells the model “this is glass; do not trust the correlation”. The design goal of this system is to use polarization as a physical confidence arbiter, overriding the confidence produced by stereo matching.

1.2 Problem: OT-Inferred Confidence Is Unreliable

In the original S2M2, confidence is inferred by Optimal Transport. The problem is that on synthetic data the glass regions are also assigned high confidence, so the model cannot use confidence to identify “regions where correlation should not be trusted”. The polarization-guided system instead overrides confidence directly using polarization physics rules.

Method	Confidence source	Characteristics
Original S2M2	OT-inferred	Glass also gets high confidence on synthetic data — unreliable
Polarization-guided	Polarization physics rule	Direct override; does not depend on the model learning it

2. Architecture: Three Steps of Segmented Matching

Three-step polarization-guided segmented matching flow

3. Components and Modules (Three Steps in Detail)

3.1 Step 1 — Polarization Detection

pol_diff = mean(|I∥_RGB - I⊥_RGB|, dim=channel): per-channel difference averaged is the most stable (B > G > R).
I∥ = left (left camera), I⊥ = right (right camera).
glass_prob = sigmoid(20 * (pol_diff - 0.05)): convert pol_diff into glass probability with a sigmoid.

3.2 Step 2 — Dilation + Confidence Override

glass_prob = gaussian_blur(glass_prob, kernel=21): dilate the glass probability map.
Γ_modified = Γ * (1 - glass_prob): confidence over glass regions is suppressed, regardless of what OT produces.

Disparity from non-glass regions (high confidence) propagates into glass regions (low confidence).
The disparity from surrounding aligned regions is interpolated into the glass regions.
Similar to a bilateral filter: neighbors that are spatially close and feature-similar have the largest influence.

3.4 Why Dilation Is Needed

Specular reflection only produces signal where the Fresnel reflection is strong (near the Brewster angle). In the center of a glass surface with near-normal incidence, the polarization difference may be very small, so the pol_diff mask may cover only part of the glass and must be dilated outward.

Reason for choosing Gaussian blur: smoother than max_pool, with a gradual confidence falloff at edges. At 1/4 resolution, kernel=21 corresponds to roughly 40 px of actual extension in the original image — enough to cover regions that the Brewster-angle detection misses.

4. Parameter Design

Parameter	Value	Source / Rationale
pol_diff computation	`mean(\|I∥ - I⊥\|, dim=C)`	per-channel difference averaged is most stable (B > G > R)
threshold	0.05	glass=0.124, non-glass=-0.016, separation=0.14; pick the lower-middle
k	20	sigmoid steepness: 0.05 down -> 0, 0.1 up -> 1
Dilation method	Gaussian blur 21x21	smoother than max_pool; ~10 px extension
Injection location	After Γ from OT, before global refinement	minimal change

5. Soft Mode and Hard Mode

The system provides two confidence-override modes:

Mode	Override rule	Description
Soft Mode	`conf_modified = conf * (1 - glass_prob)`	continuous decay; higher glass probability suppresses confidence more
Hard Mode	`conf = 0.1 if pol_diff > threshold`	binarized; glass regions are set to 0.1 directly (below GlobalRefiner’s 0.2 threshold)

The 0.1 in Hard Mode is intentionally below GlobalRefiner’s internal 0.2 threshold for “trustworthy regions”, ensuring that glass regions are treated as untrustworthy and trigger propagation.

6. Implementation Code

Insert before the global_refiner call (about 10 lines):

def inject_polarization_confidence(left, right, pred_conf, threshold=0.05, k=20):
    """
    Override confidence using the polarization signal so that global refinement
    propagates automatically.

    Args:
        left: (B, C, H, W) I∥ image, in [0, 255]
        right: (B, C, H, W) I⊥ image, in [0, 255]
        pred_conf: (B, 1, H/4, W/4) confidence output by OT
        threshold: pol_diff threshold
        k: sigmoid steepness

    Returns:
        modified confidence
    """
    # 1. Compute pol_diff (original resolution)
    pol_diff = (left - right).abs().mean(dim=1, keepdim=True) / 255.0  # (B,1,H,W)

    # 2. Downsample to 1/4 resolution (to match confidence)
    pol_diff_4x = F.interpolate(pol_diff, size=pred_conf.shape[-2:], mode='bilinear', align_corners=True)

    # 3. Sigmoid -> glass probability
    glass_prob = torch.sigmoid(k * (pol_diff_4x - threshold))

    # 4. Gaussian blur for dilation
    glass_prob = torchvision.transforms.functional.gaussian_blur(glass_prob, kernel_size=21)

    # 5. Override confidence
    return pred_conf * (1 - glass_prob)

7. Tensor Dimensions

Tensor	Shape	Description
left (I∥) / right (I⊥)	(B, C, H, W)	original resolution, range [0, 255]
pol_diff	(B, 1, H, W)	per-channel mean, normalized by /255
pol_diff_4x	(B, 1, H/4, W/4)	downsampled to match confidence
glass_prob	(B, 1, H/4, W/4)	after sigmoid, dilated by gaussian_blur(21x21)
pred_conf (Γ)	(B, 1, H/4, W/4)	confidence output by OT
output conf_modified	(B, 1, H/4, W/4)	`pred_conf * (1 - glass_prob)`

8. Polarization Injection Points

Injection location: after Γ (confidence) from OT, before global refinement.

Polarization confidence injection location

This is the minimal-change injection point: it does not modify any S2M2 network weights, only inserting about 10 lines of confidence override before the GlobalRefiner call. The polarization signal does not participate in feature extraction or matching; it is a pure physical rule that overrides confidence, letting GlobalRefiner’s propagation mechanism automatically bring disparity from non-glass regions into glass regions.

9. Design Decisions and Rationale

Decision	Rationale
Turn “polarization cannot align” into an advantage	Polarization directly tells the model “this is glass; do not trust the correlation”
Override confidence with a physical rule	OT-inferred confidence is also high over glass on synthetic data — unreliable
Inject after Γ and before refinement	Minimal change; no training, no weight changes
pol_diff via per-channel mean	per-channel difference averaged is the most stable (B > G > R)
threshold = 0.05	glass=0.124, non-glass=-0.016, separation=0.14; pick the lower-middle
k = 20	sigmoid steepness so that 0.05 down -> 0 and 0.1 up -> 1
Dilate with Gaussian blur (not max_pool)	Smoother, with gradual confidence falloff at edges
Provide both Soft / Hard modes	Soft for continuous decay; Hard for binarization set below GlobalRefiner’s 0.2 threshold

9.1 Expected Applicable Conditions

Driven purely by physical rules; no training required. By design, it is expected to deliver the most value in scenarios where “correlation truly fails over glass regions” (real-world, pretrained models).

10. Highlights

Turns a defect into an advantage: polarization inherently cannot do alignment, yet it is used as a physical arbiter signaling “this region is untrustworthy” and directly overriding confidence.
Zero training, zero weight changes: only about 10 lines of confidence override are inserted before the GlobalRefiner call; no network weights are modified.
Leverages the existing propagation mechanism: after confidence over glass regions is suppressed, GlobalRefiner’s propagation automatically interpolates disparity from nearby non-glass regions into the glass regions.
Dilation fills the Brewster-angle blind spot: a Gaussian blur dilates the glass probability map outward to cover the glass-center regions where, under near-normal incidence, the polarization difference is too weak to be detected.
Soft / Hard dual modes: Soft preserves gradual decay; Hard binarizes and is intentionally set below GlobalRefiner’s internal 0.2 threshold to guarantee propagation is triggered.