1. Design Goals
1.1 Core Insight: Polarization Inherently Cannot Do Alignment
Polarization inherently cannot do alignment — I∥ and I⊥ have completely different brightness over transparent regions such as glass, so correlation cannot find matching points; the polarization cost volume has no peak at the GT disparity.
But this “defect” can be turned into an “advantage”: the polarization signal directly tells the model “this is glass; do not trust the correlation”. The design goal of this system is to use polarization as a physical confidence arbiter, overriding the confidence produced by stereo matching.
1.2 Problem: OT-Inferred Confidence Is Unreliable
In the original S2M2, confidence is inferred by Optimal Transport. The problem is that on synthetic data the glass regions are also assigned high confidence, so the model cannot use confidence to identify “regions where correlation should not be trusted”. The polarization-guided system instead overrides confidence directly using polarization physics rules.
| Method | Confidence source | Characteristics |
|---|---|---|
| Original S2M2 | OT-inferred | Glass also gets high confidence on synthetic data — unreliable |
| Polarization-guided | Polarization physics rule | Direct override; does not depend on the model learning it |
2. Architecture: Three Steps of Segmented Matching
3. Components and Modules (Three Steps in Detail)
3.1 Step 1 — Polarization Detection
pol_diff = mean(|I∥_RGB - I⊥_RGB|, dim=channel): per-channel difference averaged is the most stable (B > G > R).- I∥ = left (left camera), I⊥ = right (right camera).
glass_prob = sigmoid(20 * (pol_diff - 0.05)): convert pol_diff into glass probability with a sigmoid.
3.2 Step 2 — Dilation + Confidence Override
glass_prob = gaussian_blur(glass_prob, kernel=21): dilate the glass probability map.Γ_modified = Γ * (1 - glass_prob): confidence over glass regions is suppressed, regardless of what OT produces.
3.3 Step 3 — Global Refinement Propagates Automatically
- Disparity from non-glass regions (high confidence) propagates into glass regions (low confidence).
- The disparity from surrounding aligned regions is interpolated into the glass regions.
- Similar to a bilateral filter: neighbors that are spatially close and feature-similar have the largest influence.
3.4 Why Dilation Is Needed
Specular reflection only produces signal where the Fresnel reflection is strong (near the Brewster angle). In the center of a glass surface with near-normal incidence, the polarization difference may be very small, so the pol_diff mask may cover only part of the glass and must be dilated outward.
Reason for choosing Gaussian blur: smoother than max_pool, with a gradual confidence falloff at edges. At 1/4 resolution, kernel=21 corresponds to roughly 40 px of actual extension in the original image — enough to cover regions that the Brewster-angle detection misses.
4. Parameter Design
| Parameter | Value | Source / Rationale |
|---|---|---|
| pol_diff computation | mean(|I∥ - I⊥|, dim=C) | per-channel difference averaged is most stable (B > G > R) |
| threshold | 0.05 | glass=0.124, non-glass=-0.016, separation=0.14; pick the lower-middle |
| k | 20 | sigmoid steepness: 0.05 down -> 0, 0.1 up -> 1 |
| Dilation method | Gaussian blur 21x21 | smoother than max_pool; ~10 px extension |
| Injection location | After Γ from OT, before global refinement | minimal change |
5. Soft Mode and Hard Mode
The system provides two confidence-override modes:
| Mode | Override rule | Description |
|---|---|---|
| Soft Mode | conf_modified = conf * (1 - glass_prob) | continuous decay; higher glass probability suppresses confidence more |
| Hard Mode | conf = 0.1 if pol_diff > threshold | binarized; glass regions are set to 0.1 directly (below GlobalRefiner’s 0.2 threshold) |
The 0.1 in Hard Mode is intentionally below GlobalRefiner’s internal 0.2 threshold for “trustworthy regions”, ensuring that glass regions are treated as untrustworthy and trigger propagation.
6. Implementation Code
Insert before the global_refiner call (about 10 lines):
def inject_polarization_confidence(left, right, pred_conf, threshold=0.05, k=20):
"""
Override confidence using the polarization signal so that global refinement
propagates automatically.
Args:
left: (B, C, H, W) I∥ image, in [0, 255]
right: (B, C, H, W) I⊥ image, in [0, 255]
pred_conf: (B, 1, H/4, W/4) confidence output by OT
threshold: pol_diff threshold
k: sigmoid steepness
Returns:
modified confidence
"""
# 1. Compute pol_diff (original resolution)
pol_diff = (left - right).abs().mean(dim=1, keepdim=True) / 255.0 # (B,1,H,W)
# 2. Downsample to 1/4 resolution (to match confidence)
pol_diff_4x = F.interpolate(pol_diff, size=pred_conf.shape[-2:], mode='bilinear', align_corners=True)
# 3. Sigmoid -> glass probability
glass_prob = torch.sigmoid(k * (pol_diff_4x - threshold))
# 4. Gaussian blur for dilation
glass_prob = torchvision.transforms.functional.gaussian_blur(glass_prob, kernel_size=21)
# 5. Override confidence
return pred_conf * (1 - glass_prob)
7. Tensor Dimensions
| Tensor | Shape | Description |
|---|---|---|
| left (I∥) / right (I⊥) | (B, C, H, W) | original resolution, range [0, 255] |
| pol_diff | (B, 1, H, W) | per-channel mean, normalized by /255 |
| pol_diff_4x | (B, 1, H/4, W/4) | downsampled to match confidence |
| glass_prob | (B, 1, H/4, W/4) | after sigmoid, dilated by gaussian_blur(21x21) |
| pred_conf (Γ) | (B, 1, H/4, W/4) | confidence output by OT |
| output conf_modified | (B, 1, H/4, W/4) | pred_conf * (1 - glass_prob) |
8. Polarization Injection Points
Injection location: after Γ (confidence) from OT, before global refinement.
This is the minimal-change injection point: it does not modify any S2M2 network weights, only inserting about 10 lines of confidence override before the GlobalRefiner call. The polarization signal does not participate in feature extraction or matching; it is a pure physical rule that overrides confidence, letting GlobalRefiner’s propagation mechanism automatically bring disparity from non-glass regions into glass regions.
9. Design Decisions and Rationale
| Decision | Rationale |
|---|---|
| Turn “polarization cannot align” into an advantage | Polarization directly tells the model “this is glass; do not trust the correlation” |
| Override confidence with a physical rule | OT-inferred confidence is also high over glass on synthetic data — unreliable |
| Inject after Γ and before refinement | Minimal change; no training, no weight changes |
| pol_diff via per-channel mean | per-channel difference averaged is the most stable (B > G > R) |
| threshold = 0.05 | glass=0.124, non-glass=-0.016, separation=0.14; pick the lower-middle |
| k = 20 | sigmoid steepness so that 0.05 down -> 0 and 0.1 up -> 1 |
| Dilate with Gaussian blur (not max_pool) | Smoother, with gradual confidence falloff at edges |
| Provide both Soft / Hard modes | Soft for continuous decay; Hard for binarization set below GlobalRefiner’s 0.2 threshold |
9.1 Expected Applicable Conditions
Driven purely by physical rules; no training required. By design, it is expected to deliver the most value in scenarios where “correlation truly fails over glass regions” (real-world, pretrained models).
10. Highlights
- Turns a defect into an advantage: polarization inherently cannot do alignment, yet it is used as a physical arbiter signaling “this region is untrustworthy” and directly overriding confidence.
- Zero training, zero weight changes: only about 10 lines of confidence override are inserted before the GlobalRefiner call; no network weights are modified.
- Leverages the existing propagation mechanism: after confidence over glass regions is suppressed, GlobalRefiner’s propagation automatically interpolates disparity from nearby non-glass regions into the glass regions.
- Dilation fills the Brewster-angle blind spot: a Gaussian blur dilates the glass probability map outward to cover the glass-center regions where, under near-normal incidence, the polarization difference is too weak to be detected.
- Soft / Hard dual modes: Soft preserves gradual decay; Hard binarizes and is intentionally set below GlobalRefiner’s internal 0.2 threshold to guarantee propagation is triggered.