PIDS — Physics-Informed Deep Stereo — Po-Ting Lin (林柏廷)

Physics-Informed Deep Stereo (PIDS) · Late 2025 – 2026-02-26 (~5 months) · 45+ training experiments, 60+ chapters of development logs.

1. Project goal

A stereo camera system using active asymmetric polarization (left I∥ / right I⊥) to recover the depth of transparent obstacles — glass doors, acrylic panels.

2. Reasons for closure

2.1 Physical ceiling: stereo matching intrinsically fails on transparent surfaces

On glass, stereo matching converges to the background depth, not the glass surface depth.

The root cause is not “lack of texture” but left–right inconsistency (violation of photometric consistency):

Specular reflections are view-dependent.
Compound light paths (refraction + internal reflection + refraction) are extremely sensitive to viewing angle.
The left and right cameras observe completely different patterns.
Polarization can only change the ratio of reflection — it cannot create matchable feature points.

This is an optical ceiling that cannot be broken by algorithms or larger models.

2.2 The polarization signal lacks a matchable peak in correlation space

Metric	Value	Problem
Peak-to-Mean Ratio	0.963	< 1.0 — the GT location is even lower than the mean
Peak Sharpness	negative	fully inverted
I∥/I⊥ Ratio	1.168	a difference exists, but it is too small

→ The polarization signal has no sharp, matchable peak in correlation space.

2.3 Polarization forces us into “surface reconstruction” — which is equally unsolvable

The left–right inconsistency caused by polarization forces us onto the surface-reconstruction path — this is not a voluntary decision to “abandon” stereo, but a forced outcome. Active asymmetric polarization deliberately creates a left–right difference, which breaks photometric consistency; stereo matching (finding the same point via left–right alignment) therefore becomes fundamentally impossible, leaving surface reconstruction — rather than “measuring depth via correspondence” — as the only remaining option.

But once forced into surface reconstruction, PIDS faces two problems that cannot be circumvented:

(a) Insufficient reflective-surface coverage. Polarization / active probing relies on specular reflection, and specular reflection on glass only appears within a specific incidence-angle region → it cannot cover the entire glass surface. Even reframed as a “probing” approach, it only yields scattered reflective patches and cannot measure depth over the full glass surface.

(b) The same theoretical risk as ClearGrasp. Surface reconstruction must rely on surface normals, and our method still does not solve the problem of inferring normals via a neural network — bringing us right back to the same unsolved problem faced by ClearGrasp and similar methods. Polarization provides no shortcut around it.

Closure judgment

After 45+ experiments spanning three generations of architecture — RAFT-Stereo (PIDS 1.x), Two-Pass RGB (PIDS 2.0), and S2M2 (PIDS 3.0) — we confirm that “polarization + stereo matching” is the wrong question: the failure lies in optical principles, not engineering implementation. Cutting losses and closing the project is the correct call; proving that this path is a dead end is itself a valuable research conclusion.

3. Legacy assets

Hardware — dual-camera system (2× Raspberry Pi Global Shutter Camera, IMX296), polarizer set (0° / 0° / 90° linear polarizing film), calibration jigs and fixtures, camera synchronization circuit (XVS Master/Slave).

Software — calibration programs (including dark / flat-field radiometric calibration), a polarization image-processing library, Mitsuba 3 polarized rendering scripts (v1 → v7, including the physical-polarizer geometry architecture), Blender scene-randomization tools, and a benchmark / data-quality-check system (5-point quality validation).

Knowledge & documentation

An Architecture Design Compendium extracted from the 17,399-line development log, documenting every architecture design across three categories (renderer, depth model, training strategy) in chronological order:
- Renderer architectures v3.1.0 → v7.2 (physical-polarizer geometry, parallel optical axis, textured RGB Stokes rendering)
- The full depth-model main line: Baseline RAFT-Stereo → Dual-Stream → Polarization Volume V1–V2-E → PIDS 2.0 (Two-Pass RGB) → PIDS V3 (Dual Volume + FiLM) → V4 (True Dual-Stream) → V5 (Cost Concat) → V6 (Glass-Aware) → PIDS 3.0 / S2M2 (Transformer, 6-channel, ComplexCNNEncoder)
- Training strategies (curriculum sampling, staged freezing/unfreezing, Directional Impulse Descent, sensor-realism augmentation)
- Version-evolution tables, naming-conflict notes, and the three architectural “iron rules” for polarization injection
Practical polarization-optics know-how
The destructive effect of neural-network normalization (BatchNorm / LayerNorm / L2 / per-image p99) on physical signals
A cross-GPU-architecture (Ada / Hopper / Blackwell) TF32 consistency checklist
A grounded understanding of the theoretical limits of transparent-object depth measurement

Closing note

As a research project, PIDS successfully proved that one path does not work — and that itself is a valuable scientific contribution. We now know: polarized stereo vision cannot measure the surface depth of transparent objects, because stereo matching presupposes left–right photometric consistency, while transparent objects are inherently view-dependent and inconsistent. Once forced into surface reconstruction, it then fails due to insufficient reflective coverage and the unsolved normal-inference problem — bearing the same theoretical risk as ClearGrasp.

Many of these assets fed directly into later work — the Architecture Design Compendium became the Blueprints collection, and the TF32 consistency finding became its own writeup.