Purpose-built to generate stereo training data for deep-learning models that perceive transparent objects in indoor scenes — glass doors, windows, partitions, drinkware on tables, and similar surfaces that defeat standard depth sensors.
It is a standalone GPU renderer that combines a Blender procedural scene randomizer with a Mitsuba 3 (cuda_ad_rgb) physically-based renderer. Each scene is a closed 6-wall chamber (floor, ceiling, four walls — fully enclosed, no open windows or skybox) populated with a table, randomized furniture, and one or more glass slabs / cups. The chamber design constrains all lighting to the in-scene back-wall LED panel plus ceiling fill lights, so the reflection / refraction physics on the glass surfaces are well-controlled and reproducible across scenes. The output is a complete, ready-to-train dataset: rectified stereo pairs paired with pixel-perfect depth, disparity, and glass-region masks, all with reproducible per-scene seeds.
Designed for training tasks such as:
- transparent-object stereo matching / depth estimation
- glass-region segmentation
- depth-completion on transparent surfaces
- ablation studies that need physically-plausible glass interactions and large scene variety
For each scene the pipeline outputs:
- RGB stereo pair —
left.exrandright.exr - Depth (left viewpoint, in meters)
- Disparity (in pixels, ray-depth → Z-depth corrected)
- Glass region masks — left view (training, aligned to GT disparity), right view, and left ∩ right intersection (strict evaluation)
- Per-scene JSON with camera + lighting parameters and seed
The pipeline has three stages:
- Blender — random closed-room scenes with a table and 1+ glass slabs, exported to OBJ.
- Mitsuba 3 — render left / right / depth / disparity / mask in a single pass.
- EXR tooling — convert HDR EXR outputs to viewable PNG.
mitsuba-stereo-renderer/
├── Renderer/
│ └── stereo_renderer.py # Stereo renderer
├── modelling/
│ ├── blender_glass_randomizer.py # Blender scene randomizer (main)
│ ├── merge_glass_objects.py # Import a glass-cup collection from an external .blend
│ ├── check_blend_objects.py # Diagnostic: inspect collections / verify naming contract
│ └── calibration/ # Coordinate calibration tools
│ ├── fix_object_origins.py # Reset origin to bottom-center, apply transforms
│ └── scale_objects_1to10.py # 1:10 scale-down for asset packs in real-world units
├── textures/
│ ├── manifest.json # Texture asset index
│ ├── furniture/ # Furniture textures
│ ├── floor/ # Floor textures
│ └── wall/ # Wall + ceiling shared textures
├── tools/
│ ├── exr_to_png.py # EXR -> PNG conversion
│ └── exr_viewer.py # Interactive EXR inspector
├── requirements.txt
├── LICENSE
└── README.md
- Python 3.8+
- Mitsuba 3 with
cuda_ad_rgbvariant (CUDA GPU recommended) - Drjit, NumPy, OpenCV (
cv2) - Blender 4.0+ (only needed for scene generation)
- Optional:
OpenEXR/Imathfortools/exr_to_png.py
pip install -r requirements.txtThe renderer auto-selects from cuda_ad_rgb → cuda_rgb → llvm_ad_rgb → scalar_rgb.
Origin: ground projection of the camera center.
Axes (Scene coordinates):
X: left(-) <-> right(+) [horizontal]
Y: depth (0=camera, +=away) [depth]
Z: height (0=floor, +=up) [height]
Scene layout:
Y+ (depth)
^
| back wall @ Y = CHAMBER_DEPTH
| ============================
|
| table @ Y = TABLE_DEPTH
| +-----+-----+
| | glass |
X- <-----+-----+-----+-----+---> X+
|
+---------------------------+
| camera @ (0, 680, 100) |
+---------------------------+
| Parameter | Value | Description |
|---|---|---|
CAMERA_DEPTH |
680 mm | Camera depth (near back wall) |
CAMERA_HEIGHT |
100 mm | Camera height above floor |
TABLE_DEPTH |
350 mm | Camera-to-table depth (default lookat) |
TABLE_SURFACE_HEIGHT |
100 mm | Lookat height (matches camera height) |
CHAMBER_DEPTH |
800 mm | Max scene depth |
CHAMBER_WIDTH |
600 mm | Scene width |
CHAMBER_HEIGHT |
400 mm | Ceiling height |
BASELINE |
65 mm | Stereo baseline |
OBJ → Mitsuba transform: Scene (X, Y, Z) → Mitsuba (X, Z, -Y) via scale(0.001) @ rotate([1,0,0], -90°).
The randomizer builds a closed chamber (6-wall enclosure: floor, ceiling, front/back/left/right walls) sized CHAMBER_WIDTH × CHAMBER_HEIGHT × CHAMBER_DEPTH = 600 × 400 × 800 mm, places a table inside it, scatters furniture, and inserts 1+ glass slabs or cups. Walls and floor get random textures from the texture pool; the ceiling shares the wall texture and carries area emitters for fill lighting.
The randomizer auto-discovers candidate meshes from named Blender collections in your .blend file. Prepare these collections (each may be empty if you don't want that category):
| Collection | Purpose |
|---|---|
Glass |
Glass cup objects (the renderer treats all meshes here as transparent and expects the material Glass_Clear) |
Tables |
Candidate tables (one is picked at random per scene; or override table.source_names in the CONFIG dict directly) |
Furniture |
Floor furniture — chairs, stools, lamps, radiators, trolleys, etc. |
Cabinets |
Large cabinets / shelves (rotation-locked placement) |
Decor |
Tabletop decorations — vases, small props |
Any collection that is missing or empty causes that category to be skipped — the script will not crash. Use modelling/check_blend_objects.py to verify your .blend matches the contract.
The renderer requires the glass material to be exactly Glass_Clear (case-insensitive); name the material on every glass mesh accordingly.
Drop 4K tileable JPG / PNG textures into:
textures/furniture/*.{jpg,png}
textures/floor/*.{jpg,png}
textures/wall/*.{jpg,png}
manifest.json indexes them. If a texture is missing, the material falls back to flat color.
blender your_scene.blend --background --python modelling/blender_glass_randomizer.py -- \
--seed 42 \
--count 500 \
--output ./scenes_output \
--texture_dir ./texturesOutput: one scene_XXXX.obj (with .mtl) per scene; textures are copied via path_mode='COPY'.
Glass material is always named Glass_Clear — the renderer uses exact-name matching.
If your glass asset pack contains meshes with intricate caustics (e.g. tulip-shaped wine glasses, whisky tumblers with thick bases) or extremely high vertex counts, those scenes may not converge to clean noise even at 60K+ SPP. After importing the asset pack with merge_glass_objects.py, drop them with --exclude name1,name2,.... There is no built-in blacklist — what to exclude is asset-pack-specific.
| Item | Value |
|---|---|
| Mitsuba variant | cuda_ad_rgb (with fallbacks) |
| Integrator | path |
| Resolution | 640 × 480 |
| SPP | 1024 (batched 64) |
| Max ray depth | 16 |
| Sensor | Sony IMX296LQR-C (5.023 × 3.754 mm) |
| Focal length | 6 mm |
| Baseline | 65 mm |
| Horizontal FOV | 45.4° |
| Glass IOR | 1.5 (roughness 0.02) |
| Glass type | thin or thick (CLI flag) |
Lighting:
- Back-wall LED area emitter (camera-side wall)
- Four ceiling area emitters embedded in the ceiling mesh
Cameras: parallel optical axes (non-converging), with optional gentle glass-tracking lookat (GLASS_LOOKAT_STRENGTH=0.3, capped at ±30 mm).
# Single scene
python Renderer/stereo_renderer.py \
--scene scene_0001.obj \
--output ./output
# Batch render an entire directory
python Renderer/stereo_renderer.py \
--input_dir ./scenes_output \
--output ./output \
--max_scenes 100
# Multi-GPU parallel
python Renderer/stereo_renderer.py \
--input_dir ./scenes_output \
--output ./output \
--num_gpus 4| Flag | Description |
|---|---|
--seed N |
MC sampler seed (reproducible per-scene randomization) |
--spp |
Override default SPP (1024) |
--glass_type {thin,thick} |
Glass BSDF model |
--max_scenes |
Cap on number of scenes |
--skip N |
Skip first N scenes (multi-GPU sharding) |
--scene_list FILE |
Whitelist of scene names (one scene_XXXX per line) |
--no-camera-jitter |
Disable per-scene camera randomization |
--num_gpus N |
Multi-GPU mode (subprocess per GPU, isolated via CUDA_VISIBLE_DEVICES) |
Per-scene augmentation (seed = hash of scene name):
CAMERA_X_JITTER: ±15 mmCAMERA_DEPTH_JITTER: 0~+20 mm (back-only)CAMERA_HEIGHT_JITTER: ±10 mm- Glass-tracking lookat: clamped to ±30 mm
{output_dir}/{scene_name}/
├── left.exr # Left camera RGB (float32)
├── right.exr # Right camera RGB (float32)
├── depth.exr # GT depth (meters, left viewpoint)
├── disparity.exr # GT disparity (pixels)
├── glass_mask_left.png # Glass mask, left view — pixel-aligned to depth.exr / disparity.exr (use this for training)
├── glass_mask_right.png # Glass mask, right view (auxiliary)
├── glass_mask.png # Pixel-wise (left ∩ right) — strict evaluation, drops monocular-only glass regions
├── params.json # Camera poses, intensities, seed, SPP
├── {scene_name}_report.json # Per-scene quality report
└── preview/
├── left.png
└── right.png
The depth and disparity ground truth are rendered from the left camera viewpoint (depth_camera_position() returns the left camera position). This means:
- Training loss on glass regions → use
glass_mask_left.png. It is on the same pixel grid asdisparity.exranddepth.exr, so a boolean indexing of disparity by this mask Just Works. - Strict / "two-view consistent" evaluation → use
glass_mask.png(intersection). It excludes glass pixels that are only visible from one camera (occluded in the other), where stereo disparity is fundamentally undefined. Note that this is a pixel-wise AND of the two view masks, not a disparity-warped intersection — it is conservative but cheap and works well for most cases. - Right-view inspection / debugging only →
glass_mask_right.png. Don't pair this with the left-view depth.
Mitsuba's depth AOV gives ray distance (Euclidean), but disparity needs Z distance (perpendicular):
Z = ray_depth × cos(angle)
cos(angle) = focal / sqrt(focal² + dx² + dy²)
disparity = baseline × focal / Z
Sample disparity ranges:
- near (depth = 350 mm) → ~142 px
- far (depth = 800 mm) → ~62 px
# Single file
python tools/exr_to_png.py output/scene_0001/left.exr
# Directory batch
python tools/exr_to_png.py --dir ./output -o ./preview
# Unified range (compare left vs. right at equivalent exposure)
python tools/exr_to_png.py --unified left.exr right.exr -o ./preview
# Interactive viewer
python tools/exr_viewer.py output/scene_0001/left.exrLoader fallback chain: OpenEXR → mitsuba.Bitmap → cv2.imread.
The renderer performs exact matching against glass_clear (case-insensitive), not keyword fuzzy matching, to avoid misclassifying names like glass_table or glass_shelf. Materials are partitioned into:
- glass →
dielectricBSDF (IOR = 1.5, roughness = 0.02) - ceiling (matches
ceiling/roof/top/sky/plafond/techo) → diffuse + area emitter - other → textured diffuse, fallback to flat color
Split OBJ files are written to tempfile.mkdtemp() and cleaned up automatically after rendering.
Questions, bug reports, and feature requests are welcome:
- GitHub Issues — please open an issue on this repository for anything reproducible (rendering bugs, scene-generation edge cases, documentation gaps, etc.). Include your Mitsuba / Blender / GPU info and a minimal repro when possible.
- Email the author — for private inquiries, collaboration, or research questions, reach out via Gmail: [email protected].
The repository ships with code only: it intentionally does not include a fully-populated .blend file or a complete texture set, because most furniture / glass / texture assets the author uses originate from third-party packs whose licenses don't allow public redistribution.
If you have an academic or time-critical need (paper deadline, thesis, course project, ablation reproduction) and don't have the time to source and prepare your own assets, email [email protected] and the author can — at his discretion — share a working bundle (.blend + matching textures) for personal academic use. The bundle is provided as-is and remains subject to the upstream licenses of each individual asset; please do not redistribute it further or include it in a public release without verifying the license of every asset yourself.
Pull requests are also welcome, especially for additional asset packs, calibration helpers, or non-glass material support.
Copyright (c) 2025-2026 Po-Ting Lin
Released under the MIT License (see LICENSE).