Stereo Transformer
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
arXiv:2605.09989v2 Announce Type: replace Abstract: Recent advances in robot imitation learning have produced powerful visuomotor policies that manipulate diverse objects from visual inputs. However, monocular observations lack depth information, which is critical for precise manipulation in cluttered or geometrically complex scenes. Explicit depth maps and point clouds are often noisy and fragile in real-world manipulation.
RCM-ACT: Imitation Learning with Dynamic RCM Calibration for Autonomous Intraocular Foreign Body Removal
arXiv:2508.19191v3 Announce Type: replace Abstract: Intraocular foreign body removal demands millimeter-level precision in confined intraocular spaces, yet existing robotic systems predominantly rely on manual teleoperation with steep learning curves. To address the challenges of autonomous manipulation, particularly kinematic uncertainties from variable motion scaling and Remote Center of Motion (RCM) point variation, we propose RCM-ACT, an imitation learning framework for autonomous...
HORUS: A Mixed Reality Interface for Managing Teams of Mobile Robots
arXiv:2506.02622v2 Announce Type: replace Abstract: Mixed Reality (MR) interfaces have been extensively explored for controlling mobile robots, but there is limited research on their application to managing teams of robots. This paper presents HORUS: Holistic Operational Reality for Unified Systems, a Mixed Reality interface offering a comprehensive set of tools for managing multiple mobile robots simultaneously. HORUS enables operators to monitor individual robot statuses, visualize sensor...
Unpaired RGB-Thermal Gaussian-Splatting Using Visual Geometric Transformers
arXiv:2606.05491v1 Announce Type: new Abstract: Multi-modal novel view synthesis (NVS) combining RGB and thermal imagery enables precise 3D scene reconstruction with visual and thermal information. However, existing methods typically rely on precisely calibrated RGB-thermal image pairs or stereo setups, limiting scalability and practical deployment.
A Survey of 3D Reconstruction with Event Cameras
Announce Type: replace Abstract: Event cameras are rapidly emerging as powerful vision sensors for 3D reconstruction, uniquely capable of asynchronously capturing per-pixel brightness changes. Compared to traditional frame-based cameras, event cameras produce sparse yet temporally dense data streams, enabling robust and accurate 3D reconstruction even under challenging conditions such as high-speed motion, low illumination, and extreme dynamic range scenarios. These capabilities offer...
PlayStation Architecture
Supporting imagery A quick introduction Sony knew that 3D hardware could get very messy to develop for. Thus, their debuting console will keep its design simple and practical… Although this may come at a cost!
Magenta RealTime 2: Open and Local Live Music Models
We’re excited to share Magenta RealTime 2 (MRT2), a state-of-the-art open model and efficient real-time inference engine that enables you to build and play AI musical instruments on your laptop! To get started, download the apps on your MacBook (requires Apple Silicon). Unlike other large generative music models that work offline to turn a prompt into a track, MRT2 is a live, interactive model that you can control with MIDI and audio, in addition to text.
The Apple Car Is Finally Here
Title: The Apple Car Is Finally Here Transportation has never been a Ferrari’s real purpose. Sure, you can drive one—although not literally you, because you probably can’t afford one. For the few who can, it is an automobile to be seen idling at a stoplight before prancing away, or parked at a luxury-hotel valet stand, inspiring desire and jealousy. For normal people, a Ferrari is a symbol: of power, control, precision, and wealth—but also of the longing for those virtues, and of the idea that they are virtues in the first place. The Ferrari is the quintessential bedroom-poster car, captured in a glossy photo pinned on a wall in a teenage boy’s bedroom like a photo of a scantily clad woman: an unachievable object of desire.If a Ferrari is an object of spectacle, an Apple device is an object of function. The Apple product, whether it’s a laptop, music player, smartphone, tablet, speaker, or watch, is designed to dissolve into its context and melt into ordinary life. Frictionless, intuitive, and transparent—in its ideal form, an Apple product ceases to feel like an object at all, and instead facilitates an activity. An iPhone or MacBook