Home › Knowledge Base › Waymo Perception

Waymo Perception

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

StandardE2E: A Unified Framework for End-to-End Autonomous Driving Datasets

arXiv:2606.04271v1 Announce Type: new Abstract: Autonomous driving has shifted from modular perception-prediction-planning stacks toward end-to-end (E2E) models that map sensor inputs directly to vehicle control, often regularized by auxiliary tasks such as 3D detection, motion forecasting, and HD-map perception. Progress is driven by a fast-growing ecosystem of sensor-rich driving datasets, yet each ships its own file formats, APIs, coordinate conventions, and modality coverage, leaving...

arXiv CS 6d ago

NTR: Neural Token Reconstruction for Scene Token Bottleneck in End-to-End Driving

Announce Type: new Abstract: Recent perception-free end-to-end (E2E) autonomous driving methods bypass explicit perception outputs by compressing dense image patch tokens into compact scene tokens for downstream trajectory generation and scoring. While these scene tokens form a compact visual bottleneck for the planner, they receive supervision solely from the planning objective, providing limited constraints on the encoded visual information.

arXiv CS 9d ago

CoMo3R-SLAM: Collaborative Monocular Dense SLAM with Learned 3D Reconstruction Priors for Outdoor Multi-Agent Systems

Announce Type: new Abstract: Collaborative dense SLAM is essential for multi-robot teams to achieve scalable and consistent 3D perception across large-scale outdoor environments. Existing systems typically depend on depth sensors, incurring significant payload, power, and calibration costs. Monocular RGB cameras are a lightweight alternative, but collaborative monocular dense SLAM remains difficult due to scale ambiguity, unreliable inter-agent data association, especially in outdoor scenes...

arXiv CS 9d ago

DVGT: Driving Visual Geometry Transformer

Announce Type: replace Abstract: Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can adapt to different scenarios and camera configurations. To bridge this gap, we propose a Driving Visual Geometry Transformer (DVGT), which reconstructs a global dense 3D point map from a sequence of unposed multi-view visual inputs.

arXiv CS 6d ago

Why Are Large Language Models So Terrible at Video Games?

Large language models (LLMs) have improved so quickly that the benchmarks themselves have evolved, adding more complex problems in an effort to challenge the latest models. Yet LLMs haven’t improved across all domains, and one task remains far outside their grasp: They have no idea how to play video games. While a few have managed to beat a few games (for example, Gemini 2.5 Pro beat Pokemon Blue in May of 2025), these exceptions prove the rule.

Hacker News 9d ago