UniOcc:
A Unified Benchmark for Occupancy Forecasting and Prediction
in Autonomous Driving


A comprehensive, open-source benchmark unifying 2D/3D occupancy labels, per-voxel flow annotations across multiple real-world and synthetic datasets.

Our UniOcc Framework

enables three representative tasks: occupancy prediction, occupancy forecasting with optional flow, and cooperative occupancy prediction and forecasting

by unifying:

  • File Structure

    • scene_infos.pkl

      A list of dictionaries, each containing the scene name, start and end frame, and other metadata.

    • scene_XXX

      A directory containing the data for a single scenario.

    • YYY.npz

      A NumPy file containing the following data for a single time step.

    Core Data (`.npz` file contents)

    • occ_label

      A 3D occupancy grid (L x W x H) with semantic labels.

    • occ_mask_camera

      A 3D grid (L x W x H) with binary values, with 1 indicating the voxel is in the camera FOV and 0 otherwise.

    • occ_flow_forward

      A 3D flow field (L x W x H x 3) with voxel flow vectors pointing to each voxel’s next frame coordinate.

    • occ_flow_backward

      A 3D flow field pointing to each voxel’s previous frame coordinate.

    • ego_to_world_transformation

      A 4x4 transformation matrix from the ego vehicle to the world coordinate system.

    • cameras: List of camera objects.
      • name

        e.g. CAM_FRONT

      • filename

        Relative path to image.

      • intrinsics

        3x3 intrinsic matrix.

      • extrinsics

        4x4 camera-to-ego matrix.

    • annotations: List of object annotations.
      • token

        Original object token.

      • agent_to_ego

        4x4 object-to-ego matrix.

      • agent_to_world

        4x4 object-to-world matrix.

      • size

        Bounding box in meters (L,W,H).

      • category_id

        Semantic class ID.

  • Occ3D nuScenes

    📎Data Source:nuScenes
    Length:9.5 hrs
    🎬Scenarios:1110
    📽Sampling Rate:2 Hz
    📏Resolution:0.2 / 0.4 m
    🚗Flow:-
    🔢Obj Categories:17

    Occ3D Waymo

    📎Data Source:Waymo
    Length:4.0 hrs
    🎬Scenarios:998
    📽Sampling Rate:10 Hz
    📏Resolution:0.2 / 0.4 m
    🚗Flow:-
    🔢Obj Categories:15

    SurroundOcc

    📎Data Source:nuScenes
    Length:9.5 hrs
    🎬Scenarios:1110
    📽Sampling Rate:2 Hz
    📏Resolution:0.5 m
    🚗Flow:-
    🔢Obj Categories:17

    OpenOccupancy

    📎Data Source:nuScenes
    Length:9.5 hrs
    🎬Scenarios:1110
    📽Sampling Rate:2 Hz
    📏Resolution:0.1 m
    🚗Flow:-
    🔢Obj Categories:17

    CoHFF

    📎Data Source:OpenCOOD
    Length:0.69 hrs
    🎬Scenarios:44
    📽Sampling Rate:10 Hz
    📏Resolution:1.0 m
    🚗Flow:-
    🔢Obj Categories:10

    UniOcc (Ours)

    📎Data Source:nuScenes, Waymo, CARLA, OpenCOOD
    Length:14.2+ hrs
    🎬Scenarios:2152
    📽Sampling Rate:2 Hz / 10 Hz
    📏Resolution:0.2 / 0.4 m
    🚗Flow:Voxel Level
    🔢Obj Categories:10, 15, 17
  • Occupancy Space Utilities

    • GetVoxelCoordinates

      Compute voxel indices in a 3D grid occupied by a transformed bounding box.

    • VoxelToCorners

      Convert voxel indices to 3D bounding-box corner coordinates.

    • OccFrameToEgoFrame

      Transform voxel coordinates between occupancy and ego-centric frames.

    • AlignToCentroid

      Recenter voxel coordinates by subtracting their centroid.

    • RasterizeCoordsToGrid

      Convert a list of voxel coordinates into a binary 3D occupancy grid.

    • AlignWithPCA

      Rotate voxel point clouds to align with principal axes using PCA.

    • SegmentVoxels

      Perform 3D connected-component labeling (CCL) on an occupancy grid.

    • EstimateEgoMotionFromFlows

      Estimate ego-motion from voxel flow fields using RANSAC.

    • TrackOccObjects

      Track objects across frames using voxel flows and estimated ego-motion.

    • BipartiteMatch

      Solve the optimal assignment problem (Hungarian algorithm) to associate objects.

    Voxel Flow Computation

    • ComputeFlowsForObjects

      Compute flow vectors for all foreground (dynamic) objects.

    • ComputeFlowsForBackground

      Compute static scene flow due to ego-motion for background voxels.

    • ComputeFlowsForOccupancyGrid

      Combine dynamic and static flow to produce a complete scene flow grid.

    • FindGMMForCategory

      Fit a Gaussian-Mixture Model (GMM) to object dimensions for a given category.

    • ComputeObjectLikelihoods

      Score object shapes against the pretrained GMM to get plausibility probabilities.

    • ComputeTemporalShapeConsistencyByTracking

      Track objects and report mean IoU of consecutive shapes for temporal consistency.

    • ComputeStaticConsistency

      Warp static voxels between frames and compute IoU for background stability.

    • ComputeIoU

      Compute standard intersection-over-union between two occupancy grids.

    • ComputeIoUForCategory

      Compute IoU restricted to voxels of a single semantic class.

    • FillRoadInOcc

      Ensure the bottom-most slice of the occupancy grid contains labeled road voxels.

    • CreateOccHandle

      Build a full Open3D visualizer for rendering occupancy grids.

    • AddFlowToVisHandle

      Draw flow vectors on the Open3D visualizer for motion inspection.

    • AddCenterEgoToVisHandle

      Add the ego vehicle model to the visualizer.

    • VisualizeOcc

      Create a visualizer for a static occupancy grid.

    • VisualizeOccFlow

      Visualize both occupancy and voxel-level flow vectors in 3D.

    • VisualizeOccFlowFile

      Load and visualize data from an .npz file.

    • RotateO3DCamera

      Load camera parameters and apply them to the current visualizer view.


Abstract

We introduce UniOcc, a comprehensive, unified benchmark for occupancy forecasting (i.e., predicting future occupancies based on historical information) and current-frame occupancy prediction from camera images. UniOcc unifies data from multiple real-world datasets (i.e., nuScenes, Waymo) and high-fidelity driving simulators (i.e., CARLA, OpenCOOD), which provides 2D/3D occupancy labels with per-voxel flow annotations and support for cooperative autonomous driving. In terms of evaluation, unlike existing studies that rely on suboptimal pseudo labels for evaluation, UniOcc incorporates novel metrics that do not depend on ground-truth occupancy, enabling robust assessment of additional aspects of occupancy quality. Through extensive experiments on state-of-the-art models, we demonstrate that large-scale, diverse training data and explicit flow information significantly enhance occupancy prediction and forecasting performance.


Key Ideas and Contributions

1) First-of-its-kind unified 2D/3D occupancy forecasting and prediction benchmark: including flow information for conventional and cooperative driving by unifying real data from nuScenes and Waymo and synthetic data from CARLA & OpenCOOD.
2) A user-friendly platform for current-frame occupancy prediction and multi-frame occupancy forecasting: enabling easy setup, cross-dataset augmentation, and comprehensive occupancy evaluation with or without reference to ground-truth labels
3) State-of-the-art performance of our pipeline and evaluation metrics on leading occupancy forecasting/prediction models: showing that (1) incorporating flow information yields performance gains in occupancy forecasting and (2) existing methods face challenges in cross-domain generalization, highlighting avenues for future research.



Qualitative Results of Occupancy Prediction




Quantitative Results of Cross Data Source Training and Evaluation for Occupancy Forecasting



Citation