Occupancy Generation

Overview

The occupancy_generation/ module is used for processing the raw collected data collected from CARLA to generate high-quality derived modalities. This includes rectifying sensor artifacts to produce refined semantic depth maps, generating surface normals, and constructing dense and resampled panoptic occupancy grids.

Running

We provide a set of bash scripts to automate the generation and visualization pipelines. You can run these scripts directly from the repository root.

Step 1: Modality Generation

To generate all modalities (refined semantic depth, normals, and panoptic occupancy at 0.1m resolution) once for all, run:

bash occupancy_generation/gen_modalities.sh

You can also generate required modalities by separately running the following commands:

# Semantic Depth Generation
python occupancy_generation/generators/gen_sem_depth.py
# Normal Map Generation
python occupancy_generation/generators/gen_normal.py
# Panoptic Occupancy Generation
python occupancy_generation/generators/gen_pano_occ.py

Step 2: Occupancy Resampling

The raw occupancy grids (vs_0_1) can be resampled into different spatial extents and resolutions. Use following command to resample the occupancy grids for forward-view or surround-view tasks:

bash occupancy_generation/resample_occ.sh

You can also customize your required occ GT format

This script uses generators/resample_occ.py to downsample the occupancy into predefined configurations like vs_0_2_forward_view and vs_0_4_surround_view.

Step 3: Visualization

To visualize the generated scene meshes and occupancy grids, run:

bash occupancy_generation/vis_modalities.sh

Occupancy Resolution Settings

Definitions

  • Voxel Size (m): Physical size of each voxel along the (x, y, z) axes.

  • Voxel Origin (m): The coordinate of the minimum corner of the voxel grid in LiDAR Coordinate System.

  • Voxel Range (m): The spatial coverage of the voxel grid along each axis in LiDAR Coordinate System.

  • Volume Size: The resolution of the occupancy grid.

Configuration Table

Occupancy grid configurations under different resolutions

Config Type

Voxel Size

Voxel Origin

Voxel Range

Volume Size

vs_0_1 (Raw)

0.1

[-25.6, -25.6, -3]

[[-25.6, 51.2], [-25.6, 25.6], [-3, 10]]

[768, 512, 130]

vs_0_2_forward_view

0.2

[0, -25.6, -2.4]

[[0, 51.2], [-25.6, 25.6], [-2.4, 4]]

[256, 256, 32]

vs_0_4_surround_view

0.4

[-25.6, -25.6, -2.4]

[[-25.6, 25.6], [-25.6, 25.6], [-2.4, 4]]

[128, 128, 16]

Notes

  • The Volume Size can be computed as the spatial extent divided by the voxel size.

  • The configuration vs_0_2_forward_view follows the spatial setup of KITTI-360-SSCBench benchmark, focusing on the single-view occupancy prediction task. The configuration vs_0_4_surround_view follows the spatial setup of Occ3D-nuScenes benchmark, focusing on the surround-view occupancy prediction task. Please note that the Voxel Range is slightly different with these two benchmarks.

  • To generate your customized occupancy ground truth with a different voxel size / voxel origin / volume size, run the following command:

    cd data_collection
    # e.g., to produce vs_0_2_forward_view
    python occ_downsample.py \
      --dataset_dir ../data/CarlaOccV1 \
      --voxel_size 0.2 \
      --save_dir_name vs_0_2_forward_view \
      --voxel_origin 0 -25.6 -2.4 \
      --volume_size 256 256 32
    

Panoptic Occupancy Labels

Label Format

To reduce spatial redundancy, we adopt a sparse format for the occupancy labels and use the .npz format to store them. The ground truth labels contain the following fields:

  • occupancy: Sparse occupancy labels in shape of [N, 4], where each row is [x, y, z, encoded_label] representing the voxel indices and its label.

  • voxel_size: See above.

  • voxel_origin: See above.

  • volume_size: See above. It is used to decode the occupancy labels back to the dense format.

Label Encoding and Decoding

To unify the storage format of the occupancy labels, we encode the semantic labels and instance labels into a single combined label of uint16. The encoding and decoding are defined as follows:

encoded_label = semantic_label * 1000 + instance_label % 1000
semantic_label, instance_label = encoded_label // 1000, encoded_label % 1000

Notes

  • For non-instance classes (i.e., stuff classes), the instance label is set to 0.

  • The semantic classes should be numbered in range of [0, 64].

  • The instance labels should be numbered in range of [0, 999]. Instance labels should be unique within each semantic class, and can be reused in different scenes.