Abstract

We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometers in the first version and 3.5 square kilometers in the second version, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. The dataset size is continually growing and now includes 6 scenarios. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. The detailed data acquisition protocol is appended in the supplementary material. It contains drone assembly, controller path planning, controller assembly, safety and protection, RTK Help, Drone Data Post-Processing, and many other details. U-Scene, developed under the auspices of the Chinese University of Hong Kong, Shenzhen MSU-BIT University, SZIIT, and auxiliary residential area, offers a unique blend of urban and academic environments for advanced spatial analysis. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance and innovation of our work.

MY ALT TEXT
Our dataset is now publicly available via the link provided above. Our enlarged dataset is divided into six main parts. The first part is the top portion of this graph, referred to as SZIIT (The Shenzhen Institute of Information Technology). The second row is called the Lower Campus, an abbreviation for the Chinese University of Hong Kong, Lower Campus. The third row displays the Upper Campus of CUHKSZ, and the SMBU (Shenzhen MSU-BIT University) Campus. As for the last row, it contains two auxiliary residential areas named He Ao Village and LFLS(Longgang Foreign language school) We utilized highly accurate LiDAR to collect the dataset, covering a range of more than 1.5 km^2. To view the dataset from different angles, one can use the embedded Youtube video provided. We store the dataset in Ply format on the OneDrive share point link, with coordinates in WGS 84/ UTM zone 50N: EPSG:32650 geographic standard. To further facilitate usage in computer vision and graphics, we will also provide COLMAP datasets and aligned point clouds.
MY ALT TEXT
Our dataset provides essential information for quality control and multi-modal analysis and visualization. By using professional tools such as DJI Terra, one can observe three important properties critical for quality control: Reflectivity, Height, and Return. Graph (a) in this figure illustrates reflectivity, which measures the amount of light reflected back to the LiDAR sensor from surfaces or objects. Meanwhile, height, shown in graph (b), represents the building's altitude relative to the drone's takeoff altitude. The return, presented in graph (c), indicates the number of light returns detected by the LiDAR. Since our analysis filters out all data except those with at least two returns, moving objects, represented by red dots, will be excluded. More visualization results can be explored in our dataset or in the supplementary materials.
MY ALT TEXT
MY ALT TEXT
MY ALT TEXT
MY ALT TEXT
1: The dataset prepared for input into the neural field and Gaussian Splatting typically consists of camera positions and images in COLMAP format. The Structure from Motion (SfM) algorithm implemented in COLMAP initializes camera positions randomly, which may not align with LiDAR data in WGS 84 coordinates. This discrepancy poses a significant challenge for geometric alignment measurement and multimodal fusion algorithms. When inputs are in two different coordinate systems, further validation becomes impractical. To address this, we propose a straightforward yet effective method for statistical scale matching to align LiDAR point clouds with camera positions. This approach is crucial for the construction of our dataset.
MY ALT TEXT
This figure shows the design of the drone routing path. The white and orange dots represent the positions where the drone took pictures. The overall path for a scene is shown in Graph (a), which is composed of several micro-blocks. One such micro-block, highlighted in orange, is detailed in Graph (a). Zooming into this orange micro-block reveals Figure (b). The total path length of each micro-block is limited by the battery life of the DJI Matrice 300, as well as the power consumption of the LiDAR in windy conditions. For safety reasons, each micro-block typically covers an area of 350 x 350 square meters. Each micro-block has five routing paths, providing different angles for photography, as illustrated in Figure (c). The first routing path offers a Bird's Eye View (BEV), while the subsequent four paths alter the camera's orientation by 45 degrees towards the horizontal plane. These four paths' camera orientations are forward, backward, rightward, and leftward, respectively.

Point Cloud Data Preview

Point Cloud Data Quantitative Scale

Size in km2 Image Number Points Number DJI Raw Data Size in GB
Lower Campus 1.020267099 670 79,767,884 12.5
Upper Campus 0.923096497 715 94,218,901 13.5
SMBU 0.908184476 563 283,31,405 16.2
SZIIT 1.557606058 1215 58,979,628 22.3
HAV 0.815080080 424 26,759,799 7.8
LFLS 1.466664729 1106 98,547,710 19.8
Total 6.668 4693 627,500,327 92.1
Download Dataset

If you are interested in the data, please first obtained the license at the dataset link provided above and send email to the correspondance according to the guidance written in the license.

BibTeX