U-Scene

Abstract

Current large-scale 3D reconstruction poses challenges like high memory consumption, lack of geometry ground truth for comparison, and lighting variations that conflict between rendering quality and geometry accuracy. Previous methods enable its training through memory reduction or spatial division but often neglect geometry accuracy due to balancing difficulties. In this work, we address these challenges by proposing TripleS (Separation, Staging, Semantics) Gaussian. Our main contributions are: (1) providing a large-scale dataset with both images and high-accuracy LiDAR point clouds; (2) introducing an attention-aware separation method to partition the dataset into manageable chunks, facilitating reconstruction and isolating lighting variations; (3) proposing semantic-aware frequency guidance to enhance geometry accuracy using Gaussian representation; and (4) resolving rendering and geometry conflicts with a staging strategy during training.

Dataset Introduction

Our dataset is now publicly available via the link provided above. Our enlarged dataset is divided into six main parts. The first part is the top portion of this graph, referred to as SZIIT (The Shenzhen Institute of Information Technology). The second row is called the Lower Campus, an abbreviation for the Chinese University of Hong Kong, Lower Campus. The third row displays the Upper Campus of CUHKSZ, and the SMBU (Shenzhen MSU-BIT University) Campus. As for the last row, it contains two auxiliary residential areas named He Ao Village and LFLS(Longgang Foreign language school) We utilized highly accurate LiDAR to collect the dataset, covering a range of more than 1.5 km^2. To view the dataset from different angles, one can use the embedded Youtube video provided. We store the dataset in Ply format on the OneDrive share point link, with coordinates in WGS 84/ UTM zone 50N: EPSG:32650 geographic standard. To further facilitate usage in computer vision and graphics, we will also provide COLMAP datasets and aligned point clouds.

Dataset Information

Our dataset provides essential information for quality control and multi-modal analysis and visualization. By using professional tools such as DJI Terra, one can observe three important properties critical for quality control: Reflectivity, Height, and Return. Graph (a) in this figure illustrates reflectivity, which measures the amount of light reflected back to the LiDAR sensor from surfaces or objects. Meanwhile, height, shown in graph (b), represents the building's altitude relative to the drone's takeoff altitude. The return, presented in graph (c), indicates the number of light returns detected by the LiDAR. Since our analysis filters out all data except those with at least two returns, moving objects, represented by red dots, will be excluded. More visualization results can be explored in our dataset or in the supplementary materials.

Table 1. This table provides detailed comparisons between our dataset and previously collected datasets. "Ptgy" stands for Photogrammetry, which is a non-LiDAR-based data acquisition method. Only real scenes are included in this table.

Dataset	Acquisition	Data Type	Area/Length	Image Number	Points/Triangular	scene
KITTI	Car Camera/Lidar	PC/Image	39.20km	300K	4549M	1
BlockNeRF	Car Camera	Image	-	12k	-	1
MILL 19	UAV Camera	Image	-	3.6k	-	2
UrbanBIS	UAV Ptgy	PC/Mesh/Image	10.78km²	113.3k	2523.8M/284.3M	5
DublinCity	UAV Lidar	PC/Image	2.00km²	-	260M	1
Hessigheim	UAV Camera/Lidar	PC/Mesh	0.19km²	-	125.7M/36.76M	1
UrbanScene3D	UAV Camera/Lidar	PC/Image	3.03km²	31k	120M	6
U(Ours)	UAV Camera/Lidar	PC/Image	6.67km²	4.6k	627.5M	6

Table 2. This table presents detailed coverage of scene reconstruction. We ensure that the size of each scene is maintained at approximately 1 km². This constraint limits the variation in lighting effects caused by the sun. The density of our point cloud is 20 cm per point. The raw data consists solely of DJI raw data and does not include the post-processed point cloud from the DJI Terra. The "Avg Height" denotes the average height of the drone's flight path relative to the altitude from which the drone took off. This height is consistently higher than that of the tallest local building. It is important to note that the maximum effective distance for LIDAR detection should be less than 250 m.

Scene	Area in km²	Image Number	Points Number	Raw Data in GB	Avg Height in m	Resolution
Lower Campus	1.020	670	79,767,884	12.5	120	5472 × 3648
Upper Campus	0.923	715	94,218,901	13.5	120	5472 × 3648
HAV	0.815	424	26,759,799	7.8	120	5472 × 3648
LFLS	1.467	1106	98,547,710	19.8	150	5472 × 3648
SMBU	0.908	563	283,31,405	16.2	150	5472 × 3648
SZIIT	1.557	1215	58,979,628	22.3	136	5472 × 3648
Total	6.668	4693	627,500,327	92.1	Nan	Nan

Table 3. This table displays the results obtained when testing our dataset with different methods, including two NeRF-based methods and 3DGS (3D Gaussian Splatting). We measured the training time in terms of GPU count multiplied by training time in minutes. For training and evaluating the Gaussian Splatting results, we used the official implementation of Gaussian Splatting. Meanwhile, the NeRF Studio implementation was utilized for Instant-NGP and NeRFacto to conduct training and evaluation.

Method	Gaussian Splatting				Instant NGP				NeRFacto
Scene	PSNR ↑	SSIM ↓	LPIPS ↓	Time (GPU·min)	PSNR ↑	SSIM ↑	LPIPS ↓	Time (GPU·min)	PSNR ↑	SSIM ↑	LPIPS ↓	Time (GPU·min)
Lower Campus	24.76	0.735	0.343	58	20.76	0.516	0.817	220	17.70	0.455	0.779	1692
Upper Campus	25.49	0.762	0.273	64	20.25	0.522	0.816	392	18.66	0.448	0.734	1704
HAV	26.14	0.805	0.237	62	20.79	0.511	0.792	268	16.95	0.399	0.727	1788
LFLS	22.03	0.678	0.371	71	18.64	0.453	0.856	348	15.05	0.364	0.879	1780
SMBU	23.90	0.784	0.248	63	18.37	0.507	0.810	252	16.61	0.405	0.682	1716
SZIIT	24.21	0.749	0.326	64	19.64	0.551	0.820	276	17.28	0.462	0.781	1732
Avg	24.42	0.752	0.300	63.7	19.74	0.510	0.815	292.7	17.04	0.422	0.764	1735.3

Table 4. Chamful Distance Between Downsampled Lidar and Reconstructed Point Cloud

Method	3DGS		Instant NGP		NeRFacto
Scene	Mean ↓	STD ↓	Mean ↓	STD ↓	Mean ↓	STD ↓
Lower Campus	0.079	0.207	0.123	0.378	0.067	0.198
Upper Campus	0.096	0.312	0.082	0.260	0.050	0.170
HAV	0.124	0.305	0.177	0.497	0.065	0.205
LFLS	0.248	0.192	0.228	0.314	0.277	0.245
SMBU	0.186	0.440	0.153	0.458	0.066	0.240
SZIIT	0.064	0.168	0.136	0.438	0.034	0.110
Avg	0.133	0.271	0.149	0.391	0.093	0.194

Data Collection Protocol

The dataset prepared for input into the neural field and Gaussian Splatting typically consists of camera positions and images in COLMAP format. The Structure from Motion (SfM) algorithm implemented in COLMAP initializes camera positions randomly, which may not align with LiDAR data in WGS 84 coordinates. This discrepancy poses a significant challenge for geometric alignment measurement and multimodal fusion algorithms. When inputs are in two different coordinate systems, further validation becomes impractical. To address this, we propose a straightforward yet effective method for statistical scale matching to align LiDAR point clouds with camera positions. This approach is crucial for the construction of our dataset.

	Size in km²	Image Number	Points Number	DJI Raw Data Size in GB
Lower Campus	1.020267099	670	79,767,884	12.5
Upper Campus	0.923096497	715	94,218,901	13.5
SMBU	0.908184476	563	283,31,405	16.2
SZIIT	1.557606058	1215	58,979,628	22.3
HAV	0.815080080	424	26,759,799	7.8
LFLS	1.466664729	1106	98,547,710	19.8
Total	6.668	4693	627,500,327	92.1

Size in km²

Image Number

Points Number

DJI Raw Data Size in GB

Lower Campus

1.020267099

670

79,767,884

12.5

Upper Campus

0.923096497

715

94,218,901

13.5

SMBU

0.908184476

563

283,31,405

16.2

SZIIT

1.557606058

1215

58,979,628

22.3

HAV

0.815080080

424

26,759,799

7.8

LFLS

1.466664729

1106

98,547,710

19.8

Total

6.668

4693

627,500,327

92.1

Location

Left-Hand Side

Right-Hand Side

TripleS-Gaussian: Align Geometry and Rendering for Large Scale Scene Reconstruction with Gausian Representation

Abstract

Dataset Introduction

Dataset Information

Data Collection Protocol

Point Cloud Data Preview

Detailed Point Properties Preview

Point Cloud Data Quantitative Scale

Download Dataset

Comparison

Location: Campus

Disclaimer