Table 1. This table provides detailed comparisons between our dataset and previously collected datasets.
"Ptgy" stands for Photogrammetry, which is a non-LiDAR-based data acquisition method. Only real scenes are
included in this table.
Dataset |
Acquisition |
Data Type |
Area/Length |
Image Number |
Points/Triangular |
scene |
KITTI |
Car Camera/Lidar |
PC/Image |
39.20km |
300K |
4549M |
1 |
BlockNeRF |
Car Camera |
Image |
- |
12k |
- |
1 |
MILL 19 |
UAV Camera |
Image |
- |
3.6k |
- |
2 |
UrbanBIS |
UAV Ptgy |
PC/Mesh/Image |
10.78km² |
113.3k |
2523.8M/284.3M |
5 |
DublinCity |
UAV Lidar |
PC/Image |
2.00km² |
- |
260M |
1 |
Hessigheim |
UAV Camera/Lidar |
PC/Mesh |
0.19km² |
- |
125.7M/36.76M |
1 |
UrbanScene3D |
UAV Camera/Lidar |
PC/Image |
3.03km² |
31k |
120M |
6 |
U(Ours)
|
UAV Camera/Lidar |
PC/Image |
6.67km² |
4.6k |
627.5M |
6 |
Table 2. This table presents detailed coverage of scene reconstruction. We ensure that the size of each
scene is maintained at approximately 1 km². This constraint limits the variation in lighting effects
caused by the sun. The density of our point cloud is 20 cm per point. The raw data consists solely of DJI
raw data and does not include the post-processed point cloud from the DJI Terra. The "Avg Height" denotes
the average height of the drone's flight path relative to the altitude from which the drone took off. This
height is consistently higher than that of the tallest local building. It is important to note that the
maximum effective distance for LIDAR detection should be less than 250 m.
Scene |
Area in km² |
Image Number |
Points Number |
Raw Data in GB |
Avg Height in m |
Resolution |
Lower Campus |
1.020 |
670 |
79,767,884 |
12.5 |
120 |
5472 × 3648 |
Upper Campus |
0.923 |
715 |
94,218,901 |
13.5 |
120 |
5472 × 3648 |
HAV |
0.815 |
424 |
26,759,799 |
7.8 |
120 |
5472 × 3648 |
LFLS |
1.467 |
1106 |
98,547,710 |
19.8 |
150 |
5472 × 3648 |
SMBU |
0.908 |
563 |
283,31,405 |
16.2 |
150 |
5472 × 3648 |
SZIIT |
1.557 |
1215 |
58,979,628 |
22.3 |
136 |
5472 × 3648 |
Total |
6.668 |
4693 |
627,500,327 |
92.1 |
Nan |
Nan |
Table 3. This table displays the results obtained when testing our dataset with different methods,
including two NeRF-based methods and 3DGS (3D Gaussian Splatting). We measured the training time in terms
of GPU count multiplied by training time in minutes. For training and evaluating the Gaussian Splatting
results, we used the official implementation of Gaussian Splatting. Meanwhile, the NeRF Studio
implementation was utilized for Instant-NGP and NeRFacto to conduct training and evaluation.
Method |
Gaussian Splatting |
Instant NGP |
NeRFacto |
Scene |
PSNR ↑ |
SSIM ↓ |
LPIPS ↓ |
Time (GPU·min) |
PSNR ↑ |
SSIM ↑ |
LPIPS ↓ |
Time (GPU·min) |
PSNR ↑ |
SSIM ↑ |
LPIPS ↓ |
Time (GPU·min) |
Lower Campus |
24.76 |
0.735 |
0.343 |
58 |
20.76 |
0.516 |
0.817 |
220 |
17.70 |
0.455 |
0.779 |
1692 |
Upper Campus |
25.49 |
0.762 |
0.273 |
64 |
20.25 |
0.522 |
0.816 |
392 |
18.66 |
0.448 |
0.734 |
1704 |
HAV |
26.14 |
0.805 |
0.237 |
62 |
20.79 |
0.511 |
0.792 |
268 |
16.95 |
0.399 |
0.727 |
1788 |
LFLS |
22.03 |
0.678 |
0.371 |
71 |
18.64 |
0.453 |
0.856 |
348 |
15.05 |
0.364 |
0.879 |
1780 |
SMBU |
23.90 |
0.784 |
0.248 |
63 |
18.37 |
0.507 |
0.810 |
252 |
16.61 |
0.405 |
0.682 |
1716 |
SZIIT |
24.21 |
0.749 |
0.326 |
64 |
19.64 |
0.551 |
0.820 |
276 |
17.28 |
0.462 |
0.781 |
1732 |
Avg |
24.42 |
0.752 |
0.300 |
63.7 |
19.74 |
0.510 |
0.815 |
292.7 |
17.04 |
0.422 |
0.764 |
1735.3 |
Table 4. Chamful Distance Between Downsampled Lidar and Reconstructed Point Cloud
Method |
3DGS |
Instant NGP |
NeRFacto |
Scene |
Mean ↓ |
STD ↓ |
Mean ↓ |
STD ↓ |
Mean ↓ |
STD ↓ |
Lower Campus |
0.079 |
0.207 |
0.123 |
0.378 |
0.067 |
0.198 |
Upper Campus |
0.096 |
0.312 |
0.082 |
0.260 |
0.050 |
0.170 |
HAV |
0.124 |
0.305 |
0.177 |
0.497 |
0.065 |
0.205 |
LFLS |
0.248 |
0.192 |
0.228 |
0.314 |
0.277 |
0.245 |
SMBU |
0.186 |
0.440 |
0.153 |
0.458 |
0.066 |
0.240 |
SZIIT |
0.064 |
0.168 |
0.136 |
0.438 |
0.034 |
0.110 |
Avg |
0.133 |
0.271 |
0.149 |
0.391 |
0.093 |
0.194 |