Building footprint is a good indicator to see people density across an area.

A building footprint is simply a polygon that surrounds a building. To have building footprints we need to digitize a polygon over building from satellite imagery and to repeat the process for all visible buildings in the target areas. As a result, we know the number of houses in a specific area. This process consumes a lot of time. A normal human could only digitize a coverage area around 0.15 squared kilometres in 3 hours.

Mask R-CNN is a state of the art model for instance segmentation. This model is built on top of Faster R-CNN, which is a region-based CNN. As the output, it returns bounding boxes for each detected object, class label, and confidence score. This algorithm is proven could distinguish objects that are attached to each other.

Step 1 – Satellite/ Aerial imagery is used to create training data.

Step 2 – Creating training samples means we need to manually digitize the building footprints in a certain area. In this case, to speed up the process, the building shapefile of Barcelona was overlaid with the aerial imagery to create the training sample.

Step 3 – In this step, we train a model with generated images and labels from the previous step. The framework used in this case is PyTorch and the backbone used is ResNet50, and the optimal learning rate for training the model is automatically determined by the library.

Step 4 – To get the accuracy we can simply count the number of the ground truth by dividing the number of detected building footprints with its ground truth.

ReferenceBuilding footprint extraction in a dense area with MaskRCNN — Jakarta, Indonesia

Building Footprint Extraction is a project of IaaC, Institute for Advanced Architecture of Catalonia
developed at Master in City & Technology in (2019/2021) by:
Students: Kushal Saraiya
Faculties: Angelos Chronis, Serjoscha Duering, Nariddh Khean