Machine Learning-based Vineyard Detection with Robosat

Isaac Boates
7 min readMay 13, 2021

In this article I will detail my experience with using the Robosat library for classifying a single land-use type (vineyards) on satellite imagery from Mapbox.

About Robosat

Robosat is an open source library published by Mapbox for identifying features on satellite imagery. For me, its biggest advantage is that is streamlines the process of data acquisition, labelling and processing by using a Web Map Tile Service (WMTS) directory structure as its accepted data format.

I have a very hard time finding a good and simple explanation of what this directory structure looks like, so suffice it to say that it is a single folder containing subfolders named after zoom levels (1–20), each of which then contains a folder named after a column index number, each of which contains images named after a row index number. In this way, full coverage of the earth is broken into a grid whose squares always have the same number of pixels, despite coming together to form a mosaic of increasingly finer resolution as one “zooms in”.

Visual representation of a WMTS, each successive “layer” (top left to bottom right) is actually a parent folder named after an increasing zoom number, which then contains a subfolder named after the column index number, which then contains an image named after the row index number. Image is taken from the QGIS documentation.

A WMTS then serves these tiles via a typical REST API in which the user must supply only the zoom level, column index and row index, which returns the image at that location.

One can easily see how this system lends itself well to creating machine learning datasets, because there is already a standardized method of partitioning the world into regular-sized images. Further to this, the images have a known geographic reference, which means that it is trivial to use a geographic vector polygon dataset as training labels.

Robosat has been abandoned by Mapbox due to its author leaving the company, but it still does work. One limitation to note however is that it can only perform binary classifications (i.e. “Present” and “Not Present”) for any given land use type.

Vineyard Detection

I wanted to test Robosat out on something that had not already been done in its sample applications, which focused mostly on buildings, buildings and roads. I did this in my free time, so the selected feature had to be simple enough that I could personally digitize the training polygons for them. This meant that it had to be somewhat abundant and easily identifiable. Further to this, I wanted it to be something that I knew would be difficult or even impossible for classical pixel-based classification methods to detect. That is why I settled on vineyards. They satisfied all the criteria:

  • Visually distinct, so I can easily digitize training data
  • Abundant in various places (especially France)
  • Classical pixel-based remote sensing would be difficult if not impossible to identify them*

*There are indeed augmentation methods and ways that classical methods could be used, such as texture/roughness analysis or using hyperspectral imagery. I still think the former would be rather difficult and the latter is quite expensive, especially if you wanted to scale up to larger and larger study areas. Object-based detection is probably also suitable, but in my experience this method tends to be cumbersome and you usually need rather expensive software to do it effectively.

One last reason why I was interested in using vineyards as the target land cover type was due to the fact that orchards are found nearby. When orchards are photographed from a satellite, they can look somewhat similar to vineyards. A human can usually identify the difference, as while orchards are also commonly planted in rows, there tends to be a sufficiently distinct gap between the plants to make it clear that the plot is composed of trees and not rows of vines.

An orchard (blue) and a vineyard (red) in close proximity. Indeed this specific example is rather clear to a human which is which. But it may not always be so obvious. Imagery from Mapbox Satellite.

Data Acquisition

Robosat offers a means of extracting data directly from an OpenStreetMap .pbf dump file (available from Geofabrik) via its extract command, but it is limited to only the default example features (buildings, parking lots and roads). So I opted to use Overpass Turbo, which allowed me to build a custom query to select and download polygons tagged as being a vineyard from OpenStreetMap for a small region of France.

Once downloaded, I loaded the Mapbox Satelite WMTS layer into QGIS and noticed that not all vineyards were actually digitized, and some features were not quite precisely covering their intended vineyards. Some of the features also covered the gaps in between individual vineyards. This is not a huge problem and it’s to be expected when dealing with crowd-sourced data. To be honest I was glad I just had a starting point at all. But ultimately, The labels needed to be accurate and precise, since I knew I wasn’t going to have the overwhelming amount required to cancel out small errors. So I spent an afternoon cleaning up the existing features and also digitizing missing features around the area.

A sample of the digitized vineyards. Imagery from Mapbox Satellite.

The full training data acquisition and processing pipeline is detailed on the Robosat repository, so I won’t go into detail about it other than to express them as simple steps here:

  1. Use the “cover” command to create a .csv file containing the zoom level and row/column indices of the images which will be downloaded to be used as training data.
  2. Use the “download” command to download the images indicated in the cover .csv
  3. Use the “rasterize” command to convert the training polygons into training masks to be used when training the network on the images.

I compiled these steps into a bash script (which also uses a secondary Python script) for doing these steps, plus splitting the data into a training, validation and holdout set:

prepare.sh

#!/bin/bash
set -e
cd ..

# Parse arguments
zoom=$1
frac_train=$2
frac_validate=$3
frac_holdout=$4
mapbox_access_token=$5

# Make folder structure for dataset
mkdir -p vineyards/dataset
mkdir -p vineyards/dataset/training
mkdir -p vineyards/dataset/training/images
mkdir -p vineyards/dataset/training/labels
mkdir -p vineyards/dataset/validation
mkdir -p vineyards/dataset/validation/images
mkdir -p vineyards/dataset/validation/labels
mkdir -p vineyards/dataset/holdout
mkdir -p vineyards/dataset/holdout/images
mkdir -p vineyards/dataset/holdout/labels

./rs cover --zoom $zoom vineyards/data/vineyards.geojson vineyards/data/vineyards-cover.csv

echo "Downloading tiles..."
./rs download --ext png https://api.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}@2x.png?access_token=$mapbox_access_token vineyards/data/all_classes-cover.csv vineyards/dataset/holdout/images

echo "Rasterizing..."
./rs rasterize --zoom $zoom --dataset vineyards/config/model-unet.toml vineyards/data/vineyards.geojson vineyards/data/all_classes-cover.csv vineyards/dataset/holdout/labels

echo "Splitting data into train/validate/holdout..."
cd vineyards
python create_dataset.py $zoom $frac_train $frac_validate $frac_holdout
cd ..

create_dataset.py

import os
from random import shuffle
from pathlib import Path
from shutil import move
import argparse
from tqdm import tqdm


def copy_image_and_label(img, group):
os.makedirs(Path("dataset", group, "images", *img.parts[3:-1]), exist_ok=True)
os.makedirs(Path("dataset", group, "labels", *img.parts[3:-1]), exist_ok=True)

image_dst = Path("dataset", group, *img.parts[2:])
move(img, image_dst)

label = Path(*[p.replace("images", "labels") for p in img.parts])
label_dst = Path("dataset", group, "labels", *img.parts[3:])

move(label, label_dst)


def main(args):

if sum([args.frac_train, args.frac_validate, args.frac_holdout]) - 1 > 0.00001:
raise ValueError("'frac_train', 'frac_validate' and 'frac_holdout' must sum to 1.")

imgs = [p for p in list(Path("dataset/holdout/images").rglob("**/*.png"))]
shuffle(imgs)
validate_imgs_start_idx = int(len(imgs) * args.frac_train)
holdout_imgs_idx = int(validate_imgs_start_idx + (len(imgs) * args.frac_validate))

for img in tqdm(imgs[:validate_imgs_start_idx], desc="Training Set:"):
copy_image_and_label(img, "training")
for img in tqdm(imgs[validate_imgs_start_idx:holdout_imgs_idx], desc="Validation Set:"):
copy_image_and_label(img, "validation")


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("zoom", type=int)
parser.add_argument("frac_train", type=float)
parser.add_argument("frac_validate", type=float)
parser.add_argument("frac_holdout", type=float)
main(parser.parse_args())

Training

I trained my model using a GPU instance from Lambda Labs. After spinning up an instance, it was as simple as installing the dependencies (as outlined in the repository’s requirements.in file).

I had only 381 images in my training set (and 67 in the validation set), including quite a few hard-negatives of pure forest or built-up areas. So the training only took a few hours on a 2-GPU instance. Of course, I had to do the process a few times as I discovered problems in how I had labelled the data and configured the training. But eventually, after the loss stabilized, I shut it down.

Results

To see how the model performed on a new dataset, I batch-downloaded a large number of image tiles at the same scale, but this time encompassing an area which was nearby, but otherwise totally new to the model. It contained vineyards, orchards, forests, and built-up areas. I then used the trained model to predict where vineyards should be on these tiles.

Despite the very low number of training images, the classification accuracy appears to be quite good. I did not get into quantitative classification accuracy, as I was only interested in playing around with the library to see what could be done. But a rough visual inspection confirms that the model is indeed performing quite well. I am very impressed by how well it performed with such a small training set, and I think that a larger training set would probably yield even better results.

Sample of model prediction results on new dataset. Imagery from Mapbox Satellite.

Conclusion

Robosat is very effective at doing a binary classifiction of vineyards vs. everything else. Especially notable is how it was able to do this with such a small training dataset. It is unfortunate that only binary classifications are possible, but I could imagine that it could be possible to train multiple binary models, and then have an additional processing script which could reconcile conflicts between predictions using ancillary data.

--

--