Orange trees detection with YOLO v5 in UAV Imagery

Object Detection

Object Detection is one of the most famous and extensively researched topics in the field of Machine Vision. To understand Object Detection in simplistic terms, it deals with identifying and localizing some of the classes such as person, car, bus, spoon, etc. from the image. This can be achieved by drawing a bounding box around the given specific target class.

Object detection is a computer vision technique that works to identify and locate objects within an image or video. Specifically, object detection draws bounding boxes around these detected objects, which allow us to locate where said objects are in (or how they move through) a given scene.

Object detection is commonly confused with image recognition, so before we proceed, it’s important that we clarify the distinctions between them.

Image recognition assigns a label to an image. A picture of a tree receives the label “tree”. A picture of two trees, still receives the label “tree”. Object detection, on the other hand, draws a box around each tree and labels the box “tree”. The model predicts where each object is and what label should be applied. In that way, object detection provides more information about an image than recognition.

Here’s an example of how Object detection works in practice:


YOLO — You Only Look Once is an algorithm proposed by by Redmond et. al in a research article published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) as a conference paper, winning OpenCV People’s Choice Award.

Compared to the approach taken by object detection algorithms before YOLO, which repurpose classifiers to perform detection, YOLO proposes the use of an end-to-end neural network that makes predictions of bounding boxes and class probabilities all at once.

Following a fundamentally different approach to object detection, YOLO achieves state-of-the-art results beating other real-time object detection algorithms by a large margin

The YOLO algorithm works by dividing the image into N grids, each having an equal dimensional region of SxS. Each of these N grids is responsible for the detection and localization of the object it contains. Correspondingly, these grids predict B bounding box coordinates relative to their cell coordinates, along with the object label and probability of the object being present in the cell.

This process greatly lowers the computation as both detection and recognition are handled by cells from the image. It brings forth a lot of duplicate predictions due to multiple cells predicting the same object with different bounding box predictions.

YOLO makes use of Non Maximal Suppression to deal with this issue. In Non Maximal Suppression, YOLO suppresses all bounding boxes that have lower probability scores. YOLO achieves this by first looking at the probability scores associated with each decision and taking the largest one. Following this, it suppresses the bounding boxes having the largest Intersection over Union with the current high probability bounding box.

This step is repeated till the final bounding boxes are obtained.

YOLOv5 is an open-source project that consists of a family of object detection models and detection methods based on the YOLO model pre-trained on the COCO dataset. It is maintained by Ultralytics and represents the organization’s open-source research into the future of Computer Vision works.

After the release of YOLO v4, within just two months of period, an another version of YOLO has been released called YOLO v5 ! It is by the Glenn Jocher, who already known among the community for creating the popular PyTorch implementation of YOLO v3.

On June 9, 2020, Jocher stated that his YOLO v5 implementation is publicly released and is recommended to use in new projects. However he did not publish a paper to accompany his release, when initially releasing this new version.

YOLO v5 is different from all other prior releases, as this is a PyTorch implementation rather than a fork from original Darknet. Same as YOLO v4, the YOLO v5 has a CSP backbone and PA-NET neck. The major improvements includes mosaic data augmentation and auto learning bounding box anchors.

Orange trees UAV Image

The image used in this analysis was obtained from the openaerialmap website. This image is from a plantation of orange trees in the region of São José do Rio Preto — SP — Brazil.

The boundboxes were collected in using the image as a reference. After collection, the complete image is partitioned into several patches, along with the shapefile representing the labels. For YOLO compatibility, images are converted from .tiff to .jpg. The coordinates of each label’s boundboxes are converted into row and column values.


Patches and dataframe with annotations are separated into training data and validation data.


To train a Yolo V5 model, a few things need to be downloaded from the internet.

In a Notebook, the easiest is to download and setting up your environment using terminal commands directly from your notebook, as follows:

  • Clone the yolo V5 repository from GitHub

This will create a folder called ‘yolov5’ on your machine. This folder will contain everything you need further on, including pre-trained weights for the model, and a specific directory structure.

  • Install pytorch and other required packages

You need to create a folder called data at the same level as your yolov5 folder. In this data folder you need to create a folder for images and a folder for labels. Inside each of them, you make a folder for train data and a folder for validation data.

The directory tree for training a Yolo V5 model

If files are not placed in the right directory, you are likely to encounter errors later on.

Yolo V5 Data Format

The images

The images have to be directly in the image folders. Training images in the data/images/train folder and validation images in the data/images/valid folder. The names of the images have to be simply unique names with a .jpg (or another format).

The labels

The labels have to be in the data/labels/train/ or in the data/labels/valid. The name of the labels file has to be the same name as the image, but with “.txt” instead of “.jpg”.

The bounding boxes have to be listed as one bounding box per line, with on that line:

  • the class number of the object in the bounding box (always 0 if only one class)
  • the standardized center pixel of the bounding box in terms of width
  • the standardized center pixel of the bounding box in terms of height
  • the standardized width of the bounding box
  • the standardized height of the bounding box

Standardization is done by dividing the number of pixels by the total number of pixels of the image. So a bounding box on pixel (10, 20) with a width of 30x40 on a picture of size (100, 100) would be standardized to (0.1, 0.2, 0.3, 0.4).

Start the Model Training

To start training a Yolo V5 model you need two YAML files.

The first YAML is to specify:

  • where your training data is,
  • where your validation data is,
  • the number of classes (types of objects) that you want to detect,
  • and the names corresponding to those classes.

Training is done using the terminal command, which you can execute from your notebook.

There are multiple hyper-parameters that you can specify, for example, the batch size, the number of epochs, and the image size. You then specify the locations of the two yaml files that we just created above. You also specify a name, which is important later on to find your results.

Running this line will create a sub-folder in yolov5 that contains the weights of this trained model, which you can then reuse in step 5.

Those folders will always be created in the same directory: yolov5/runs/exp0_yourname/…

Exp0 will augment to exp1 etc when you rerun the “train” command.


After finishing the training, we can observe the training metrics:

Detect orange trees in new images

Now for the final phase, you will want to detect objects on unseen photos. This is done using the terminal command, which will generate a new folder with outputs. You can either generate pictures with the bounding boxes drawn on them, or you can generate text files with the locations of the bounding boxes.

This is done as follows:


The final results were very expressive, given the small amount of samples collected. Through YOLOv5 it was possible to create a detection and counting model for orange trees, facilitating precision agricultural analysis.



Deep Learning Computer Vision for Remote Sensing Images