Vision Trasnformers Architecture

Convolutional neural networks have been widely used in computer vision tasks in recent years as the state of the art. Classification, detection and segmentation of images use convolutional filters to extract feature maps from the input images, providing elements for the next layers to perform their specified tasks. However, an architecture that does not use convolutions called Vision Transformers (ViT) has been showing encouraging results in tasks such as image classification and object detection. Is the CNN reign coming to an end?

Convolutional Neural Networks

To learn more about CNNs, visit the link below:

Transformers

The paper ‘Attention Is All You Need’ introduces a…


Fires in natural areas are one of the main environmental problems affecting different parts of the world. The Brazilian Pantanal, for example, suffered in 2020, one of the worst years in relation to the number of fires registered in the biome. Based on FIRMS data obtained through the Google Earth Engine, it shows that in 2020, until the month of November, approximately 60.000 fires were registered, well above the numbers registered in previous years.


Hello, in this post we‘ll use the NDVI temporal variation of different types of use and coverage of our study area to create harmonic time series. After that, we ‘ll use an algorithm to cluster the samples into three classes. For this task, let’s create the script on google Colab, using the following packages: Google Earth Engine Python API for obtaining spectral information, ipygee for converting information into a pandas dataframe, folium for visualization and tslearn for implementing the cluster algorithm.

Installing the packages

Let’s install the necessary packages:

!pip install rasterio
!pip install ipygee
!pip install tslearn
!pip install earthengine-api
!pip …


The Grand Ethiopian Renaissance Dam (GERD) is a dam on the Blue Nile River in Ethiopia that has been under construction since 2011. It is in the Benishangul-Gumuz Region of Ethiopia, about 15 km east of the border with Sudan. At 6.45 gigawatts, the dam will be the largest hydroelectric power plant in Africa when completed, as well as the seventh largest in the world.

Grand Ethiopian Renaissance Dam Localization.

The GERD project is a major tributary of the world’s longest river the Nile River contributing up to 80% of its water during the rainy season.

The Blue Nile originates at Lake Tana in north-western…


One of the most common ways to perform a supervised classification of satellite images is using pixel classification. This type of classification has its advantages, such as low computational cost, being possible to perform it for large areas. However, some problems are common, such as the salt and pepper noise, where pixels of different classes are found scattered in the image. Another problem is the spectral similarity between pixels of different classes. One way to avoid this problem is to use the segmentation technique, where the image is segmented into regions that are defined as unique elements in the classification…


Brazil is one of the main soyabean producers in the world, with the state of Mato Grosso being the largest producer and with the largest area of soyabeans planted. Several initiatives stand out in the mapping of soybean crop fields. An example is the GAAF/UNEMAT SojaMaps. In this post, we will try to obtain good results in mapping soyabean crop fields, using images from ESA Sentinel 1. From Google Earth Engine, we will obtain satellite images and apply the semantic segmentation technique to obtain binary images labeled in soybean and no soybean.

Semantic Segmentation

In Computer Vision Semantic Segmentation is a high-level…

Joao Otavio Nascimento Firigato

Deep Learning Computer Vision for Remote Sensing Images

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store