asper brothers team
Mariusz Interewicz Updated: 2 Feb 2023 8 min to read

Image Recognition in Python based on Machine Learning – Example & Explanation for Image Classification Model

Let’s look at the photo below to understand how image classification works in our brains. The above photo can be interpreted as an old or a young woman. The dilemma occurs because image features can be interpreted in two different ways. This explains how our brain performs any image classification task. The brain tries to extract certain features out of the image. According to the extracted features, classification is performed.


picture ilusion


How does Image Recognition Works?

The brain consists of neurons and weights connecting between them. Machine learning Algorithms follow the same design of brain structure as it has neurons in the so-called layer and weights connecting between them that are updated according to a specific loss function. Different neural networks mimic different brain functionalities. For example, recurrent neural networks mimic the memory part of the brain. One of the applications of convolutional neural Networks is brain functionality related to vision and Image recognition. This is the main focus of our article.

Applications of Image Recognition

Image Recognition is one of the key boosting metrics in today’s technology. It can be applied in a lot of domains. For example, in the gaming domain, many interesting features are offered that weren’t possible before without image recognition. Face recognition is used in one of the top-selling game, Honor of Kings, to identify user ages. Another application of Image recognition is in the medical sector. Medical images are trained on a revolutionized image recognition technology to detect several diseases much easier with minimal human interference. Skinvision is a healthcare app that can detect skin cancer with only your phone camera. Nevertheless, the car industry is investing at a fast pace in image recognition. It can enable speed prediction of the car by monitoring the behaviour of other moving objects and locations. Also, Researchers are close to image recognition that gives a chance to cars to see during the dark.



According to their website, “SkinVision introduces an integrated dermatology service as a preventive health medium that helps you stay on top of your skin health.” This app helps detect skin cancer by self-monitoring a mole on the skin and assessing the risk. Users can use the camera on their smartphone to take a picture(s) of the problem spots on their skin. Using AI, the app takes 30 seconds to conduct the scan looking for signs of cancer. A report is generated of low, medium, and high risk. SkinVision sets reminders for the users to retake the assessment. Image recognition experts keep track, and if a risk is detected, the user is immediately notified to approach their doctor.


Implementing image recognition offers transformative benefits for businesses, optimizing everything from inventory management to customer service. Our experience developing software for startups has demonstrated its vast potential. By automating key processes and enhancing data accuracy, this technology not only speeds up operations but also sharpens decision-making with precise analytics. The adoption of image recognition has allowed our clients to achieve rapid growth and improved efficiency, underscoring its crucial role in the competitive digital marketplace. Mike Jackowski COO, ASPER BROTHERS Contact Me


How does Image recognition work in python

Image recognition in python gives an input image to a Neural network (the most popular neural network used for image recognition is Convolution Neural Network). This is the main focus of our article that will be discussed in detail shortly. The task is split mainly into two categories:

1. Classification of the image to a single category /multiple categories.

2. Identification of certain objects in an Image ( This can be done only for the purpose of detection, segmentation, object tracking in videos, etc..)

Though final Tasks are different but the algorithm used in the neural network is the same. The flow is as follows:

image recognition path


The Input image consists of pixels. If it is a grayscale Image (B/W Image), it is displayed as a 2D array, and each pixel takes a range of values from 0 to 255. If it is RGB Image (coloured Image), it is transformed into a 3D array where each layer represents a colour.

Let’s Discuss the Process step by step. We will tackle the layer in three main points for the first three steps: purpose, operation, and output.


1. Convolutional layer:

Purpose: Detect certain features in the image.

Operation: The convolution of Input Image and feature detector (or filter) is used to detect certain features in the image. Convolution occurs in the same manner as digital signal processing. Convolution occurs in the same manner as digital signal processing. Feature detector values can be predetermined if you know what features to extract from the image, or values can be initialized randomly, and the network training process determines the best filter values that fit our model.

Output: The output of this layer is called a feature map. The size of the feature map is less than the size of the image. This has the advantage of making the computation process easier. A point to elaborate is that part of image information is lost due to decreased output size. However, this doesn’t cause a problem because the feature map’s values are different from the original image as they represent the locations where the highest detection of the filter is performed.

2. Relu Rectifier:

Purpose: increase non-linearity of images so they can be easily separable. Normally, images are highly non-linear because there are many details related to intensity, borders, etc. The convolutional layer can result in linear feature maps, so this step is highly crucial.

Operation: A relu rectifier is applied to the feature map

Output: The output of this layer is a feature map with higher non-linearity.

3. Maximum Pooling layer:

Purpose: Distinguish features if they are distorted. The main purpose is to detect features even if there is a slight difference in the feature itself.

Operation: Maximum pooling finds the maximum value of a certain window. The maximum pooling Layer shifts to the left by a certain number of steps called strides.

Output: Output of this layer is pooled feature map. Pooled feature map has multiple advantages. The output size is always smaller. Maximum values are still present, and these are the locations of highest similarity with the featured filter. In addition, more than 75% of image information that isn’t related to features or is useless are removed. In addition, the Feature map becomes prominent to distortion if the feature value is shifted from its location.

Convolutional and MaxPool layers can be repeated more than once according to our machine learning problem. Then, We add MLP to the existing CNN. The main purpose of this step is to increase the number of feature attributes to make better class predictions.

4. Flattening

Numbers are taken row by row, column by column and put in a single column. The main purpose of this step is to convert matrix output from the previous layer to a format that can be accepted by ANN.

5. Fully Connected Layer

This is an artificial neural network where input is the flattened layer, followed by a group of fully connected layers—finally, the output layer according to categories that we have or objects that need to be detected.


Practical Example for Creating a Simple Image Classification Model in Python

Let’s discuss a practical example in python. We will examine a simple classification problem. Data preprocessing and augmentation are basic steps to be used in any image classification problem without modification, while the model structure is modified according to the problem at hand. We will write the code and discuss it shortly, along with every step.


Import libraries

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

ImageDataGenerator library is needed to perform data preprocessing.

Data Preprocessing

train_datagen = ImageDataGenerator( rescale =1./255, zoom_range = 0.2, horizontal_flip = True)
training_set = train_datagen.flow_from_directory('link to dataset directory',target_size = (64, 64),
batch_size = 32,class_mode = 'binary')

For the training dataset, the upper code is used. Transformation as (zoom range, horizontal file, ..etc.) is applied to input images to make them in a more generic format to avoid overfitting. The first line initializes all the parameters that you want to apply to your dataset. This includes zooming, flipping, etc.. . Parameter values can be changed according to the final output accuracy of our model. For more information on all options available for data preprocessing, check Keras documentation under the following link. Second-line loads training dataset to our file system split it into certain batch size, identify classification mode. The classification model can take one of two options, binary for a single category or categorical for multiple categories. Also, the image target size that is taken as input to the CNN model is initialized.

test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('dataset/test_set',
target_size = (64, 64),batch_size = 32, class_mode = 'binary')

For testing the dataset, the code is different. In this case, code is in the production phase, so no adjustment such as flip, zoom, etc.. can be applied to the image only feature scaling. The second line is similar to the case in the training dataset.

Model definition and training

Model definition and training are done in 4 main steps:

1. First Step: Initialize an instance of the class

cnn = tf.keras.models.Sequential()

2. Second Step: Initialize convolutional Network

  • Build Initial convolutional layer of CNN with an input shape corresponding to target image output. Note that filter and kernel size varies accordingly.
cnn.add(tf.keras.layers.Conv2D(filters=5, kernel_size=3, activation='relu', input_shape=[64, 64, 3])))


  • Add Maximum pooling layer, where pool size and strides can vary accordingly.
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))


  • Add Convolutional + Maximum pooling layer according to required network architecture.
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))


  • Add Flattening layer


  • Add Artificial Neural Network, where layers and number of neurons can vary accordingly.
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))


  • Add final layer output, where several neurons are according to categories.
cnn.add(tf.keras.layers.Dense(units=1, activation='relu'))


3. Third Step: Compiling CNN

  • There are multiple parameters to be initialized in compiling the CNN model, optimizer, loss function and metric to measure the model’s performance.
 cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])


4. Fourth Step: Training CNN on the training set and evaluation on the testing dataset.

 cnn.fit(x = training_set, validation_data = test_set, epochs = 5)


Evaluating Results


image recogition


Above is a code snippet for the last 5 epochs of the output result when applying the previous steps with the same model structure on a cat-vs-dog dataset. It consists of a huge dataset of photos of different types and poses of cats and dogs. The model is then trained to identify a photo if it is for a cat or a dog. As we can observe from the above results, there are 2 main parameters to identify model accuracy. Training accuracy is accuracy per epoch as measured on the training dataset. In addition, testing accuracy is accuracy per epoch as measured on the testing dataset. As we can observe, both training and testing accuracy have an increasing value over several epochs. This shows that the current model structure performs well on our dataset without being prone to overfitting or underfitting.



ML/AI applications became an important part of our life, interfering in most daily tasks without our knowledge. The main edge of image recognition is that it enabled us to interact visually with the environment actively. As a result, image recognition gave us the chance to innovate in multiple domains. The sectors are still open, and innovation options are limitless. Therefore, there is always a chance to improve people lives and easily start your own business by carefully examining the market and looking for a chance of improvement using AI generally and image recognition specifically.


Call to action
Looking for support in implementing AI solutions based on machine learning? Schedule a free consultation and see how you can boost your digital product.
default avatar asper brothers



Are you interested in news from the world of software development? Subscribe to our newsletter and receive a list of the most interesting information.


    Download our Free MVP Prompt Template

    Identify the best technology for your image recognition app with our ChatGPT Prompt

      RELATED articles