Nowadays, artificial intelligence is becoming increasingly popular. The term refers to the implementation of human intelligence in machines designed to learn...
Image Recognition in Python based on Machine Learning – Example & Explanation for Image Classification Model
How does Image Recognition Works?
The brain consists of neurons and weights connecting between them. Machine learning Algorithms follow the same design of brain structure as it has neurons in the so-called layer and weights connecting between them that are updated according to a specific loss function. Different neural networks mimic different brain functionalities. For example, recurrent neural networks mimic the memory part of the brain. One of the applications of convolutional neural Networks is brain functionality related to vision and Image recognition. This is the main focus of our article.
Applications of Image Recognition
Image Recognition is one of the key boosting metrics in today’s technology. It can be applied in a lot of domains. For example, in the gaming domain, many interesting features are offered that weren’t possible before without image recognition. Face recognition is used in one of the top-selling game, Honor of Kings, to identify user ages. Another application of Image recognition is in the medical sector. Medical images are trained on a revolutionized image recognition technology to detect several diseases much easier with minimal human interference. Skinvision is a healthcare app that can detect skin cancer with only your phone camera. Nevertheless, the car industry is investing at a fast pace in image recognition. It can enable speed prediction of the car by monitoring the behaviour of other moving objects and locations. Also, Researchers are close to image recognition that gives a chance to cars to see during the dark.
According to their website, “SkinVision introduces an integrated dermatology service as a preventive health medium that helps you stay on top of your skin health.” This app helps detect skin cancer by self-monitoring a mole on the skin and assessing the risk. Users can use the camera on their smartphone to take a picture(s) of the problem spots on their skin. Using AI, the app takes 30 seconds to conduct the scan looking for signs of cancer. A report is generated of low, medium, and high risk. SkinVision sets reminders for the users to retake the assessment. Image recognition experts keep track, and if a risk is detected, the user is immediately notified to approach their doctor.
How does Image recognition work in python
Image recognition in python gives an input image to a Neural network (the most popular neural network used for image recognition is Convolution Neural Network). This is the main focus of our article that will be discussed in detail shortly. The task is split mainly into two categories:
1. Classification of the image to a single category /multiple categories.
2. Identification of certain objects in an Image ( This can be done only for the purpose of detection, segmentation, object tracking in videos, etc..)
Though final Tasks are different but the algorithm used in the neural network is the same. The flow is as follows:
The Input image consists of pixels. If it is a grayscale Image (B/W Image), it is displayed as a 2D array, and each pixel takes a range of values from 0 to 255. If it is RGB Image (coloured Image), it is transformed into a 3D array where each layer represents a colour.
Let’s Discuss the Process step by step. We will tackle the layer in three main points for the first three steps: purpose, operation, and output.
1. Convolutional layer:
Purpose: Detect certain features in the image.
Operation: The convolution of Input Image and feature detector (or filter) is used to detect certain features in the image. Convolution occurs in the same manner as digital signal processing. Convolution occurs in the same manner as digital signal processing. Feature detector values can be predetermined if you know what features to extract from the image, or values can be initialized randomly, and the network training process determines the best filter values that fit our model.
Output: The output of this layer is called a feature map. The size of the feature map is less than the size of the image. This has the advantage of making the computation process easier. A point to elaborate is that part of image information is lost due to decreased output size. However, this doesn’t cause a problem because the feature map’s values are different from the original image as they represent the locations where the highest detection of the filter is performed.
2. Relu Rectifier:
Purpose: increase non-linearity of images so they can be easily separable. Normally, images are highly non-linear because there are many details related to intensity, borders, etc. The convolutional layer can result in linear feature maps, so this step is highly crucial.
Operation: A relu rectifier is applied to the feature map
Output: The output of this layer is a feature map with higher non-linearity.
3. Maximum Pooling layer:
Purpose: Distinguish features if they are distorted. The main purpose is to detect features even if there is a slight difference in the feature itself.
Operation: Maximum pooling finds the maximum value of a certain window. The maximum pooling Layer shifts to the left by a certain number of steps called strides.
Output: Output of this layer is pooled feature map. Pooled feature map has multiple advantages. The output size is always smaller. Maximum values are still present, and these are the locations of highest similarity with the featured filter. In addition, more than 75% of image information that isn’t related to features or is useless are removed. In addition, the Feature map becomes prominent to distortion if the feature value is shifted from its location.
Convolutional and MaxPool layers can be repeated more than once according to our machine learning problem. Then, We add MLP to the existing CNN. The main purpose of this step is to increase the number of feature attributes to make better class predictions.
Numbers are taken row by row, column by column and put in a single column. The main purpose of this step is to convert matrix output from the previous layer to a format that can be accepted by ANN.
5. Fully Connected Layer
This is an artificial neural network where input is the flattened layer, followed by a group of fully connected layers—finally, the output layer according to categories that we have or objects that need to be detected.
Practical Example for Creating a Simple Image Classification Model in Python
Let’s discuss a practical example in python. We will examine a simple classification problem. Data preprocessing and augmentation are basic steps to be used in any image classification problem without modification, while the model structure is modified according to the problem at hand. We will write the code and discuss it shortly, along with every step.
import tensorflow as tf from tensorflow.keras.preprocessing.image import ImageDataGenerator
ImageDataGenerator library is needed to perform data preprocessing.
train_datagen = ImageDataGenerator( rescale =1./255, zoom_range = 0.2, horizontal_flip = True) training_set = train_datagen.flow_from_directory('link to dataset directory',target_size = (64, 64), batch_size = 32,class_mode = 'binary')
For the training dataset, the upper code is used. Transformation as (zoom range, horizontal file, ..etc.) is applied to input images to make them in a more generic format to avoid overfitting. The first line initializes all the parameters that you want to apply to your dataset. This includes zooming, flipping, etc.. . Parameter values can be changed according to the final output accuracy of our model. For more information on all options available for data preprocessing, check Keras documentation under the following . Second-line loads training dataset to our file system split it into certain batch size, identify classification mode. The classification model can take one of two options, binary for a single category or categorical for multiple categories. Also, the image target size that is taken as input to the CNN model is initialized.
test_datagen = ImageDataGenerator(rescale = 1./255) test_set = test_datagen.flow_from_directory('dataset/test_set', target_size = (64, 64),batch_size = 32, class_mode = 'binary')
For testing the dataset, the code is different. In this case, code is in the production phase, so no adjustment such as flip, zoom, etc.. can be applied to the image only feature scaling. The second line is similar to the case in the training dataset.
Model definition and training
Model definition and training are done in 4 main steps:
1. First Step: Initialize an instance of the class
cnn = tf.keras.models.Sequential()
2. Second Step: Initialize convolutional Network
- Build Initial convolutional layer of CNN with an input shape corresponding to target image output. Note that filter and kernel size varies accordingly.
cnn.add(tf.keras.layers.Conv2D(filters=5, kernel_size=3, activation='relu', input_shape=[64, 64, 3])))
- Add Maximum pooling layer, where pool size and strides can vary accordingly.
- Add Convolutional + Maximum pooling layer according to required network architecture.
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu')) cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
- Add Flattening layer
- Add Artificial Neural Network, where layers and number of neurons can vary accordingly.
- Add final layer output, where several neurons are according to categories.
3. Third Step: Compiling CNN
- There are multiple parameters to be initialized in compiling the CNN model, optimizer, loss function and metric to measure the model’s performance.
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
4. Fourth Step: Training CNN on the training set and evaluation on the testing dataset.
cnn.fit(x = training_set, validation_data = test_set, epochs = 5)
Above is a code snippet for the last 5 epochs of the output result when applying the previous steps with the same model structure on a cat-vs-dog dataset. It consists of a huge dataset of photos of different types and poses of cats and dogs. The model is then trained to identify a photo if it is for a cat or a dog. As we can observe from the above results, there are 2 main parameters to identify model accuracy. Training accuracy is accuracy per epoch as measured on the training dataset. In addition, testing accuracy is accuracy per epoch as measured on the testing dataset. As we can observe, both training and testing accuracy have an increasing value over several epochs. This shows that the current model structure performs well on our dataset without being prone to overfitting or underfitting.
ML/AI applications became an important part of our life, interfering in most daily tasks without our knowledge. The main edge of image recognition is that it enabled us to interact visually with the environment actively. As a result, image recognition gave us the chance to innovate in multiple domains. The sectors are still open, and innovation options are limitless. Therefore, there is always a chance to improve people lives and easily start your own business by carefully examining the market and looking for a chance of improvement using AI generally and image recognition specifically.
Image Segmentation is a process of partitioning images into sets of pixels (segments) that correspond to objects on the image. This...
Sentiment analysis is the way of identifying a sentiment of a text. In this case, sentiment is understood very broadly. It...