Image Classification & Object Detection on IBM PowerAI Vision

Illustration-1: Object Recognition result of “Nasi Rames” (Rice with various mixed toppings) on IBM PowerAI Vision (deep learning model was trained in 8000 epochs, while the rest of hyperparameters are left at its default). The modeling (training) is based on Detectron object recognition deep learning architecture, one of supported base models in PowerAI Vision. Provided dataset to train the model is only 38 labelled images with various objects within each of the images and then all images are augmented (e.g. blur, rotation, sharpen, flip, etc.) to generate 1400+ images which become the base dataset for the modeling process.
Illustration-2: Training/Modeling result on Object Detections for “Nasi rames” dataset on IBM PowerAI Vision running on IBM Watson Machine Learning Accelerator (PowerAI). Base model is Detectron, with 8000 epochs (iterations) and other hyper-parameters as illustrated above.
Illustration-3: Login screen to IBM PowerAI Vision using a provided temporary userid.
Illustration-4: Welcome screen of PowerAI Vision v1.1.3 on IBM Cloud.
Illustration-5: A sample dataset, created by taking 38 images of ‘Nasi rames’ (typically steamed white rice with mixed of various toppings such as vegetables, chicken, tofu, egg, meat, crackers, fried onion, etc. served in plate or other variations such as leaf of banana tree) by google image search and uploading those to PowerAI Vision (can be one by one per image, collection of images or images compressed in a zip file). The Images are the labeled one by one by hand (using the provided labeling tool in PowerAI Vision) to define the area for objects that we want to label and train later on.
Illustration-6: Two sample images of ‘Nasi-Rames’ that have been tagged with multiple objects in it. Each of object that we want to define needs to be labeled one by one per image (the polygon line that is being drawn needs to be as closed as possible to the whole boundary of each object) for all the images that we want the dataset to be trained on.
Illustration-7: Available data augmentations in PowerAI Vision, to enrich (create variations) the original dataset.
Illustration-8: Augmented & labeled dataset.
Illustration-9: Base model selections to train our dataset.
Illustration-10: Hyperparameters that we can adjust for the training with a given neural network architecture (in this case: Detectron).
Illustration-11: Progress of the training process based on Detectron neural network architecture. It completed in less than one hour on May 8 2019 after starting the training (get GPU allocation) at 8.28 AM in the morning (GMT+7) for the this specific use-case, with a defined labeled & augmented dataset along with chosen hyperparameters.
Illustration-12: The top-right corner on the screen is showing the GPU utilization in IBM PowerAI Vision on IBM Cloud, either being used for Training/Modeling or running a Deployed models. There are 12 GPUs available, and 1 GPU is being used for Training.
Illustration-13: Some metrics that are shown for the trained model. Loss VS Iteration graph is one of the important measurement in which the loss should be as closed as possible to reach zero, but can not be zero. Model may be optimized for speed or accuracy.
Illustration-14: Confusion Matrix & PR Curve as part of generated metrics from IBM PowerAI Vision.
Illustration-15: List of deployed model that can be accessed through REST-API.
Illustration-16a: A test image with identified objects according to the trained model.
Illustration-16b: A test image with identified objects according to the trained model.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store