Intelligent Connected Platform

Intelligence at EDGE: Convolution Neural Network based Image Classification using CMSIS-NN kernels on Low-power IoT Edge Device


Driven by the Internet of things (IoT), a new computing model – edge-cloud computing – is currently evolving, which involves extending data processing to the edge of a network in addition to computing in a cloud or on a central data centre. Edge intelligence (EI) is the application of machine learning (ML) algorithms assisting the edge computing devices, cloud environments and advanced networking capabilities. This means that several information technology (IT) and operational technology (OT) industries are moving closer towards the edge of the network so that aspects such as real-time networks, security capabilities to ensure cyber security, self-learning solutions and personalized/customized connectivity can be addressed. People want to have edge computing capability on embedded devices to provide more advanced services, like voice recognition for smart speakers and face detection and recognition for surveillance cameras. Deep Convolution Neural Networks (CNN's) are one of the primary methodology to perform image recognition and image classification. CNNs use a variation of multilayer perception that require minimal pre-processing, based on their shared-weights architecture and translation invariant characteristics.

Artificial Intelligence @QueSSence

Figure 1: QueSSence platform performing for image classification

The QueSSence™ is a platform which combines the best of artificial intelligence and IoT worlds, providing users with a comprehensive platform to rapidly develop AI applications. It has optimized Deep Neural Network (DNN) libraries for edge AI Architecture. The addition of AI capabilities to these Smart Things (IoT) will significantly enhance their functionality and usefulness, especially when the full power of these networked devices is harnessed – a trend that is often called AI on the Edge.

Why run deep learning model on a micro controller?

The QueSSence™ platform, embedded with ultra low power ARM-M4 processor executes with ease the AI applications. The chip also comes with a secure co-processor and connectivity options of Wi-Fi®, Zigbee® and Bluetooth®. QueSSence provides amazing applications of deep learning models present around us. QueSSence takes pre-trained neural network model output from a broad range of the most popular AI frameworks and maps it to an optimized DNN adapted within the memory and processing power constraints.

Overview of CMSIS-NN:

The Arm Cortex-M processor family is a range of scalable, energy-efficient and easy-to-use processors that meet the needs of smart and connected embedded applications. One of the real benefits of Cortex-M is the software ecosystem. Cortex Microcontroller Software Interface Standard (CMSIS) is a vendor-independent hardware abstraction layer for the Cortex-M processor series and defines generic tool interfaces. CMSIS-DSP (Digital Signal Processing) is an important component that provides a DSP library collection with more than 60 functions for various data types: fixed-point (fractional q7, q15, q31) and single precision floating-point (32-bit). The library is optimized for the SIMD instruction set, and programmers can focus on high-level algorithms and rely on the library for audio/image/communication, or any DSP-related low-level firmware implementation.

CMSIS-NN is a collection of optimized neural network library functions for ARM Cortex-M core micro controllers enabling neural networks and machine learning algorithms pushed into the end node of IoT applications.

The embedded world is putting more and more intelligence into end devices, such as smart speakers and surveillance cameras. “Always-on” hardware can provide solutions on the edge without involving cloud services - avoiding concerns surrounding the availability of an Internet connection, and around personal privacy. Based on the Cortex-M Digital Signal Processing (DSP) capabilities, ML has a proven 5x boost on the Cortex-M platform with the new CMSIS-NN software framework. If you'd like to know more about it, you can go through the paper on CMSIS-NN for Arm Cortex-M CPUs (

For a model trained with a popular framework such as TensorFlow, Keras and Caffe. The weights and biases will first be quantized to 8 bit or 16-bit integers before deploying to the micro-controller for inferencing.

Neural network inference based on CMSIS-NN kernels claims to achieve 4.6X improvement in runtime/throughput and 4.9X improvement compared to baseline implementation. The best performance was achieved by leveraging SIMD instructions features of the CPU to exploit the inherent parallelism of Cortex-M4 and Cortex-M7 core micro-controllers, although reference implementation for Cortex-M0 and Cortex-M3 is also available without DSP instructions.

How to Run CMSIS-NN model on the QueSSence:

In the section,we will run the CIFAR10 image classification model on an QueSSence board,

To do this the prerequisites are:

You can access the example project (reference design) in the below link:

  • CMSISNNExamplesARMarm_nn_examplescifar10

For board description please refer below link

1. Add a new target

Open arm_nnexamples_cifar10.uvprojx with Keil Cortex-M4 ARM. The project is created to run on Micro-controller preferred QueSSence board. To run CIFAR10 image classification project on QueSSence board, follow the below instructions:

Go to project:arm_nnexamples_cifar10 then Right-click on the current target, then click "Manage Project Items" button from the menu.

Create a new target, and name it such as "ARMCM4_SP to help you remember its purpose. Highlight your new target and click "Set as Current Target", then "OK".

2. Configure target options

Open the target options and go to the "Device" tab to choose the target micro-controller.

Redpine modules will be visible once the device is connected.

If you cannot find, then you need to install package. Download package from the provided link to run on QueSSence board.(redpinesignals/rs14100_1mb)

Go to the "Target" tab and change the external crystal frequency to 12MHz as well as the on-chip memory areas to match with the one on the board

The compiler optimization to allow improved debugging experience. Higher level compiler optimization on one side improves the code by making the software consume fewer resources but on the other side cuts down debug information and alters the structure of the code which makes the code harder to debug.

In the "Debug" tab, select "CMSIS-DAP Debugger" as it is available on the ARMCM4_FP board, click "Settings" button to configure it.

If board is plugged in, the CMSIS-DAP debugger adapter appears in the new window and check port "0x2BA01477 ARM Core sight SW-DP" is selected.

In the "Trace" tab, enter the correct CPU Core Clock speed as specified in your project.

In the "Flash Download" tab add the "RS14100_1MB_FLASH" programming option to download the binary to its flash memory.

Now confirm the changes and close the "Options for Target" window.

3. Configure to run

There is one more step to configure the micro-controller.

Please select the "system_RS1xxxx.c (Startup)" option under the Device tree  to configure.

4. Build and debug

Now we are ready, with the board connected to the PC, to build and debug the application.

Select the build option.

Select the load option.

Select the debug option.

5. Results

Add output_data as a watch_filed to see the results of cifar10 image classification

With a 32x32 pixel color input image, objects are classified into one of the 10 output classes by the predictive model.  Each number denotes the probability for one of the 10 image classes, out of which the maximum probability denotes the correct prediction.