Video Analytics and AI: A partnership born for the 21st Century
In the 21st century world, video and video analytics software has become commonplace across public spaces including retail centers, shopping malls, airports and roads.
In the last 40 years computer vision and associated research areas have seen active participation by the public, govt. and the research institutions alike. One reason for this is certainly the creation of massive amount of public space video feeds & data. As a result, video analytics along with its myriad applications has now come to address different requirements across sectors, deploying Artificial Intelligence (AI) and Machine Learning (ML) at its core.
AI-powered video analytics, on the one hand, enables an accurate evaluation of each individual event. And on the other it allows us the collection of events- measured statistically and holistically over a long period.
Till now, most of the understanding of video analytics has largely been on detecting alarm events and abnormal behaviours – be it for public safety, crowed formation, facial recognition, threat detection, intrusion control and/or security applications. But precious little is known about the sheer granularity and sophistication, video analytics software and services have acquired, with the advent of AI and ML models.
Video Analytics – The Core Themes & Applications
Video analytics today pivots on predominantly four themes- computational vision, demographics, behaviour analysis, and systems. For example, in security and associated applications it is crucial to detect abnormal or suspicious demographic behavior(s), almost, always in real-time.
If one looks at crowd formation and its management in city spaces, video intelligence is required to focus on monitoring normal events (e.g., people entering a mall or an office), and it is often deemed sufficient to capture and store the observation for offline analysis.
Consider this for example. In urban and semi-urban crowded environments, video analytics is used to collect statistical information about crowd formation and its behaviour (e.g., how many people entered and lingered around a particular spot, how many females/males or which age groups are leading to crowd formation, and what are the paths followed by the crowd)
The modern airport is perhaps the best metaphor of the 21st century. An array of complex, multi-modal, multi-dimensional, moving pieces. At an average airport for example, there is always a need for real-time estimation detection of congestion and queuing length, especially in the departure, security, checking areas and immigration control areas.
AI-powered video analytics was born to address the challenges of such a world
With so many video cameras installed almost ubiquitously across contemporary human culture, vast amounts of video data are generated. To efficiently search and retrieve this constant stream of video data is one of the core challenges for video analytics applications.
Videonetics is one of the few organizations to have developed AI-based algorithms readily applicable across security, business and civil environments. Our expertise rooted in a deep understanding of Artificial Intelligence and Machine Learning has enabled us, to cater to a cross-spectrum of clients whose requirements range from developing a simple computer vision to a full-fledged, sophisticated video analysis suite of applications.
Our use cases amply demonstrate the innovative adaptation of existing video analytics techniques and/or development of novel approaches, rooted in ML and AI. Videonetics’ Unified Video Computing Platform is well suited for dynamic business environments, enabling our clients detect, track, and recognize objects of interest from multiple videos, and more specifically to interpret their behaviors and activities. The UVCP accomplishes this by integrating various computer vision and pattern recognition techniques to build a unified, intelligent computational framework for video analytics.
Deep learning, a form of machine learning, and Convolutional Neural Network (CNN), are two essential applied sciences that help the AI framework train the computer vision algorithms enabling the engine to look & learn.
Just like a human, a CNN uses computer vision to identify differently shaped objects or subjects by breaking them into pixels and tagging them. It uses these tagging to further perform mathematical operations or convolutions to predict what it is looking at. It is this prediction that is known to us as AI powered Video Analytics.
The more data it feeds on to train itself, higher is the accuracy of AI engine. Videonetics AI engine is one of the powerful AI engines available in the market with its own designed CNN to ensure highest accuracy and performance in challenging environment.
Detection & tracking of objects is key building block for any video analytics system. The object can be a face, a head, a human, a queue of people, a crowd or just another inanimate thing! At Videonetics, object detection and tracking are a strong differentiator for us. Our research team is well-versed with the latest methods in object detection, object modeling, and object tracking so as to generate the right insights for our clients.
Camera calibration is another important building block. However, a convenient calibration is still a challenge on its own. A calibration framework for large networks can include non-overlapping cameras, relying on information coming from multiple dimensions. It is here that developing and deploying the right AI can yield the right breakthrough solutions for the clients.
Leveraging intelligent video surveillance for crowd formation, crowd dispersion detection, and crowd anomaly detection
Collecting demographic information about rapid crowd formation and/or a disseminating crowd is an important part of video intelligence. For instance, how many people visited a shopping mall, what was their gender dispersion and length of stay. Instead of hiring humans to manually observe the customers, a computational system can be developed to automatically collect the demographic information by detecting the presence and analyzing the behaviour of people in the videos captured by cameras.
There are distinct computational approaches to human age and sex classification. A working system can be developed with heatmaps and navigational reporting to help identify crowd statistics. From a strategic decision-making point of view, it can help the clients uncover issues in site navigation or other factors.
Crowd-formation, Queue Length Detection, Queue Limit Exceed Detection and Bottleneck Prevention
In public places, crowd formation indicates congestion, delay, instability, or abnormal events such as a fight, riot and emergency. Video intelligence is the best tool for crowd information e.g., the distribution of people throughout spaces, throughput rates, and local densities. Video analytics deploys various crowd counting approaches that use local features to estimate the crowd size and its distribution across a particular spot. The right video algorithm can calculate local occupancy statistics.
Video analytics can also screen hundreds of hours of video for activity patterns to automatically recognize various functional elements, such as walkways, roadways, parking-spots, and doorways, through their interactions with pedestrian and vehicle detections.
Monitoring queue statistics like – queue length detection, queue limit exceed detection, average wait time in a queue, social distancing in a queue, average service time and queue length can help clients optimize their crowd formation and space management challenges.
The integration of AI with video analytics helps in the systematic design and validation of a general solution for automated queue statistics estimation and detection. The design can take into account multiple variables such as queue geometry, illumination dynamics, camera viewpoints, people appearances, etc.
And especially in a post-covid world, one security parameter that has become foundational to any crowd management effort is – Social Distancing.
AI-based image and video analysis addresses the different aspects of monitoring social distancing through intelligent image processing, segmentation, object recognition and the semantic interpretation of the image/video.
The use of deep learning models does this automatically, monitoring social distancing at the successive levels of a crowd/group structure. AI and Deep Learning has enabled us to create computerized vision systems capable of monitoring social distance at different levels.
These AI and Deep Learning models let us assess a crowd from three perspectives-knowledge, algorithmic and implementation.
For a state-of-the-art video analytics software(s) such as those of Videonetics’ these social distancing monitoring algorithms are adept at representing the relevant information and predicting where the risks could be.
Real time alerts for proactive incident response
Group monitoring is an important vector for video analytics. By considering relational connections among people, group modeling can provide more meaningful semantic description of the visual events. AI-based video analytics can analyze groups of people and group activities.
Different aspects can be considered.1) the life of a group, analyzing how the presence of a group can be detected in crowded situations (i.e., the birth and the death of a group), 2) how a moving group can be tracked (its evolution). In particular, the regions of the environment where the attention of humans is more focused can be detected.
In public places, crowd formation and its size may be an indicator of congestion, delay, instability, or of abnormal events, such as a fight, riot or emergency. Crowd formation related information can also provide important business intelligence such as the distribution of people throughout spaces, throughput rates, unauthorized access to restricted areas, detection of a person-of-interest and formation of fluid crowd formation densities.
Facial Recognition Alerts
When managing large groups of people, it is always advantageous to know how long it takes for people to move between different spaces, how long they spend in each space, and where they are likely to go as they move from one point to another. Presently, these measures can only be determined manually, or through the use of hardware tags (i.e. RFID).
Facial recognition techniques can be used to describe and uniquely identify an individual. They include traits such as gender, facial features, facial geometry, etc. Facial recognition can be acquired by surveillance cameras at range without any user co-operation, providing robust authentication and can be used to provide identification from a reasonable distance.
An AI-based Video analytics system is adept at using facial recognition to determine operational statistics relating to how people move through a space. This aspect can be used to locate people who look distinct, and these people are then detected at various locations within a spread-out camera network to gradually obtain operational statistics.