Sometimes words don’t tell the whole story. Sometimes, a person’s body language and posture convey a lot more information than words ever can. As humans, we are generally very good at reading one another’s body language and posture. But what about software?
The ability to mathematically represent human posture with software has a wide range of practical uses. A few examples include:
Movement impairment detection in physiotherapy
Pedestrian detection in autonomous cars
Gesture recognition in the game industry
Sign language detection
The problem is that computers generally are not very good at detecting human images and making sense of their posture. And, those techniques that have been developed are often difficult for application developers to use as the basis for new applications that solve real business problems.
But that is starting to change. In this post we examine emerging techniques for analyzing and interpreting human body language and posture, and detail an implementation of these techniques using Spring Cloud Data Flow and TensorFlow. We hope these techniques prove useful to Java and Spring developers, making it easier to integrate with existing and new applications.
Human Pose Estimation
Human pose estimation is a collection of computer vision techniques that aim to identify the location of anatomical objects (i.e., body parts) from images. Recent advancements in deep learning along with improved training datasets have significantly improved the accuracy and performance of these techniques.
Fig.1: Human Pose Estimation in Practice
A 2017 paper on multi-person pose estimation by Zhe Cao and others describes state of the art estimation techniques, which, along with the OpenPose and TF-Pose-Estimation reference implementations, have become popular choices for pose estimation applications.
Fig.2: Application of greedy-algorithms to refine the pose estimations
Pose Estimation Processor and Spring Cloud Data Flow
The Pose Estimation Processor is a real-time, multi-person pose estimation processor for Spring Cloud Data Flow (SCDF). It integrates the TF-Pose-Estimation TensorFlow model for predicting part and limb confidence maps and implements the greedy post-processing steps for refining and assembling the body part candidates into full body poses for all people in an image.
For example, the following SCDF streaming pipeline continuously detects human poses in images dropped in the input-folder directory, augments them with the detected pose skeletons, and stores the result in the output-folder directory.
file-source: file --directory='/input-folder' | pose-estimation --mode=header | file-sink: file --directory='/output-folder' --name-expression='headers[file_name]'
As illustrated in Fig.2. the stream is comprised of three Spring Cloud Stream applications: `File-Source`, `Pose-Estimation-Processor`, and `File-Sink`.
Fig.3: Image pose estimation SCDF pipeline
The file-source is configured to monitor an input-folder directory. When an image is dropped in the input folder, the file-source app emits a new message with the image embedded in the payload.
The pose-estimation-processor, configured with a pre-trained Tensorflow model, infers human poses from the inbound message and produces an outbound message with the computed scores in the header and a copy of the image augmented with the pose skeletons as payload.
The augmented images are consumed by the file-sink and stored in a preconfigured output directory using the original image file names.
The three applications in the pipeline can optionally communicate over Apache Kafka, RabbitMQ or other binder implementations.
Fig.4: Real-time pose estimation in action
For a real-time pose-estimation solution, you can consume input images via the out-of-the-box http-source. For video processing, there are a few utility applications, including an input-webcam or video-stream app, which can process payloads as a sequence of images. For example, the pipeline below predicts the human poses in a webcam video-stream and emits the augmented images in near real-time.
stream create --name webcam-pose-stream --definition ”webcam --width=320 --height=240 --capture-interval=800 | pose-estimation --mode=header | image-viewer”
The webcam source and the Image-Viewer sink are experimental application starters. Follow these instructions on how to use them with SCDF.
Making Human Pose Estimation Accessible to Developers
Incorporating data science capabilities, such as human pose estimation, in business applications can seem a daunting task for most developers. But by using the right tools that abstract away some of the complexities of the data science itself, like Spring Cloud Data Flow and TensorFlow, it doesn’t have to be.
As data science and deep learning methodologies mature - which they are doing at a rapid pace - Pivotal is committed to exploring new avenues to make them easily accessible for Spring/Java developers. Your feedback is critical to that progress. Let us know what you think of the approach to human pose estimation described in this post and give it a try yourself. You can reach out to us with feedback and comments on StackOverflow or with questions and pull-request contributions on Gitter.
And don't miss the session Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cloud Data Flow and TensorFlow at SpringOne Platform 2018 to learn more about this topic. Register for the event today and get $200 of with discount code S1P200_JKelly.
About the AuthorFollow on Twitter Follow on Linkedin Visit Website