Home | aslinterpreter

A Gesture Recognition

Machine Learning Project.

We use a combination of image analysis techniques and a nearest neighbor classifier to design a sign language interpreter. In this website we describe how we collected our data, how we then processed our data, and what useful features we were able to extract from the processed data. We then describe the different classificiation methods we tested, and how we chose our final classifier. We have also provided a video showing our classifier in action.

Motivation

to Video

Video

to Motivation

Motivation

The primary goal of our gesture recognition project was to create a system which can identify American Sign Language (ASL) gestures and translate them to the written English alphabet.

An estimated 360 million people worldwide suffer from hearing loss. Because the majority of hearing individuals do not understand sign language, communication between the hearing and the deaf can be challenging. A program such as what we have developed is extremely important in the interest of breaking down this barrier.

The purpose of this project is to identify machine learning algorithms and classification methods that can accurately understand signed hand gestures in real time. To achieve this task, the input to our algorithm is one static image of a sign and the output is the identity of the sign.

to Solution

Solution

We were able to successfully collect data of ASL gestures and represent the images by their most important features. We found that the most important feature set was an array of bit values after nine image processing steps. In this way, we collected a training data set with over ten thousand examples.

We were also able to test different supervised classification machine learning methods on our dataset and compare their accuracies. We used the Weka 3.6 package. By doing so, we were able to choose the most accurate classifier for our prediction step. We discovered that for our training data, the 3 Nearest Neighbor classifier was the best choice with a classification accuracy of 96.78% on test data and a processing time of about 10 seconds.

Finally, we successfully implemented our data preprocessing code along with our chosen high-accuracy classifier into a program that can correctly classify signed gestures from individuals outside of the dataset space and return the correct corresponding letter, in real time.

to Data

DataGeneration

We gathered our own data by taking pictures from different individuals with different genders, hand sizes and skin colors.

Below we show eight instances of photographed ASL signs for the letter “G” that exist in our training data set.

to Feature

Feature Extraction

The pre-processing stage prepares the input images and extracts useful features used later with our classification algorithms.

Our data preprocessing algorithm comprises of the following image processing methods:

Initial image is converted to a binary black and white image.
Wrist is detected and cut.
Hand region is segmented.
Hand segment is rotated and oriented properly.
Image is cropped to frame. If image is too small or too large, program will resize to predefined window of 38x23 pixels.
Image is converted back to RGB format after rotation and local adjustment.
Hand contour is detected.
Hand edges are detected.
Binary representation of the sign is developed.
Hand sign is represented as an array containing all grey image pixel values (1 or 0).

The feature set is the final output of the image processing algorithm described in the above ten steps.

The feature set is an array consisting of the pixl value in a predefined image frame of size 38 x 23. This gives us an 874 dimesnional discrete binary feature space.

1: Initial Image

Any input image to the data processing algorithm.

2: Wrist Detection

The wrist is detected using an algorithm we wrote that finds the smallest width along the forearm.

3: Hand Region Segmentation

The hand region is segmented from the rest of the image by removing the background and cutting the wrist.

4: Reorientation

The hand segment is reoriented properly within the frame.

5: Image Cropped to 38x23 Frame

The image is cropped such that the hand segment fits in a 38 x 23 pixel frame.

6: Contour Detection

The contours of the hand are detected using Matlab image processing code.

7: Edge Detection

The main edges in the hand are detected using Matlab code.

9: Binary Representation

The image is converted into a binary black (1) and white (0) representation. This is then converted into an array that can be used as the feature array.

to Testing

Classifier Testing

We trained learning models with five different algorithms using the Weka 3.6 package.

Training times, test times, and accuracies were measured and compared for the different mothods. All training experiments were performed with 10-fold cross validation. The five algorithms tested were:

Decision Tree
Random Forest
1 Nearest Neighbor
3 Nearest Neighbor
Naive Bayes

For the experimental evaluation, we generated three different datasets with varying number of attributes and instances:

Dataset 1
- 10 ASL signs corresponding to the following letters in the alphabet: {B, E, F, H, J, L, O, V, W, Y}
- Letters were chosen such that the signs were most distinct from each other.
- 392 instances per letter
- Total instances within the data set: 3920
Dataset 2
- 26 ASL signs corresponding to all 26 letters of the English alphabet
- 168 instances per letter
- Total instances within the data set: 4368
Dataset 3
- 26 ASL signs corresponding to all 26 letters of the English alphabet
- 392 instances per letter
- Total instances within the data set: 10192

Our goal was to create a learner that can recognize static signs with high accuracy. We aimed to optimize the accuracy and speed with which the learner can understand the ASL sign gestures. Thus, our measure of success throughout our investigation was both a measure of percent accuracy of detection as well as the time taken to identify signs.

to Results

Results

Comparing the five experimented classification algorithms we see that Nearest Neighbor with k=3 is the most accurate. The training time for this algorithm was the fastest, having the longest but still reasonable test time of 10 seconds.

Comparing the three datasets, we see that the first dataset is the fastest with greatest accuracy. This is because there are only 10 classes in this dataset, and we had chosen the ten letters in the dataset to be very distinguishable from each other. The thrid data set gives the lowest accuracy, because while dataset 1 and 2 both have 392 instances per letter, this dataset only has 168 instances. Dataset 3 is the best dataset, because it gives good accuracy, while being able to classify all the letters in the alphabet.

From our analysis we concluded that we have a suitable feature set, since we can learn reasonably accurate models with greater than 97% accuracy on the test data. Therefore, we used our learned 3-Nearest Neighbor model to predict hand gestures unknown to the computer, as seen in our video above.

For a more detailed discussion on our results, please refer to our .

FinalReport.pdf

to About

About

to Team

Team

Sun Yue

Master of Science in Robotics

McCormick School of Engineering

Northwestern Univeristy

Mahdieh Nejati

Master of Science in Robotics

McCormick School of Engineering

Northwestern Univeristy

to Course

Email: sunyue@u.northwestern.edu

Phone: (224) 307-9556
GitHub Profile

Personal Webpage

LinkedIn Profile

Email: m.nejati@u.northwestern.edu

Phone: (920) 716-7631
GitHub Profile

Personal Webpage
LinkedIn Profile

Course

This project and website were developed as part of the curriculum for Northwestern Univeristity's EECS 349: Machine Learning course taught during the spring quarter of 2015 by Professor Doug Downey.