DEV Community

Omkar Ajnadkar

Posted on Aug 17, 2018 • Originally published at Medium on Aug 16, 2018

Sign Language and Static-Gesture Recognition

#deeplearning #datascience #machinelearning #keras

Gesture recognition is an open problem in the area of machine vision, a field of computer science that enables systems to emulate human vision. Gesture recognition has many applications in improving human-computer interaction, and one of them is in the field of Sign Language Translation, wherein a video sequence of symbolic hand gestures is translated into natural language.

Dataset

The dataset format is patterned to match closely with the classic MNIST. Each training and test case represents a label (0–25) as a one-to-one map for each alphabetic letter A-Z (and no cases for 9=J or 25=Z because of gesture motions). The training data (27,455 cases) and test data (7172 cases) are approximately half the size of the standard MNIST but otherwise similar with a header row of the label, pixel1, pixel2….pixel784 which represent a single 28x28 pixel image with grayscale values between 0–255.

Data Preprocessing

As the dataset has already given CSV values for images, we don’t need to do much preprocessing. If dataset of the image was in raw format, we have to convert them in CSV format arrays before doing any of the further operations. Still, we perform the following steps:

Separate features(784 pixel columns) and output(result label)
Reshape the features
One Hot Encoding on the result

X_train = train.drop(['label'],axis=1)  
X_test = test.drop(['label'], axis=1)  

X_train = np.array(X_train.iloc[:,:])  
X_train = np.array([np.reshape(i, (28,28)) for i in X_train])  
X_test = np.array(X_test.iloc[:,:])  
X_test = np.array([np.reshape(i, (28,28)) for i in X_test])

num_classes = 26  
y_train = np.array(y_train).reshape(-1)  
y_test = np.array(y_test).reshape(-1)  
y_train = np.eye(num_classes)[y_train]  
y_test = np.eye(num_classes)[y_test]

X_train = X_train.reshape((27455, 28, 28, 1))  
X_test = X_test.reshape((7172, 28, 28, 1))

Model

We will use Keras to build the simple CNN(Convolutional Neural Network).

There are total 7 layers in the CNN:

1st Convolutional Layer with relu
1st Max Pooling
2nd Convolutional Layer with relu
2nd Max Pooling
Flattening
First Full Layer with relu
Output Layer with sigmoid

def model():  
  classifier = Sequential()  
  classifier.add(Convolution2D(filters=8,   
                               kernel_size=(3,3),  
                               strides (1,1),  
                               padding='same',  
                               input_shape=(28,28,1),  
                               activation='relu',  
                               data_format='channels_last'))  
  classifier.add(MaxPooling2D(pool_size=(2,2)))  
  classifier.add(Convolution2D(filters=16,   
                               kernel_size=(3,3),  
                               strides=(1,1),  
                               padding='same',  
                               activation='relu'))  
  classifier.add(MaxPooling2D(pool_size=(4,4)))  
  classifier.add(Flatten())  
  classifier.add(Dense(128, activation='relu'))  
  classifier.add(Dense(26, activation='sigmoid'))  
  classifier.compile(optimizer='adam',  
                     loss='categorical_crossentropy',   
                     metrics=['accuracy'])  
  return classifier

Then fit the model on the training set and check the accuracy on the test set.

classifier.fit(X_train, y_train, batch_size = 100, epochs = 100)  
y_pred = classifier.predict(X_test)

Note that the output present in y_pred is in the format of the array with 26 values for each training example. We have to see which one is maximum and then create y_pred again.

Result

Training Set Accuracy: 96.06 %
Test Set Accuracy: 87.77%

DEV Community

Sign Language and Static-Gesture Recognition

Dataset

Data Preprocessing

Model

Result

Complete Code with Dataset

blackbird71SR / Small-Deep-Learning-Projects

Small projects with Deep Learning magic! - Predicting Customer Churn in Banking, Predict tags on Stack Overflow, Sign Language Recognition

Neural Networks

1. Predicting Customer Churn In Banking

2. Predict tags on StackOverflow

3. Sign Language and Static-Gesture Recognition

4. You Only Look Once - Photos & Videos

Top comments (0)

Read next

Microsoft Copilot: Redefining Productivity with AI

Part 2: Building Your Own AI - Setting Up the Environment for AI/ML Development

From prompts to programs: Language models' unbounded computational power

WebRL: Self-Evolving LLM Agents Learn Web Navigation via Adaptive Curriculum Training