Self-Supervised Learning¶

This section is dedicated to explaining how to perform pre-training and fine-tuning using any required datasets for ISLR.

Pretraining Dataset¶

The raw dataset used for pretraining is expected to be in HDF5 format, inorder to have faster random-access at frame-level for sampling random windows during pretraining.

HDF5 format¶

For each YouTube channel/playlist, we have different .h5 files.
Each .h5 file has 2 groups, namely keypoints (which has the actual pose data) and visibility (confidence scores for each keypoint returned)
Each group has multiple datasets in it, with name as the YouTube video and data of shape (F, K, C), where F is number of frames in that video chunk, K is the number of keypoints (75 in our dataset) and C is the number of channels (3 for keypoints and 1 for visibility)

Generating HDF5 datasets¶

See this script to extract pose for all the given videos using MediaPipe Holistic
Use this script to convert all the above individual pose files (in .pkl) to HDF5 format.

Download datasets¶

The following are the checkpoints scraped for Indian SL for raw pretraining (as mentioned in the OpenHands paper):

Source	Total Hours	Download Size
ISH News	145	16GB
MBM Vadodara	225	23GB
NewzHook	615	61GB
National Institute of Open Schooling (NIOS)	115	11GB
SIGN LIBRARY	29	3GB

For downloading data for the other 9 sign languages mentioned in our new work, please use the links below:

Sign Language	Total Duration	Download Size
American	879.25 Hrs	75.68GB
Australian	71.99 Hrs	6.46GB
British	675.78 Hrs	56.98GB
Chinese	305.65 Hrs	25.78GB
Greek	475.18 Hrs	40.44GB
Korean	426.72 Hrs	46.45GB
Russian	412.67 Hrs	35.32GB
Spanish	161.48 Hrs	`12 GB <https://zenodo.org/record/6990583/files/Spanish.zip`_
Turkish	48.50 Hrs	3.87 GB

Pre-training¶

Currently, the library supports pose-based pretraining based on the dense predictive coding (DPC) technique.

To perform pre-training, download the config from here
Set the root_dir for train_dataset and val_dataset. Usually, HDF5 is used for training and a ISLR dataset like INCLUDE (from Datasets section) is used as validation set.

Finally, run the following snippet to perform the pretraining:

import omegaconf
from openhands.apis.dpc import PretrainingModelDPC

cfg = omegaconf.OmegaConf.load("path/to/config.yaml")
trainer = PretrainingModelDPC(cfg=cfg)
trainer.fit()

Fine-tuning¶

Ensure that the model parameters and pretrained checkpoint path are specified in a new config as shown in this fine-tuning example.
Finally, you can perform the fine-tuning using the same snippet from the Training section.

Checkpoints¶

The following are the checkpoints reported in the paper, which was pretrained using the above mentioned Indian raw SL data, and finetuned on different labeled datasets.

Checkpoint	Download
DPC pretrained model	raw_dpc.zip
Model finetuned on DEVISIGN	devisign_dpc.zip
Model finetuned on INCLUDE	include_dpc.zip
Model finetuned on LSA64	lsa64_dpc.zip
Model finetuned on WLASL2000	wlasl_dpc.zip