Self-Supervised Learning

This section is dedicated to explaining how to perform pre-training and fine-tuning using any required datasets for ISLR.

Pretraining Dataset

The raw dataset used for pretraining is expected to be in HDF5 format, inorder to have faster random-access at frame-level for sampling random windows during pretraining.

HDF5 format

  • For each YouTube channel/playlist, we have different .h5 files.

  • Each .h5 file has 2 groups, namely keypoints (which has the actual pose data) and visibility (confidence scores for each keypoint returned)

  • Each group has multiple datasets in it, with name as the YouTube video and data of shape (F, K, C), where F is number of frames in that video chunk, K is the number of keypoints (75 in our dataset) and C is the number of channels (3 for keypoints and 1 for visibility)

Generating HDF5 datasets

  • See this script to extract pose for all the given videos using MediaPipe Holistic

  • Use this script to convert all the above individual pose files (in .pkl) to HDF5 format.

Download datasets

The following are the checkpoints scraped for Indian SL for raw pretraining (as mentioned in the OpenHands paper):

Source

Total Hours

Download Size

ISH News

145

16GB

MBM Vadodara

225

23GB

NewzHook

615

61GB

National Institute of Open Schooling (NIOS)

115

11GB

SIGN LIBRARY

29

3GB

For downloading data for the other 9 sign languages mentioned in our new work, please use the links below:

Sign Language

Total Duration

Download Size

American

879.25 Hrs

75.68GB

Australian

71.99 Hrs

6.46GB

British

675.78 Hrs

56.98GB

Chinese

305.65 Hrs

25.78GB

Greek

475.18 Hrs

40.44GB

Korean

426.72 Hrs

46.45GB

Russian

412.67 Hrs

35.32GB

Spanish

161.48 Hrs

`12 GB <https://zenodo.org/record/6990583/files/Spanish.zip`_

Turkish

48.50 Hrs

3.87 GB

Pre-training

Currently, the library supports pose-based pretraining based on the dense predictive coding (DPC) technique.

  • To perform pre-training, download the config from here

  • Set the root_dir for train_dataset and val_dataset. Usually, HDF5 is used for training and a ISLR dataset like INCLUDE (from Datasets section) is used as validation set.

Finally, run the following snippet to perform the pretraining:

import omegaconf
from openhands.apis.dpc import PretrainingModelDPC

cfg = omegaconf.OmegaConf.load("path/to/config.yaml")
trainer = PretrainingModelDPC(cfg=cfg)
trainer.fit()

Fine-tuning

  • Ensure that the model parameters and pretrained checkpoint path are specified in a new config as shown in this fine-tuning example.

  • Finally, you can perform the fine-tuning using the same snippet from the Training section.

Checkpoints

The following are the checkpoints reported in the paper, which was pretrained using the above mentioned Indian raw SL data, and finetuned on different labeled datasets.

Checkpoint

Download

DPC pretrained model

raw_dpc.zip

Model finetuned on DEVISIGN

devisign_dpc.zip

Model finetuned on INCLUDE

include_dpc.zip

Model finetuned on LSA64

lsa64_dpc.zip

Model finetuned on WLASL2000

wlasl_dpc.zip