Self-Supervised Learning¶
This section is dedicated to explaining how to perform pre-training and fine-tuning using any required datasets for ISLR.
Pretraining Dataset¶
The raw dataset used for pretraining is expected to be in HDF5 format, inorder to have faster random-access at frame-level for sampling random windows during pretraining.
HDF5 format¶
For each YouTube channel/playlist, we have different
.h5
files.Each
.h5
file has 2 groups, namelykeypoints
(which has the actual pose data) andvisibility
(confidence scores for each keypoint returned)Each group has multiple datasets in it, with name as the YouTube video and data of shape
(F, K, C)
, whereF
is number of frames in that video chunk,K
is the number of keypoints (75 in our dataset) andC
is the number of channels (3 forkeypoints
and 1 forvisibility
)
Generating HDF5 datasets¶
See this script to extract pose for all the given videos using MediaPipe Holistic
Use this script to convert all the above individual pose files (in
.pkl
) to HDF5 format.
Download datasets¶
The following are the checkpoints scraped for Indian SL for raw pretraining (as mentioned in the OpenHands paper):
Source |
Total Hours |
Download Size |
---|---|---|
145 |
||
225 |
||
615 |
||
115 |
||
29 |
For downloading data for the other 9 sign languages mentioned in our new work, please use the links below:
Sign Language |
Total Duration |
Download Size |
---|---|---|
American |
879.25 Hrs |
|
Australian |
71.99 Hrs |
|
British |
675.78 Hrs |
|
Chinese |
305.65 Hrs |
|
Greek |
475.18 Hrs |
|
Korean |
426.72 Hrs |
|
Russian |
412.67 Hrs |
|
Spanish |
161.48 Hrs |
`12 GB <https://zenodo.org/record/6990583/files/Spanish.zip`_ |
Turkish |
48.50 Hrs |
Pre-training¶
Currently, the library supports pose-based pretraining based on the dense predictive coding (DPC) technique.
To perform pre-training, download the config from here
Set the
root_dir
for train_dataset and val_dataset. Usually, HDF5 is used for training and a ISLR dataset like INCLUDE (from Datasets section) is used as validation set.
Finally, run the following snippet to perform the pretraining:
import omegaconf
from openhands.apis.dpc import PretrainingModelDPC
cfg = omegaconf.OmegaConf.load("path/to/config.yaml")
trainer = PretrainingModelDPC(cfg=cfg)
trainer.fit()
Fine-tuning¶
Ensure that the model parameters and pretrained checkpoint path are specified in a new config as shown in this fine-tuning example.
Finally, you can perform the fine-tuning using the same snippet from the Training section.
Checkpoints¶
The following are the checkpoints reported in the paper, which was pretrained using the above mentioned Indian raw SL data, and finetuned on different labeled datasets.
Checkpoint |
Download |
---|---|
DPC pretrained model |
|
Model finetuned on DEVISIGN |
|
Model finetuned on INCLUDE |
|
Model finetuned on LSA64 |
|
Model finetuned on WLASL2000 |