🌟 Setting Up the Training Dataset for CVLFace

This guide provides detailed instructions on how to download and configure the training datasets necessary for training face recognition models using the CVLFace toolkit. We utilize .rec files, which are similar to HDF5 files and are optimized for high-performance read and write operations.

Prerequisites 🛠️

Before beginning, make sure you have configured the $DATA_ROOT directory in your cvlface/.env file. This directory will serve as the base path for all dataset directories.

Dataset Structure 🗂️

The datasets are neatly organized into directories where the name pertains to the name in the data config's yaml file.

For example, the cvlface/research/recognition/code/run_v1/dataset/configs/casia.yaml file contains the following configuration:

yaml file: 
data_root: ${oc.env:DATA_ROOT}
rec: 'casia_webface'
color_space: 'RGB'
num_classes: 10572
num_image: 490623
repeated_sampling_cfg: null
semi_sampling_cfg: null

Then the rec field in the yaml file corresponds to the directory name in the $DATA_ROOT directory.

Below is the example directory structure after setup:

Directory Structure

$DATA_ROOT
├── casia_webface
│   ├── train.rec
│   ├── train.idx
│   └── train.tsv
├── webface260m/WebFace4M
│   ├── train.rec
│   ├── train.idx
│   ├── train.tsv
│   └── ldmk_5points.csv
├── webface260m/WebFace12M
│   ├── train.rec
│   ├── train.idx
│   ├── train.tsv
│   └── ldmk_5points.csv
├── MS1MV2
...
└── custom, etc
...
(Note: You do not need to download all datasets. Select the one(s) that best suit your needs.)

Downloading and Setting Up Datasets

CASIA-WebFace

In terminal, do DATA_ROOT=/path/to/data_root
mkdir -p $DATA_ROOT/casia_webface
Download a zipfile from https://drive.google.com/file/d/1KxNCrXzln0lal3N4JiYl9cFOIhT78y1l/view
unzip faces_webface_112x112.zip
cp faces_webface_112x112/train.rec $DATA_ROOT/casia_webface/train.rec
cp faces_webface_112x112/train.idx $DATA_ROOT/casia_webface/train.idx
rm -rf faces_webface_112x112 faces_webface_112x112.zip
(Optional) For training with preprocessed landmarks, (models like KP-RPE), ldmk_5points.csv need to be prepared. Run python predict_landmark.py --source_dir $DATA_ROOT/casia_webface

VGG2

In terminal, do DATA_ROOT=/path/to/data_root
mkdir -p $DATA_ROOT/vgg2
Download a zipfile from https://drive.google.com/file/d/1dyVQ7X3d28eAcjV3s3o0MT-HyODp_v3R/view
unzip faces_vgg_112x112.zip
cp faces_vgg_112x112/train.rec $DATA_ROOT/vgg2/train.rec
cp faces_vgg_112x112/train.idx $DATA_ROOT/vgg2/train.idx
rm -rf faces_vgg_112x112 faces_vgg_112x112.zip
(Optional) For training with preprocessed landmarks, (models like KP-RPE), ldmk_5points.csv need to be prepared. Run python predict_landmark.py --source_dir $DATA_ROOT/vgg2

MS1MV2

In terminal, do DATA_ROOT=/path/to/data_root
mkdir -p $DATA_ROOT/MS1MV2
Download a zipfile from https://drive.google.com/file/d/1SXS4-Am3bsKSK615qbYdbA_FMVh3sAvR/view
unzip faces_emore.zip
cp faces_emore/train.rec $DATA_ROOT/MS1MV2/train.rec
cp faces_emore/train.idx $DATA_ROOT/MS1MV2/train.idx
rm -rf faces_emore faces_emore.zip
(Optional) For training with preprocessed landmarks, (models like KP-RPE), ldmk_5points.csv need to be prepared. Run python predict_landmark.py --source_dir $DATA_ROOT/MS1MV2

MS1MV3

In terminal, do DATA_ROOT=/path/to/data_root
mkdir -p $DATA_ROOT/MS1MV3
Download a zipfile from https://drive.google.com/file/d/1JgmzL9OLTqDAZE86pBgETtSQL4USKTFy/view
unzip ms1m-retinaface-t1.zip
cp ms1m-retinaface-t1/train.rec $DATA_ROOT/MS1MV3/train.rec
cp ms1m-retinaface-t1/train.idx $DATA_ROOT/MS1MV3/train.idx
rm -rf ms1m-retinaface-t1 ms1m-retinaface-t1.zip
(Optional) For training with preprocessed landmarks, (models like KP-RPE), ldmk_5points.csv need to be prepared. Run python predict_landmark.py --source_dir $DATA_ROOT/MS1MV3

WebFace4M

In terminal, do DATA_ROOT=/path/to/data_root
mkdir -p $DATA_ROOT/webface260m/temp
Obtain a dataset download link and password from https://www.face-benchmark.org/download.html
Download all zipfiles in 0 folder and place them in a folder (ex: 0_0.zip, 0_1.zip, 0_2.zip, etc). 0 folder pertains to WebFace4M split.
Unzip all zipfiles into $DATA_ROOT/webface260m/temp. Ex) unzip -d $DATA_ROOT/webface260m/temp 0_1.zip so on and so forth.
Rename folders mv $DATA_ROOT/webface260m/temp/WebFace260M $DATA_ROOT/webface260m/WebFace4M
rm -rf $DATA_ROOT/webface260m/temp and you can remove the zip files as well.
Bundle all images into a rec file by running python bundle_images_into_rec.py --source_dir $DATA_ROOT/webface260m/WebFace4M --remove_images
(Optional) For training with preprocessed landmarks, (models like KP-RPE), ldmk_5points.csv need to be prepared. Run python predict_landmark.py --source_dir $DATA_ROOT/webface260m/WebFace4M

WebFace12M

In terminal, do DATA_ROOT=/path/to/data_root
mkdir -p $DATA_ROOT/webface260m/temp
Obtain a dataset download link and password from https://www.face-benchmark.org/download.html
Download all zipfiles in 0,1,2 folder and place them in a folder (ex: 0_0.zip, 0_1.zip, 0_2.zip, etc). 0,1,2 folder pertains to WebFace4M split.
Unzip all zipfiles into $DATA_ROOT/webface260m/temp. Ex) unzip -d $DATA_ROOT/webface260m/temp 0_1.zip so on and so forth.
Rename folders mv $DATA_ROOT/webface260m/temp/WebFace260M $DATA_ROOT/webface260m/WebFace12M
rm -rf $DATA_ROOT/webface260m/temp and you can remove the zip files as well.
Bundle all images into a rec file by running python bundle_images_into_rec.py --source_dir $DATA_ROOT/webface260m/WebFace12M --remove_images
(Optional) For training with preprocessed landmarks, (models like KP-RPE), ldmk_5points.csv need to be prepared. Run python predict_landmark.py --source_dir $DATA_ROOT/webface260m/WebFace12M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_TRAIN_DATA.md

README_TRAIN_DATA.md

🌟 Setting Up the Training Dataset for CVLFace

Prerequisites 🛠️

Dataset Structure 🗂️

Directory Structure

Downloading and Setting Up Datasets

CASIA-WebFace

VGG2

MS1MV2

MS1MV3

WebFace4M

WebFace12M

Files

README_TRAIN_DATA.md

Latest commit

History

README_TRAIN_DATA.md

File metadata and controls

🌟 Setting Up the Training Dataset for CVLFace

Prerequisites 🛠️

Dataset Structure 🗂️

Directory Structure

Downloading and Setting Up Datasets

CASIA-WebFace

VGG2

MS1MV2

MS1MV3

WebFace4M

WebFace12M