EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Setup

Codebase preparation (based on fairseq)

# we use fairseq to build the model
git clone https://github.com/facebookresearch/fairseq
cd fairseq
pip install --editable ./

# plug-in for EH-MAM
Replace all the files in examples/data2vec with EHMAM
To use easy-to-hard masking add compute_mask_indices_ema_loss function in data_utils.py file present in the original fairseq repo. You can find the implementation of compute_mask_indices_ema_loss in ehmam/data_utils.py

Data preparation: please follow instruction provided by wav2vec2 for pre-training/fine-tuning data preprocessing

Usage

Training

For the list of hyper-parameters, see config file.

# minimal example to reproduce model
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2 \
--config-name base_audio_only_task task.data=/path/to/manifests &

Loading pre-trained model as python object

import fairseq
import argparse
code_path = "examples/data2vec"
fairseq.utils.import_user_module(argparse.Namespace(user_dir=code_path))
ckpt_path = "/path/to/the/checkpoint.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]

Fine-tuning pre-trained checkpoint as ASR

# minimal example for fine-tuning with 100hr data
python fairseq_cli/hydra_train.py -m \
        --config-dir examples/wav2vec/config/finetuning \
        --config-name base_100h \
        common.user_dir=examples/data2vec \
        task.data=/path/to/labeled/librispeech/ \
        model.w2v_path=/path/to/ehmam.ckpt \
        task.normalize=True

Pre-trained checkpoint

Pre-trained checkpoint without fine-tuning can be downloaded here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
data		data
models		models
scripts		scripts
tasks		tasks
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
data_utils.py		data_utils.py
fb_convert_beit_cp.py		fb_convert_beit_cp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Setup

Usage

Pre-trained checkpoint

About

Releases

Packages

Languages

License

cs20s030/ehmam

Folders and files

Latest commit

History

Repository files navigation

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Setup

Usage

Pre-trained checkpoint

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages