Skip to content

Commit 7237cd3

Browse files
committed
Adding Automatic Speaker Recognition to Summer-Intern-Projects
Desc: The project Automatic Speaker Recognition (referred to as ASR) is designed to recognize and distinguish between multiple voices of different people. A large dataset of various speakers is used and each audio sample is tested with the existing audio samples and matched with the one whose features are the closest. This is achieved with the help of MFCC, LPC, and LBG algorithms. Team Members: @Jagathveerendra, @DweejaReddy, Ayush Varma, Gautam Tahilyani, Aditya Undrikar Mentors: @sibam23, @ThanmayJ, @LuqmanFarooqui
1 parent fd0995e commit 7237cd3

File tree

3 files changed

+670
-0
lines changed

3 files changed

+670
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -1 +1,61 @@
1+
# Speaker-Recognition
2+
Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves.
3+
4+
![image](https://user-images.githubusercontent.com/92499855/137593881-06a6708a-43bf-4cec-bb01-7f21da458ae5.png)
5+
6+
### Workflow of project:
7+
1) Pre-processing of input audio signal
8+
2) Feature extraction (MFCC OR LPC)
9+
3) Feature matching with LBG
10+
4) Training
11+
5) Testing
12+
### Data
13+
* The eight speakers data set were taken from [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443)
14+
* In every audio file speakers utter "Please call stella".
15+
16+
### Mel frequency cepstral coefficients
17+
* MFCCs are derived from a type of cepstral representation of the audio clip.
18+
* In MFCC the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response.
19+
* We have chosen MFCCs for feature extraction because they shows more significant variation from speaker to speaker since they are derived on logarithmic scale.
20+
### Linear prediction coefficients
21+
* LPCs are another popular feature for speaker recognition. To understand LPCs, we must first understand the Autoregressive model of speech.
22+
* Speech can be modelled as a pth order AR process. These coefficients give characteristics of input audio signal.
23+
### LBG(Linde-Buzo-Gray)algorithm
24+
* Linde-Buzo-Gray (LBG) Algorithm is used for designing of Codebook efficiently which has minimum distortion and error.
25+
* It is an iterative procedure and the basic idea is to divide the group of training vectors and use it to find the most representative vector from one group.
26+
* These representative vectors from each group are gathered to form the codebook.
27+
* Since codebook derived from LBG shows minimum distortion we have chosen this.
28+
29+
30+
### Training and Testing
31+
Model is trained over data sets (finding codebooks).
32+
Feed the model with testing data sets and find out which speakers from training data sets are matching with testing data sets respectively.
33+
34+
### Results
35+
Accuracy for model is 100 % for both mfccs and lpcs on CSTR VCTK Corpus data set.
36+
### Note
37+
Model show errors for a audio signal containing a silent part.
38+
### References
39+
1) Introduction to [speaker recognition project](https://minhdo.ece.illinois.edu/teaching/speaker_recognition/speaker_recognition.html)
40+
2) [MFCC](http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/)
41+
3) [Pre processing and MFCC code reference](https://aadityachapagain.com/2020/08/asr-mfcc-filterbanks/)
42+
4) [LPC reference slides](https://docs.google.com/presentation/d/1hBIF-j9fH92bnA72nzNQhTr5RXCcIK7AA-e6LIHX4Hw/edit#slide=id.gf4f26d30c1_0_13)
43+
5) [K means clustering](https://github.com/CihanBosnali/Machine-Learning-without-Libraries/blob/master/K-Means-Clustering/K-Means-Clustering-without-ML-libraries.ipynb)
44+
6) [complete project code reference](https://ccrma.stanford.edu/~orchi/Documents/speaker_recognition_report.pdf)
45+
7) [Basics of signal processing videos](https://youtube.com/playlist?list=PLJ-OcUCIty7evBmHvYRv66RcuziszpSFB)
46+
47+
48+
49+
50+
51+
52+
53+
54+
55+
56+
57+
58+
59+
60+
161

0 commit comments

Comments
 (0)