|
| 1 | +# Speaker-Recognition |
| 2 | +Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | +### Workflow of project: |
| 7 | +1) Pre-processing of input audio signal |
| 8 | +2) Feature extraction (MFCC OR LPC) |
| 9 | +3) Feature matching with LBG |
| 10 | +4) Training |
| 11 | +5) Testing |
| 12 | +### Data |
| 13 | +* The eight speakers data set were taken from [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443) |
| 14 | +* In every audio file speakers utter "Please call stella". |
| 15 | + |
| 16 | +### Mel frequency cepstral coefficients |
| 17 | +* MFCCs are derived from a type of cepstral representation of the audio clip. |
| 18 | +* In MFCC the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response. |
| 19 | +* We have chosen MFCCs for feature extraction because they shows more significant variation from speaker to speaker since they are derived on logarithmic scale. |
| 20 | +### Linear prediction coefficients |
| 21 | +* LPCs are another popular feature for speaker recognition. To understand LPCs, we must first understand the Autoregressive model of speech. |
| 22 | +* Speech can be modelled as a pth order AR process. These coefficients give characteristics of input audio signal. |
| 23 | +### LBG(Linde-Buzo-Gray)algorithm |
| 24 | +* Linde-Buzo-Gray (LBG) Algorithm is used for designing of Codebook efficiently which has minimum distortion and error. |
| 25 | +* It is an iterative procedure and the basic idea is to divide the group of training vectors and use it to find the most representative vector from one group. |
| 26 | +* These representative vectors from each group are gathered to form the codebook. |
| 27 | +* Since codebook derived from LBG shows minimum distortion we have chosen this. |
| 28 | + |
| 29 | + |
| 30 | +### Training and Testing |
| 31 | +Model is trained over data sets (finding codebooks). |
| 32 | +Feed the model with testing data sets and find out which speakers from training data sets are matching with testing data sets respectively. |
| 33 | + |
| 34 | +### Results |
| 35 | +Accuracy for model is 100 % for both mfccs and lpcs on CSTR VCTK Corpus data set. |
| 36 | +### Note |
| 37 | +Model show errors for a audio signal containing a silent part. |
| 38 | +### References |
| 39 | +1) Introduction to [speaker recognition project](https://minhdo.ece.illinois.edu/teaching/speaker_recognition/speaker_recognition.html) |
| 40 | +2) [MFCC](http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/) |
| 41 | +3) [Pre processing and MFCC code reference](https://aadityachapagain.com/2020/08/asr-mfcc-filterbanks/) |
| 42 | +4) [LPC reference slides](https://docs.google.com/presentation/d/1hBIF-j9fH92bnA72nzNQhTr5RXCcIK7AA-e6LIHX4Hw/edit#slide=id.gf4f26d30c1_0_13) |
| 43 | +5) [K means clustering](https://github.com/CihanBosnali/Machine-Learning-without-Libraries/blob/master/K-Means-Clustering/K-Means-Clustering-without-ML-libraries.ipynb) |
| 44 | +6) [complete project code reference](https://ccrma.stanford.edu/~orchi/Documents/speaker_recognition_report.pdf) |
| 45 | +7) [Basics of signal processing videos](https://youtube.com/playlist?list=PLJ-OcUCIty7evBmHvYRv66RcuziszpSFB) |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | + |
| 50 | + |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | + |
| 60 | + |
1 | 61 |
|
0 commit comments