Pierre Dumouchel

In Interspeech

Abstract

In this paper, we describe systems that were developed for the Open Performance Sub-Challenge of the INTERSPEECH 2009 Emotion Challenge. We participate to both two-class and five-class emotion detection. For the two-class problem, the best performance is obtained by logistic regression fusion of three systems. Theses systems use short- and long-term speech features. This fusion achieved an absolute improvement of 2,6% on the unweighted recall value compared with [6]. For the five-class problem, we submitted two individual systems: cepstral GMM vs. long-term GMM-UBM. The best result comes from a cepstral GMM and produced an absolute improvement of 3,5% compared to [6].

Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification

By Najim Dehak, Réda Dehak, Patrick Kenny, Niko Brummer, Pierre Ouellet, Pierre Dumouchel

2009-06-22

In Interspeech

Abstract

This paper presents a new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor. In this modeling, the JFA is used to define a new low-dimensional space named the total variability factor space, instead of both channel and speaker variability spaces for the classical JFA. The main contribution in this approach, is the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score. This last scoring method makes the process faster and less computation complex compared to others classical methods. We tested several intersession compensation methods in total factors, and we found that the combination of Linear Discriminate Analysis and Within Class Covariance Normalization achieved the best performance.

Support vector machines and joint factor analysis for speaker verification

By Najim Dehak, Patrick Kenny, Réda Dehak, Ondrej Glember, Pierre Dumouchel, Lukas Burget, Valiantsina Hubeika, Fabio Castaldo

2009-04-19

In IEEE-ICASSP

Abstract

This article presents several techniques to combine between Support vector machines (SVM) and Joint Factor Analysis (JFA) model for speaker verification. In this combination, the SVMs are applied on different sources of information produced by the JFA. These informations are the Gaussian Mixture Model supervectors and speakers and Common factors. We found that the use of JFA factors gave the best results especially when within class covariance normalization method is applied in the speaker factors space, in order to compensate for the channel effect. The new combination results are comparable to other classical JFA scoring techniques.

Comparison between factor analysis and GMM support vector machines for speaker verification

By Najim Dehak, Réda Dehak, Patrick Kenny, Pierre Dumouchel

2007-09-25

In Proceedings of the speaker and language recognition workshop (IEEE-odyssey 2008)

Abstract

We present a comparison between speaker verification systems based on factor analysis modeling and support vector machines using GMM supervectors as features. All systems used the same acoustic features and they were trained and tested on the same data sets. We test two types of kernel (one linear, the other non-linear) for the GMM support vector machines. The results show that factor analysis using speaker factors gives the best results on the core condition of the NIST 2006 speaker recognition evaluation. The difference is particularly marked on the English language subset. Fusion of all systems gave an equal error rate of 4.2% (all trials) and 3.2% (English trials only).

Kernel combination for SVM speaker verification

By Réda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel

2007-09-25

In Proceedings of the speaker and language recognition workshop (IEEE-odyssey 2008)

Abstract

We present a new approach for constructing the kernels used to build support vector machines for speaker verification. The idea is to construct new kernels by taking linear combination of many kernels such as the GLDS and GMM supervector kernels. In this new kernel combination, the combination weights are speaker dependent rather than universal weights on score level fusion and there is no need for extra-data to estimate them. An experiment on the NIST 2006 speaker recognition evaluation dataset (all trial) was done using three different kernel functions (GLDS kernel, linear and Gaussian GMM supervector kernels). We compared our kernel combination to the optimal linear score fusion obtained using logistic regression. This optimal score fusion was trained on the same test data. We had an equal error rate of $\simeq 5,9%$ using the kernel combination technique which is better than the optimal score fusion system ($\simeq 6,0%$).

The role of speaker factors in the NIST extended data task

By Patrick Kenny, Najim Dehak, Réda Dehak, Vishwa Gupta, Pierre Dumouchel

2007-09-25

In Proceedings of the speaker and language recognition workshop (IEEE-odyssey 2008)

Abstract

We tested factor analysis models having various numbers of speaker factors on the core condition and the extended data condition of the 2006 NIST speaker recognition evaluation. In order to ensure strict disjointness between training and test sets, the factor analysis models were trained without using any of the data made available for the 2005 evaluation. The factor analysis training set consisted primarily of Switchboard data and so was to some degree mismatched with the 2006 test data (drawn from the Mixer collection). Consequently, our initial results were not as good as those submitted for the 2006 evaluation. However we found that we could compensate for this by a simple modification to our score normalization strategy, namely by using 1000 z-norm utterances in zt-norm. Our purpose in varying the number of speaker factors was to evaluate the eigenvoiceMAP and classicalMAP components of the inter-speaker variability model in factor analysis. We found that on the core condition (i.e. 2–3 minutes of enrollment data), only the eigenvoice MAP component plays a useful role. On the other hand, on the extended data condition (i.e. 15–20 minutes of enrollment data) both the classical MAP component and the eigenvoice component proved to be useful provided that the number of speaker factors was limited. Our best result on the extended data condition (all trials) was an equal error rate of 2.2% and a detection cost of 0.011.

Linear and non linear kernel GMM SuperVector machines for speaker verification

By Réda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel

2007-08-27

In Proceedings of the european conference on speech communication and technologies (interspeech’07)

Abstract

This paper presents a comparison between Support Vector Machines (SVM) speaker verification systems based on linear and non linear kernels defined in GMM supervector space. We describe how these kernel functions are related and we show how the nuisance attribute projection (NAP) technique can be used with both of these kernels to deal with the session variability problem. We demonstrate the importance of GMM model normalization (M-Norm) especially for the non linear kernel. All our experiments were performed on the core condition of NIST 2006 speaker recognition evaluation (all trials). Our best results (an equal error rate of 6.3%) were obtained using NAP and GMM model normalization with the non linear kernel.

Cepstral and long-term features for emotion recognition

Abstract

Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification

Abstract

Support vector machines and joint factor analysis for speaker verification

Abstract

Comparison between factor analysis and GMM support vector machines for speaker verification

Abstract

Kernel combination for SVM speaker verification

Abstract

The role of speaker factors in the NIST extended data task

Abstract

Linear and non linear kernel GMM SuperVector machines for speaker verification

Abstract

Search

Tags