Najim Dehak

State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and speakers in the wild evaluations

By Jesús Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Leibny Paola García-Perera, Fred Richardson, Réda Dehak, Pedro A. Torres-Carrasquillo, Najim Dehak

2020-12-14

In Computer Speech & Language

Abstract

State-of-the-art speaker recognition for telephone and video speech: The JHU-MIT submission for NIST SRE18

By Jesús Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, François Grondin, Réda Dehak, Leibny Paola García-Perera, Daniel Povey, Pedro A. Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak

2019-10-28

In Proc. Interspeech 2019

Abstract

The MIT Lincoln Laboratory 2016 speaker recognition system

By Pedro A. Torres-Carrasquillo, Frederick Richardson, Shahan Nercessian, Douglas Sturim, William Campbell, Youngjune Gwon, Swaroop Vattam, Réda Dehak, Harish Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Raghavendra Reddy Pappagari, Nanxin Chen, Najim Dehak, Ruben Zazo

2016-12-12

In NIST speaker recognition evaluation 2016

Abstract

This document presents the system submission for the group composed of MIT Lincoln Laboratory, Johns Hopkins University (JHU), Laboratoire de Recherche et de Développement de l’EPITA (LRDE) and Universidad Autónoma de Madrid (ATVS). The primary submission is a combination of four systems focused on i-vector systems. Two secondary submissions are also included

GMM weights adaptation based on subspace approaches for speaker verification

By Najim Dehak, O. Plchot, M. H. Bahari, L. Burget, H. Van hamme, Réda Dehak

2014-06-16

In Odyssey 2014, the speaker and language recognition workshop

Abstract

In this paper, we explored the use of Gaussian Mixture Model (GMM) weights adaptation for speaker verifica- tion. We compared two different subspace weight adap- tation approaches: Subspace Multinomial Model (SMM) and Non-Negative factor Analysis (NFA). Both techniques achieved similar results and seemed to outperform the retraining maximum likelihood (ML) weight adaptation. However, the training process for the NFA approach is substantially faster than the SMM technique. The i-vector fusion between each weight adaptation approach and the classical i-vector yielded slight improvements on the tele- phone part of the NIST 2010 Speaker Recognition Eval- uation dataset.

The MIT lincoln laboratory / JHU / EPITA-LSE LRE17 system

By Fred Richardson, Pedro Torres-Carrasquillo, Jonas Borgstrom, Douglas Sturim, Youngjune Gwon, Jesus Villalba, Jan Trmal, Nanxin Chen, Réda Dehak, Najim Dehak

2014-06-16

In Odyssey 2018, the speaker and language recognition workshop

Abstract

Unsupervised methods for speaker diarization: An integrated and iterative approach

By S. Shum, Najim Dehak, Réda Dehak, J. Glass

2013-06-07

In IEEE Transactions on Audio, Speech, and Language Processing

Abstract

In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.

MITLL 2012 speaker recognition evaluation system description

By Jonas Borgstrom, William Campbell, Najim Dehak, Réda Dehak, Daniel Garcia-Romero, Kara Greenfieldand Alan McCree, Doug Reynold, Fred Richardsony, Elliot Singery, Douglas Sturim, Pedro A. Torres-Carrasquillo

2012-12-01

In NIST speaker recognition evaluation

Abstract

First attempt at Boltzmann machines for speaker recognition

By M. Sennoussaoui, Najim Dehak, P. Kenny, Réda Dehak, P. Dumouchel

2012-06-01

In Odyssey speaker and language recognition workshop

Abstract

Frequently organized by NIST, Speaker Recognition evaluations (SRE) show high accuracy rates. This demonstrates that this field of research is mature. The latest progresses came from the proposition of low dimensional i-vectors representation and new classifiers such as Probabilistic Linear Discriminant Analysis (PLDA) or Cosine Distance classifier. In this paper, we study some variants of Boltzmann Machines (BM). BM is used in image processing but still unexplored in Speaker Verification (SR). Given two utterances, the SR task consists to decide whether they come from the same speaker or not. Based on this definition, we can illustrate SR as two-classes (same vs. different speakers classes) classification problem. Our first attempt of using BM is to model each class with one generative Restricted Boltzmann Machine (RBM) with symmetric Log-Likelihood Ratio on both models as decision score. This new approach achieved an Equal Error Rate (EER) of 7% and a minimum Detection Cost Function (DCF) of 0.035 on the female content of the NIST SRE 2008. The objective of this research is mainly to explore a new paradigm i.e. BM without necessarily obtaining better performance than the state-of-the-art system.