Xavier Menendez-Pidal - Los Gatos CA Miyuki Tanaka - Campbell CA Ruxin Chen - San Jose CA Duanpei Wu - Sunnyvale CA
Assignee:
Sony Corporation - Tokyo Sony Electronics Inc. - Park Ridge NJ
International Classification:
G10L 506 G10L 900 G10L 302
US Classification:
704233
Abstract:
A method for reducing noise distortions in a speech recognition system comprises a feature extractor that includes a noise-suppressor, one or more time cosine transforms, and a normalizer. The noise-suppressor preferably performs a spectral subtraction process early in the feature extraction procedure. The time cosine transforms preferably operate in a centered-mode to each perform a transformation in the time domain. The normalizer calculates and utilizes normalization values to generate normalized features for speech recognition. The calculated normalization values preferably include mean values, left variances and right variances.
Method For Implementing A Speech Recognition System To Determine Speech Endpoints During Conditions With Background Noise
Duanpei Wu - Sunnyvale CA Miyuki Tanaka - Campbell CA Ruxin Chen - San Jose CA Lex Olorenshaw - Madera CA
Assignee:
Sony Corporation - Tokyo Sony Electronics Inc. - Park Ridge NJ
International Classification:
G01L 300
US Classification:
704253
Abstract:
A method for implementing a speech recognition system for use during conditions with background noise includes the steps of calculating, in real-time, sequential short-term delta energy parameters for speech energy from a spoken utterance, determining threshold values in the speech energy, and identifying a beginning point and an ending point for the spoken utterance based on the relationship between the threshold values and the short-term delta energy parameters.
Speech Detection With Noise Suppression Based On Principal Components Analysis
Duanpei Wu - Sunnyvale CA Miyuki Tanaka - Campbell CA Mariscela Amador-Hernandez - San Jose CA
Assignee:
Sony Corporation - Tokyo Sony Electronics Inc. - Park Ridge NJ
International Classification:
G10L 2102
US Classification:
704226
Abstract:
A method for effectively suppressing background noise in a speech detection system comprises a filter bank for separating source speech data into discrete frequency sub-bands to generate filtered channel energy, and a noise suppressor for weighting the frequency sub-bands to improve the signal-to-noise ratio of the resultant noise-suppressed channel energy. The noise suppressor preferably includes a subspace module for using a Karhunen-Loeve transformation to create a subspace based on the background noise, a projection module for generating projected channel energy by projecting the filtered channel energy onto the created subspace, and a weighting module for applying calculated weighting values to the projected channel energy to generate the noise-suppressed channel energy.
Method And Apparatus For A Parameter Sharing Speech Recognition System
Ruxin Chen - San Jose CA Miyuki Tanaka - Campbell CA Duanpei Wu - Sunnyvale CA Lex S. Olorenshaw - Corte Madera CA
Assignee:
Sony Corporation - Tokyo Sony Electronics, Inc. - Park Ridge NJ
International Classification:
G10L 708
US Classification:
704254
Abstract:
A method and an apparatus for a parameter sharing speech recognition system are provided. Speech signals are received into a processor of a speech recognition system. The speech signals are processed using a speech recognition system hosting a shared hidden Markov model (HMM) produced by generating a number of phoneme models, some of which are shared. The phoneme models are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having a common biphone exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models having the same center context.