Speech Enhancement Using Machine Learning
DOI:
https://doi.org/10.48001/joeeed.2024.2111-15Keywords:
Charlier moments, Charlier polynomials, Deep learning, Machine learning, Speech enhancementAbstract
The incorporation of machine learning, specifically deep learning, into speech enhancement algorithms represents an advanced methodology aimed at restoring original speech signals from distorted counterparts. This innovative approach incorporates the use of Charlier polynomials-based discrete transform, particularly the discrete Charlier transform (DCHT), to extract spectra from noisy signals employing a fully connected neural network. Leveraging the capabilities of deep learning, particularly in handling nonlinear mapping challenges, the system acquires contextual information from speech signals, resulting in enhanced speech characterized by improved quality and intelligibility. The proposed algorithm undergoes rigorous empirical testing through self-comparison, fine-tuning the DCHT parameter to optimize the performance of speech enhancement models. The experimentation entails the variation of DCHT parameter values, with evaluation conducted using the TIMIT database. Diverse speech measures are employed for comprehensive assessment, revealing the effectiveness of the DCHT-based trained model in enhancing speech signals within specific conditions.
Downloads
References
Alshabandar, R., Hussain, A., Keight, R., & Khan, W. (2020, July). Students performance prediction in online courses using machine learning algorithms. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.
https://doi.org/10.1109/IJCNN48605.2020.9207196.
Bhangale, K. B., & Kothandaraman, M. (2022). Survey of deep learning paradigms for speech processing. Wireless Personal Communications, 125(2), 1913-1949. https://doi.org/10.1007/s11277-022-09640-y.
Bulut, A. E., & Koishida, K. (2020, May). Low-latency single channel speech enhancement using u-net convolutional neural networks. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6214-6218). IEEE. https://doi.org/10.1109/ICASSP40776.2020.9054563.
Dattatray, G., & Takale, S. U. (2022). Machine learning method for automatic potato disease detection. Neuro Quantology, 20(16), 2102-2106. https://doi.org/10.48047/NQ.2022.20.16.NQ880300.
Kadam, S. U., Khan, V. N., Singh, A., Takale, D. G., & Galhe, D. S. (2022). Improve the performance of non-intrusive speech quality assessment using machine learning algorithms. NeuroQuantology, 20(10), 12937. https://doi.org/10.14704/nq.2022.20.10.NQ551254.
Kolmogorov, A. (1941). Interpolation and extrapolation of stationary random sequences. Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematicheskaya, 5, 3. https://cir.nii.ac.jp/crid/1371694370656646659.
Loizou, P. C. (2007). Speech enhancement: Theory and practice. CRC press.
https://www.taylorfrancis.com/books/mono/10.1201/9781420015836/speech-enhancement-philipos-loizou.
Mandala, D., Du, X., Dai, F., & You, C. (2008). Load balance and energy efficient data gathering in wireless sensor networks. Wireless Communications and Mobile Computing, 8(5), 645-659. https://doi.org/10.1002/wcm.492.
Shende, M. S. S. (2023). A review on wireless sensor network: Its applications and challenges. International Journal of Computational Research in Engineering and Science, 1(01), 18-25. https://ijcres.org/index.php/1/article/view/8.
Swamy, K. V., & Divya, B. (2021, December). Skin disease classification using machine learning algorithms. In 2021 2nd International Conference on Communication, Computing and Industry 4.0 (C2I4) (pp. 1-5). IEEE. https://doi.org/10.1109/C2I454156.2021.9689338.
Takale, D. G., Gunjal, S. D., Khan, V. N., Raj, A., & Guja, S. N. (2022). Road accident prediction model using data mining techniques. NeuroQuantology, 20(16), 2904-2101. https://doi.org/10.48047/NQ.2022.20.16.NQ880299.