Real-Time Subtitle Generator for Sinhala Speech

Authors

  • R.V.P.S. Akesh
  • R.G.N. Meegama

DOI:

https://doi.org/10.31357/vjs.v26i02.6806

Abstract

In today’s digital era, the significance of speech recognition technology cannot be overstated as it plays a pivotal role in enabling human-computer interaction and supporting various applications. This paper focuses on the development of a real-time subtitle generator for Sinhala speech using speech recognition techniques. The CMUSphinx toolkit, an open-source toolkit based on the Hidden Markov Model (HMM), is employed for the implementation of the application. Mel-frequency cepstral coefficients (MFCC) are utilized for feature extraction from the given ’wav’ format recordings. The paper places significant emphasis on the importance of a real-time subtitle generator for Sinhala speech and explores the existing literature in the field. It outlines the objectives of the research and discusses the achieved outcomes. By fine-tuning hyperparameters to enhance the recognition accuracy of the system, impressive results of 88.28% training accuracy and 11.72% Word Error Rate (WER) are attained. The
significance of this research is underscored by its methodological advancements, robust performance metrics, and the potential impact on facilitating seamless interactions and applications in the Sinhala speech domain.

Keywords: Speech recognition, Real-time, Subtitle, CMUSphinx, Open source, Hidden Markov Model, Mel-frequency cepstral coefficients, ’wav’, Accuracy, Word Error Rate

Author Biographies

R.V.P.S. Akesh

Department of Computer Science, University of Sri Jayewardenepura Gangodawila, Nugegoda, Sri Lanka

R.G.N. Meegama

Department of Computer Science, University of Sri Jayewardenepura Gangodawila, Nugegoda, Sri Lanka

Downloads

Published

2023-12-31