Deep Integration of Virtual Reality Technology: Innovative Path and Practical Exploration of the Reform of University English Speaking Education
Qing Hua Yang
School of Foreign Languages, Hubei University of Science and Technology, Xianning 437100, China
Abstract:
In order to improve the real-time and accuracy of the voice interaction system, this paper constructs a human-computer voice interaction system based on virtual reality environment, integrating ASR (Automatic Speech Recognition), NLP (Natural Language Processing) and TTS (Text to Speech) core modules to construct a complete speech processing closed loop. The system integrates the core modules of ASR (Automatic Speech Recognition), NLP (Natural Language Processing) and TTS (Text to Speech) to build a complete closed loop of speech processing. The system adopts ReSpeaker Mic Array v2.0 to collect speech signals in the perception layer, with a signal-to-noise ratio of 65dB and a sampling rate of 16kHz; in the data processing layer, the speech recognition module is based on DeepSpeech 3.0, with a recognition accuracy of 93.8%, and an average response latency of less than 250ms; the semantic modeling part adopts BERT-large model, with semantic parsing accuracy of less than 250ms; and the semantic modeling part adopts BERT-large model, with a semantic parsing accuracy of less than 250ms. large model is used for semantic modeling, with a semantic parsing accuracy of 92.5%; Tacotron 2 is used for speech synthesis, with an output delay of less than 80ms and a timbre stability of 98.7%. The system supports multimodal input and WebRTC low-latency transmission protocol, and the audio latency is optimized to 147ms under 5G network. 200-person empirical tests show that speech fluency (WPM) is improved by 45.9% on average, the speech recognition error rate (WER) is reduced by 49.2%, the stability of intonation F0 is improved by 6.9%, and the semantic score is improved by 19.4%. The system has good performance in end-to-end speech understanding and feedback response, and can provide technical support for real-time speech interaction and multimodal communication system design.