Development of an Automated System for Integrating Biotechnological Knowledge into English Translation Corpora Using Data Mining Algorithms


  • Yanhua Ma School of English Language, Zhejiang Yuexiu University, Shaoxing, Zhejiang, China, 312000



Machine Translation (MT), a critical subfield of natural language processing, has seen transformative advancements through the adoption of end-to-end neural machine translation (NMT) models. These models have become the gold standard in contemporary translation systems, crucial for translating complex biotechnological texts. This paper provides a comprehensive review of NMT, emphasizing architectures, translation strategies, and data augmentation techniques tailored for biotechnological contexts. We also compile an essential toolkit for researchers aiming to apply these methods in the field of biotechnology. The integration of diverse and overlapping subjective prior knowledge remains a significant challenge in NMT, especially pertinent to the precise translation of biotechnological information. We propose the use of back regularization as a framework to incorporate prior knowledge effectively into the NMT process. This integration uses features represented in a log-linear model to enhance the learning process of the neural translation model. Our experiments with Chinese-to-English translations of biotechnological texts demonstrate substantial improvements, suggesting that our approach can significantly enhance the accuracy and reliability of translations in the biotechnological sector.

