人机交互式机器翻译技术

武汉供卵中心

人机交互式机器翻译技术

点击数：168次 2020-05-15 10:47

1. 技术背景

1. technical background（1）机器翻译研究历程Machine Translation research course机器翻译的研究在上世纪五十年代就已经展开，早期的工作主要以基于规则的方法为主，进展相对来说比较缓慢。之后美国自然语言处理咨询委员会还作出了一个质疑了机器翻译的可行性的报告，对该领域研究造成了一定阻碍。到了上世纪九十年代，IBM提出了著名的基于词的翻译模型，开启了统计机器翻译时代，随后短语和句法模型相继被提出，翻译质量得到了显著提升。最近两年神经网络机器翻译方法开始兴起，该方法突破统计机器翻译方法中的许多限制，成为当前的研究热点。Machine Translation's research began in the 50s of the last century, and early work was mainly based on rule based methods, and progress was relatively slow. Later, the Natural Language Processing Advisory Board also made a report that challenged Machine Translation's viability, hindering research in the field. By the 90s of last century, IBM proposed the famous word based translation model, which opened the statistical Machine Translation era, and then the phrase and syntax model were put forward, and the quality of translation was greatly improved. In the last two years, the method of neural network Machine Translation began to emerge. This method breaks through many limitations in the statistical Machine Translation method and becomes the focus of current research.　　（2）统计机器翻译Statistical Machine Translation统计机器翻译的基本思想是充分利用机器学习技术从大规模双语平行语料中自动获取翻译规则及其概率参数，然后利用翻译规则对源语言句子进行解码。对于给定的源语言句子，统计机器翻译认为其翻译可以是任意的目标语言句子，只是不同目标语言句子的概率不同。而统计机器翻译的任务，就是从所有的目标语言句子中，找到概率最大的译文。The basic idea of Machine Translation is to make full use of machine learning techniques of automatic acquisition of translation rules and probability parameters from the largeapproach for training, can also optimize all the parameters in the model. It is different from the traditional Machine Translation based on the discrete symbol conversion rules as the core, and needs to have a series of steps such as word alignment, rule extraction, probability estimation and parameter adjustment, which is prone to error propagation. The Machine Translation neural network uses continuous vector representation to model the translation process. Thus, it can fundamentally overcome the problems of poor generalization performance and too strong independence assumption in the traditional Machine Translation.2. 译后编辑/交互式机器翻译Post edit / interactive Machine Translation（1）译后编辑postinteraction. SDL, Trados and other computer aided translation tools usually support Google translation and other API to obtain the automatic translation of Machine Translation directly. Therefore, post editing is the most popular form of assistance. If the quality of Machine Translation's automatic translation is higher, the amount of manual modification will be relatively small, which can effectively improve the interpreter's productivity. But in practice, post editing reality facing many challenges, sometimes even just Something is better than nothing. The main reason is that the quality of the translation of the current Machine Translation system is far from the expected user expectation of the translation scene. If the poor quality of the Machine Translation automatic translation, translators have to play less words to analyze and modify the sentence at the expense of the Its loopholes appeared one after another., far more than the direct translation. Terminology translation rigid translation and makes use of Machine Translation's specious interpreter enthusiasm is not high, and repeat the same error correcting boring and repeated modification is still not satisfactory the frustration users feel depressed.近两年来，神经网络机器翻译发展迅猛，译文质量显著提升，同时也带来了新的挑战，如“顺而不信”和翻译结果难以干预等问题。因此，神经网络机器翻译仍需要相当长时间才可能在实践中显著改善译后编辑的人机交互体验。In the past two years, the development of Machine Translation has been rapid, and the quality of translation has been greatly improved. At the same time, it has brought new challenges, such as "Shun and not believe" and difficult to interfere with the translation results. Therefore, it still takes a long time for the neural network Machine Translation to significantly improve the interactive experience of post editing editors in practice.（2）交互式机器翻译Interactive Machine Translation交互式机器翻译指系统根据用户已翻译的部分译文动态生成后续译文候选供用户参考。译员从零开始翻译，因此译员无需修改自动译文，仅在翻译过程中选择可接受的部分即可。该技术指在通过翻译人员与机器翻译引擎之间的交互作用，从而实现人类译员的准确性和机器翻译引擎的高效性。Interactive Machine Translation means that the system dynamically generates candidate candidates for subsequent translations according to the translated parts of the user's translations. The interpreter starts from scratch, so the interpreter does not need to modify the automatic translation and only accepts the accepted part in the translation process. The technology refers to the interaction between the translator and the Machine Translation engine, thus achieving the accuracy of the human interpreter and the efficiency of the Machine Translation engine.与译后编辑相比，交互式机器翻译系统对技术实现有更高的要求：从左至右的强制解码和流畅的实时响应。同时，因为需要译员反复阅读和理解最新的译文部分，这种模式也给用户带来了额外负担。因此，目前流行的在线翻译系统和计算机辅助翻译工具并不支持交互式机器翻译模式。目前的交互式机器翻译系统仍处于原型阶段。可喜的是，从近期机器翻译技术的发展，尤其是基于神经网络机器翻译的交互式机器翻译的进步可以预见，交互式机器翻译有望成为未来人工翻译的候选项之一。Compared with post editing, interactive Machine Translation systems have higher requirements for technology implementation: forced decoding from left to right and smooth realcolor="#00BBFF">In order to achieve this goal, we propose a Chinese input method that combines statistical Machine Translation technology. The input method for artificial translation according to the scene, the user presses a key, the statistical translation rules, translation hypothesis and Ninteraction experience. In addition, in order to guide Machine Translation generation system is more suitable for the input method of the translation results, we put forward the evaluation index for automatic input method Machine Translation translation, the input method using statistical translation more appropriate results, to further enhance the efficiency of artificial translation.4. 术语翻译方法Terminology translation method（1）基于双语括号句子的术语翻译挖掘方法A method of terminological translation mining based on Bilingual parenthesis sentences站在改善最终机器翻译译文质量的角度，我们认为术语翻译知识的质量优先于规模。因此，我们将目光转向互联网上单语网页上大量存在的双语括号的句子。所谓双语括号句子需要同时满足下列三个条件：包含一个或多个括号；紧临括号的左边是一个术语；该术语的译文在括号内。双语括号句子包含丰富的术语翻译知识，如目标语言术语的上下文信息。相对于平行语料或可比语料而言，双语括号句子的限制更少，更新比较及时且相对更容易抽取术语翻译知识。因此我们认为双语括号句子是挖掘术语翻译知识的理想语料。如以下示例所示，挖掘术语翻译知识的主要任务是确定目标术语的左边界，因为右边界已经由括号给出，且源语言术语的边界是确定的。From the point of view of improving the quality of the final Machine Translation translation, we believe that the quality of terminology translation knowledge is prior to scale. So we turn our attention to the large number of bilingual parentheses in the monolingual web pages. The somethods have basically exited the stage of history. Although statistical methods are not limited by the field, the recognition of multi terms is not ideal, so the terms extracted also have more noise. Therefore, if the term recognition results are directly aligned as words, the term recognition errors will be passed to the next stage, and the quality of the translation will be difficult to improve. Therefore, it is an urgent problem to study how to improve the term recognition and word alignment performance and to improve the quality of the final Machine Translation translation.为了尽量降低训练流程中错误传递的影响以改进术语翻译知识抽取，我们提出了融合双语术语识别的联合词对齐方法。首先，为了降低对训练数据的依赖，该联合词对齐方法从单语术语识别弱分类器开始。该分类器由维基百科等自然标注数据训练得到的。其次，为了降低因术语识别和词对齐的错误传递带来的负面影响，该方法利用双语术语和词对齐的相互约束，将单语术语识别、双语术语对齐和词对齐联合在一起执行，最后得到效果更好的双语术语识别和词对齐结果。In order to reduce the influence of error transfer in training process and improve terminology translation knowledge extraction, we propose a joint word alignment method for bilingual term recognition. First, in order to reduce the dependence on training data, the joint alignment method starts with the monolingual term recognition of the weak classifier. The classifier by Wikipedia and other natural annotation data obtained from the training. Secondly, in order to reduce terminology recognition and word alignment error propagation of the negative impact, the mutual constraint of bilingual terminology and word alignment, bilingual terminology recognition, bilingual terminology alignment and word alignment together, finally get the better effect of the bilingual terminology recognition and word alignment results. 　　（3）融合术语识别边界信息的统计翻译术语解码方法Statistical translation term decoding method incorporating terminology identifying boundary information人名、地名、机构名等命名实体有明显的边界特征，相对容易进行识别与对齐。一般而言，将命名实体直接翻译方法用于统计翻译解码器就可以取得比较好的翻译效果。但是，用与翻译命名实体的方式“直接翻译” 术语并不能明显改善机器翻译自动译文的质量。最主要的原因就是目前的术语识别模型还不够好，识别准确率大幅弱于命名实体识别。另外，由于术语本身是与领域高度相关的，为目标领域训练高性能的术语识别分类器需要大量高质量且同领域的人工标注训练语料，这进一步加大了术语识别的难度。在这种情况下，如果直接将术语识别结果作为词对齐的约束，术语识别错误就会传递给后续阶段，最终译文质量反而难以得到提升。因此，研究如何提高术语识别和词对齐性能，并提高最终的机器翻译译文质量是迫切需要解决的一个难题。Named entities such as names, places and institutions have obvious boundary features and are relatively easy to identify and align. Generally speaking, the direct translation method of named entity can be used in statistical translation decoder to achieve better translation results. However, the term "direct translation" does not significantly improve the quality of Machine Translation's automatic translation. The main reason is that the current terminology recognition model is not good enough, and the recognition accuracy is much weaker than named entity recognition. In addition, because the term itself is highly correlated with the field, for the training corpus annotation terminology recognition classifier training goal in the field of high performance requires a large number of high quality and in the same field, which further increased the difficulty of term recognition. In this case, the term recognition error will be passed to the next stage if the term recognition result is directly aligned with the word, and the quality of the translation will be difficult to improve. Therefore, it is an urgent problem to study how to improve the term recognition and word alignment performance and to improve the quality of the final Machine Translation translation.为了尽量降低训练流程中错误传递的影响以改进术语翻译知识抽取，我们提出了融合双语术语识别的联合词对齐方法。首先，为了降低对训练数据的依赖，该联合词对齐方法从单语术语识别弱分类器开始。该分类器由维基百科等自然标注数据训练得到的。其次，为了降低因术语识别和词对齐的错误传递带来的负面影响，该方法利用双语术语和词对齐的相互约束，将单语术语识别、双语术语对齐和词对齐联合在一起执行，最后得到效果更好的双语术语识别和词对齐结果。In order to reduce the influence of error transfer in training process and improve terminology translation knowledge extraction, we propose a joint word alignment method for bilingual term recognition. First, in order to reduce the dependence on training data, the joint alignment method starts with the monolingual term recognition of the weak classifier. The classifier by Wikipedia and other natural annotation data obtained from the training. Secondly, in order to reduce terminology recognition and word alignment error propagation of the negative impact, the mutual constraint of bilingual terminology and word alignment, bilingual terminology recognition, bilingual terminology alignment and word alignment together, finally get the better effect of the bilingual terminology recognition and word alignment results.。

上一篇：PDF翻译是一把双刃剑

下一篇：上海翻译公司证件翻译

供卵试管婴儿助孕最新文章

供卵试管婴儿助孕资讯推荐阅读