Lade Inhalt...

Overview of Translation Tools - Benefits of Translation Memory Management Software for an International Company

Diplomarbeit 2007 88 Seiten

Informatik - Software


Table of Contents

1 Purpose and Course of Action

2 Analysis of the Translation Market
2.1 Impact of the Internet
2.2 Increase in Demand for Translations
2.3 Translation Market
2.3.1 Definition of Language Translation
2.3.2 Development of the Translation Market The Market of Machine Translation Human Translation

3 Translation and Translation Management Tools
3.1 Machine Translation
3.1.1 Brief History of the Machine Translation
3.1.2 Basic Features and Terminology Direct Translation Systems Rule-based Translation Interlingual Systems Transfer Systems Corpus-based Methods Statistical Machine Translation Example-based Machine Translation Hybrid Approaches Controlled Language, Domain-specific and User-specific Systems
3.1.3 Evaluation of Machine Translation IR-style Techniques BLEU NI F-measure METEOR String Matching Techniques
3.1.4 Reasons to use Machine Translation Open Source and Commercial Software
3.1.5 Return on Investment
3.1.6 Summary
3.2 Translation Memory
3.2.1 The Concept of Translation Memory
3.2.2 Translation Process, and Effects of TM on Translation Process Internal Attributes Terminology Databases Analysis
3.2.3 Common Standards and Products
3.2.4 Adequate Texts for TM Usage Consideration of the documents Updates. Revisions.
3.2.5 Advantages and Drawbacks of TM
3.2.6 Overview of currently available TM Products Classical TM Tools TM/MT Hybrids Localization Software with TM
3.2.7 Cost-effectiveness of TM
3.2.8 Summary
3.3 Globalization and Localization Software
3.3.1 Introduction to Globalization and Localization
3.3.2 Differentiation of Terminology
3.3.3 Organizations for the Globalization, Internationalization, and Localization LI W3C I
3.3.4 Benefits of Internationalized Software Application
3.3.5 Conclusion

4 Proposal for XY Company for Use of Translation Memory Tool
4.1 Company Profile
4.2 Operating Manuals (OPM)
4.3 Current Translation Process of Operating Manuals
4.3.1 Disadvantages of a Conventional Translation Process
4.4 How Translation Memory Tool Can Benefit the Company
4.4.1 How Documentation from the Company is Suitable for TM
4.4.2 Benefits from TM
4.5 Proposal to Purchase Transit TM from STAR AG
4.5.1 STAR AG Brief Company Description
4.5.2 STAR AG Transit® Translation Memory
4.5.3 Key Benefits for XY of Transit TM
4.5.4 System Requirements
4.5.5 Installation or Transit TM
4.5.6 Additional Software Required to Work with Transit TM
4.5.7 Cost Transit TM
4.6 Conclusion


List Electronic References



List of Figures

Figure 1: Building Blocks of a Direct MT System

Figure 2: Building Blocks of an Interlingual MT System

Figure 3: Building Blocks of a Transfer-based MT System

Figure 4: Direct and Indirect MT

Figure 5: Verification of Translation Equivalents for Authentication

Figure 6: EBMT Architecture

Figure 7: Screenshot from Linguaphile Translator

Figure 8: Linguatec PT 2006

Figure 9: Babel Fish Translation

Figure 10: Systran MT Engine Systran Premium 4

Figure 11: Basic Translation Memory Steps

Figure 12: Basic translation process

Figure 13: Translation Process with TM

Figure 14: Lexicological Entry

Figure 15: Terminological Entry:

Figure 16: Mailbox Representation in two Different Countries

Figure 17: Translation Process with Transit

List of Tables

Table 1: Number of People Online in Each Language Zone (Native Speakers)

Table 2: Internet Users by Language

Table 3: Overview of MT Approaches

Table 4: Effectiveness of TM

Table 5: Worldwide Translation/Globalization Support Software Revenue by Region, 2005 and 2009 in Million US Dollars ($M)

Table 6: Translation Memory Tools

Table 7: Competitive Analysis of CAT Tools Vendors

List of Abbreviations

illustration not visible in this excerpt

1 Purpose and Course of Action

“TRANSLATION is very much like copying paintings.” These are the words of Boris Pasternak (1890-1960), a Russian Poet, Novelist and Translator. The key to translation is not only understanding and being fluent in more than one language and having an understanding of language and culture, but also being able to convey the meaning of the text in one language into the other. However, just the knowledge of different languages is not enough to succeed in the competing world scene of translation. Through the immense development of the Internet, information technology, and ongoing globalization, “translation complexity takes a quantum leap”[1] and forces today’s translators to use computer technology, advanced software applications, and computer-aided translation tools to meet enhanced translation requirements in a timely manner. In order to be successful translator in today’s fast turn around times, the knowledge of and skills in different translation facilitating programs are essential. Translation is a difficult process, which requires computer and software skills in addition to proficiency. The translation business is quickly becoming one of the fastest growing markets in the world, and as a result, translators must develop efficiencies in their processes to meet the increasing demand and to be able to offer competitively priced services.

The purpose of this thesis is to analyze the major translation tools available in the marketplace and to illustrate how they benefit the translator in multiple ways. This thesis will combine the essential knowledge of these different software tools and provide the important criteria required to choose the appropriate foreign language translation software.

Organization of the Thesis

Chapter 1, Purpose and Course of Action, describes the purpose of this thesis, and provides an overview.

Chapter 2, Analysis of the Translation Market, explains the meaning of translation, and presents an outline description of the actual state of the translation market. This chapter demonstrates how strongly the Internet has influenced the demand for instantaneous translation services and has it’s bearing on the translation market.

Chapter 3: Translation and Translation Management Tools, emphasizes the importance of the competency in the software application and how to determine whether the software is appropriate for a particular translator or company. This chapter introduces the main translation facilitating tools. Chapter 3 is divided into three main sections: Machine Translation, Translation Memory Management Tools, and Localization Software. The first two sections give a definition of a particular tool, its purpose, advantages and disadvantages, as well as its costs and return on investment. Furthermore, the two sections describe the evaluation process for each type of tool and give a brief overview of the most used tools on the market. The third section describes briefly the Localization Software. The most important part here is the Translation Memory because it is a subject of the proposal for an international company to purchase this tool.

Chapter 4, Proposal for MM to Use Translation Memory Tool, represents the main purpose of this work: a proposal for an international organization to purchase a Translation Memory. The proposal must persuade the company that a translation memory tool will increase the efficiency of the translation process, bring better consistency in the translated documentation and by doing so, lower the cost of the production of operating manuals in different languages. An introduction of the appropriate translation memory tool for this particular company is present in this chapter.

Moreover, this thesis highlights the importance of proper research and the evaluation of translation tools prior to purchasing and implementing the software in the company, similar to any other clever investment.

2 Analysis of the Translation Market

The progress in integration of advanced technology and communication is bringing large-scale changes to the economy. Companies use their computer networks, applications, and the Internet to identify potential customers, markets around the world, evaluate services, and compare prices beyond the geographical and economical borders. Today’s business world without the Internet is unthinkable. The Internet and World Wide Web transcend national and geographical barriers and empower companies and individuals to share knowledge and draw resources across geographical and linguistic boundaries. They steer the economy towards multinational participation and international transaction.

2.1 Impact of the Internet

The global Internet infrastructure is developing very rapidly, and approximately 1.1 billion users will have access to the Web by 2007[2]. With the expansion of the Internet, the online population in non-English speaking regions of the world is growing at increasing rates. For various reasons, such as competitive advantages and increased market requirements, companies are forced to translate now into multiple languages.

Statistics in table 1 clearly show that the English-only world has given ground to a multilingual world.

Now business partners need to communicate even more in different languages and to deal with different cultures. Effective and precise communication clearly lies in the core of any successful business. This is exceptionally difficult if the involved parties do not speak the same language. For this reason, translations tip the scale in today’s global electronic business and furthermore show a strong upward trend.

illustration not visible in this excerpt

Table 1: Number of People Online in Each Language Zone (Native Speakers)[3]

2.2 Increase in Demand for Translations

With the development of information technology, the volume of information is vast. Companies are now putting structured implementations in order to deal with this load of information. Translators must keep up, not only with the basics and with nuances of languages and cultures, but also with business and technical terminology. In order to be a successful translator and to be able to offer competitively priced services, translation must be performed fast and accurately. To achieve these goals, various companies offer different translation and translation management tools. There is a wide range of software available to help translators increase their productivity and handle multilingual information, starting from document and content management systems, machine translations, translation memory management tools, to globalization and localization software. According to the Forrester Research, an independent technology and market research company, volume of multilingual web sites were growing during 2003 by a factor of ten and human web content translation was growing at 50% a year. The language translation market has been projected to reach $22.7US billion by the end of 2005 according to a report by Allied Business Intelligence. Table 2 shows Internet growth by languages in March 2006. It is clear that increased volume of information implicates and stimulates the growing demand for translations, and the World Wide Web is becoming truly global.

illustration not visible in this excerpt

Table 2: Internet Users by Language[4]

2.3 Translation Market

Language translation is not a new phenomenon. Since people began to talk, they were searching for ways to express themselves and communicate their ideas, thoughts, and opinions to others. All along history, the work of translators has acquired an extraordinary importance in the development and transmission of the cultural heritage of humankind. The companies are able to do business with companies in other countries thanks to the effort of translators. The translation of natural languages underwent a long history. Today, it is experiencing a period of great upheaval.

2.3.1 Definition of Language Translation

The word "translation" is, etymologically[5], a "carrying across" or "bringing across": the Latin translatio derives from transferre (trans, "across" + ferre, "to carry" or "to bring"). Language translation, in its most basics, “is an activity comprising the interpretation of the meaning of a text in one language and production in another language.”[6] Translation can take shape in two main forms: human translation and machine translation. Human translation is a direct translation of text done by a person who is fluent in both source and target languages. Even though the translator is versed in both languages, it is very time consuming and in many cases a costly process. Furthermore, the effort to computerize the translation process can be divided simply into machine translation and computer-assisted translation. Machine translation is based on a set of codes and protocols that involve fully automated and computerized translation of natural-language text without any human intervention. In contrast, computer-assisted or computer-aided translation requires that a human translate that text using computer software designed to support and facilitate the translation process[7].

2.3.2 Development of the Translation Market

The market is “exploding” of translation technologies on one side and human translator services on the other. The translation industry is one of the most fragmented service sectors in the world. There are 100,000 freelance translators in Europe and over 200,000 in the world. There are an estimated (EUATC base = 400 companies) 1,500 translation companies in Europe and 3,000 worldwide[8]. Their average turnover is in the region of 300,000 Euros per company, i.e. a worldwide turnover of close to 1,000 million Euros, plus 1 billion Euros, i.e. the turnover of the 15 biggest translation companies in the world[9]. The Market of Machine Translation

MT gateways on the Lotus Domino server, AltaVista’s Babel Fish Translation server by Systran, Google Language Tools and many other free translation tools on the Internet are the example of a booming translation industry. Since the first idea of using computers to translate natural language, which was proposed in 1940s, the translation technology has made major improvements. Still, even today software can only translate individual words or rather simple sentences correctly. The results of pure machine translation contain many errors and are often more amusing then useful. A pure machine translation can only be used or understood with the correction of a professional translator. The reason for this is very simple. Software can only produce a perfect output when it receives perfect input; and humans almost never communicate with perfect usage of the rules of grammar and terminology upon which software relies to translate one language into another[10]. Language translation is a difficult process. Even in one’s own native language, things can be misunderstood very quickly. Even quicker, it will happen in a foreign language because the languages are influenced by culture, history, and mentality of a particular country. Due to the differences in dialects, we can, at times, find a word with several distinct meanings, for example the word “tostón.” In Salamanca, Spain, this word refers to a piglet baked in a special way and is a typical local meal. The same word in Andalusia means toasted bread, and in Puerto Rico, “toston” is a fried plantain. Another example of translation from English to German reinforces the difficulties and sometimes even the impossibility for computer software to translate a certain word in a particular sentence. The English word “safe” in two different sentences: “I have a safe home” and “I keep the money in a safe” is the same word with two completely different meanings. In another example, the German word “Messer” can be translated into English as “knife” or as “gager.” These examples show the boundaries of a “pure” machine translation.

Nevertheless, machine translation can be useful in catching the meaning of a text or to get a general idea about the meaning of the overall context. It can be used only for easy sentences or with the correction of a translator. However, machine translation can be very helpful in many cases. For instance, machine translation in form of handheld electronic dictionaries and translators have recently emerged as the modern language and communication solution in many different areas such as traveling, studying languages, and business. With the expansion of global telecommunications, many MT software vendors now offer talking and non-talking electronic dictionaries and interpreters, PDA and mobile phone electronic dictionaries.

A great growth has occurred in the MT services offering on-line and real-time translation of electronic mail messages and chat forums. In this area, it is obvious that only fully automatic systems could possibly operate in real time with user acceptance. Another major use of real-time machine translation is the immediate production of television captions or subtitles in multiple languages. This service would be impossible without real-time fully automatic translation[11]. Human Translation

The profession of translator dates back many centuries. However, the translators today have a completely new set of skills and tasks and meet completely new challenges than the translators of the early days. Translators today need not only proficiency in the languages themselves, but also in the new technologies designed to manage and facilitate the translation process. Today, it is difficult to keep up with the ongoing development of the translation market. A technical background and understanding is now a requirement for the professional translator.

MT is a system that belongs to artificial intelligence. “Artificial intelligence is a branch of computer sciences that deals with using computers to simulate human thinking.” However, a machine will never be able to operate like the human brain. In recent years, MT systems designed to translate from one language into another vary from a basic word finder for the tourist to complex programs for translating technical and scientific texts. Some software cost as low as 250 US Dollars while others run as high as 250,000 US Dollars and are used from freelancers, companies and government. These MT systems proved that MT cannot replace human translation.[12]

The following chapters will address and discuss the issue of translation tools in detail.

3 Translation and Translation Management Tools

The three main technologies in translation industry are Machine Translation (MT), Translation Memory (TM) and Globalization and Localization tools. Machine translation is a subfield of computational linguistics. It is a form of translation without human intervention at any stage. MT output generally cannot be used without extensive human post-editing. Translation memory in comparison to machine translation is a type of database that is used in software programs designed to aid human translators[13]. Other important groups of translation technology is the globalization and localization tools. These tools help to adapt products such as publications or software for non-native environments, especially other nations and cultures.

Chapter 3.1 addresses machine translation. It gives a brief history about the development of MT, advantages and disadvantages, evaluation and return on investment of MT.

3.1 Machine Translation

Automatic translation of human languages is long-dated scientific dream of immense social, political, and scientific importance. However, putting the vision of computerized systems for production of translations into reality was a much harder task then it first appeared. First ideas about mechanizing translations were traced back in the seventeenth century. However, realistic possibilities were discovered first in the late twentieth century.

Today Machine Translation is again becoming an important field of research. The need for translations of technical and commercial documentation is growing well beyond the capacity of the translation profession.

Machine Translation is the oldest application of natural language processing. During the fifty years of history several major approaches have been developed.

3.1.1 Brief History of the Machine Translation

This chapter gives a brief history and mentions the most significant developments of machine translation. This chapter attaches great importance to the methods established at the beginning of the 1990s.

The impression that MT is something relatively new is not correct. MT has a long history. It started almost before electronic digital computers existed.

In 1933, a French-Armenian Georges Artsrouni and a Russian Petr Smirnov-Troyanskii applied for patents for “translating machines.” Of the two, Troyanskii’s was more significant. He proposed not only a method for an automatic bilingual dictionary, but also a scheme for coding interlingual grammatical roles, based on Esperanto[14] and an outline of how the analysis should work. He divided mechanical translation in three stages. In the first step an editor, that knows only the source language (SL) analyses the words into their base form and syntactic functions. In the next step, the machine-transformed sequences of base forms and functions are converted into equivalent sequences in the target language (TL). Finally, another editor knowing only the target language converted the output into the normal forms of his own language. However, the Troyanskii’s idea was not known until the end of the 1950s[15].

Soon after, Konrad Zuse, a German engineer, completed the first general purpose programmable calculator in 1941, and the first “electronic calculators” came to the market, began the research of using computers as aids for translating natural languages. Within a few years, research on MT had begun at many US universities. In 1954, collaboration between IBM and Georgetown University was able to present the first public demonstration of MT feasibility.

The first decade of research is often called “a decade of high expectations and optimism.” Many universities and companies began an extensive research. They were hoping to be able to translate natural languages without human intervention. However, disillusions grew as researchers encountered “semantic barriers” for which there were no straightforward solutions, and the real progress was much slower then expected. The quality of the output was disappointing.

The ALPAC report in 1966 found that ten years of research had failed to fulfill the expectations. The funding for MT was dramatically reduced. This caused a virtual end of MT research in the United States for over a decade. However, research continued in Canada, France, and Germany[16].

In 1980, as computation power increased and became less expensive, research began afresh. At the end of the 1980s, MT entered a period of innovation in methodology, which has changed the framework of research. The most significant commercially available system during the 1980s was the METAL, a German-English system. One of the best-known projects during this time was the Eurotra project of the European Communities (EC). This project was working on the construction of an advanced multilingual transfer systems for translation of the languages in the EC. It was a linguistic-based modular transfer system, which produced a good, but not perfect, output. The system assumed batch processing and human post-editing. In 1976, another successful operational system appeared in Canada. The Meteo system developed at Montreal University was used for translating weather reports. In any operational machine translation system, a variety of texts will be encountered even if its domain of usefulness is restricted to a specific field. This diversity of texts poses a problem on handling word sense ambiguity and customized translation.[17]

Until the end of the 1980s, MT research was based on essentially linguistic rules, such as rules for syntactic analysis, lexical rules, rules for lexical transfer and morphology[18] etc. Similarly, during 1980s, the dominant framework of MT was essentially rule-based, e.g. the linguistics-based approaches of Ariane, METAL, and Eurotra. From the early 1990s, the research diversified in many directions. The main direction of the 1990s is based on large text corpora, the alignment of bilingual texts, the use of statistical methods and the use of parallel corpora for example-based translation.[19]

Recapitulatory machine translation development was dominated by three generations. In the 1960s, direct MT systems came into existence. These systems were typical for the first generation of TM. Amongst the more famous first generation of direct MT systems are the Georgetown system and SYSTRAN. The major problem of the first generation of MT was the lack of linguistic information about source text. The effort to find ways to capture this information gave rise to the development of the indirect MT systems. The indirect approach of i nterlingual and transfer based systems, such as Eurotra and Ariane, characterize the second generation of MT, which began in the mid 1980s. This approach was based on essentially linguistic rules, so-called rule based approach. Rule-based systems, e.g. Ariane, METAL, SUSY, and Eurotra dominated in the first half of the second generation MT. In the later 1980s, interlingual systems appeared. Some were still linguistics-oriented, e.g. DTL and Rosetta, and others adopted knowledge-based approaches. Nevertheless, these knowledge-based systems continued to be essentially rule-based until almost the end of the decade when rule-based framework was broken by new corpus-based methods. The third generation began in 1990s. In this time, the hybrid systems emerged .

Today, MT capabilities provides “Real-Time MT” or translation “on-the-fly”. Within seconds after receiving the text, the computer begins providing the translation[20].

3.1.2 Basic Features and Terminology

The design of translation systems can be divided in two main forms:

- bilingual systems: designed for two particular languages, and
- Multilingual systems: designed for more than a single pair of languages.

Further, bilingual systems can be:

- uni-directional: designed to operate in only one direction (e.g. from Russian to German), or
- bi-directional: designed to operate in both direction (e.g. from Russian to German and vice versa).

There have been two basic types of machine translations: direct and indirect translations. Whereas the indirect approach cuts in two main frameworks: rule-based MT and corpus-based MT. Rule-based MT is further divided into interlingual and transfer approach, and corpus-based MT can be based on statistical methods or on corpora of translation examples, so-called example-based translation. Direct Translation Systems

The historically oldest type of system design for MT is the “direct translation.” This MT system is designed for one particular pair of languages, for example German as the language of the original text (source language) and English as the language of the translated text (target language). Direct translation is based on large dictionaries and word-by-word translation with some simple grammatical adjustments. These systems consisted primarily of large bilingual dictionaries where entries for words of the source language gave one or more equivalents in the target language and some rules for producing the correct word order in the output.

All of the first working MT systems were based on direct translation approach. The operation of the first generation MT systems was very primitive, and therefore the quality of their output was quite poor. The reasons for this are obvious. In the late 1950s and early 1960s, the second generation of computers was utilized. There were no high-level programming languages, and most programming was done in assembly code.

The first step of translation was morphological analysis of the source text where identification of word endings and word compounds took place. The results of morphological analysis would then be input into a large bilingual dictionary look-up program. There was no analysis of syntactic structure or of semantic relationships, which meant that lexical identification would depend on morphological analysis and would lead directly to bilingual dictionary look-up providing target language word equivalences. After the look-up program, some local reordering rules followed (e.g. moving some adjectives or verb particles). Words and phrases in target text needed to be rearranged in order to give more acceptable target language output[21]. The direct MT is diagrammed in Figure 3.

illustration not visible in this excerpt

Figure 1: Building Blocks of a Direct MT System[22] Rule-based Translation

The rule-based translation process requires the analysis and representation of the meaning of SL text and the generation of equivalent TL text. Typical features of these systems are batch processing with post-editing and no interactive components, essentially syntax-oriented. There have been two major rule-based approaches:

- two-stage interlingual approach: analysis leads to language-neutral representations and from this interlingual representation starts the generation of the TL texts, and
- three-stage transfer approach: analysis into abstract SL representations, transfer into abstract TL representations, and generation or synthesis into TL texts[23]. Interlingual Systems

The interlingual systems are the second basic design strategy. Interlingual approach involves the transformation of the source language into an interlingual, language independent representation. The target language is then generated out of the Interlingua. The Israeli philosopher Yehoshua Bar-Hillel first discussed the idea extensively in 1969. His arguments still have relevance for a spectrum of methods for translating content into something abstract, not only machine translation but also the translation-to-logic methods at the heart of Artificial Intelligence[24]. A schematic view of the direct approach can be summarized as follows (adapted from Hutchins & Somers, 1992):

illustration not visible in this excerpt

Figure 2: Building Blocks of an Interlingual MT System[25]

There are two principal advantages of the interlingual approach. The first advantage is its robustness, and second is the overall economy of effort in multilingual environment. In this type of translation between all pairs of a set of languages, it requires only translation to and from the Interlingua for each member of the set. Translation from and into n languages requires 2n interlingual systems, in comparison to the direct translation where n(n-1) direct translation programs are necessary. The disadvantage of the interlingual systems is their greatly increased complexity.

The interlingual approach requires a full analysis leading to an abstract representation that is independent of the source language. This way, the translation of the target text can be made without any knowledge of what the source language was. For this step, a considerable amount of information is necessary. Semantic knowledge is generally stored in a knowledge base. Transfer Systems

The third basic design strategy is transfer systems. Instead of operating in two stages through interlingual representation, transfer systems operate in three stages, involving underlying representations for both source and target languages.

- Analysis is the first stage. It converts source language text into abstract SL-oriented representations. This step involves the parser and the grammar of the source language. SL-Grammars parse and analyze the input to produce a source language interface structure.
- The second stage is Transfer. This stage converts SL-oriented representations into equivalent TL-oriented representations. Bilingual rules relate source structures to target structures.
- The third and final stage is called Synthesis (or generation). This stage generates the final TL text. A generator and TL-Grammar generate TL output from the TL interface structure.[26] There are many different approaches to sentence generation, including systemic grammar, unification frameworks, and classification[27].

illustration not visible in this excerpt

Figure 3: Building Blocks of a Transfer-based MT System[28]

The transfer approach involves a comparison between two languages. The transfer phase compares lexical units and syntactic structure exactly across the language gap and uses mapping rules to convert the source to the target representation. These rules, and any additional semantics or other information, are stored in dictionaries or knowledge bases.

Relationship between direct and indirect MT

Both, direct and indirect translations are essentially based on the specification of rules for morphology, syntax, lexical selection, semantic analysis, and generation.

Direct MT systems were mainly based on word-to-word and/or phrase-to-phrase translations. A simple word-to-word translation cannot resolve the ambiguities arising in MT.

Relationship between interlingual and Transfer-MT

The Interlingua is a language-neutral analysis of the text. The theoretical advantage of the interlingual approach is that one can add new languages at a relatively low cost by creating only the rules that map from the new language into the Interlingua and back again. In contrast, the transfer approach requires one to build mapping rules from the new language to and from each and every other language in the system.[29]

In order to assure good quality output interlingual systems need the addition of semantic and extra linguistic information, which can be very complex and time-consuming.

The transfer systems are less ambitious than interlingual systems because they accept the need for mapping rules between the most abstract representations of source and target text.

Transfer approach is more flexible and adaptable in meeting the needs of different levels or depths of syntactic and semantic analysis. The depth of analysis in transfer systems is not defined a priori. It can depend, for example, on the closeness of the languages involved: the closer the languages, the shallower the analysis[30].

Whereas the interlingual approach necessarily requires complete resolution of all ambiguities and anomalies of SL text so that translation is possible into any other language, in the transfer approach, only these ambiguities inherent in the language in question are tackled[31].

In practice the transfer systems are often chosen because they are simpler, and the interlingual systems are used because they require less work to add new languages.

The relationship between these three systems is shown in Figure 5.

illustration not visible in this excerpt

Figure 4: Direct and Indirect MT[32] Corpus-based Methods

Since 1989, the rule-based framework has been broken by a new corpus-based approach. Most research in machine translation is currently done using corpus-based approaches. The use of corpus data in languages other than English has become increasingly important in recent years, and as a result, it has given rise to a growing body of research and applications in multilingual corpus linguistics.

It has often been argued that translations of natural language texts are valid if and only if the source language text and the target language text have the same meaning[33]. Therefore, an MT system, which produces translations from an input text, needs to be aware of its meaning.

Traditionally, the focus in Corpus-Based Machine Translation (CBMT) has been on semantic[34] aspects. In this method, it is assumed that knowing the propositional structure of a text means to understand it. Under the same premise, research in MT has focused on semantic aspects assuming that texts have the same meaning if they are semantically equivalent.

Recent research of CBMT has different premises. CBMT-systems use a set of reference translations on which the translation of a new text is based. These systems assume that meaning equivalence holds for the reference example translations given to the system in a training phase. According to their intelligence, these systems try to figure out of what the meaning invariance consists in the reference text and learn an appropriate SL to TL mapping mechanism[35].

A corpus, in linguistics, is a large and structured set of texts, electronically stored and processed[36]. It is intended to support the study of linguistic phenomena. This data may be compiled on a principled or systematic basis[37].

- Monolingual corpus contains texts in a single language.
- Multilingual corpus contains text data in multiple languages. Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora.

Aligning translation corpora makes each translation unit of the source text to correspond to an equivalent unit of the target text. It not only covers shorter sequences like words, phrases, and sentences, but also larger sequences such as paragraphs and chapters.

Extraction of translation equivalents

The process of extracting equivalent units from translation corpora and their subsequent verification with monolingual corpora is graphically presented in figure 7. Extraction of translation equivalent units from translation corpora will enable MT designers and others to build on the resources since translation corpora will help them find suitable translation equivalents. Thus, translation corpora will enable them to:

- retrieve translation equivalent units including words, idioms, compounds, collocations, and phrases,
- learn how corpora help to produce translated texts that display “naturalness” of the target language,
- create new translation databases, which enable users to translate correctly into a foreign language of which they have only limited command, and
- generate new terminology databank, since a large proportion of terminological material in new texts is neither standardized nor recorded in “term banks.”

For finding equivalent units, various searching methods are used to trace comparable units of meaning in texts, which are often larger and complex than simple words. Implemented into a translation platform, these facilitate translations with more than customary translation memories[38].

illustration not visible in this excerpt

Figure 5: Verification of Translation Equivalents for Authentication[39]

Corpus-based approaches distinguish from statistical MT and example-based MT. Statistical Machine Translation

The essence of the method is the alignment of sentences, phrases, word groups and individual words in two languages and the calculation of the probabilities that any one word in a sentence of one language corresponds to two, one, or zero words in the translated sentence in the other language. Alignment is established by a technique widely used in speech recognition. The probabilities are estimated by matching bigrams (two consecutive words) in each SL sentence against bigrams in equivalent TL sentences[40]. An essential feature in statistical MT is the availability of suitable large bilingual corpus of reliable translations[41]. Example-based Machine Translation

The example-based (or memory-based) method was first proposed in the mid 1984 by Nagao, but not implemented until the end of the decade, and main development and research in this area began from about 1990 onwards. Nagao claimed that linguistic data are more durable than linguistic theories. EBMT is essentially translation by analogy. The focus in this method is to find or recall an analogous example, discover or remember how a particular SL expression or something similar has been translated before. The computational implementation of this idea became feasible with the development of the large databases with fast access, because the essence of this method relies on bilingual databases of example phrases derived from a large corpus of text and their translations.

illustration not visible in this excerpt

Figure 6: EBMT Architecture[42]

The EBMT benefits likewise from improved rapid access to large databanks of text corpora.

Corpora as Sources of Information

Research on example-based and statistics-based approaches has emphasized the importance of the text corpora in MT research. However, it had already become important for the rule-based systems to have access to reliable data. Gathering of large reusable corpora of machine-readable texts, dictionaries, and lexical databases gained importance in the field of computational linguistics.

Recent major government-supported initiatives are:

- Linguistic Data Consortium (LDC) in the United States. LDC is an open consortium of universities, companies, and government research laboratories, which creates, collects, and distributes speech and text databases, lexicons and other sources for research and development in computational linguistics[43].
- Expert Advisory Group on Language Engineering Standards (EAGLES) in European Community.
Within MT is it clear that corpus information is essential for number of purposes.
- When building sublanguage or domain-specific systems, it is necessary to have detailed knowledge of the vocabulary and grammatical features in the types of texts that the systems intended to translate.
- Lexical and knowledge databases are also of importance for interlingual systems with conceptual representations, mainly they sharing and reusability.
- Bilingual databases are required for example based-systems as well as for translator’s workstations.

Table 3 provides the summary of above described approaches to MT Systems[44].

illustration not visible in this excerpt

Table 3: Overview of MT Approaches[45] Hybrid Approaches

It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based machine translation systems have many human costs in formulating rules and introduce inconsistencies when the number of rules increases. Integration of these two approaches reduces the difficulties associated with both. Because of this reason, a number of hybrid systems were developed. “Linguistic engineers integrates the results of various theories and paradigms into an essentially hybrid approach such as the use of large-coverage symbolic parsers, integrated semantic knowledge that is partially symbolic and partially based on statistical models and, finally, the user’s interactive participation.”[46]

Since some specific problems were particularly suited to an example-based approach, in some systems there is an example-based component to deal with the kinds of problems that are difficult to capture in a rule-based approach. Other rule-based systems combine rule-based analysis and generation with example-based transfer. Another example of hybrid systems is multiple-engine systems. In this case, the source text is passed through s number of different MT systems, each using different techniques. One may be essentially lexicon based, another rule-based analysis, and generation, and the third can be example-based or based on purely statistical methods. In each case, built into the system will be a kind of scoring mechanism by which the engine is able to evaluate for its self its results of the output[47].

Today, there are several approaches to build hybrid systems, which utilize some of the features from each approach or even from other fields of information retrieval, such as latent semantic analysis, aligned bitexts, probabilistic parsers, vector space distance, etc. Most of these efforts are focused on improving specific tasks without modifying the overall approach.

The power of hybrid systems lies in the fact that different approaches can complement each other. However, it is extremely difficult to find the fine balance between different architectures.


[1] Poul Andersen, Translation Service & MLIS Programme, 1997.

[2] (August 2005).

[3] (March 2004)

[4]; NOTES: (1) Internet Top Ten Languages Usage Stats were updated on March 31, 2006. (2) Internet Penetration is the ratio between the sum of Internet users speaking a language and the total population estimate that speaks that specific language. (3) The most recent Internet usage information comes from data published by Nielsen//NetRatings, International Telecommunications Union, Computer Industry Almanac, and other reliable sources. (4) World population information comes from the world gazetteer web site.

[5] See Glossary.


[7] Vgl.


[9] Source: Common Sense Advisory, Inc.


[11] International Journal of Translation 13, 1-2 Jan-Dec 2001, pp.5-20. Special theme issue on machine translation, edited by Michael S. Blekhman.

[12] M. Sofer; 2004: The translator’s Handbook, Rockville Schreiber Publishing, Inc.; Page 88.


[14] See Glossary.

[15] E.F.K Koerner, R.E. Asher; 1995: Concise history of the language sciences: from the Sumerians to the cognitivists, Oxford: Pergamon Press; Pages 431-445.



[18] See Glossary.



[21] Adapted from W.J. Hutchins, H.L. Somers; 1992: An introduction to machine translation, London: Academic Press; Page 71-73.

[22] Adapted from W.J. Hutchins, H.L. Somers; 1992: An introduction to machine translation, London: Academic Press; Page 72.

[23] Adapted from W.J. Hutchins; 1993: Latest developments in machine translation technology: beginning a new era in MT research. In: The Fourth Machine Translation Summit: MT Summit IV. Proceedings: International cooperation for global communication, Kobe, Japan: Proceedings: International cooperation for global communication; Page 11-34.



[26] Adapted from D.J. Arnold, L. Balkan, S. Meijer, R. Lee Humphreys, L. Sadler; 1994: Machine Translation: an Introductory Guide, London: Blackwell Publishers; Page 68-69.

[27] Adapted from N. Nicolov, R. Mitkov; 1997: Recent Advances in Natural Language Processing II: Amsterdam/Philadelphia: John Benjamins Publishing Company; Page 221-240.


[29] Adapted from R. Cole, J. Mariani,, H. Uszkoreit, G. Battista, V.A. Zaenen, A. Zampolli; 1998: Survey of the State of the Art in Human Language Technology, Cambridge: Cambridge University Press; Page 68-69.

[30] Adapted from:

[31] From W. J Hutchins; 1986: Machine Translation: Past, Present, Future, Chichester: Ellis Horwood Limited Publishers; Page 56.

[32] Adapted from:

[33] Cf. M. Nagao; 1989: Machine Translation How Far Can It Go, Oxford: Oxford University Press.

[34] See Glossary.

[35] Adapted from:


[37] J. Lawler, H. A. Dry; 1998: Using Computers in Linguistics. A Practical Guide, London: Routledge, Page 256.

[38] From S. P. Botley, A. M. McEnery, A. Wilson; 2000: Multilingual Corpora In Teaching And Research, Amsterdam: Rodopi Bv Editions.


[40] W.J. Hutchins; 1993: Latest developments in machine translation technology: beginning a new era in MT research. In: The Fourth Machine Translation Summit: MT Summit IV. Proceedings: Kobe, Japan, International cooperation for global communication.

[41] R. Mitkov; 2003: The Oxford Handbook of Computational Linguistics, New York: Oxford University Press Inc, Page 516.


[43] J. Lawler, H. A. Dry; 1998: Using Computers in Linguistics. A Practical Guide, London: Routledge, Page 254.

[44] Computational Linguistics and Chinese Language Processing, vol.1, no. ÓComputational Linguistics Society of R.O.C.; August 1996, pp.159-182.

[45] Vgl. Schwarze: Electronic Commerce, Berlin 2002, S. 37 ff.

[46] Claude Coulombe: Hybrid Approaches in Machine Translation: From Craft to Linguistic Engineering, Machine Sapiens Inc.

[47] R. Mitkov; 2003: The Oxford Handbook of Computational Linguistics, New York: Oxford University Press Inc, Page 518.


ISBN (eBook)
877 KB
Institution / Hochschule
Hochschule Ansbach - Hochschule für angewandte Wissenschaften Fachhochschule Ansbach
Overview Translation Tools Benefits Translation Memory Management Software International Company



Titel: Overview of Translation Tools - Benefits of Translation Memory Management Software for an International Company