Lade Inhalt...

Influence of spellchecking on translation

Corpora, Translation, Technology

von Paula W. (Autor)

Essay 2013 9 Seiten

Dolmetschen / Übersetzen


Table of Contents

1. Introduction

2. History and evolution of spellchecking:

3. Relevance and complexity of spellchecking due to various processes:

4. Improvements of Spellchecking by projects research

5. Conclusion

6. Final Remarks:

7. Learned contents:

1. Introduction

A computer program, that identifies possible misspellings in a block of text by comparing the text until a database of accepted spellings, is called a spell-check or spell-checking. Spellcheck is also mostly defined as a computer program used to process a document in order to check the spelling. The purposes of spellchecking are very simple: they are very useful to control the accuracy of the spelling of an electronic document and to correct misspellings and errors. The Oxford dictionary (2013) states:“the text show signs of having been spellchecked but not proof-read”.This last explanation reveals the difficulties implicated by referring to spellchecking.

In order to understand the intricacy of spell-checking, we have to explore the spellchecking context. After presenting the history of spellchecking, we will focus on the analysis of the varied spellchecking methods to provide a large perspective of this specific domain. Finally, we will reflect about the different systems developed by searchers and discuss the influence of spellchecking on translation.

2. History and evolution of spellchecking:

Spellchecking has a long history. It began in the late fifties (Blair, 1960). At that time, it was mostly valued by secretaries and by a professional public. Nowadays, everyone can use a spellchecker. This evolution regarding the users points out the relevance of this activity. It also confirms the crucial function of right spelling. While errors constitutes a barrier on reading, they distract there readers. On the contrary, correct spelling is a powerful method to increase the reader’s attention. Therefore, we need a reliable support, offered by spellcheckers to simplify the reading. By making improvements in a text, we also facilitate the understanding of the substance. The importance of spellchecking can be pointed out by examining the peculiarities and different methods involved with this process.

3. Relevance and complexity of spellchecking due to various processes:

There are different classical errors in spellchecking made by the users:”insertion, transposition, substitution and omission”. These mistakes signalize the necessity of spellchecking to replace the faults by accepted words. There is also a dilemma if we try to interpret the term “spellchecking”. Two connotations require two methods. So, to avoid confusions, we need to distinguish between two different concepts. Spellchecking can be dictionary free or demand the support of a“dictionary” (meaning, a list of correct spelling). According to these explanations, the classical procedure, which is the most popular, requires the aid of a dictionary. In fact, any word absent of this list would not be detected at all by the spellchecker.

However, the first process described by Riseman and Hanson (1974) indirectly uses a dictionary. This case is interesting because it exposes the main goals of spellchecking. Thanks to a dictionary, the text is divided into a trigram (a three letter sequence). Afterwards, the program notices their frequency before finally detecting errors by finding rare trigrams. So, this method can be very useful to detect mistyping, for example, on the computer.

The second method doesn’t involve a dictionary at all (Morris and Cherry 1975), but refers to a similar scheme. Instead of noting the frequency of a trigram, it would observe the peculiarity of a trigram. In fact, it is exactly the opposite from the first one. This second way of analyzing spellchecking is very effective for typing errors and is multilingual. Being multilingual, this second process illustrates the value of spellchecking for translation. Even though, these systems are successful, spelling checkers and correctors are confronted to huge problems. They have to deal with storage issues. For Peterson (1986), the real difficulty is not the storage capacity but the generation of “real-word errors” caused by to short words.

With the purpose to fight space problems, “some economize on storage place by holding only the stems of words”; (Mcllroy, 1982) creating storage rules with „affixstripping “. Moreover, this technique presents many advantages. It allows keeping free space (by striping pre & suffix) and works by derivation. For example, by only saving “compute”,every similar word could be derivate. So it’s very convenient because it requires less memory. At the same time, adding suffixes increases the risk of misspelling errors. An alternative to this method was proposed by Mcllroy (1982) and Nix (1981) using a bit map by„hashing words”.Spellchecking raises many questions. On one hand, there is not enough storage space to keep all words. On the other hand, we can't completely rely on them. The issues with spellchecking points out the complexity of the exercise. Now, that we observed several methods, we will investigate the different technologies and linguistic knowledge involved with their application.

In order to deal with spellchecking, we need to respect several rules, to understand how they work and to get a closer idea of the spellchecking function. We previously mentioned the trigrams peculiarity, but we did not discuss this method. Trigrams have specific functions, since they only focus on trigrams and not on spelling. Besides, they are also unable to recognize foreign words even if they are adapted to a given language. This system has his failures because it cannot be applied for translation. So it could not been used for a foreign support. By studying the two modes of spellchecking, we already exposed the defects in association with spellchecking. Starting from now, we will try to bring to light their potential by presenting three additional methods.

“A confusion set“(e.g. Golding, 1995; Golding and Roth, 1999) can also have an impact on spellchecking. This scheme is very helpful to find an amount of a few words which look or sound alike, for example “wear/where”.The program decides on the basis of a set of rules, which word could be a better match. This system can be very useful to hold and clarify the right word according to syntax and semantic. If I write”I where a dress”, the program would notice the confusion. Based on the syntax, it will identify that the correct word in this case is “wear”.

Another modus called “string-matching algorithm”is exercised to minimize and to generate a list of errors which is very similar to the error. It can resemble to the misspelled word in length, vowel or even by the initial letter. This system is precise and very appropriate to recognize the correct word. So the bigger the similarity with the error, the better is the chance to have accordance with the suggested word. In case of an omission“proessor”, the program will suggest “professor”. In fact, this method can be ever more reliable then confusion set because of the matching probability. Thissystem relies on “edit-distance”(Levenshtein, 1966; Wagner and Fischer, 1974) which helps to find the correct word by counting the number or errors (professor: edit-distance 1).

Finally, the “hasting method” can be described to illustrate the variety of the spellchecking complex. This process has many advantages in terms of storage space. With this system, letters are converted into numbers which leads to save more memory on the computer. By an arithmetic process, each letter gains a certain amount of points according to its position in the alphabet (5 points for e). However like every system, it has negative effects. It is considered as too rudimentary and demands a long process in terms of time. This system is also criticized because it accepts non existing combinations instead of flagging them as errors.

After presenting different methods to explain the complexity of spellchecking, we observed that failures in the system generate non-existent words. In order to improve the function of spellchecking, the research tries, to detect real-word errors through diverse research projects. This part is extremely significant because it points out that the research attempt to find solutions to unsolved problems to improve the reliability on spellchecking.

4. Improvements of Spellchecking by projects research.

The first system calledCritique,also known asEpistle,was developed by IBM (Heidorn et al. 1982). This project can be valued to control the spelling, the grammar and the style of business correspondence by referring to a set of complicated grammar rules. It provides a very specific grammatical analysis of a sentence. If the sentence is grammatical incorrect, it tries again by moderating, “relaxing” some rules. This system can be very helpful because by repeated relaxing, it finds out a successful parse. Also which is very essential for the goal of the project, it recognizes real-word errors by detecting grammatical errors. So this system finds a fault, for example, the omission of a verb in a sentence and suggests a correction.

The second system was created at Lancaster University and modified to improve the way of spotting real-word errors (Marshall 1983, Garside et al. 1987). This system also leans on the syntax of sentences and has a specific function. After referring to a dictionary, “fly”would be for example tagged as a verb or a noun. Consequently, this system works with the basis of probability that words must be rather a noun than a verb depending on the syntax. As a matter of fact, this program will alarm the users when it finds unlikely sentences or in a grammatical incorrect order. So in fact, the system uses the co-occurrence of tags to detect real-word errors. After analyzing the functioning of the systems, we need to concentrate on the term “co-occurrence”.

This illustrates one fundamental aspect of spellchecking. By “co-occurrence”, we mean the interdependency of two terms. This means that the program flags linguistic elements that never occur together by restricting this occurrence. So, if we create an unknown occurrence, there is probably a real-word error. Both systems have some success in spotting real word errors. Even though, they generate false alarms and syntactic normal real-words cannot be found. This second issue is even more problematical for the users, because if the sentence is grammatical correct but a word incorrect, the system won’t flag the error. “We had afiteyesterday” instead of “fight” won’t be detected. This method emphasize that even if every system tries to correct real-words which were wrong misspelled, they can’t always succeed while fighting against the presence of non-existent words. As a matter of fact, spellcheckers are responsible for the acceptation of non-existing words, which also have an impact on the text substance.



ISBN (eBook)
ISBN (Buch)
536 KB
Institution / Hochschule
Universität zu Köln – Romanisches Seminar


  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.

    Paula W. (Autor)


Titel: Influence of spellchecking on translation