5. Discussion / Conclusion
Modern English is known to be a language made up of mainly two different roots: the Germanic language that was spoken by many inhabitants of the British Isles before the Norman Conquest in 1066, and the Romanic language that the Norman invaders brought with them. These two origins, however, are not distributed equally on the English vocabulary: very generally speaking, Germanic words more often denote basic concepts, while Romanic words more often denote abstract concepts. This is illustrated by the fact that the General Service List (GSL), listing the 2000 most frequent (and therefore most basic) English words, is made up by 50.98 percent of words of Germanic origin, whereas in the Computer Dictionary (CD), which consists of 80 096 words, only 26.28 percent of the entries have Germanic roots, but a majority of 58.52 percent have Latin or Romanic ones (Scheler 1978: 72).
Therefore it seems quite obvious that swear-words in particular should, to a higher percentage, have Germanic roots, because the concepts they denote are mostly ‘basic’, the domain in which Germanic words are represented to a greater extent than Latin or Romanic words.
Moreover, bearing in mind that words of Latin or Romanic origin are more likely to denote abstract concepts and that they often seem to have a certain taste of ‘culture’ and ‘good education’, one could suppose that there is a higher percentage of Latin or Romanic words among euphemisms.
These considerations led to the following hypothesis:
The share of words of Germanic origin is higher among swear-words than among euphemisms, while the share of words of Latin/Romanic origin is higher among euphemisms than among swear-words. Compared with innocuous words of the normal vocabulary, there are more words of Germanic origin among swear-words, and more words of Latin/Romanic origin among euphemisms.
It was necessary to create two corpora in order to analyze the origins of euphemisms and swear-words in English. The corpora should consist of about 100 tokens each, as to ensure a certain representativeness and not exceed the frame of this paper in the same time.
The Dictionary of Euphemisms (Holder 1995) was chosen as the basis for the corpus of euphemisms. As the dictionary consists of about 400 pages, the first entry of every forth page was included in the corpus. Unfortunately, there was no dictionary of swear-words to be found, so I gathered words that appeared in Bad Language (Andersson / Trudgill 1990), Swearing (Hughes 1991) and Cursing in America (Jay 1992) and thus obtained a swear-word corpus of only 55 tokens.
It may be astonishing that in 3 books dealing with swear-words, one can not find more than 55 totally different expressions, while one the other hand, there are obviously enough euphemisms to fill a dictionary of 400 pages. Is the swear-word vocabulary of English really so limited? Indeed, Andersson and Trudgill state that English, as well as other Germanic languages, uses “only a few taboo concepts or words”. These, however, “can be combined with other words and used in fixed expressions to make up a fairly elaborate system of swearing” (Andersson/Trudgill 1990: 58). Moreover, once having become a swear-word, a word tends to acquire “greater grammatical flexibility” (Hughes 1991: 30), its original semantics tend to take a back seat and the focus is put on the emotive force of the word, so that e.g. one can talk of a fucking motorcycle, without actually being able to imagine that kind of thing.
After creating the corpora, I looked up the etymology of each token in the Shorter Oxford English Dictionary. The following four etymological categories were established:
1. Germanic: Old Norse, Anglo-Saxon, Old Saxon, Old Frisian, Old High German, Middle Low German, Modern Low German, Middle Dutch, Gothic
2. Latin/Roman: Latin, Anglo-Latin, Christian Latin, Old French, French, Spanish, Italian
3. Other: Greek, derivations from forenames, the first letter of a word
4. Unknown: Origin unknown or unclear.
Where, as in the case of shop and souse, the dictionary did not decide between Germanic and Latin/Roman origin, I added half a token to both etymological groups, in spite of the one-word-expression (this is the reason why the figures in the results are not always integers). If a euphemistic or swear-word expression was made up of more than one word, I proceeded as follows in deciding on the origin of the expression:
(a) The origin of pronouns, prepositions and determiners which were part of the expression was not taken into consideration. This decision was led by the fact that there are only a limited number of these words in English; and that these words are fixed, so that speakers usually do not have the choice between two words having the same meaning but a different origin. Thus, considering these words as well would most probably have falsified the results.
(b) If all other components belonged to the same etymological group (as in son of a bitch; son → Germanic, bitch → Germanic), the expression was regarded as one single token for this group.
(c) If the components belonged to different etymological groups, (as in act like a husband; act → Latin/Roman, husband → Germanic), the expression was split up and the components added as half a token to their respective groups.
 The CD contains all entries of the CED (Chronological Dictionary), which lists chronologically all 80096 main entries of the SOED (Shorter Oxford English Dictionary).
 There are, for example, frames like “Who gives a X”/ “I don’t give a X”, where X can be fuck, damn, shit…
 Like Dolly; Richard; suck my dick.
 For example D: every taboo-word beginning with the letter D; J: a marijuana cigarette (from Joint).
 Pine overcoat was a difficult example: pine is of Latin/Roman origin; the compound overcoat is made up of over (Germanic origin) and coat (French origin). But as I decided not to count prepositions (over), the whole expression was regarded as of Latin/Roman origin.