Can Computers Translate?

We review four Japanese-to-English translation programs

(This article is being published here with the permission of Computing Japan magazine).

Is using translation software a viable alternative to hiring a human translator? This article serves as an introduction to machine translation and compares the output of four commercial Windows-based Japanese-to-English translation products. In a future issue, we will present a follow-up article that looks at English-to-Japanese translation programs.

by Steven Myers

Professional Japanese-to-English translators have grown used to hearing from non-translators that their work will one day be done entirely by computers. Since the early days of artificial intelligence (AI) research in the 1950s, large groups of scientists and corporate researchers have been working toward the automation of language translation. It is only since about 1990, however, that commercial translation products have achieved any degree of market acceptance in Japan.

There seems to be much confusion concerning the level of capability possessed by these products. Businesses of every size are all too eager for a magical "black box" that will instantly convert their Japanese to English, and vice-versa -- and without the per-page costs of human translation. In reality, though, do any of the current crop of commercial Japanese-to-English (J/E) translation programs come close to emulating what even a mediocre human translator can do?

Machine translation (MT) products are now receiving a considerable amount of hype. At one end of the scale are Japanese mass media reports on one product after another that use the latest magical technique to produce near-perfect English translations. Unfortunately, these reports are generally based entirely on the manufacturers' promotional press releases, and make it into print without any attempt at verification or review.

At the other end of the spectrum are the detractors of machine translation, those who steadfastly assert that all translation programs are useless, and the whole effort is a meaningless waste of time. Language translation is inherently too complex, they charge, and too dependent on human culture to ever be automated. This group, not surprisingly, includes a lot of working translators.

In the middle, however, is a much larger group of people who hold that machine translation technology, while not perfect, has progressed rapidly. Some of today's systems, they believe, can render a Japanese source document into a very rough, but understandable, English translation -- one that can then be "cleaned up" (heavily edited) by a native speaker.

But just how realistic are such expectations? In order to adequately answer the question of whether machine translation of documents is practical, it is necessary not only to rigorously test the current commercial systems, but also to look a little bit more closely at the translation process in general.

No magic box: the art of translation

In his classic collection of essays on software engineering entitled The Mythical Man Month, Fred Brooks asserts that designing and building software is an inherently complex task, and that no "silver bullet" technique (such as object-oriented methods, CASE tools, or visual programming) will ever be able to significantly reduce its complexity. He implores corporate managers to stop waiting for magical solutions that will eliminate their dependence on eccentric programmers and, instead, concentrate on making better use of the tools that exist.

Like software design, language translation is an art rather than a science, one that is inherently complex and widely misunderstood. An orchestral musician brings to life the notes on a printed page by calling on a finely honed musical vocabulary and a strong sense of phrasing and structure. So, too, the translator, who must construct his own precise interpretation of a document via the "score" of the source language. Like a skilled programmer or musician, a translator relies on an intuitive grasp of the interrelationships among complex components, a sense of how they fit together to form the finished product, and a knowledge of how to build new structures on top of existing ones.

In short, translation is a highly creative process. For each semantic idea in one language, there are several possibilities for expressing the same idea in another language. It is the job of the translator to find the best match possible, without sacrificing the overall flow of the finished document.

For translators of languages as different as Japanese and English, there is a special degree of tension between precise depiction of the Japanese concepts on one hand, and rendering the whole into natural and cohesive English on the other. The translator faces tradeoffs and "'design decisions" at every turn. The complexity of J/E translation arises not only from differences in linguistic structure, but all too often from extreme variations in the way each culture views the world.

In Japanese, for example, politeness dictates that certain feelings/ideas be expressed explicitly at specific times using set phrases. In English, those same feelings might be implicit, but not stated directly, or stated in different ways depending on the situational context. To cite a simple example, most Japanese will always state their gratitude for forthcoming favors with yoroshiku onegaishimasu or osewa ni narimasu. These set expressions are used in a wide variety of situations in Japanese, but the proper English translation will depend on the context (or might even be ignored altogether). And something that is hazukashii for a majority of Japanese might not be the least bit embarrassing or shameful to most Americans -- and vice-versa. A word-for-word translation in such cases would be nonsensical.

Skillful translation also requires careful attention to tone and politeness levels. The expression of certain sentiments that are common in one culture may be harsh or offensive in another. The Japanese, especially, often say things in a vague and indirect manner, or express doubt or reservation before making a statement (even if no such doubt actually exists in the mind of the speaker). An intimate knowledge of both languages and cultures involved is required for effective translation.

Given the high degree of creativity and complexity involved, it is naive to view the translation process as a "black box" mechanism whereby Japanese goes in one end and English automatically comes out of the other. When applied to the development of machine translation, there is a widespread misconception that, although initial output might be of poor quality, continual refinement of the software will eventually arrive at a well-honed translation system. Unfortunately, there is no "black box" system that can remove all the difficulties inherent in language translation.

State of the art: four Windows-based translation packages

So, where does this leave modern language translation software? Even if machine translation systems can never duplicate human translations, can't they at least generate output that is understandable and useful for in-house memos and the like?

The answer is a definite "maybe" -- assuming that users proceed with caution. The danger with machine translation is not that the output is often incomprehensible, but rather that it may be ambiguous or even erroneous, leading the reader to misunderstand the intent of the original writer.

That said, though, when used correctly, and for appropriate purposes, MT systems can be extremely useful. The key to making effective use of translation software is to thoroughly understand the capabilities and limitations, so as not to be misled by the often-unrealistic claims made by the manufacturers.

In the remainder of this article, we present a comparison of four popular commercial Japanese-to-English translation software packages currently on the market. Each of these has been tested extensively by the Computing Japan editors. We begin by providing a brief description of each product, and conclude by comparing their output for some sample sentences.

PC-Transer/je

Nova Corporation's PC-Transer/je was one of the first commercial J/E translation products, and remains one of the best known and most widely used. The software, which utilizes a 120,000-term general dictionary, can be used to produce translations from within MS-Word or Ichitaro, in multiple open windows.

The package includes a function that allows users to construct templates of frequently used phrases and formats for dates, addresses, etc. The program allows the user to make multiple new entries to the dictionary simply by including the information in an appropriate text file. Nova also offers numerous technical dictionaries that can be added, for ¥30,000 to ¥80,000 each. The base PC-Transer/je package sells for ¥198,000.

JLondon/JE

JLondon/JE, developed by Osaka-based Kodensha Corporation, features a basic dictionary of 90,000 terms. Kodensha also makes 34 specialized dictionaries, including one of the most extensive (204,400-term) medical/pharmaceutical dictionaries available. Like PC-Transer, JLondon supports a template function, but it cannot be used from within a word-processing program. JLondon/JE retails for ¥98,000.

ASTRANSAC

Toshiba bills ASTRANSAC as a "translation accelerator," which is probably a more accurate description for all of these products than the grandiose term "machine translation software." ASTRANSAC requires more memory than the other systems tested. We managed to squeeze the main test sentences through on a 12MB machine, but ran into numerous problems on other sentences until we upgraded to a computer with more RAM. For users with at least 16MB, though, ASTRANSAC produces probably the best output of the products featured here.

Like PC-Transer/je, ASTRANSAC can be run from within MS-Word, but Ichitaro support is not included. The J/E product sells for ¥63,000, and there are several specialized dictionaries priced at ¥20,000 each.

J-E Bank for Windows

J-E Bank is considerably smaller and less expensive than the other products reviewed here, but it nonetheless produces output that can be useful for a variety of applications where a large general dictionary is not required. J-E Bank for Windows was developed by Kamejima Artificial Intelligence Laboratory, and sells for ¥49,000. Kamejima claims that their innovative parsing and analysis techniques allow J-E Bank to achieve greater accuracy and speed than other systems of similar size.

Getting ready to translate

Before discussing the output of these four machine translation programs, let's clear up some all-too-common misconceptions about the capabilities of machine translation. First, users who expect to be able to feed a Japanese document in its original form to any of these systems and get back English that can be easily cleaned up for use will be severely disappointed. It is the exception rather than the rule that an original Japanese document can be machine-translated into anything even remotely comprehensible -- and those bits that are comprehensible are often ambiguous or totally incorrect. It is almost always necessary to first do a considerable amount of pre-editing on the Japanese original.

Second, even after the original Japanese sentences have been shortened, simplified, and clarified, users must still make sure that they are using an appropriate dictionary for the subject matter of the text. A computer software manual, for example, will require either a special technical dictionary, or a user-compiled dictionary in which entries have been registered for the katakana renderings of terms such as mouse, window, and font. Machine translation systems are currently capable of working only within the narrowest of parameters, and measures must be taken at every step to reduce ambiguity.

First-time users of J/E translation software are usually surprised at the amount of pre-editing required on the original Japanese text. The point of the pre-editing process is to eliminate ambiguity in the sentences, and to make the sentence structure more closely match its English equivalent. (Cheaper than hiring a translator, perhaps, but still a time- and labor-intensive task.)

The Nova PC-Transer user's manual includes an entire chapter on pre-editing, with numerous examples of acceptable and unacceptable sentences. Here are just a few of the main guidelines given by Nova for pre-editing:

* Break up long sentences into several shorter ones.

* Leave out words and phrases whose only function is to "soften" the impact of the sentence (make it less assertive). For example, a sentence such as Kare no setsumei wa wakarinikui mono ga aru would be changed to the more blunt Kare no setsumei wa wakarinikui.

* Avoid vague and ambiguous expressions; explicitly include information normally implied from the context. In normal Japanese, for example, the phrase Tokyo no hito might mean a person born in Tokyo, a person who lives in Tokyo, or a person who came from Tokyo.

* Explicitly state cause-and-effect relationships. In Japanese, it would be common to say something like Keiki ga waruku, gakusei ni totte shuushoku wa konnan da, where the relationship between poor economic conditions and a difficult job search is implied. An English speaker, however, would use words such as "because" or "since" to explicitly state the reason for something, so this sentence would be better changed to something like Keiki ga warui no de, gakusei ni totte shuushoku wa konnan da.

The list goes on, with numerous rules for questions and imperative sentences, but I think you get the idea. A substantial amount of work often must go in to the pre-editing process before a sentence is a good candidate for machine translation.

Comparing the systems

In testing the four translation software packages, we used a wide range of Japanese source texts, both with and without pre-editing and dictionary building. The "Sample translation comparisons" box shows, following the original Japanese source text for four representative test sentences, the output produced by each system, plus a set of translations provided by a professional translator (Japanese Language Services) for comparison. All of the test translations were done on a 486DX computer with 12MB of RAM, running Windows 95J.

The first sentence (see the "Sample translation comparisons" box) uses a Japanese giongo expression �킢�킢 (waiwai), which normally would be excluded during the pre-editing process. We left it in, however, to see how the systems would handle it. The only program that really stumbled was PC-Transer, which mistook the term for a celebration (iwai) of wa (whatever that is). It is also interesting to note how each system handled the term ���΂����� (obaasan). ASTRANSAC came closest to getting it right with "old woman," while J-E Bank and JLondon went for the more literal "grandmother." Either choice could be correct, depending on the context. PC-Transer again had problems, appearing to extract the san suffix in order to form "Mr. Baa."

With sentence number two, we wanted to see how each product would handle a typical "evening news" type of sentence. ASTRANSAC turned in the best rendering, with PC-Transer also doing reasonably well. JLondon choked on the original sentence (with an "out-of-memory" error), so we gave it an abbreviated version; it still got stuck on "Haneda," translating it literally as "a feather rice paddy." Obviously, this sentence was also done "as is"; a pre-edited version would make it clear that Haneda is the name of an airport. Interestingly, the only program to successfully translate the term ���ʋ@ (tokubetsuki) was J-E Bank.

The third sentence is quite appropriate for machine translation. It makes explicitly clear what the subject of the sentence is (this was added in the pre-editing process; it would be more natural for a Japanese speaker to leave the subject out in actual usage) and what happens to that subject. Nevertheless, all four of the programs made the mistake of saying that "(the product) did not hear a name" instead of "(the product's) name was not heard." It is easy to see how, in other contexts, this kind of mistake could lead to sentences that make perfect sense yet distort the original meaning beyond recognition.

Finally, we gave the programs a simple sentence that we thought should be relatively easy for a machine translation program. (Perhaps we were wrong, though; the human translator pointed out that four different translations are possible depending on the context.) Of the four products, only JLondon got the basic meaning of sentence four right. PC-Transer was close, but the man does not sit by the chair, he sits in it.

ASTRANSAC introduces ambiguity into the translated sentence by saying that the man removes the coat, rather than that he takes it off. The resulting sentence could be taken to mean, for example, that the man removed the coat from the chair rather than from his body. While this is a trivial example, the point is that this kind of confusion with detail can be dangerous; these kinds of errors can be hard to detect, even for persons with a strong command of Japanese. J-E Bank, meanwhile, makes a mess of the entire sentence, saying that the man took off everything (instead of just his jacket) and "sat for a chair for a jacket." Hmm... whatever.

Know your needs

If you've been wondering about the current state of commercial translation software technology, the results presented in this article pretty much speak for themselves. It is simply unrealistic at this point to expect to get output from translation programs that is anywhere close to being accurate.

Does this mean that translation programs are useless? Absolutely not. For professional translators and corporations that are heavily involved in translation, the products introduced here are excellent tools that greatly reduce the time required for translation.

The important point to remember is that, for now and for the foreseeable future, these products are strictly tools and not black boxes. Computers can aid the process by taking over the parts of translation that are tedious and repetitious, leaving the artistic and creative aspects intact. They can work wonders in the hands of an experienced translator (who will have built up his own set of dictionaries and templates to facilitate reuse), but they are highly unlikely to ever take the place of that translator.

What does the future hold for machine translation systems? MT has become an extremely active research area for computer scientists and computational linguists, and the initial results of experiments using statistical techniques such as Hidden Markov Models (see the "Structural overview" sidebar) for ambiguity resolution look highly promising. Many scientists are starting to rethink the traditional approaches for language modeling, and we are likely to see several new ideas for MT systems introduced in the next couple of years.

For now, however, the bottom line is this: If you have one or more skilled translators in your organization, these packages are worth a look and could prove helpful. If, on the other hand, you just need a "quick and dirty" job that will give you the essence of what's being said, you'll probably just be wasting your time and money with machine translation. To paraphrase the output of today's typical MT programs, machine translation just "won't be the suitable answer for a problem for a time which is the present" -- if you know what we mean.ç

Prior joining the Computing Japan staff in December 1994, Steve Myers worked as a professional Japanese-to-English translator. He has won awards in several major competitions, including 2nd runner-up in the 1993 Babel International Translation Contest (co-sponsored by Mangajin) and 1st runner-up in the 1994 and 1995 Aruku Translation Competitions.

The human translations used for the "sample translation comparison" sidebar of this article were provided by Japanese Language Services, a Massachusetts-based firm specializing in translation, software localization, and Japanese Web site development; http://www.Japanese.com.


Sample translation comparisons

Original Japanese text:

1. �q���������킢�킢�����ł��邳�������̂ŁA���΂�����͋x�މɂ��Ȃ������B

2. ���{������b�́A�ʏ����ɂ‚��ăA�����J�̃N�����g���哝�̂Ƙb ���������߁A�����̌ߌ�H�c����ʋ@�Ŕ��—\��ł��B

3. ���̐��i�́A���N���s�������A���N�͖��O���������Ȃ��B

4. �ނ͏㒅��E���ŁA�֎q�ɍ������B

PC-Transer/je translation:

1. Because I celebrated ��, and children made noise and was persistent, as for Mr. �΁[, there was not the time too when stopped.

2. Prime Minister Hashimoto is the plan that Haneda is left with a machine especially at evening of tomorrow to talk with American Clinton president about commerce problem.

3. This product went around last year, but doesn't hear even a name this year.

4. He took off a coat, and sat down by a chair.

JLondon translation:

1. There was not a grandmother in time that he rests, too, because the children made noise in �킢�킢 and were pestiferous.

2. (The program choked on the original sentence, reporting an "out of memory" error; it eventually translated an abbreviated version, Souri-daijin wa, tsuushou mondai ni tsuite, America no daitouryou to hanashiau tame, Haneda o shuppatsu suru yotei desu as:) The prime minister plans to leave a feather rice paddy to discuss with the president in America about the commerce problem.

3. This product was popular last year but this year, it doesn't hear even a name.

4. He took off a coat and sat on the chair

ASTRANSAC translation:

1. Children clamor noisily, and since it was noisy, the old woman did not have the spare time from which it is absent, either.

2. The Hashimoto Prime Minister is going to leave �H�c on tomorrow afternoon by the opportunity specially, in order to discuss a commerce problem with the American Clinton President.

3. Although this product was in fashion last year, it does not hear even a name this year.

4. He removed the coat and sat on the chair.

J-E Bank translation:

1. Child pass as it is noisy as making a noise noisily, a grandmother had no time to rest.

2. The Hashimoto Prime Minister intends to �� with a special airplane for Hata tomorrow in the afternoon, in order to discuss the American Clinton President about a commerce question.

3. Though this product came into fashion last year, an even name doesn't hear it this year.

4. He undressed and sat for a chair for a jacket.

Human translation (provided by Japan Language Services):

1. The children were making such a racket that Grandmother didn't have any time to rest.

2. Prime Minister Hashimoto is scheduled to depart from Haneda on an official jet tomorrow afternoon to discuss trade issues with US President Clinton.

3. This product was very popular last year, but this year you don't even hear the name.

4. He took off (his/the) jacket and sat in (a/the) chair.


Using software to translate technical manuals

During testing of these four translation programs, we tried taking paragraphs from documents and manuals in various fields to see how each product performed without any special dictionary entries. Shown here are translations from a software manual. In considering the output, it is important to remember that many of the hiragana/katakana renderings, as well as mistakes such as "win dough" instead of "window," can be solved by registering these terms in the user dictionary.

From Apple Computer's Macintosh Japanese Input Guide:

�ϊ�����޳�̕\���t�H���g�̑I��

�ϊ�����޳�̕\���ɗ��p����̫�Ă�ύX���邱�Ƃ��ł��܂��B�ꕔ�̋L���́A�����̫�Ă����ł����\������Ȃ����̂�����܂��B̫�Ă�ς��Ċm�F���Ă��������B

�������A���ނ֓��͂��������́A���p���Ă����ij���Őݒ肳��Ă���� ��Ăŕ\������܂��B�ϊ�����޳�ƁA��ij����̫�Ă̐ݒ肪�قȂ�ƁA�ʂ̋L�����\������邱�Ƃ� ��܂��B

Apple's translation:

Select a font for the input window

You can select the font to be used for the input window display. Be aware that some signs and symbols may not be available in certain fonts. Try other fonts if this happens.

The font used for placing text in the document is usually determined by the application you are using. If the application font differs from the font used in the input window display, different signs and symbols may appear in your document than were in the input window.

The PC-Transer translation:

A selection of indication font of conversion

I can change �� utilized in indication of conversion. Person indicated only with thing of specification has a part sign. I change ��, and confirm it.

But I utilize it, and the character which input it into documents is established, and it is indicated.

The JLondon translation:

The choice of the display font of the change win dough.

The font to use for the display of the change win dough can be changed. The partial symbol includes the one which is displayed only with the specific font, only. Change and confirm a font.

But, the letter to have inputted to the paper is displayed with the font which is set by the software to be using. When the setting of the font of the change win dough and the software is different, another symbol is sometimes displayed.

The J-E Bank translation:

Selection of a display �t�H���g of a conversion wind cormorant

I can change �t�H���g that I make use of for display of a conversion wind cormorant. Or a partial symbol is only a specific �t�H���g there is an object not displayed. Please confirm it as changing �t�H���g.

However, the character that I input to documents, is displayed with �t�H���g with a software that it makes use of set up. With a conversion wind cormorant, as a set up of �t�H���g of a software differs, an other symbol is displayed.

The ASTRANSAC translation:

Selection of display �t�H���g of a conversion window

�t�H���g used for a display of a conversion window can be changed. A part of sign has what is displayed only by specific �t�H���g. Please change and check �t�H���g.

However, the character inputted into documents is displayed by �t�H���g currently set up by the software used. Another sign may be displayed when a setup of �t�H���g of software differs from a conversion window.


Contact information and system requirements

All of these packages will run on Windows 3.1J, but the memory requirements given here assume that Windows 95J is used. Note that all packages come with only Japanese-language manuals and help files.

PC-Transer/je

From Nova Corporation, phone 03-3351-3356, fax 03-3351-5766; requires 8MB of RAM (12MB recommended) and 30MB of disk space (plus 8MB for each additional technical dictionary); ¥198,000.

JLondon/JE

From Kodensha Corporation, phone 06-628-8880, fax 06-629-3841; requires 12MB of RAM (16MB recommended) and 46MB of disk space (base system only); ¥98,000.

J-E Bank for Windows

From Kamejima Artificial Intelligence Laboratory, phone 03-3798-4838, fax 03-3798-4839; requires 4MB of RAM (8MB recommended) and 7MB of disk space; ¥49,000.

ASTRANSAC

From Toshiba Corporation, Software Products Division, phone 0423-40-6244, fax 0423-40-6010; requires 16MB of RAM and 20MB of disk space (base system only); ¥63,000.


Copyright Computing Japan Magazine


(C) Japanese Language Services Inc. All rights reserved.

[ Go to previous page ]