|
Machine Translation Project
Currently available commercial machine translation (MT) systems do not
meet the requirements of a group of potential MT users whose MT needs
are not fixed. The military is a typical example, but various types
of civilian users also fall into this group of "ad-hoc MT users"
(international relief work, medical work, development work, financial
sector, economic advisory work, and so on). The requirements of this
group of potential MT users can be characterized as follows in
operational terms:
-
Much of the translation needed by ad-hoc MT users is domain-specific,
e.g., battle scenario message traffic, training manuals, medical
diagnosis routines, intelligence reports and briefing slides,
domain-specific newswire, and so on.
-
Often, languages for which ad-hoc MT users need high-quality
translation (such as Korean, Arabic, Serbo-Croatian, Ukranian, Somali,
Haitian Creole, or Indonesian) are not viable languages for
large-scale commercial development of MT.
-
Moreover, specific MT needs for ad-hoc MT users specific with
respect to domain and/or language can arise at very short notice in
response to crises or opportunities, which can be in any spot in the
world and can arise with no warning.
-
Finally, unlike many corporate settings, MT must be available for many
ad-hoc MT users on laptop PCs under rugged conditions.
Thus, what is needed for ad-hoc MT users is not simply a particular MT
system for a particular language and a particular domain, or even a
suite of such tools. Instead, what is needed is an integrated
approach to machine translation that includes a collection of
components, tools and resources, which together meet the requirements
outlined above. More specifically, what is needed is a combination of
the following:
-
Available broad-coverage, acceptable-quality, cross-platform MT
systems for certain key languages.
-
Available domain-specific, high-quality, cross-platform MT systems
for certain key domains and languages.
-
The ability to quickly assemble cross-platform MT systems for new
languages and/or new limited domains, exploiting existing (legacy)
resources as much as possible and using advanced tools to create new
resources where needed.
During this project, CoGenTex, Inc. and its subcontractors, the University of Pennsylvania and Systran Software, Inc., propose to
develop the above components, tools, resources, and a methodology to
use them.
In Phase I, we developed a modular framework with a "plug-and-play"
architecture for assembling MT systems from off-the-shelf components.
The core of the system is a lexicalized, syntax-oriented transfer
component. The definition of the level of transfer also provides the
interface definition for other software components, such as parsers
and generators. We used two different parsers from the University of
Pennsylvania, and the RealPro generator from CoGenTex.
In Phase II, we will add to this framework resources and specific
functionality not currently present, and improve currently available
resources and functionality.
Two types of results will issue from the Phase II effort.
-
We will produce an extensible plug-and-play translation framework
which will come with trainable tools for transfer lexicon
extraction, and a choice of parsers and a modifiable generation
shell.
-
We will produce an operational prototype MT system for high-quality
translation in the battlefield message domain, as well as an
operational prototype broad-coverage MT system for acceptable
quality. Both systems will be for the language pairs
Korean-to-English and English-to-Korean.
-
Subcomponents of these two systems will include stand-alone English
and Korean parsers and generators, as well as associated lexicons and
the bilingual transfer lexicon.
Because of the modular "plug-and-play" architecture of our framework
as developed in Phase I, each of the tasks can be worked on
independently, and the results can be easily integrated into the
framework. The system can easily be upgraded when new and
higher-quality components become available. All implementation work
will be done in C, C++, and/or Java, assuring cross-platform
compatibility. The principal target platform will be the PC.
This approach will enable us to achieve:
-
Good-quality broad coverage MT, by exploiting the newest natural
language processing technology.
-
High-quality domain-specific MT.
-
Rapid development of new MT systems for new languages and/or
domains.
For more information
Contact .
Papers
Nasr, Alexis; Rambow, Owen; Palmer, Martha; and Rosenzweig, Joseph (1997).
Enriching Lexical Transfer with Cross-Linguistic Semantic Features, or
How to Do Interlingua Without Interlingua.
In Proceedings of the Interlingua Workshop at the MT Summit,
San Diego, CA.
[Acrobat, 271 Kb]
[PostScript, 163 Kb]
Palmer, Martha; Rambow, Owen; and Nasr, Alexis (1998).
Rapid Prototyping of Domain-Specific Machine Translation Systems.
In Machine Translation and the Information Soup -
Proceedings of the Third Conference of the Association for
Machine Translation in the Americas (AMTA '98), Springer Verlag
(Lecture Notes in Artificial Intelligence No. 1529), Berlin.
[Acrobat, 199 Kb]
[PostScript, 833 Kb]
Han, Chung-hye; Lavoie, Benoit; Palmer, Martha; Rambow, Owen;
Kittredge, Richard; Korelsky, Tanya; Kim, Nari; and Kim, Myunghee (2000).
Handling Structural Divergences and Recovering Dropped
Arguments in a Korean-English Machine Translation System.
In Proceedings of the Fourth Conference of the Association for
Machine Translation in the Americas (AMTA 2000), Misión Del Sol, Mexico.
[Acrobat, 204 Kb]
[PostScript, 789 Kb]
Lavoie, Benoit; Kittredge, Richard; Korelsky, Tanya; and Rambow, Owen (2000).
A Framework for MT and Multilingual NLG Systems Based on
Uniform Lexico-Structural Processing.
In Proceedings of ANLP/NAACL 2000, Seattle, Washington.
[Acrobat, 66 Kb]
[PostScript, 313 Kb]
Lavoie, Benoit; White, Michael; and Korelsky, Tanya (2001).
Inducing Lexico-Structural Transfer Rules from Parsed Bi-texts.
Proceedings of ACL 2001 Workshop on Data-driven Machine
Translation, Toulouse, France, pp. 17-24.
[Acrobat, 56 Kb]
Lavoie, Benoit; White, Michael; and Korelsky, Tanya (2002).
Learning Domain-Specific Transfer Rules: An Experiment with
Korean to English Translation.
In Proceedings of the COLING 2002 Workshop on Machine
Translation in Asia, Taipei, Taiwan, pp. 60-66.
[Acrobat, 35 Kb]
(c) 2010 CoGenTex, Inc. All Rights Reserved.
|