Languages can be broken down into: words, syntax, grammar, and idioms. The genius of languages and cognitive theories is Noam Chomsky. You should read up a little on him and his writings on the subject. He also writes a fair bit these days on politics, so check the subject matter of the article/book before you lug it home or bookmark it for reading later.
Generally, translating something would involve the following steps:
1. Lexigraphic analysis - also sometimes referred to as "scanning". This is where words, punctuation, sentences, and paragraphs are picked out - if it is human language. If it is a computer language, then you have statements instead of sentences, and methods or functions instead of paragraphs. Plus, you are also concerned with whether a word is a number or an identifier/variable in a computer language. The output of a scanner is generally a stream (or list) of tokens.
2. Parsing. A parser takes in the token stream and outputs a parse tree. One name for this sort of parse tree is an Abstract Syntax Tree (AST, for short), if it is for a computer language. For a human language, you might try to generate an Augmented Transition Network (ATN). The parser or a following step will need to perform some "semantic analysis". This is where one determines the meaning or at least "type" of a word in context, as opposed to in general. Like in English, the word "swim" could be a noun as in "I am going out for a swim" or a verb as in, "I like to swim on really hot days" or "See how fast she can swim!".
3. Understanding. If it is a human language, you have to figure out what things really mean. Chomsky-style research on cognative theory might be helpful here, as well as having a good data structure like an ATN. By this point, or at least at it, you should have picked out any idioms that are there, and substituted in their intended as opposed to literal meaning. Also, the context of the topic and subjects of conversation should be known. Figuring out things like pronouns (he, she, it) and articles (that, those) are difficult without some framework to keep track of that.
4. Translation is the last step. In a computer language, the compiler will use something called a code generator to do this. The code generator walks the AST and emits code. Ultimately, that code becomes the new, translated runnable program. In a computer language, you would probably do some word substitution - taking care to get adjectives in the right place (before the noun in English and typically after the noun in French, for example). Also, idioms in the target language could be inserted here - and hopefully they have been removed/substituted out already from the source language.
I think that is a decent informal description of how translation works - for both computer and human languages. It should be enough for you to have a good idea of the series of things that need to be done, the order to do them, and the terminology involved. From there you can do web searches with Yahoo, Google, or your favorite search engines. If your interest is computer-related, I also recommend searching FOLDOC for the computer terms. The Dict.org automatically includes searching FOLDOC, so you could simply put searches to it.
For understanding human languages, you need a good computerized dictionary. An excellent one for this is WordNet. Many human language translation/analysis programs that involve the English language use WordNet.
If you are starting to do a lot of research, do yourself a favor and download Firefox 2 and then download some useful search add-ins for it. There are ones for FOLDOC, Dict.org, Wikipedia, etc. that are very handy - and, of course, free.
While designing - and debugging - your translator, you might find a diagramming tool very handy. It can help with both organizing/reviewing what you are learning/building - and communicating it to others. There are a lot of free tools for this - such as add-ons for LaTeX, as well as stand-alone tools like Graphviz.
In defining your grammar(s), you might want to look at semantic web tools and document formats like Stanford's popular, free semantic web GUI application program Protégé, and look at the W3's OWL XML-based language for semantic web documents.
These days, now that XML is such a popular and well-supported - and pretty simple format - you might wish to consider modeling your AST or whatever as an XML infoset and/or document. Then, you could use translation/transformation tools like XSLT and things like that on it.
I would grab some free open source database and XML related tools, if I were you, to hold all your data/information during your design phase - and perhaps to use by your tool itself, while it is running.
Human-language translation has been a subject of intense human interest for the better part of a century. A lot of AI research and programming has been written for it.
A couple popular languages for automated human language translation are: SNOBOL and Prolog. SNOBOL is good at pattern matching/substitution and Prolog is good for defining grammars declaratively, using DCGs.
2006-10-29 03:11:55
·
answer #1
·
answered by John C 5
·
0⤊
0⤋
¡Hola! Estoy bien, gracias. Español, suena como un curso genial, si tan sólo pudiera hablar español lol. Estoy atascado con hablar algo de francés y los desi idiomas normales. Lol. ¿Te graduás pronto? Estoy finalizando un grasp en Ciencias, el errors más grande de mi vida. Lol! Tengo que escribir una tesis.¿Y tú, hasta dónde? No contestes en español lol. ¡No entenderé una palabra! Espero que puedas entender todo lo que te escribí. Lol! i'm hoping this permits you i might desire to respond to in case you desire help with spanish ;)
2016-10-16 12:41:48
·
answer #2
·
answered by lurette 4
·
0⤊
0⤋