MARIE consists of an Ngram-based statistical machine translation decoder, which aims at being helpfull to the research community in the field of Statistical Machine Translation. It has been developed at the TALP Research Center of the Universitat Politècnica de Catalunya (UPC) by Josep M. Crego as part of his PhD thesis, with the aid of Adrià de Gispert and under the advice of professor José B. Mariño.
In order to perform better translations, the decoder can make use of a target language model, a reordering model, a word penalty and any additional translation models, all introduced in the search following a log-linear combination of models.
Tools for building language models are freely available (we recommend the SRI Language Modeling Toolkit). Methods to learn translation models can be found after a brief look at current research papers on SMT.
The decoder is released with a manual which describes its usage and inner workings. Details of the decoder have also been presented in the next international conference (reference MARIEdecoder citing this paper):
.- fixed a bug appearing when sorting lists whithin the same group.
.- Implemented the output word graph format.
.- burst of 1gram NULLs penalty model (-lNN)
.- high (>=3) Ngram BM bonus model (-l3gr)
.- fixed a bug appearing when sorting lists whithin the same group.
.- format/units output using (-format -units) in STDOUT and outfile.UNITS
.- most efficient reading of input files
.- target Tags using (-fTTM, -ltt, -ftags)
.- verbose output file in outfile.VERBOSE
.- added (-ln) units bonus model
.- input reordering graph (-ingraph)
.- input can be read from STDIN
.- to optimize decoding, ngrams are cached (-cache)
.- output models cost in outfile.UNITS for each tuple (-unitscost)
.- reading models first than input files (usefull in client/server mode).
.- more detailed help (-h) option.
To unpackage just type (under linux OS): tar xvzf marie-vX.Y.Z.tgz. 21 files will appear:
We would also like to thank the rest of members of the SMT group in the Signal Theory and Communications Department of the UPC for their comments, suggestions and contributions in the development and testing work: Patrick Lambert, Rafael Banchs, Marta Ruiz and José A. R. Fonollosa.
Send your comments and suggestions to Josep M. Crego.
Copyright © 2017 - Designed by Madstudio