rparse is a data-driven parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRS). It has been developed at the Emmy Noether group of Prof. Dr. Laura Kallmeyer at the University of Tübingen. At this time, it is under active development at the project "Beyond CFG" at the University of Düsseldorf. The development of rparse has been funded by the German Research Foundation (DFG).

See the rparse homepage for more information.

Hierarchical Aligner

The Bottom-up Hierarchical Aligner verifies whether a given word alignment can be generated with a Synchronous Linear Context-Free Rewriting System (SLCFRS) of specified fan-out, including ITG/SCFG. Please refer to the following publication for details:

Miriam Kaeshammer (2013): Synchronous Linear Context-Free Rewriting Systems for Machine Translation. Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-7), NAACL-HLT 2013 Workshop. Atlanta, Georgia, USA. [pdf] [bib] [code]

RegAligner - A tool for regularized word alignment

RegAligner is a tool for word alignment, meant as a replacement for GIZA++. It implements the models IBM1-4 and HMM, which can optionally be combined with regularity terms. It is free to use and modify for all kind of research purposes. A recent version can be found in the Git repository.

RegAligner was written by Thomas Schoenemann in roughly equal parts in his free time and at Lund University, Sweden. A few refinements were made at the University of Pisa, and future versions will be developed at the University of Düsseldorf.

Discosuite – A parser test suite for German discontinuous structures

Discosuite is a test suite for German discontinuous structures. It is described in

W. Maier, M. Kaeshammer, Peter Baumann, and S. Kübler, "Discosuite - A parser test suite for German discontinuous structures," in Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC'14), Reykjavik, Iceland, 2014. 

Discosuite is available for download here.

Penn Treebank Coordination Annotation

This is an annotation layer for the Penn Treebank which marks punctuation tokens. The layer is described in 

Wolfgang Maier, Sandra Kübler, Erhard Hinrichs, and Julia Krivanek (2012): Annotating Coordination in the Penn Treebank. Proceedings of The 6th Linguistic Annotation Workshop (The LAW VI). July 2012, Jeju, Korea.

Available through the LDC catalog.