Sunday, July 6, 2014

Marpa or ParZu?

I spent most of May working through my old natural-language tokenizer, adding a vocabulary-driven lexer/lexicon for German, all in preparation for undertaking a Marpa-based German parser. That's looking halfway decent at this point (except I need to do much better stemming), and then I decided to do a general search on German parsers and found ParZu.

The unusual thing about ParZu, among parsers especially, is that it's fully open source. That is, it has a free license, not a free-for-academics-only license - and it's hosted on GitHub. Also, I can try it online. So I fed it some more-or-less hairy sentences from my current translation in progress - and it parsed them perfectly.

So here's the thing. I kind of want to do my own work and come to terms with the hairiness of things myself. And then on the other hand, parsing German by any means would allow me to jump ahead and maybe start doing translation-related tasks directly....

It's a dilemma.

Update 2014-09-26: Maybe not such a dilemma. ParZu is written in Prolog and I'm just not sure I'm up for that. It honestly seems it would be easier to do it in Marpa...

This is probably incorrect. But I think I'm going to start finding out, this week.