PHP Romanian stemmer class

Citește postarea în română

Share on:

Because I needed a Romanian stemmer at a point in time for Zend Search Lucene, and it seems that there aren’t any in PHP, I’ve made one.

The page is here, and  comparing the resulting PHP class with a dictionary of the algorithm developed in snowball,  after which this class was made, because I tried to make class work with or without diacritics, general error has increased by about 3%,  but remaining below 5% for the whole dictionary of 22,570 words.

As a note, the class file should be opened with an UTF-8 editor, otherwise diacritics will disappear from the file.

Enjoy it!