CIS
Centrum für Informations-
und Sprachverarbeitung



End-of-sentence tokenizer (EOS version 2)

Page in English Language   
Page in Italian Language   
Page in German Language   

This piece of software has been developed in several programming workshops with C++ starting from summer semester 2008.

More information and readme file: click here.

Author: Dr. Max Hadersbeck
Students:

Previous version (SS 2008 and WS 2008/2009):
Susanne Peters, Jonathan Cummins, Daniel Bruder, Michael Mandl,

Following version (SS 2009, WS 2010/2011):
Perez Estelle, Peters Susanne; Azzano Dino, Bruder Daniel, Fink Florian, Kaumanns David , Thum Simon

Version eos**2 is currently being developed (SS 2011)
Perez Estelle; Azzano Dino, Bruder Daniel, Fink Florian, Kaumanns David

If you want to receive the result via e-mail:
E-mail address:

If you just want to see the possible sentence boundaries:

Text language:
German
English
French (beta)
Italian (beta)
Norvegian (beta)
Croatian (beta)

Attention! The sentence tokenizer accepts UTF-8 texts up to 5 MB:

Data input as .txt file (up to 5 MB):
file name: :

or copy the text in the text field below (up to 5 MB):

Protection agains illegaler use: Enter the Init-Number 1357 in the following Textarea :
Init-Number:

If you have specified the file name or entered the text, press "Start".
The text will be processed by the application