List

Software

Apache Tika (content analysis toolkit) - The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents (PDF, OppenOffice, Word, …) using existing parser libraries.

Text mining

Text mining

Crawler

Natural Language - Crawler