Data Mining - Content Analysis and Acquisition

1 - List

1.1 - Software

Apache Tika (content analysis toolkit) - The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents (PDF, OppenOffice, Word, …) using existing parser libraries.

1.2 - Text mining

1.3 - Crawler

data_mining/content.txt · Last modified: 2017/10/26 15:10 by gerardnico