(Natural|Human) Language - Text (Mining|Analytics)

About

See What is Unstructured data? known also as structure-later, schema-later or schema on read

Tweet
Web site comments
Weblogs
Forum comment
…

A tweet is analyzed differently than a long blog post and a blog comment is analyzed differently than a tweet.

If you want to use any method of machine learning to work with natural language you have to pre-process your data first. Depending on you problem it could for example:

mean stemming,
lemmatization,
computing n-gram statistics,
tf-idf.

Training classifier is the last step.

Natural language processing is a field of artificial intelligence concerned with the interactions between computers and human languages. Computers can be trained to model a language.

Analyses Type

Extract or analyze category by order of importance

Topics and Theme

document classification

topic categorization.
Topic Classification (Baesian Classifier)

Sentiment

sentiment analysis,
Opinion
Attitudes
Emoticons
Perceptions
Intent

Named Entity

Named entity (extraction and disambiguation),

people extraction
company extraction
geographic location (Town, …)
author extraction

Concept

Abstract groups of entities. Concept tagging.

Metadata

author extraction
publication date
language detection,
title
headers

Other entities

phone number
part/product
e-mail
street/address
keyword extraction,
quotations extraction,

Information Extraction (IE)

Relationships and fact

Relations extraction,
Collapsed dependencies - Online Example
Basic dependencies - Online Example

Others

What is a Full Text Search Engine ?
Document Structure
- Part-of-Speech - Online Example
- Part-of-Speech Tagging
- Sentence Detection
Coreference - Online Example

Collapsed CC-processed dependencies - Online Example

web page cleaning,
intent mining,
Clustering
…

Statistics

Text Mining - term frequency – inverse document frequency (tf-idf)

Unstructured Information Management applications

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.

Library

See Natural language processing

Application

wiki/Lucene
Apache Unstructured Information Management Architecture. The major goal of UIMA is to transform unstructured information to structured information by orchestrating analysis engines to detect entities or relations and thus to build the bridge between the unstructured and the structured world. UIMA is, by itself, an empty framework.