(Natural|Human) Language - Text (Mining|Analytics)

1 - About

See Data - Unstructured data

  • Tweet
  • Web site comments
  • Weblogs
  • Forum comment

A tweet is analyzed differently than a long blog post and a blog comment is analyzed differently than a tweet.

If you want to use any method of machine learning to work with natural language you have to pre-process your data first. Depending on you problem it could for example:

  • mean stemming,
  • lemmatization,
  • computing n-gram statistics,

Training classifier is the last step.

Natural language processing is a field of artificial intelligence concerned with the interactions between computers and human languages. Computers can be trained to model a language.

3 - Analyses Type

Extract or analyze category by order of importance

3.1 - Topics and Theme

document classification

3.2 - Sentiment

3.3 - Named Entity

Named entity (extraction and disambiguation),

  • people extraction
  • company extraction
  • geographic location (Town, …)
  • author extraction

3.4 - Concept

Abstract groups of entities. Concept tagging.

3.5 - Metadata

  • author extraction
  • publication date
  • language detection,
  • title
  • headers

3.6 - Other entities

  • phone number
  • part/product
  • e-mail
  • street/address
  • keyword extraction,
  • quotations extraction,

Information Extraction (IE)

3.7 - Relationships and fact

3.8 - Others

  • web page cleaning,
  • intent mining,
  • Clustering

4 - Statistics

5 - Unstructured Information Management applications

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.

6 - Library

7 - Application

  • Apache Unstructured Information Management Architecture. The major goal of UIMA is to transform unstructured information to structured information by orchestrating analysis engines to detect entities or relations and thus to build the bridge between the unstructured and the structured world. UIMA is, by itself, an empty framework.

8 - Natural Language

9 - Documentation

natural_language/natural_language.txt · Last modified: 2017/10/26 15:15 by gerardnico