Text Mining - (Corpus|Corpora) - Structured set of Text Document

Text Mining

About

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts.

See What is a bag of words model? known also as a bag of tokens in NLP

Documentation / Reference

Dictionary

Wikitionary

MediaWiki

The XML schema for each dump is defined at the top of the file. And also described in the MediaWiki export help page.

MediaWiki API (For Wiki bot)

https://en.wikipedia.org/w/api.php?action=query
    &titles=SQL   # the title of the page that are in the URL separated by |
    &format=xml   # The exported format
    &prop=description|categories # The properties exported





Discover More
Card Puncher Data Processing
Process - Poisson Process

The Poisson process is a stochastic process in which events occur: continuously independently (of the time since the last event) - (ie random) at a constant / known average rate in a fixed interval...
Text Mining
Text Mining - term frequency – inverse document frequency (tf-idf)

tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used...



Share this page:
Follow us:
Task Runner