About

shallow, yet wide, and nested data problems

nested transactional data = all the claims for a person for example.

Time

Considering a database of retail purchases that includes the item bought, the purchaser, and the date and time of purchase, it's easy to construct a model that will fit the training set perfectly by using the date and time of purchase to predict the other attributes; but this model will not generalize at all to new data, because those past times will never occur again.

Information from all past experience can be divided into two groups:

  • information that is relevant for the future
  • information that is irrelevant (noise).