# Hive - Sample Clause

The sampling clause allows the users to write queries for samples of the data instead of the whole table. Currently the sampling is done on the clustered column. (ie columns specified in the CLUSTERED BY)

## 3 - Syntax

The buckets are numbered starting from 0.

In general the TABLESAMPLE syntax looks like:

TABLESAMPLE(BUCKET x OUT OF y)

where:

• y has to be a multiple or divisor of the number of buckets in that table as specified at the table creation time.
• The buckets chosen are determined when the following formula is true:

$$\text{BucketNumber } \text{module } y = x$$

The table pv_gender_sum has 32 bucket.
SELECT pv_gender_sum.* FROM pv_gender_sum TABLESAMPLE(BUCKET 3 OUT OF 32);
SELECT pv_gender_sum.* FROM pv_gender_sum TABLESAMPLE(BUCKET 3 OUT OF 16)
SELECT pv_gender_sum.* FROM pv_gender_sum TABLESAMPLE(BUCKET 3 OUT OF 64 ON userid)