Data bucketing, also known as data clustering or bucket-based partitioning, involves dividing data into smaller, equally-sized units called buckets. Unlike partitioning, which is based on a specific column value, bucketing uses a hash function on one or more columns to assign data to buckets. Bucketing improves query performance by grouping similar data together and reducing the number of files to scan during processing

Also referred to as Data Binning.


References