2 - Range

You can read files in multiple HTTP range requests. This is simply a way for HTTP to request a portion of the file instead of the entire file (for example, GET FILE X Range: byte=0-10000).

EMR uses this technique to read data from S3. For example, if a single data file on Amazon S3 is about 1 GB, Hadoop reads your file from Amazon S3 by issuing 15 different HTTP requests in parallel if Amazon S3 split size is 64 MB (1 GB/64 MB = ~15).