Cryptography - Hash

> Software Security > Cryptography - Key

1 - About

A hash function is a mathematical function which converts/encrypt a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array.

The hash value is shorter and of fixed-length.

The input data is often called the message, and the output (the hash value or hash) is often called the message digest or simply the digest.

The values returned by a hash function are called :

  • hash values,
  • hash codes,
  • hash sums,
  • hash key,
  • or simply hashes.

It's a one way function (you can't retrieve the input) but the output wil change for a slight change in the input.

Hash functions are used to:

In database application, Hash functions are mostly used to speed up:

such as :


3 - Algo

4 - Library

5 - Property

5.1 - Character Position

Position Independent: CRC16 and CRC32 (cyclical redundancy checksum, 16 bit and 32 bit). The hash of Nico or ociN will be the same. The position of the letter doesn't matter.

5.2 - Cryptographic strength

6 - Collision

An hash collision happens when two different inputs produce the same result. Hash collision are very similar to the Birthday_problem.

6.1 - Rate

If you look at two arbitrary values, the collision probability is only

<MATH> 2^{\frac{n}{2}} = 2^{-128} \text{(for 2 arbitrary value)} </MATH>

What's the probability for the clash for the md5 algorithm

Odds of a hash collision when you know the number of value to hash (Source)


7 - Distributed System

In distributed system, you hash because you don't want to request always the same server. If you hash by time for instance, you can have only one server that serves the most request (ie for the current time)

Distributed hash table (DHT)

7.1 - Regular

Regular Hashing = Round Robin = Modular function

Assign M data keys to N servers assign each key to server = k mod N where:

  • k is the key
  • and N is the number of server

For instance:

  • key 0 → server 0
  • key 1 → server 1
  • key 2 → server 2
  • key 3 → server 0
  • key 4 → server 1
  • ….

If the number of servers increase from N to 2N, every existing key needs to be remapped, relocated.


7.2 - Consistent

In consistent hashing, only a portion of keys need to be transferred to new storage space (a server) when more storage allocated. All data keys do not need to be rehashed like in regular hashing.

When the node D is added, only the data on the orange segment need to be reallocate from the Node A to the Node D.

Each server memorize the locations of other servers in the ring and can forward any incoming request.

8 - Reference / Documentation