Hashing

Section 33.2 Hashing

Hashing is the process of transforming input data into a fixed-size value using a hash function. The output of a hash function is called a hash of the input.

🔗

Arbitrary data is transformed by a hash function into a fixed-size set of bits.🔗 — Figure 33.2.1. A hash function transforms input data into a fixed-size value called a hash.
🔗

We can use such a hash function to calculate indexes for items that we want to store in an array to implement what is known as a hash table. A hash function that took strings and produced an integer value between 0 and 6 could be used to store strings in the array shown below. Given a string like "Sandra Jackson", we compute its hash to decide which cell to store it in (2 in the figure). If later on we need to know if that string is in the table, we can compute its hash again to know which cell to look in.

🔗

The three names "Sandra Jackson", "John Smith", and "Jesus Lopez" are hashed to the values 2, 4, and 1 respectively. Those hash values are used as indexes to store the names in an array.🔗 — Figure 33.2.2. A hash function used to decide on indexes we will use to store different strings.
🔗

Before we look at how to implement a hash function that will work for this purpose, it is important to point out that hashing is used in many different applications in programming and computer science. In addition to being useful for building data structures, hash functions are also extensively used in cryptography. These different applications require different properties from the hash function.

🔗

In cryptography hash functions are used to securely store passwords. If there is a data breach and someone gains access to the stored data, we don’t want them to have access to the actual passwords. So instead of storing a user’s password in plain text, we can store a hash of the password. When the user tries to log in, we hash the password they enter and compare it to the stored hash. Knowing the hash of the password won’t allow someone to actually log in because to do so you need to enter a string that produces the hash, not the hash itself.

🔗

In that context we want a hash function that is impossible (or at least infeasible) to reverse. We don’t want anyone to be able to take a hashed password and figure out what the original password was. We also want a hash function that is resistant to collisions, meaning that it should be very difficult to find two different inputs that produce the same hash. And we likely want a hash function that is computationally expensive to compute, so that it would take a long time for an attacker to try many different inputs to find one that produces a known hash.

🔗

In contrast, when we are using hashing for building a data structure, we do not care about whether anyone can reverse the hash function. We are not trying to securely store data, we are just trying to efficiently store and retrieve data. For this application, we want a hash function that is very fast to compute, and ideally, that distributes values uniformly across the range of possible hash values to minimize the likelihood that two different inputs will produce the same hash (a collision).

🔗

In the next section we will look at how to implement hash functions for building a hash table. Although some of the principles also apply to cryptographic hash functions, they would require additional considerations.

🔗

Checkpoint 33.2.1.

Which of the following are properties we want in a hash function used for cryptographic purposes? Select all that apply.

🔗

Hard to reverse from the hash value to the original input.
Expensive to compute.
Avoids collisions (different inputs producing the same hash value).

🔗

Checkpoint 33.2.2.

Which of the following are properties we want in a hash function used for building a hash table? Select all that apply.

🔗

Hard to reverse from the hash value to the original input.
Expensive to compute.
Avoids collisions (different inputs producing the same hash value).

🔗

You have attempted of activities on this page.

🔗

Prev Top Next