A vector is great if you want to access elements by their position (index) in the collection. But what if you want to look up elements in a different way? For example, we might want to just ask a collection βdo you have this value?β. Given a list of words, we might want to write something like words.contains("hello"). There is no simple and efficient way to do this in a vector because a vector is not optimized for searching by value.
Or, perhaps you want to store a mapping from one value to another, such as mapping from a personβs name to their phone number so that it is easy to look up a phone number by providing the personβs name using syntax like phoneNumbers.at("Zoe"). You could make a vector that stored structs with a name and phone number, but then you would have to search through the vector to find the right person.
The C++ Standard Library provides several categories of containers (or data structures) for storing collections of data, each optimized for different kinds of access and modification patterns. vectors and lists are part of the sequence containers category, which items are stored in a specific order based on how/when they were inserted. Another important category is the associative containers, which are optimized for storing collections of data that can be quickly looked up by a value or key .
The simplest associative container is the set, which stores a collection of unique values and allows for fast lookup to see if a value is present in the collection. For example, say we want to keep track of a list of the unique words that appear in a document. By creating a set, and then inserting each word into the set as we read the document, we can easily build a collection of all the unique words.
We can then use .insert(value) to add a value to the set, .erase(value) to remove a value, .contains(value) to check if a value is in the set, and ask for the number of elements in the set using .size().
A set stores only unique values. If you try to insert a value that is already present in the set, the set will ignore the insertion. (There is a multiset container that allows duplicates if you need that behavior. We will not cover it here.)
Here is a simple program to build all the unique words in a document using a set. (Rather than actually read a file, we will read from a stringstream and pretend that is a file.)
To keep things simple, we will are not worrying about punctuation or case sensitivity. A version of this program designed to read a real text file would likely want to turn all words into lowercase so that Hello and hello were treated as the same word. It would also need to take something like world! and remove the exclamation mark so it matched world.
The program finishes by printing all the unique words found in the document. Note that the words are printed in alphabetical order. This is because the set container automatically keeps its elements in sorted order. This is what allows it to rapidly check for the presence of a value.