Associative Containers and Set

Section 21.4 Associative Containers and Set

A vector is great if you want to access elements by their position (index) in the collection. But what if you want to look up elements in a different way? For example, we might want to just ask a collection “do you have this value?”. Given a list of words, we might want to write something like words.contains("hello"). There is no simple and efficient way to do this in a vector because a vector is not optimized for searching by value.

🔗

Or, perhaps you want to store a mapping from one value to another, such as mapping from a person’s name to their phone number so that it is easy to look up a phone number by providing the person’s name using syntax like phoneNumbers.at("Zoe"). You could make a vector that stored structs with a name and phone number, but then you would have to search through the vector to find the right person.

🔗

The C++ Standard Library provides several categories of containers (or data structures) for storing collections of data, each optimized for different kinds of access and modification patterns. vectors and lists are part of the sequence containers category, which items are stored in a specific order based on how/when they were inserted. Another important category is the associative containers, which are optimized for storing collections of data that can be quickly looked up by a value or key .

🔗

The simplest associative container is the set, which stores a collection of unique values and allows for fast lookup to see if a value is present in the collection. For example, say we want to keep track of a list of the unique words that appear in a document. By creating a set, and then inserting each word into the set as we read the document, we can easily build a collection of all the unique words.

🔗

A set is declared like a vector or queue with the type of data it will store. To make a set of strings we would declare: set<string> words;.

🔗

We can then use .insert(value) to add a value to the set, .erase(value) to remove a value, .contains(value) to check if a value is in the set, and ask for the number of elements in the set using .size().

🔗

A set stores only unique values. If you try to insert a value that is already present in the set, the set will ignore the insertion. (There is a multiset container that allows duplicates if you need that behavior. We will not cover it here.)

🔗

Here is a simple program to build all the unique words in a document using a set. (Rather than actually read a file, we will read from a stringstream and pretend that is a file.)

🔗

Listing 21.4.1.

🔗

#include <iostream>
#include <set>
#include <sstream>
#include <string>
using namespace std;

int main() {
  // Sample text to read words from
  string sampleText = R"(
  apple banana orange apple grape banana pear orange kiwi mango pear peach
  strawberry blueberry raspberry blackberry apple mango
  )";

// Use a stringstream to simulate reading from a file
  stringstream textStream(sampleText);

// Create a set to hold unique words
  set<string> uniqueWords;

// Read words from the stream
  string word;
  while (textStream >> word) {
    // Insert the word into the set
    uniqueWords.insert(word);
  }

cout << "Total unique words: " << uniqueWords.size() << endl;

cout << "Contains the word 'apple': "
       << (uniqueWords.contains("apple") ? "yes" : "no") << endl;
  cout << "Contains the word 'missing': "
       << (uniqueWords.contains("missing") ? "yes" : "no") << endl;

// Display the unique words
  cout << "Unique words found:" << endl;
  for (const auto& w : uniqueWords) {
    cout << w << endl;
  }
}

Note 21.4.1.

To keep things simple, we will are not worrying about punctuation or case sensitivity. A version of this program designed to read a real text file would likely want to turn all words into lowercase so that Hello and hello were treated as the same word. It would also need to take something like world! and remove the exclamation mark so it matched world.

🔗

The program finishes by printing all the unique words found in the document. Note that the words are printed in alphabetical order. This is because the set container automatically keeps its elements in sorted order. This is what allows it to rapidly check for the presence of a value.

🔗

Checkpoint 21.4.1.

🔗

Checkpoint 21.4.2.

Which statements about sets are true?

🔗

If duplicates are inserted, only one copy is stored.
Sets allow us to look up items by index.
Sets store elements in the order they were added.
Sets can tell us their size.

🔗

Checkpoint 21.4.3.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next