Skip to main content

Section 11.5 Test Your Knowledge

Checkpoint 11.5.1.

The chapter states that in the famous "diapers and beer" example, a supermarket might find that "if diapers are purchased, then beer is also purchased." The measure of how frequently beer is purchased specifically in those transactions that contain diapers is called:
  • Confidence
  • Correct! Confidence is the measure of how often the rule is true when the first item is present. It answers the question, "Among all the times diapers were bought, what percentage of the time was beer also bought?"
  • Support
  • Incorrect. Support measures how frequently the combination (diapers AND beer) appears across ALL transactions in the entire dataset, not just the ones containing diapers.
  • Not quite. Lift is a more advanced metric that measures how much more likely the items are purchased together than if they were independent, indicating how "interesting" or unexpected the rule is.
  • Data Preparation
  • No, data preparation is the first step in the data mining process, which involves cleaning and organizing data, not a metric for evaluating a rule.

Checkpoint 11.5.2.

The chapter outlines a four-step process for data mining. Which of these steps is described as typically taking the most amount of time?
  • Data preparation
  • That’s right! The chapter explicitly states, "...Step 1 [data preparation] usually takes the most amount of time." This involves organizing, cleaning, and recoding data.
  • Model development
  • No, while model development is described as the most complex and interesting step, the chapter notes that data preparation is usually the most time-consuming.
  • Interpretation of results
  • Incorrect. The chapter describes interpretation as the most important step for the data user, but not the one that takes the most time for the data miner.
  • Exploratory data analysis
  • No, while exploratory analysis is a key step, the chapter identifies data preparation as the one that typically requires the most time.
Chapter Challenge
The arules package contains other data sets, such as the Epub dataset with 3975 transactions from the electronic publication platform of the Vienna University of Economics. Load up that data set, generate some rules, visualize the rules, and choose some interesting ones for further discussion.
Data Mining with Rattle
A company called Togaware has created a graphical user interface (GUI) for R called Rattle. At this writing (working with R version 3.0.0), one of Rattle’s components has gotten out of date and will not work with the latest version of R, particularly on the Mac. It is likely, however, that those involved with the Rattle project will soon update it to be compatible again. Using Rattle simplifies many of the processes described earlier in the chapter. Try going to the Togaware site and following the instructions there for installing Rattle for your particular operating system.
You have attempted of activities on this page.