Skip to main content

Section 11.6 Input Tokens and Errors

The term token means β€œthe smallest meaningful unit”. What exactly is meaningful depends on the context. If we were looking at 12 34 and thinking of integers, it would make sense to call the tokens 12 and 34. But if we were looking at that same piece of text as some characters, we would think of it as the tokens 1, 2, , 3, and 4.
In C++, the >> operator says β€œskip past any whitespace, read one token, then stop”. If it is trying to read into an integer, and encounters 10 20 it will read the number as 10 and then stop. The 20 is left for the next >>. If we use >>` again at that point, it will read over the space at the start and then read the 20.
If we start with the same data - 10 20 - but read into a char with >>, we would only get 1. The 0 20 would be left for future reads.
This means that if our data is separated by whitespace (spaces, tabs, newlines) and we read it using the right data type, we can not think about the spacing or how to break up the input. Here is a datafile Numbers2.txt:
Data: Numbers2.txt
Right now it is the same as Numbers.txt, but we will edit in just a minute. Because it has size integers separated by whitespace, we can simply >> into an integer variable 6 times to consume all the numbers. All of the spaces and even the newlines are automatically skipped over:
Listing 11.6.1.
But what if we try to read a word into an int variable?

Insight 11.6.1.

Once a stream has a failure, it goes into a failure state and refuses to read (or write) any more data. Calling >> on a stream that is in a failure state will not do anything - the variable being read into will get no new data and there will be no visible errors.
To safeguard against that behavior, we can check after each use of >> to see if there was an error and if so stop before trying to use that data:
Listing 11.6.2.
It is up to us to detect failures and decide what to do about them. Here we just print an error message and let the user decide what to do. That will be our usual approach. We often will even omit checking after each input for the sake of readability as new ideas are explained.
Having the program recovering from errors parsing a file can be quite challenging - it is not always clear what should be done. Should you skip that token and try to read the rest? Is a whole line or section of the file now invalid? Do we need to discard other data we have read? What if we were trying to read an int and the next chunk of the file was x12? Do we read the 12? Skip it? There are almost an infinite number of things that could be wrong that we could try to worry about and fix once we started down that path.

Aside

Checkpoint 11.6.2.

We are reading a file with the data abc 123. We try to read in an integer. What do we get?
  • abc
  • That is not a valid integer
  • Nothing
  • An exception is generated
  • Input errors are silent! We have to check for them.
  • 123
  • There is other data first

Checkpoint 11.6.3.

We are reading a file with the data abc 123. We try to read in a string. What do we get?
  • abc 123
  • We stop at whitespaces
  • An exception is generated
  • No error here
  • a
  • The token continues until we reach whitespace

Checkpoint 11.6.4.

We are reading a file with the data abc 123. We try to read in a char. What do we get?
  • abc
  • That is more than one char
  • An exception is generated
  • No error here
  • abc 123
  • That is more than one char
You have attempted of activities on this page.