Section 11.6 Input Tokens and Errors
The term token means βthe smallest meaningful unitβ. What exactly is meaningful depends on the context. If we were looking at
12 34
and thinking of integers, it would make sense to call the tokens 12
and 34
. But if we were looking at that same piece of text as some characters, we would think of it as the tokens 1
, 2
,
, 3
, and 4
.
In C++, the
>>
operator says βskip past any whitespace, read one token, then stopβ. If it is trying to read into an integer, and encounters 10 20
it will read the number as 10 and then stop. The
20 is left for the next
>>. If we use
>>` again at that point, it will read over the space at the start and then read the 20.
If we start with the same data -
10 20
- but read into a char
with >>
, we would only get 1
. The 0 20
would be left for future reads.
This means that if our data is separated by whitespace (spaces, tabs, newlines) and we read it using the right data type, we can not think about the spacing or how to break up the input. Here is a datafile
Numbers2.txt
:
Data: Numbers2.txt
Right now it is the same as
Numbers.txt
, but we will edit in just a minute. Because it has size integers separated by whitespace, we can simply >>
into an integer variable 6 times to consume all the numbers. All of the spaces and even the newlines are automatically skipped over:
But what if we try to read a word into an
int
variable?
Insight 11.6.1.
Once a stream has a failure, it goes into a failure state and refuses to read (or write) any more data. Calling
>>
on a stream that is in a failure state will not do anything - the variable being read into will get no new data and there will be no visible errors.
To safeguard against that behavior, we can check after each use of
>>
to see if there was an error and if so stop before trying to use that data:
It is up to us to detect failures and decide what to do about them. Here we just print an error message and let the user decide what to do. That will be our usual approach. We often will even omit checking after each input for the sake of readability as new ideas are explained.
Having the program recovering from errors parsing a file can be quite challenging - it is not always clear what should be done. Should you skip that token and try to read the rest? Is a whole line or section of the file now invalid? Do we need to discard other data we have read? What if we were trying to read an int and the next chunk of the file was
x12
? Do we read the 12? Skip it? There are almost an infinite number of things that could be wrong that we could try to worry about and fix once we started down that path.
Aside
Checkpoint 11.6.2.
Checkpoint 11.6.3.
Checkpoint 11.6.4.
You have attempted of activities on this page.