Characters

Section 9.1 Characters

Subsection 9.1.1 `char` data type

C++ has two different data types that are used to store text. The first is the char data type, which stores a single character. The second is the string data type, which stores a sequence of characters. In this chapter we will start with the char data type before learning about strings.

🔗

A character value is a single letter or digit or other symbol enclosed in single quotes, like 'a' or '5' or '%'. To store a character value, we need to use the data type char:

🔗

Listing 9.1.1.

🔗

Exactly one character muse be written in the single quotes for a char. An empty char '' is illegal. As is trying to place multiple characters in the char like 'hello'. (We need to use strings for 0 or 2+ characters.)

🔗

Subsection 9.1.2 Chars as numbers

The processor in your computer does not really work with characters. At a hardware level, everything is just a number (or to be more precise, a sequence of 1’s and 0’s). So to store and work with something like a character, it needs to be converted to a number.

🔗

Each character in C++ has a corresponding number, which is called its ASCII value. For example, the ASCII value for the letter ’A’ is 65, and the ASCII value for the letter ’a’ is 97. You can see the ASCII values for all characters at the ASCII Table. The ASCII table has 256 entries (\(2^8\)) so it requires 8 bits to store a value large enough to represent any ASCII value. So, in C++, a char occupies 1 byte (8 bits) in memory. That means it can store the values -128 to 127.

🔗

Note 9.1.1.

For more on ASCII values and representing data in binary, you can refer to the Data Representation chapter from the book Welcome to CS.

🔗

Because chars are stored as numbers, it is possible to do math with them. The ASCII value for 'A' is 65. If you add one to that you get 66, which is the ASCII value for 'B'. It is even possible to assign char variables numeric values:

🔗

Listing 9.1.2.

🔗

Warning 9.1.2.

Just because you can do something does not mean you should. Using numeric values like 67 instead of char literals like 'C' is bad practice. No one reading your code should have to remember what character has the ASCII code 94.

🔗

The numeric aspect of chars explains one confusing aspect of working with them. Although you can compare chars using relational operators, the results are not always what you might expect. Examine the following. (Remember 1 is true and 0 is false.)

🔗

Listing 9.1.3.

🔗

The first two make sense. ’A’ is the same as ’A’ and it is not the same as ’C’.

🔗
The next one also makes sense. Capital ’A’ is not the same as lower-case ’a’.

🔗
'B' < 'A' being false makes sense. B is not less than A in alphabetical order - it is greater.

🔗
The last one is the tricky one. Why is 'B' < 'a' true????

🔗

When you compare chars, you are really comparing their ASCII values. So 'B' is less than 'a' because 66 (ASCII for B) is less than 97 (ASCII for a).

🔗

As long as you are comparing two upper-case letters or two lower-case letters, it is safe to assume that < or > will do a logical alphabetical order comparison. But you can’t rely on those operators to do anything that makes sense outside of the ASCII table when applied to two different kinds of character.

🔗

Note 9.1.3.

The ASCII character set has the characters used in English and many European languages. But to represent characters from other languages, symbols, and things like emojis, we need a bigger table of characters. Unicode is a standard for representing in alphabets like Cyrillic and Greek, non-alphabetic languages like Chinese, and various symbols. You can read more about it at the Unicode website (https://unicode.org/).

🔗

C++ provides a data type wchar_t (wide character type) for storing Unicode values. We will not cover it in this book, but pretty much anything you can do with a char you can do with a wchar_t. Do a search for “C++ wide character” to learn more if you are interested.

🔗

Checkpoint 9.1.1.

Which statements are correct about the char data type.

🔗

They can store any value an int can
🔗
A char is stored as an integer, but can only store values from -128 to 127.
The ASCII code for upper case letters are “larger” than those for lower case letters
🔗
No, the ASCII code for upper case letters are smaller than those for lower case letters.
You can write a char literal using either ' or "
🔗
You must use ' for a char literal.
A char must always store one character (not 0 or 2+)
🔗
Correct!

🔗

You have attempted of activities on this page.

🔗

Prev Top Next