Section 1.6 Formal and Natural Languages
Natural languages are the languages that people speak, such as English, Spanish, Korean, and Mandarin Chinese. They were not purposely designed by people (although people have tried to impose some order on them); they evolved naturally.
Formal languages are languages that are designed by people for specific applications. For example, the notation that mathematicians use is a formal language that is particularly good at denoting relationships among numbers and symbols. Chemists use a formal language to represent the chemical structure of molecules. And most importantly:
Programming languages are formal languages that have been designed to express computations.
Formal languages tend to have strict rules about syntax. For example, 3 + 3 = 6
is a syntactically correct mathematical statement, but 3 = + 6 $
is not. H₂O is a syntactically correct chemical name, but ₂Zz is not.
Syntax rules come in two flavors, pertaining to tokens and structure. Tokens are the basic elements of the language, such as words, numbers, and symbols. One of the problems with 3 = + 6 $
is that the symbol $
is not a legal token in mathematics (at least as far as we know). Similarly, ₂Zz is not legal in chemistry because there is no element with the abbreviation Zz
.
When you read a word in English, you have to make sure the tokens are correct and appropriate. Humans are quite good at figuring out the tokens despite their many variations as seen by our ability to read various fonts, bumper stickers, and Internet spellings (e.g.: α, ∀, and a are all recognizable as the letter a; the text ’e5c4p3’ can be read as ’escape’).
The second type of syntax rule pertains to the structure of a statement— that is, the way the tokens are arranged. The statement 3 = + 6 $
is structurally illegal because you can’t place a plus sign immediately after an equal sign. Similarly, molecular formulas have to have subscripts after the element name, not before.
When you read a sentence in English or a statement in a formal language, you have to figure out what the structure of the sentence is (although in a natural language you do this subconsciously). This process is called parsing. For example, when you hear the sentence, “The other shoe fell”, you understand that the other shoe is the subject and fell is the verb. Once you have parsed a sentence, you can figure out what it means — the semantics of the sentence. Assuming that you know what a shoe is and what it means to fall, you will understand the general implication of this sentence.
People who grow up speaking a natural language—that is, everyone—often have a hard time adjusting to formal languages like computer programing languages. Although formal and natural languages have many features in common — tokens, structure, syntax, and semantics — there are many differences:
- ambiguity
-
Natural languages are full of ambiguity, which people deal with by using contextual clues and other information.
Formal languages are designed to be nearly or completely unambiguous, which means that any statement has exactly one meaning, regardless of context.
For example, in a natural language, someone may be ’tall’ for their age. However in a a formal language like Python, tall = True
has one meaning.
- literalness
Formal languages mean exactly what they say; in Python we code height = 192.6
. On the other hand, natural languages are full of idiom and metaphor. If someone says, “How’s the weather up there?” we can assume they are implying the other person is so tall that they must experience a different weather at their head’s elevation.
Here are some suggestions for reading programs (and other formal languages):
Remember that formal languages are much more dense than natural languages. It takes longer to read them and little inconsistencies in spelling and punctuation, which you can get away with in natural languages, will make a big difference in a formal language. Practice paying extra attention to all the tokens.
Recognize that structure is very important, and is usually quite consistent, in formal languages. Always start reading a formal language from top to bottom but make note of indentations (it really matters) and other ways the program’s flow will be modified. Recognize that some instructions will be executed but not others (conditional execution) or that some actions will happen more than once (repetition). Learn to to identify these kinds of structures, the algorithm’s parts and the program’s overall flow. Learn to see, identify, and use common programming structures.
It bears repeating, computers will do only what you tell them to do and nothing more. They do not ’understand’ your intentions. In later chapters we will discuss how to make ’self-documenting’ code that makes your programing intentions plainer to others reading your code, but always be ready to parse the program to understand what the code is actually doing and not what you assume it ought to do.
Check your understanding
Checkpoint 1.6.1.
The differences between natural and formal languages include:
natural languages can be parsed while formal languages cannot.
Actually both languages can be parsed (determining the structure of the sentence), but formal languages can be parsed more easily in software.
ambiguity and literalness.
Both of these can be present in natural languages, but cannot exist in formal languages.
there are no differences between natural and formal languages.
There are several differences between the two but they are also similar.
tokens, structure, syntax, and semantics.
These are the similarities between the two.
Checkpoint 1.6.2.
True or False: Reading a program is like reading other kinds of text.
True
It usually takes longer to read a program because the structure is as important as the content and must be interpreted in smaller pieces for understanding.
False
It usually takes longer to read a program because the structure is as important as the content and must be interpreted in smaller pieces for understanding.
You have attempted
of
activities on this page.