2.7. Case Study 2: Where Should We Start Our New Business?¶
Before we begin, make sure that the Google Sheets file you are working on is exactly like the one given at the beginning of this case study. Any changes you made to the Google Sheets while following along in the previous section should be deleted.
2.7.1. Business Score by Region¶
Let’s show the average business score in all countries. We will create a table for all countries in the region. Before we get started, consider the following questions.
What are the unique region names?
How can we compute an average for the countries that are in the same region?
1. Let’s start by first sorting the data by region. Select Column A and then from the menu select Data -> Sort sheet by column A, A -> Z.
2. Use the UNIQUE
function to create a table of the unique region names. In cell B193
use the UNIQUE
function
and find the different names for the regions. The UNIQUE
function returns a list of unique values in a range.
In this case, the values are unique region names.
3. Now that we have the table of regions, use SUMIF
and COUNTIF
to compute the average business score. The
COUNTIF
function counts the number of cells in a range that meets a specific condition. In our case, the range for all
regions is B2:B192
. The condition is the name of the regions in the newly created table of region names.
To get started, create a column next to the region names that contains the number of countries in the region.
4. Now, let’s use the SUMIF
function to create a column that sums the business score for each region.
SUMIF
function sums up values in a range that meet specific criteria. The first parameter will be
the range containing the regions; the second parameter will be the name of a region to match. In this case,
a third parameter is the range of cells containing the business scores. When a row in the region
column matches the given region, the function includes the value from the business score column in the sum.
Q-1: What is the sum of all business scores in Asia?
Lastly, we can sort the business scores to see the regions from the highest score to the lowest. First,
select Column A, and then from the menu select Data -> Sort sheet by column A, A -> Z. Then, add another column to
our table that tells us how many countries are in each region (COUNTIF
). Finally, using MAXIFS
, MINIFS
, MATCH
,
and INDEX
, find the easiest and hardest country to start a business for each region. MAXIFS
and MINIFS
work
like AVERAGEIF
and COUNTIF
, but they allow for more conditions. In our case, we still need only one condition.
2.7.2. Joining Data from Other Sources When Considering Starting a New Business¶
So far, we have only used the original business start-up score spreadsheet for our analysis. Often, one file does not contain all the data we need. So, if we need to look at other factors that affect starting a business in different countries, we can import another spreadsheet that has a lot more data. In this lesson, we will learn how to join data from various sources.
In our business spreadsheet, we have a column that contains the name of each country. It has a bunch of business-related data about each country in other columns. In our countries of the world sheet, we have a column of country names and other information about countries in other columns. The two sheets do not have the countries in the same order. Also, they have most of the same countries, but not all.
The first thing we need to do is import the countries of the world spreadsheet. This has a huge amount of data about each country. For now, we will use the information on this new spreadsheet to give us the country code of each country
When we use VLOOKUP
, our goal is to join these two sheets together, adding
columns to the business start-up spreadsheet using values from the population row in the countries of the
world sheet in the rows where the country names match. For example, in our
business start-up spreadsheet, Ethiopia is on row 170, but in the countries of the world spreadsheet,
it is on row 67. We want to take column A row 67 from
the countries’ spreadsheet and add it to the business start-up spreadsheet on row 170 column Q.
We do this with VLOOKUP
by allowing the function to search for the value in
one cell in another column, and then return the value from a different cell in
the same row but in some other column.
To find the country code of Ethiopia, we would use =VLOOKUP(A170, world_countries!$A$2:$BK$265, 2, FALSE)
.
A170 is the cell containing Ethiopia
world_countries!$A$2:$BK$265
is the range of cells we can search in as well as get values from2 tells Sheets that when we find a match for Ethiopia, we want the value from the same row but in column 2 of our range
To fill in a new column with the country codes, paste the following into N2:
=VLOOKUP(A170, world_countries!$A$2:$BK$265, 2, FALSE)
. Have N2 selected
then double click blue square in the lower right corner of the cell. Google Sheets will automatically
paste the values into the remaining cells until it reaches a black cell in the left column.
Double-check the entire column to make sure that all the data is filled.
As in our previous study case, we will have some rows with a value of #N/A. In this case, In one spreadsheet there is a name “Venezuela, RB” and in the other spreadsheet it has “Venezuela, the Bolivarian Republic of.” You will have to clean up this data and make the names manually match where they don’t already.
Any time you are introducing data from another source, you are likely to run into inconsistencies and missing data.
Now that you have learned how to use VLOOKUP
. Add another column to business start-up spreadsheet.
You can start by either copy/pasting the whole spreadsheet into a new tab in the same spreadsheet or importing the CSV file into a new tab in the same spreadsheet.
Next, we will want to add a column to the business start-up spreadsheet that contains the population for each country.
2.7.3. Summarizing Key Business Data Using Pivot Table¶
A great way to summarize and analyze data is by using a pivot table. A pivot table takes the unique values from some column and make them the titles of a bunch of columns, while summarizing the data for those columns from a number of rows.
For this practice, you should redo the exercise of finding the average population for each region that you did above. You should create a pivot table that uses the unique values for each country code as the row and calculates a number of summary statistics for each.
Pivot tables can be treated like any other part of your spreadsheet. You can use a pivot table to find the country with
the lowest business start-up score. Then using MATCH
and INDEX
, add an additional column
that contains the name of that country.
Lesson Feedback
-
During this lesson I was primarily in my...
- 1. Comfort Zone
- 2. Learning Zone
- 3. Panic Zone
-
Completing this lesson took...
- 1. Very little time
- 2. A reasonable amount of time
- 3. More time than is reasonable
-
Based on my own interests and needs, the things taught in this lesson...
- 1. Don't seem worth learning
- 2. May be worth learning
- 3. Are definitely worth learning
-
For me to master the things taught in this lesson feels...
- 1. Definitely within reach
- 2. Within reach if I try my hardest
- 3. Out of reach no matter how hard I try