5.7. Glossary¶
5.7.1. Definitions¶
Cell: Rectangular boxes containing text or code in a notebook.
Code Cell: A cell in Juypter lab that you can program in. It uses Python3 as its programming language.
Data Frame: Data frames are multidimensional arrays taken from a larger dataset.
They are used to implement specific data operations that may not need the entire dataset.
(In pandas it is called DataFrame)
Explicit Index: Uses the values (numeric or non-numeric) set as the index. For example, if we set a column or row as the index then we can use values in the row or column as indices in different panda methods.
Implicit Index: Uses the location (numeric) of the indices, similar to the python style of indexing.
Index: An Index is a value that represents a position (address) in the DataFrame or series.
Markdown: Markdown is a lightweight markup language that uses a plain text format which is used in programming to edit and present HTML, XHTML, pdf and other file types. Refer to the relevant appendix for more about Markdown.
Series: A series is an array of related data values that share a connecting factor or property.
Text Cell: A cell in Juypter lab that you can write text in. The text is written in a language called Markdown.
5.7.2. Keywords¶
import: Import lets programmers use packages, libraries or modules that have already been programmed.
<DataFrame>[<string>]: return the series corresponding to the given column (<string>).
<DataFrame>[<list of strings>]: returns a given set of columns as a DataFrame.
<DataFrame>[<series/list of Boolean>]: If the index in the given list is True then it returns the row from that same index in the DataFrame.
<DataFrame>.loc[ ]: Uses explicit indexing to return a DataFrame containing those indices and the values associated with them.
<DataFrame>.loc[<string1>:<string2>]: This takes in a range of explicit indices and returns a DataFrame containing those indices and the values associated with them.
<DataFrame>.loc[<string>]: Uses an explicit index and return the row(s) for that index value.
<DataFrame>.loc[<list/series of strings>]: Returns a new DataFrame containing the labels given in the list of strings.
<DataFrame>.iloc[ ]: Uses implicit indexing to return a DataFrame containing those indices and the values associated with them.
<DataFrame>.iloc[<index, range of indices>]: This takes in an implicit index (or a range of implicit indices) and returns a DataFrame containing those
indices and the values associated with them.
<DataFrame>.set_index [<string)>]: Sets an existing column(s) with the <string> name as the index of the DataFrame.
<DataFrame>.head(<numeric>): Returns the first <numeric> element(s). If no parameter (<numeric>) is set then it will return the first five elements.
<pandas>.DataFrame(<data>): Used to create a DataFrame with the given data.
<pandas>.read_csv(): Used to read a csv file into a DataFrame.
<DataFrame>.set_index(<column>): Gets the values of the given column and sets them as indices. The output will be sorted in accending order based on the new indices.
<pandas>.to_numeric(): Converts what is inside the parenthesis into neumeric values.
<series>.str.startswith(<string>): .str.startswith() (in pandas) checks if a series contains a string(s) that starts with the given prarameter (<string>),
and returns a boolean value (True or False).
<data frame>.sort_index(): Sorts the different objects in the DataFrame. By default, the DataFrame is sorted based on the first column in accending order.