{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tabular Data in Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tabular data is one of the most common forms of data you'll encounter in data analysis. It consists of rows and columns, much like a table in a spreadsheet or database. This structured format allows data to be easily organized and manipulated. Some key characteristics of tabular data include:\n", "\n", "* Rows: Represent individual records or observations.\n", "* Columns: Represent variables or features that describe the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most common way to represent tabular data in Python, comes from files called Comma Separated Values (CSV). CSVs are of plain text format where each row corresponds to a record, and the columns are separated by commas." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gotta Catch 'Em All!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's consider some examples with a [Pokémon](https://www.pokemon.com/us) dataset. \n", "\n", "The Pokémon franchise (short for \"Pocket Monsters\") was created by Satoshi Tajiri and Ken Sugimori, and first introduced by Nintendo, Game Freak, and Creatures in 1996. It has since become one of the most popular and enduring multimedia franchises in the world, centering around fictional creatures that humans, known as Pokémon Trainers, catch and train to battle each other.\n", "\n", "The core of the franchise revolves around video games, where players explore various regions, capture wild Pokémon, and aim to complete a comprehensive database known as the [Pokédex](https://www.pokemon.com/us/pokedex).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's consider the following dataset, which we can load using the `pd.read_csv()` function: \n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
0BulbasaurGrassPoison318454949656545
1IvysaurGrassPoison405606263808060
2VenusaurGrassPoison52580828310010080
3Venusaur Mega VenusaurGrassPoison6258010012312212080
4CharmanderFireNaN309395243605065
.................................
1210Iron CrownSteelPsychic590907210012210898
1211Terapagos Normal FormNormalNaN450906585658560
1212Terapagos Terastal FormNormalNaN600959511010511085
1213Terapagos Stellar FormNormalNaN70016010511013011085
1214PecharuntPoisonGhost6008888160888888
\n", "

1215 rows × 10 columns

\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "0 Bulbasaur Grass Poison 318 45 49 49 \n", "1 Ivysaur Grass Poison 405 60 62 63 \n", "2 Venusaur Grass Poison 525 80 82 83 \n", "3 Venusaur Mega Venusaur Grass Poison 625 80 100 123 \n", "4 Charmander Fire NaN 309 39 52 43 \n", "... ... ... ... ... ... ... ... \n", "1210 Iron Crown Steel Psychic 590 90 72 100 \n", "1211 Terapagos Normal Form Normal NaN 450 90 65 85 \n", "1212 Terapagos Terastal Form Normal NaN 600 95 95 110 \n", "1213 Terapagos Stellar Form Normal NaN 700 160 105 110 \n", "1214 Pecharunt Poison Ghost 600 88 88 160 \n", "\n", " SP. Atk. SP. Def Speed \n", "0 65 65 45 \n", "1 80 80 60 \n", "2 100 100 80 \n", "3 122 120 80 \n", "4 60 50 65 \n", "... ... ... ... \n", "1210 122 108 98 \n", "1211 65 85 60 \n", "1212 105 110 85 \n", "1213 130 110 85 \n", "1214 88 88 88 \n", "\n", "[1215 rows x 10 columns]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "pokedex = pd.read_csv(\"pokedex.csv\")\n", "pokedex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A dataset in this form is referred to as a DataFrame. This dataset includes information about **each** Pokémon, from the first generation to the ninth generation\n", "\n", "Each row represents one Pokémon. The columns are:\n", "\n", "| **Column Label** | Description |\n", "|--------------------|-----------------------------------------------------|\n", "| `Name` | Pokémon's name |\n", "| `Type 1` | The primary type of the Pokémon, which defines its elemental characteristic. |\n", "| `Type 2` | The secondary type of the Pokémon, if applicable. Some Pokémon have two types |\n", "|`Total` | The sum of all base stats, indicating overall power.|\n", "|`HP` | Hit Points, representing how much damage a Pokémon can take.|\n", "|`Attack` | Physical attack power, affecting damage from physical moves.|\n", "|`Defense` | Ability to resist physical attacks.|\n", "|`SP. Atk.` | Special Attack power, for non-physical moves (e.g., energy-based)|\n", "|`SP. Def.` | Special Defense, resistance to special (non-physical) moves.|\n", "|`Speed` | Determines attack order in battle; higher Speed means attacking first.|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the first Pokémon in the dataset is Bulbasaur, who is a Grass/Poison-type Pokémon with a total stat of 318, including 45 HP, 49 Attack, 49 Defense, 65 Special Attack, 65 Special Defense, and 45 Speed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The first n rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that our `pokedex` DataFrame has 1215 rows, and 10 unique columns. If we only want to see a specific amount of rows, we can do so using `df.head()`, where `df` corresponds to a generic dataframe." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
0BulbasaurGrassPoison318454949656545
1IvysaurGrassPoison405606263808060
2VenusaurGrassPoison52580828310010080
3Venusaur Mega VenusaurGrassPoison6258010012312212080
4CharmanderFireNaN309395243605065
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "0 Bulbasaur Grass Poison 318 45 49 49 \n", "1 Ivysaur Grass Poison 405 60 62 63 \n", "2 Venusaur Grass Poison 525 80 82 83 \n", "3 Venusaur Mega Venusaur Grass Poison 625 80 100 123 \n", "4 Charmander Fire NaN 309 39 52 43 \n", "\n", " SP. Atk. SP. Def Speed \n", "0 65 65 45 \n", "1 80 80 60 \n", "2 100 100 80 \n", "3 122 120 80 \n", "4 60 50 65 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex.head() #Show the first 5 rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, if we don't specify a specific value in `.head()`, the default is 5. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try looking at the first 8 rows, corresponding to the first 8 Pokémon: " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
0BulbasaurGrassPoison318454949656545
1IvysaurGrassPoison405606263808060
2VenusaurGrassPoison52580828310010080
3Venusaur Mega VenusaurGrassPoison6258010012312212080
4CharmanderFireNaN309395243605065
5CharmeleonFireNaN405586458806580
6CharizardFireFlying53478847810985100
7Charizard Mega Charizard XFireDragon6347813011113085100
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "0 Bulbasaur Grass Poison 318 45 49 49 \n", "1 Ivysaur Grass Poison 405 60 62 63 \n", "2 Venusaur Grass Poison 525 80 82 83 \n", "3 Venusaur Mega Venusaur Grass Poison 625 80 100 123 \n", "4 Charmander Fire NaN 309 39 52 43 \n", "5 Charmeleon Fire NaN 405 58 64 58 \n", "6 Charizard Fire Flying 534 78 84 78 \n", "7 Charizard Mega Charizard X Fire Dragon 634 78 130 111 \n", "\n", " SP. Atk. SP. Def Speed \n", "0 65 65 45 \n", "1 80 80 60 \n", "2 100 100 80 \n", "3 122 120 80 \n", "4 60 50 65 \n", "5 80 65 80 \n", "6 109 85 100 \n", "7 130 85 100 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex.head(8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice as well, that dataframe rows are considered **zero-indexed**, where the first row is at index 0, the second at index 1, and so on and so forth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The last n rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can view the last few rows using `df.tail()`. Again, if we don't specify an argument, the default number of rows is 5" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
1210Iron CrownSteelPsychic590907210012210898
1211Terapagos Normal FormNormalNaN450906585658560
1212Terapagos Terastal FormNormalNaN600959511010511085
1213Terapagos Stellar FormNormalNaN70016010511013011085
1214PecharuntPoisonGhost6008888160888888
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "1210 Iron Crown Steel Psychic 590 90 72 100 \n", "1211 Terapagos Normal Form Normal NaN 450 90 65 85 \n", "1212 Terapagos Terastal Form Normal NaN 600 95 95 110 \n", "1213 Terapagos Stellar Form Normal NaN 700 160 105 110 \n", "1214 Pecharunt Poison Ghost 600 88 88 160 \n", "\n", " SP. Atk. SP. Def Speed \n", "1210 122 108 98 \n", "1211 65 85 60 \n", "1212 105 110 85 \n", "1213 130 110 85 \n", "1214 88 88 88 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex.tail() #last 5 rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to see the last 11 rows, perhaps, we can do: " ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
1204Ogerpon Cornerstone MaskGrassRock55080120846096110
1205ArchaludonSteelDragon600901051301256585
1206HydrappleGrassDragon540106801101208044
1207Gouging FireFireDragon590105115121659391
1208Raging BoltElectricDragon59012573911378975
1209Iron BoulderRockPsychic590901208068108124
1210Iron CrownSteelPsychic590907210012210898
1211Terapagos Normal FormNormalNaN450906585658560
1212Terapagos Terastal FormNormalNaN600959511010511085
1213Terapagos Stellar FormNormalNaN70016010511013011085
1214PecharuntPoisonGhost6008888160888888
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack \\\n", "1204 Ogerpon Cornerstone Mask Grass Rock 550 80 120 \n", "1205 Archaludon Steel Dragon 600 90 105 \n", "1206 Hydrapple Grass Dragon 540 106 80 \n", "1207 Gouging Fire Fire Dragon 590 105 115 \n", "1208 Raging Bolt Electric Dragon 590 125 73 \n", "1209 Iron Boulder Rock Psychic 590 90 120 \n", "1210 Iron Crown Steel Psychic 590 90 72 \n", "1211 Terapagos Normal Form Normal NaN 450 90 65 \n", "1212 Terapagos Terastal Form Normal NaN 600 95 95 \n", "1213 Terapagos Stellar Form Normal NaN 700 160 105 \n", "1214 Pecharunt Poison Ghost 600 88 88 \n", "\n", " Defense SP. Atk. SP. Def Speed \n", "1204 84 60 96 110 \n", "1205 130 125 65 85 \n", "1206 110 120 80 44 \n", "1207 121 65 93 91 \n", "1208 91 137 89 75 \n", "1209 80 68 108 124 \n", "1210 100 122 108 98 \n", "1211 85 65 85 60 \n", "1212 110 105 110 85 \n", "1213 110 130 110 85 \n", "1214 160 88 88 88 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex.tail(11)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Structure\n", "\n", "We can also use `df.info()` to give an overview of the dataset, including the number of entries, column names, data types, and non-null counts." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 1215 entries, 0 to 1214\n", "Data columns (total 10 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Name 1215 non-null object\n", " 1 Type 1 1215 non-null object\n", " 2 Type 2 669 non-null object\n", " 3 Total 1215 non-null int64 \n", " 4 HP 1215 non-null int64 \n", " 5 Attack 1215 non-null int64 \n", " 6 Defense 1215 non-null int64 \n", " 7 SP. Atk. 1215 non-null int64 \n", " 8 SP. Def 1215 non-null int64 \n", " 9 Speed 1215 non-null int64 \n", "dtypes: int64(7), object(3)\n", "memory usage: 95.1+ KB\n" ] } ], "source": [ "pokedex.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that each of the columns, except the `Type 2` column, have 0 null (empty) entries. We do see though, that only 669 of the Pokémon have an entry in the `Type 2` column. This seems to indicate that around half of the Pokémon only have a single type associated with them" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary Statistics\n", "\n", "We can use `df.describe()` to get some summary statistics for each of our columns in our dataset" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TotalHPAttackDefenseSP. Atk.SP. DefSpeed
count1215.0000001215.0000001215.0000001215.0000001215.0000001215.0000001215.000000
mean443.10452771.24444481.15226375.00740773.22469172.44115270.034568
std121.19340626.92781932.03713430.74099932.75715227.57818830.161298
min175.0000001.0000005.0000005.00000010.00000020.0000005.000000
25%332.00000052.00000057.00000052.00000050.00000051.00000045.000000
50%465.00000070.00000080.00000070.00000065.00000070.00000068.000000
75%521.00000085.000000100.00000091.00000095.00000090.00000091.000000
max1125.000000255.000000190.000000250.000000194.000000250.000000200.000000
\n", "
" ], "text/plain": [ " Total HP Attack Defense SP. Atk. \\\n", "count 1215.000000 1215.000000 1215.000000 1215.000000 1215.000000 \n", "mean 443.104527 71.244444 81.152263 75.007407 73.224691 \n", "std 121.193406 26.927819 32.037134 30.740999 32.757152 \n", "min 175.000000 1.000000 5.000000 5.000000 10.000000 \n", "25% 332.000000 52.000000 57.000000 52.000000 50.000000 \n", "50% 465.000000 70.000000 80.000000 70.000000 65.000000 \n", "75% 521.000000 85.000000 100.000000 91.000000 95.000000 \n", "max 1125.000000 255.000000 190.000000 250.000000 194.000000 \n", "\n", " SP. Def Speed \n", "count 1215.000000 1215.000000 \n", "mean 72.441152 70.034568 \n", "std 27.578188 30.161298 \n", "min 20.000000 5.000000 \n", "25% 51.000000 45.000000 \n", "50% 70.000000 68.000000 \n", "75% 90.000000 91.000000 \n", "max 250.000000 200.000000 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that these summary statistics are only calculate for columns which have **numeric** values, where a categorical column like `\"Name\"` is not present" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Listing the columns\n", "\n", "Suppose we had a dataset which had many many columns. If we wanted to see the different column names, we could do so using `df.columns`" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',\n", " 'SP. Atk.', 'SP. Def', 'Speed'],\n", " dtype='object')" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sorting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting is a fundamental operation in data analysis that allows you to organize your data based on specific criteria. In Pandas, the `.sort_values()` function is used to sort a DataFrame by one or more columns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example, let's sort the `pokedex` dataset by values in the `Total` column. In order to ensure that the row with the highest `\"Total\"` is at the top, we have to specify the `ascending = False` argument. Otherwise, we would have the Pokémon with the lowest `\"Total\"` at the top" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
1058Eternatus EternamaxPoisonDragon1125255115250125250130
475Rayquaza Mega RayquazaDragonFlying780105180100180100115
201Mewtwo Mega Mewtwo XPsychicFighting780106190100154100130
202Mewtwo Mega Mewtwo YPsychicNaN78010615070194120140
473Groudon Primal GroudonGroundFire7701001801601509090
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "1058 Eternatus Eternamax Poison Dragon 1125 255 115 250 \n", "475 Rayquaza Mega Rayquaza Dragon Flying 780 105 180 100 \n", "201 Mewtwo Mega Mewtwo X Psychic Fighting 780 106 190 100 \n", "202 Mewtwo Mega Mewtwo Y Psychic NaN 780 106 150 70 \n", "473 Groudon Primal Groudon Ground Fire 770 100 180 160 \n", "\n", " SP. Atk. SP. Def Speed \n", "1058 125 250 130 \n", "475 180 100 115 \n", "201 154 100 130 \n", "202 194 120 140 \n", "473 150 90 90 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex_sorted = pokedex.sort_values(by='Total', ascending=False) #highest total at the top\n", "pokedex_sorted.head()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
902Wishiwashi Solo FormWaterNaN175452020252540
245SunkernGrassNaN180303030303030
985BlipbugBugNaN180252020254545
1034SnomIceBug185302535453020
368AzurillNormalFairy190502040204020
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "902 Wishiwashi Solo Form Water NaN 175 45 20 20 \n", "245 Sunkern Grass NaN 180 30 30 30 \n", "985 Blipbug Bug NaN 180 25 20 20 \n", "1034 Snom Ice Bug 185 30 25 35 \n", "368 Azurill Normal Fairy 190 50 20 40 \n", "\n", " SP. Atk. SP. Def Speed \n", "902 25 25 40 \n", "245 30 30 30 \n", "985 25 45 45 \n", "1034 45 30 20 \n", "368 20 40 20 " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex_sorted = pokedex.sort_values(by='Total') #lowest totals at the top\n", "pokedex_sorted.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to sort the Pokémon alphabetically by their names, simply sort by the `\"Name\"` column:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
561AbomasnowGrassIce494909275928560
562Abomasnow Mega AbomasnowGrassIce5949013210513210530
83AbraPsychicNaN3102520151055590
441AbsolDarkNaN4656513060756075
442Absol Mega AbsolDarkNaN565651506011560115
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "561 Abomasnow Grass Ice 494 90 92 75 \n", "562 Abomasnow Mega Abomasnow Grass Ice 594 90 132 105 \n", "83 Abra Psychic NaN 310 25 20 15 \n", "441 Absol Dark NaN 465 65 130 60 \n", "442 Absol Mega Absol Dark NaN 565 65 150 60 \n", "\n", " SP. Atk. SP. Def Speed \n", "561 92 85 60 \n", "562 132 105 30 \n", "83 105 55 90 \n", "441 75 60 75 \n", "442 115 60 115 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex_sorted = pokedex.sort_values(by='Name')\n", "pokedex_sorted.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, notice that when we use `.sort_values` without the `ascending = False` argument, our `\"Name\"` column is sorted by ascending order, where pokemon names that come earliest in the alphabet are at the top of the Dataframe" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameType 1Type 2TotalHPAttackDefenseSP. Atk.SP. DefSpeed
865Zygarde Complete FormeDragonGround708216100121919585
863Zygarde 50% FormeDragonGround600108100121819595
864Zygarde 10% FormeDragonGround48654100716185115
760ZweilousDarkDragon420728570657058
54ZubatPoisonFlying245404535304055
\n", "
" ], "text/plain": [ " Name Type 1 Type 2 Total HP Attack Defense \\\n", "865 Zygarde Complete Forme Dragon Ground 708 216 100 121 \n", "863 Zygarde 50% Forme Dragon Ground 600 108 100 121 \n", "864 Zygarde 10% Forme Dragon Ground 486 54 100 71 \n", "760 Zweilous Dark Dragon 420 72 85 70 \n", "54 Zubat Poison Flying 245 40 45 35 \n", "\n", " SP. Atk. SP. Def Speed \n", "865 91 95 85 \n", "863 81 95 95 \n", "864 61 85 115 \n", "760 65 70 58 \n", "54 30 40 55 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pokedex_sorted = pokedex.sort_values(by='Name', ascending=False)\n", "pokedex_sorted.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can actually see the Pokémon which have names that come latest in the alphabet" ] } ], "metadata": { "kernelspec": { "display_name": "UDUB", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 2 }