13 `Pandas` module:

Let’s break down the pandas library in simple terms with analogies.

Pandas as a Data Organizer:

Imagine you have a big box of LEGO bricks. Each LEGO brick is like a piece of data. Now, you want to build something cool with these bricks, but managing them in the box can be messy. Here’s where pandas comes in.

Series - The Single LEGO Stack:
- A Series is like a single stack of LEGO bricks. It’s organized and labeled. Each brick (data point) has its place, and you can easily refer to them by their position or label.
```
import pandas as pd

# Creating a Series
temperatures = pd.Series([25, 28, 24, 30, 22], name='Temperature')
```
Just like a stack of LEGO bricks neatly arranged, a Series keeps your data in order.
DataFrame - The LEGO Structure:
- Now, imagine you want to build something more complex, like a spaceship. A DataFrame is like a structured LEGO creation. It consists of multiple stacks (Series), each representing a specific aspect of your project.
```
import pandas as pd

# Creating a DataFrame
data = {'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
        'Temperature': [25, 28, 24, 30, 22]}
weather_df = pd.DataFrame(data)
```
In this analogy, your spaceship (DataFrame) has different sections (columns) for the day and temperature, and each section is like a well-organized stack of LEGO bricks (Series).

Let’s look another simple example of creating dataframe:

Certainly, Asad_Pro_Beta! Let’s break down the provided code step by step:

import pandas as pd

# Creating a DataFrame
df = pd.DataFrame([[1000, 'steve', 86],
                   [1001, 'mathew', 91],
                   [1002, 'jose', 72],
                   [1003, 'patty', 69],
                   [1004, 'vin', 88]],
                  columns=['Regd.No', 'Name', 'Marks%'],
                  index=['ID1', 'ID2', 'ID3', 'ID4', 'ID5'])

# Displaying the DataFrame
print(df)

Explanation:

Importing Pandas:
- import pandas as pd: This line imports the pandas library and gives it the alias pd. This alias is commonly used for brevity.
Creating a DataFrame:
- pd.DataFrame(...): This creates a DataFrame. The data is provided as a list of lists, where each inner list represents a row of data.
Data Values:
- The inner lists contain values for ‘Regd.No’ (Registration Number), ‘Name’, and ‘Marks%’ respectively for each student.
Columns:
- columns=['Regd.No', 'Name', 'Marks%']: This specifies the column names for the DataFrame.
Index:
- index=['ID1', 'ID2', 'ID3', 'ID4', 'ID5']: This sets custom index values for the DataFrame. Each index corresponds to a row.
Displaying the DataFrame:
- print(df): This prints the DataFrame to the console.

Resulting DataFrame:

     Regd.No   Name  Marks%
ID1     1000  steve      86
ID2     1001  mathew     91
ID3     1002  jose       72
ID4     1003  patty      69
ID5     1004  vin        88

So, the DataFrame df is a table with columns ‘Regd.No’, ‘Name’, and ‘Marks%’, and custom row indices ‘ID1’ through ‘ID5’, representing information about students, their registration numbers, names, and marks percentage. Each row corresponds to a different student.

Note: From now on i will be using Jupyter Notebook to do data analysis because Jupyter Notebook is made for that purpose. To install Jupyter Notebook go on to this Jupyter Notebook

**Creating a DataFrame and adding some data to it.

# Giving alias to pandas module and calling it pd instead of pandas as you would give a nickname to a person
import pandas as pd

df = pd.DataFrame([[1000,'steve',86],
                       [1001,'mathew',91],
                       [1002,'jose',72],
                       [1003,'patty',69],
                       [1004,'vin',88]], columns = ['Regd.No','Name','Marks%'], index=['ID1','ID2','ID3','ID4','ID5'])

df

Reading the data from a webpage

import pandas as pd

url = 'https://en.wikipedia.org/wiki/Python_(programming_language)'

d = pd.read_html(url)
d[1]

Now we will use the above data to perform various operation on it.

You have to convert the DataFrame into string for regular expression to work on it.

content = d[1].to_string()
content[:1000]

Scenario 1: Return Data types that are mutable

content = d[1].to_string()
pattern = r'\n\d{1,}\s{1,}(.+?)\s{1,}mutable.+'

result = re.findall(pattern, content)
result

Scenario 2: Return those data types that uses the curly braces format

pattern2 = r'\d{1,}\s{1,}(\w+?)\s{1,}.+\{.*\}'
result = re.findall(pattern2, content)
result

Scenario 3: Extract datatype names who is no longer than 4 characters e.g. int, set, dict, list so on.

pattern3 = r'\n\d{1,}\s{1,}(\w{3,4})\s{1,}.+'
result = re.findall(pattern3, content)
result

Scenario 3: return the description of those id who is odd

pattern4 = r'\n(\d*[13579])\s{1,}.+?\s{1,}\w+\s{1,}(.{,10}).+'
result = re.findall(pattern4,content)
result

Scenario 4: return all the data type who’s syntax part contain at least one decimal part

pattern5 = r'\n\d{1,}\s{1,}(\w+).+\s{1,}.+(\d+\.\d+)'
result = re.findall(pattern5, content)
result