Python Pandas is a helpful tool for analyzing and modifying tabular data. It uses Python and SQL to make data analysis more efficient compared to traditional methods.

The Pandas is written in C language. The Pandas module is a high-performance, highly efficient, and high-level data analysis library. It allows us to work with large sets of data called DataFrames.
Purpose of Pandas
- Calculate statistics and answer questions about the data like average, median, max, and min of each column
- Finding correlations between columns
- Tracking the distribution of one or more columns
- Visualizing the data with the help of matplotlib, using plot bars, histograms, etc.
- Cleaning and filtering data, whether it’s missing or incomplete, just by applying a user-defined function (UDF) or built-in function
- Transforming tabular data into Python to work with
- Exporting the data into a CSV, other file, or database
- Feature engineer new columns that can be applied to your analysis
Pandas data types
- Series ➤ One-dimensional labeled array capable of holding data of any type
- DataFrame ➤ Spreadsheet
- Axis ➤ Column or row, axis = 0 by row; axis = 1 by column
- Record ➤ A single row
- dtype ➤ Data type for DataFrame or series object
- Time Series ➤ Series object that uses time intervals, like tracking weather by the hour
How to create a dataframe in Pandas
The Datafrme carnation from the dictionary can be seen in the example. What is the purpose of the seed() method? it customizes the start number of the random number generator.
import random
import pandas as pd
random.seed(3) # generate same random numbers every time, number used doesn't matter
names = [ "Jess", "Jordan", "Sandy", "Ted", "Barney", "Tyler", "Rebecca" ]
ages = [ random.randint(18, 35) for x in range( len(names) )]
people = { "names" : names, "ages" : ages }
df = pd.DataFrame.from_dict(people)
print(df)
The output
names ages
0 Jess 25
1 Jordan 35
2 Sandy 22
3 Ted 29
4 Barney 33
5 Tyler 20
6 Rebecca 18
Related







You must be logged in to post a comment.