Pandas is a valuable tool for data analysis in Python. The provided content consists of 12 interview questions about Pandas that can assist you in preparing for and excelling in your interview.

Pandas interview Questions
Photo by Google DeepMind on Pexels.com

Pandas Interview Questions

01. What is Pandas?

Pandas is a Python library that is free to use. It helps with working with data by providing tools and functions. These tools can be used to manipulate and analyze structured data, like data frames and series.

02. What is a data frame in Pandas?

A data frame is a two-dimensional table-like data structure in Pandas. It is similar to a spreadsheet or a SQL table and consists of rows and columns. Each column can contain data of a different data type.

03. What is a Series in Pandas?

A Series is a one-dimensional labeled array in Pandas. It can hold data of any type, including numeric, character, and boolean data. Each element of the series has a label, which can be used to access the data.

04. How can you read a CSV file using Pandas?

You can use the read_csv() function in Pandas to read a CSV file. For example:

import pandas as pd

df = pd.read_csv('filename.csv')

05. How can you select a subset of columns in a Pandas data frame?

To select a subset of columns in a Pandas data frame, you can use the .loc or .iloc indexer. The .loc indexer is used when you want to select columns by label, while the .iloc indexer is used when you want to select columns by index position.

Here’s an example of using the .loc indexer to select specific columns:

import pandas as pd

# Create a data frame
data = {'col1': [1, 2, 3, 4, 5],
        'col2': ['a', 'b', 'c', 'd', 'e'],
        'col3': ['foo', 'bar', 'baz', 'qux', 'quux']}
df = pd.DataFrame(data)

# Select specific columns using .loc
subset_df = df.loc[:, ['col1', 'col3']]
print(subset_df)

Output:

   col1  col3
0     1   foo
1     2   bar
2     3   baz
3     4   qux
4     5  quux

In the above example, we used the .loc indexer with [:, ['col1', 'col3']] to select all rows (:) and the columns with labels 'col1' and 'col3'.

You can also use the .iloc indexer to select columns by index position. Here’s an example:

import pandas as pd

# Create a data frame
data = {'col1': [1, 2, 3, 4, 5],
        'col2': ['a', 'b', 'c', 'd', 'e'],
        'col3': ['foo', 'bar', 'baz', 'qux', 'quux']}
df = pd.DataFrame(data)

# Select specific columns using .iloc
subset_df = df.iloc[:, [0, 2]]
print(subset_df)

Output:

   col1  col3
0     1   foo
1     2   bar
2     3   baz
3     4   qux
4     5  quux

In the above example, we used the .iloc indexer with [:, [0, 2]] to select all rows (:) and the columns at index positions 0 and 2.

06. How can you filter rows in a Pandas data frame based on a condition?

You can use boolean indexing to filter rows in a Pandas data frame based on a condition. For example:

import pandas as pd

df = pd.read_csv('filename.csv')

subset = df[df['column'] > 10]

07. How can you merge two data frames in Pandas?

You can use the merge() function in Pandas to merge two data frames based on a common column. For example:

import pandas as pd

df1 = pd.read_csv('filename1.csv')

df2 = pd.read_csv('filename2.csv')

merged = pd.merge(df1, df2, on='column')

08. How can you group data in a Pandas data frame based on a column?

You can use the groupby() function in Pandas to group data in a data frame based on a column. For example:

import pandas as pd

df = pd.read_csv('filename.csv')

grouped = df.groupby('column')

09. How can you pivot a Pandas data frame?

You can use the pivot() function in Pandas to pivot a data frame.

import pandas as pd

df = pd.read_csv('filename.csv')

pivoted = df.pivot(index='column1', columns='column2', values='value')

10. How can you handle missing values in a Pandas data frame?

You can use the fillna() function in Pandas to replace missing values in a data frame with a specified value. For example:

import pandas as pd

df = pd.read_csv('filename.csv')

df.fillna(0, inplace=True)

11. How can you plot data in a Pandas data frame?

You can use the plot() function in Pandas to plot data in a data frame. For example:

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv('filename.csv')

df.plot(x='column1', y='column2', kind='line')

plt.show()

12. How can you export a Pandas data frame to a CSV file?

To export a Pandas DataFrame to a CSV file, you can use the to_csv() function in Pandas. Here’s an example:

import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Jane', 'Mark', 'Emma'],
        'Age': [25, 30, 35, 28],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)

# Export DataFrame to CSV file
df.to_csv('filename.csv', index=False)

In the above example, we first created a DataFrame with some sample data. Then, we used the to_csv() function to export the DataFrame to a CSV file named “filename.csv”. The index=False argument ensures that the row index is not included in the exported file.

Make sure to replace 'filename.csv' with the desired name and path of the CSV file you want to export to.

References