These questions were asked in the CitiusTech Data Engineer role interview. I am sharing with you questions and solutions.

Pandas top interview questions
Photo by SHVETS production on Pexels.com

SQL and Pandas Interview Questions

01. How to read CSV files in pandas?

import pandas as pd
# Assuming the CSV file is named "data.csv" and is in the current directory
df = pd.read_csv("data.csv")
# Displaying the DataFrame
print(df)

02. How to create a data frame in Pandas?

import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df= pd.DataFrame(data)
print(df)

03. How to show in the below format using Pandas?

## input
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })

## expected output
a Newcol
A 3
B 1
C 3

Solution

import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)

df=df.groupby("a")["b"].sum().reset_index(name="Newcol")
print(df)

Output

   a  Newcol
0 A 3
1 B 1
2 C 3

04. How to add a new column in Pandas with a fixed value and range of values?

## Adding fixed value in the Newcol
import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)
df["Newcol"]=5
print(df)

##Output
a b Newcol
0 A 1 5
1 A 2 5
2 B 1 5
3 C 3 5

Adding a range of values

import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)
start_value = 1
end_value = 4
df["Newcol"]= range(start_value, end_value + 1)
print(df)

##Output
a b Newcol
0 A 1 1
1 A 2 2
2 B 1 3
3 C 3 4

05. How to get the average age country-wise: write an SQL query?

select country, avg(age) from Customers
group by country;

06. In Pandas, how to format the given data in the list format?

## input
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })

## Output
a Newcol
A [1,2]
B [1]
C [3]

Solution

import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)
df=df.groupby("a")["b"].agg(list).reset_index(name="Newcol")
print(df)

## Output
a Newcol
0 A [1, 2]
1 B [1]
2 C [3]