These questions were asked in the CitiusTech Data Engineer role interview. I am sharing with you questions and solutions.

SQL and Pandas Interview Questions
01. How to read CSV files in pandas?
import pandas as pd
# Assuming the CSV file is named "data.csv" and is in the current directory
df = pd.read_csv("data.csv")
# Displaying the DataFrame
print(df)
02. How to create a data frame in Pandas?
import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df= pd.DataFrame(data)
print(df)
03. How to show in the below format using Pandas?
## input
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
## expected output
a Newcol
A 3
B 1
C 3
Solution
import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)
df=df.groupby("a")["b"].sum().reset_index(name="Newcol")
print(df)
Output
a Newcol
0 A 3
1 B 1
2 C 3
04. How to add a new column in Pandas with a fixed value and range of values?
## Adding fixed value in the Newcol
import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)
df["Newcol"]=5
print(df)
##Output
a b Newcol
0 A 1 5
1 A 2 5
2 B 1 5
3 C 3 5
Adding a range of values
import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)
start_value = 1
end_value = 4
df["Newcol"]= range(start_value, end_value + 1)
print(df)
##Output
a b Newcol
0 A 1 1
1 A 2 2
2 B 1 3
3 C 3 4
05. How to get the average age country-wise: write an SQL query?
select country, avg(age) from Customers
group by country;
06. In Pandas, how to format the given data in the list format?
## input
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
## Output
a Newcol
A [1,2]
B [1]
C [3]
Solution
import pandas as pd
data=({'a': ['A', 'A', 'B', 'C'], "b": [1,2,1,3] })
df=pd.DataFrame(data)
df=df.groupby("a")["b"].agg(list).reset_index(name="Newcol")
print(df)
## Output
a Newcol
0 A [1, 2]
1 B [1]
2 C [3]







You must be logged in to post a comment.