These questions were asked in a Data Engineer interview at Smarterp. They are useful for preparing for similar interviews.

Python and SQL Interview Questions
Photo by Monstera Production on Pexels.com

Python and SQL Interview Questions

01. How to sum all the values of an integer?

# Code to sum all the integers
a=12345

d=0
for i in str(a):
d += int(i)
print(d)

Output

15

** Process exited - Return Code: 0 **
Press Enter to exit terminal

02. How to reverse a string?

s="srinivas"
k=s[::-1]
print(k)

Output

savinirs

** Process exited - Return Code: 0 **
Press Enter to exit terminal

03. How to write Pyspark code, to create a DataFrame using Data and Schema. And, extract only the date from the DateTime column?

from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType,StringType, TimestampType
from pyspark.sql.functions import col, to_date

spark=SparkSession.builder.appName("Date extract").getOrCreate()
schema=StructType(
[StructField("date_column", StringType(), True) ]
)
data = [("2024-02-28 08:30:00",), # Sample data
("2024-02-28 12:45:00",),
("2024-02-29 10:15:00",)]
df=spark.createDataFrame(data, schema)
df=df.withColumn("date_column", col("date_column").cast(TimestampType()))
df=df.withColumn("Date_only", to_date("date_column"))
df.show()

Output

+-------------------+----------+
| date_column| Date_only|
+-------------------+----------+
|2024-02-28 08:30:00|2024-02-28|
|2024-02-28 12:45:00|2024-02-28|
|2024-02-29 10:15:00|2024-02-29|
+-------------------+----------+

04. What is the “get root directory” in PySpark?

In PySpark, the “root directory” typically refers to the directory from which your Spark application is running. It’s the base directory where your SparkContext or SparkSession is created. check the link for more information.

05. Can we modify Tuples and Strings in Python?

We cannot modify Tuples and String in place. But, we can modify by assigning those to new Tuple and New string.

my_tuple=(1,2,3,4)
new_tuple=my_tuple[0:1] + (7,) + my_tuple[2:]
print(new_tuple)

Output

(1, 7, 3, 4)

** Process exited - Return Code: 0 **
Press Enter to exit terminal
my_string="Amazon Operations"
new_string="India " + my_string[0:]
print(new_string)

Output

India Amazon Operations

** Process exited - Return Code: 0 **
Press Enter to exit terminal