Here are the questions asked in TechM and Synecron. These include replacing substrings without using the replace() method. They also include flattening a list of lists into a single list. Lastly, we cover splitting a large dataframe into multiple files using PySpark. Also, added an SQL query asked in the interview.

Python, PySpark and SQL interview questions
Photo by Miguel u00c1. Padriu00f1u00e1n on Pexels.com

How do we replace the substring without using the replace()?

a = "s3://ABC/abc.txt"
o = a.split("/")
print(o)

# Modify the third element
o[2] = "abc"
print(o)

# Join the list back to a string
result = "/".join(o)
print(result)

output

['s3:', '', 'ABC', 'abc.txt']
['s3:', '', 'abc', 'abc.txt']
s3://abc/abc.txt


** Process exited - Return Code: 0 **
Press Enter to exit terminal

How do we split the list of lists into a single list?

list1 = [1, [2, 3], [3, 4]]
o=[]
for i in list1:
if isinstance(i, list): # Checks if 'i' is a list
for j in i: # Iterates through the sublist
o.append(j)
else:
o.append(i)

print(o)

Output

[1, 2, 3, 3, 4]


** Process exited - Return Code: 0 **
Press Enter to exit terminal

The isinstance() method in Python is used to check whether an object is an instance of a particular class. It can check variables or values. It checks if an object is an instance of a specific class. It can also verify if it is part of a tuple of classes. It’s a way to find the type of an object before performing certain operations on it.

Syntax:

isinstance(object, classinfo)
  • object: The object you want to check.
  • classinfo: A class, type, or a tuple of classes and types to check against.

Examples:

  1. Check if an object is a specific type:
x = 5
print(isinstance(x, int)) # Output: True, because x is an integer
  1. Check if an object is of multiple types:
x = [1, 2, 3]
print(isinstance(x, (list, tuple))) # Output: True, because x is a list
  1. Check if an object is a custom class type:
class MyClass:
pass

obj = MyClass()
print(isinstance(obj, MyClass)) # Output: True, because obj is an instance of MyClass
  1. Check for different types:
x = "Hello"
print(isinstance(x, (int, float, str))) # Output: True, because x is a string, which is one of the types in the tuple

Use Cases:

  • Type Checking: Before performing an operation that requires a specific type, use isinstance(). This ensures the object is of the required type. For example, when getting the length of an object or calling a method, confirm the type is correct.
  • Error Prevention: Helps prevent runtime errors by ensuring that operations are only performed on compatible types.

How do you split a large dataframe into multiple files in PySpark?

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("Split DataFrame Example").getOrCreate()

# Create a sample DataFrame
data = [("Alice", 34), ("Bob", 45), ("Cathy", 29), ("David", 37)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Repartition the DataFrame into the desired number of partitions
num_partitions = 2 # Specify the number of partitions (i.e., number of files)
df_repartitioned = df.repartition(num_partitions)

# Write the DataFrame to multiple files
df_repartitioned.write.mode("overwrite").csv("output/path/multiple_files")

Output

Output: The output directory (output/path/multiple_files) will contain two files, such as:

part-00000-xxxx.csv
part-00001-xxxx.csv

How can we separate phone numbers and create an SQL query to display them in separate columns?

NAME  PHONE1 PHONE2
AMC   1111    2222
CDF   5555    9999

The output should be
NAME  PHONE
AMC   1111
AMC   2222
CDF   5555
CDF   9999

Solution

-- create a table
create table phone_numbers(name varchar(20), phone1 integer, phone2 integer);

--  insert values into table
insert into phone_numbers values("Srini", 1111, 2222),
("Ravi", 5555, 8888),
("Krishan", 9999, 01010); */

-- select * phone numbers
select * from phone_numbers;

-- Reformat phone numbers
select name, phone1 as phone from phone_numbers
union all
select name, phone2 as phone from phone_numbers;

Output

-- Table data
name	phone1	phone2
Srini	1111	2222
Ravi	5555	8888
Krishan	9999	1010

-- Output
name	phone
Srini	1111
Ravi	5555
Krishan	9999
Srini	2222
Ravi	8888
Krishan	1010