Introduction
Are you preparing for an interview at Altimetrik or a similar data engineering/analytics company? 🚀
Altimetrik interviews often focus on problem-solving skills in Python, SQL, and PySpark, with a mix of coding challenges and conceptual questions.
In this blog, we’ll cover 10 real-world interview-style questions with detailed solutions—ranging from Python coding (strings, decorators, arrays), SQL queries (joins, aggregation, window functions), and PySpark transformations (pivot, explode, deduplication).
By the end of this guide, you’ll have a solid practice set to boost your confidence for technical interviews.
🔟 Questions Covered
- Python – First non-repetitive character in a string
- Python – Rearrange an array so negatives come first
- Python – Decorator to log function execution time
- Python – Flatten a nested list
- SQL – Latest transaction and total transactions per credit card
- SQL – Departments with more than 5 employees
- SQL – Match results summary (played, won, lost, draw per team)
- PySpark – Remove duplicates by keeping the latest row
- PySpark – Pivot sales data by year
- PySpark – Explode comma-separated values into rows
Altimetrik-Style Interview Practice Questions
🐍 Python (4 Questions)
Q1. First non-repetitive character in a string
Problem: Given a string, return the first character that does not repeat.
Solution:
from collections import Counter
def first_non_repeating(s: str):
freq = Counter(s)
for c in s:
if freq[c] == 1:
return c
return None
print(first_non_repeating("swiss")) # w
Q2. Rearrange an array so that negatives come first
Problem: Input [3, -2, -1, 5, 4, -3] → Output [-2, -1, -3, 3, 5, 4] (order within groups not important).
Solution:
def segregate(arr):
i, j = 0, len(arr)-1
while i < j:
if arr[i] < 0:
i += 1
elif arr[j] >= 0:
j -= 1
else:
arr[i], arr[j] = arr[j], arr[i]
return arr
print(segregate([3, -2, -1, 5, 4, -3]))
Q3. Write a decorator that logs function execution time
Solution:
import time
def log_time(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
print(f"{func.__name__} took {time.time() - start:.4f}s")
return result
return wrapper
@log_time
def add_numbers(a, b):
return a + b
print(add_numbers(5, 7))
Q4. Flatten a nested list
Problem: [1, [2, [3, 4], 5], 6] → [1,2,3,4,5,6]
Solution:
def flatten(lst):
out = []
for x in lst:
if isinstance(x, list):
out.extend(flatten(x))
else:
out.append(x)
return out
print(flatten([1, [2, [3, 4], 5], 6]))
🗄 SQL (3 Questions)
Q5. Latest transaction and total transactions per credit card
Table: transactions(credit_card_id, transaction_date, amount)
Solution:
%sql
CREATE TABLE transactions (
credit_card_id INT,
transaction_date DATE,
amount DECIMAL(10,2)
);
%sql
INSERT INTO transactions (credit_card_id, transaction_date, amount) VALUES
(101, '2025-01-05', 250.00),
(101, '2025-01-10', 300.00),
(101, '2025-02-02', 150.00),
(102, '2025-01-15', 500.00),
(102, '2025-01-20', 200.00),
(103, '2025-02-01', 450.00),
(103, '2025-02-10', 600.00),
(103, '2025-03-05', 350.00);
SELECT
credit_card_id,
COUNT(*) AS total_txns,
MAX(transaction_date) AS latest_txn
FROM transactions
GROUP BY credit_card_id;
Q6. Departments with > 5 employees
Tables: employees(emp_id, dept_id) and departments(dept_id, dept_name)
Solution:
WITH dept_counts AS (
SELECT dept_id, COUNT(*) AS emp_count
FROM employees
GROUP BY dept_id
)
SELECT d.dept_name, dc.emp_count
FROM departments d
JOIN dept_counts dc
ON d.dept_id = dc.dept_id
WHERE dc.emp_count > 5;
Q7. Match results summary
Table: matches(team_a, team_b, score_a, score_b) → find total played, won, lost, draw per team.
Solution:
%sql
CREATE TABLE matches (
team_a VARCHAR(50),
team_b VARCHAR(50),
score_a INT,
score_b INT
);
%sql
INSERT INTO matches (team_a, team_b, score_a, score_b) VALUES
('India', 'Australia', 250, 200), -- India wins
('India', 'England', 180, 180), -- Draw
('Australia','England', 220, 230), -- England wins
('India', 'Australia', 210, 220), -- Australia wins
('England', 'India', 190, 200), -- India wins
('England', 'Australia', 300, 300); -- Draw
%sql
with cte as (
select team_a as team, score_a as score, score_b as opponent
from matches
union all
select team_b as team, score_b as score , score_a as opponent
from matches
)
select team,
count(*) as all_played,
sum(case when score > opponent then 1 else 0 end) as wins,
sum(case when score < opponent then 1 else 0 end ) as lost,
sum(case when score = opponent then 1 else 0 end) as draw
from cte
group by team
order by team
🔥 PySpark (3 Questions)
Q8. Remove duplicates by keeping latest row
Data: (id, value, updated_at) → keep only latest record per id.
Solution:
from pyspark.sql import functions as F, Window
w = Window.partitionBy("id").orderBy(F.col("updated_at").desc())
df_latest = df.withColumn("rn", F.row_number().over(w)) \
.filter("rn = 1") \
.drop("rn")
Q9. Pivot sales data by year
Data: (region, year, sales) → pivot years as columns.
Solution:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.appName("PivotExample").getOrCreate()
#PIVOT
data = (["East", 1000, 2022], ["West", 2000, 2022], ["East", 3000, 2022], ["West", 4000, 2022], ["East", 5000, 2023], ["West", 6000, 2024])
schema = ["Region", "Sales", "Year"]
df = spark.createDataFrame(data, schema)
df = df.groupBy("Region").pivot("Year").sum("Sales")
df.show()
#UNPIVOT
df1 = df.selectExpr("Region",
stack(3, '2022','2022', '2022', '2022','2023','2024')as(Year,Sales))
)
df1.show()
Q10. Explode comma-separated values into rows
Data: (Name, Sports) → "Alice", "Badminton,Tennis" → 2 rows.
Solution:
from pyspark.sql.functions import split, explode, trim
df2 = df.withColumn("Sport", explode(split("Sports", ",")))
df2 = df2.withColumn("Sport", trim(df2["Sport"]))
df2.show()
Extra Questions
Q1. Write a SQL query to find the most recent order date for each customer from the orders1 table, but only include orders placed within the last 30 days.
-- 1. Create table
CREATE TABLE orders1 (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
amount DECIMAL(10,2),
ordered_date DATE
);
-- 2.Insert values
INSERT INTO orders1 (name, amount, ordered_date) VALUES
('Alice', 250.00, '2025-09-10'),
('Bob', 180.50, '2025-09-05'),
('Charlie', 320.75, '2025-08-30'),
('David', 150.00, '2025-08-01'), -- older than 30 days
('Eva', 500.00, '2025-09-13'),
('Frank', 275.00, '2025-09-01'),
('Grace', 420.00, '2025-09-07');
-- 3. Solution
select name,
max(ordered_date) as recent_from_orders
where datediff(CURDATE(), ordered_date) < 30
group by name;
Q2. Write a SQL query to find, for each month, the customer who spent the highest total amount. If a customer has multiple orders in the same month, sum them first before comparing with others.
-- Create table
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
amount DECIMAL(10,2),
ordered_date DATE
);
-- Insert sample data across different months
INSERT INTO orders (name, amount, ordered_date) VALUES
('Alice', 200.00, '2025-01-05'),
('Bob', 450.00, '2025-01-12'),
('Charlie', 300.00, '2025-01-20'),
('Alice', 500.00, '2025-02-03'),
('David', 350.00, '2025-02-15'),
('Eva', 700.00, '2025-02-25'),
-- March
('Frank', 600.00, '2025-03-08'),
('Grace', 750.00, '2025-03-16'),
('Helen', 400.00, '2025-03-30'),
-- April
('Alice', 900.00, '2025-04-02'),
('Bob', 650.00, '2025-04-18'),
('Charlie', 800.00, '2025-04-25'),
-- May
('David', 1000.00, '2025-05-05'),
('Eva', 950.00, '2025-05-10'),
('Frank', 720.00, '2025-05-20');
WITH CTE1 AS (
SELECT NAME,
SUM(AMOUNT) as totamt,
DATE_FORMAT(ordered_date, '%Y-%m') as month
from orders
group by name,DATE_FORMAT(ordered_date, '%Y-%m')
),
CTE2 as(
select name,
totamt,
month,
row_number()over(PARTITION BY month order by totamt desc) as rn
from CTE1
)
select name,
totamt,
month
from CTE2
where rn =1;






