Introduction

Are you preparing for an interview at Altimetrik or a similar data engineering/analytics company? 🚀
Altimetrik interviews often focus on problem-solving skills in Python, SQL, and PySpark, with a mix of coding challenges and conceptual questions.

In this blog, we’ll cover 10 real-world interview-style questions with detailed solutions—ranging from Python coding (strings, decorators, arrays), SQL queries (joins, aggregation, window functions), and PySpark transformations (pivot, explode, deduplication).

By the end of this guide, you’ll have a solid practice set to boost your confidence for technical interviews.

🔟 Questions Covered

  1. Python – First non-repetitive character in a string
  2. Python – Rearrange an array so negatives come first
  3. Python – Decorator to log function execution time
  4. Python – Flatten a nested list
  5. SQL – Latest transaction and total transactions per credit card
  6. SQL – Departments with more than 5 employees
  7. SQL – Match results summary (played, won, lost, draw per team)
  8. PySpark – Remove duplicates by keeping the latest row
  9. PySpark – Pivot sales data by year
  10. PySpark – Explode comma-separated values into rows

Altimetrik-Style Interview Practice Questions

🐍 Python (4 Questions)

Q1. First non-repetitive character in a string

Problem: Given a string, return the first character that does not repeat.

Solution:

from collections import Counter

def first_non_repeating(s: str):
    freq = Counter(s)
    for c in s:
        if freq[c] == 1:
            return c
    return None

print(first_non_repeating("swiss"))  # w

Q2. Rearrange an array so that negatives come first

Problem: Input [3, -2, -1, 5, 4, -3] → Output [-2, -1, -3, 3, 5, 4] (order within groups not important).

Solution:

def segregate(arr):
    i, j = 0, len(arr)-1
    while i < j:
        if arr[i] < 0:
            i += 1
        elif arr[j] >= 0:
            j -= 1
        else:
            arr[i], arr[j] = arr[j], arr[i]
    return arr

print(segregate([3, -2, -1, 5, 4, -3]))

Q3. Write a decorator that logs function execution time

Solution:

import time

def log_time(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        print(f"{func.__name__} took {time.time() - start:.4f}s")
        return result
    return wrapper

@log_time
def add_numbers(a, b):
    return a + b

print(add_numbers(5, 7))

Q4. Flatten a nested list

Problem: [1, [2, [3, 4], 5], 6] → [1,2,3,4,5,6]

Solution:

def flatten(lst):
    out = []
    for x in lst:
        if isinstance(x, list):
            out.extend(flatten(x))
        else:
            out.append(x)
    return out

print(flatten([1, [2, [3, 4], 5], 6]))

🗄 SQL (3 Questions)

Q5. Latest transaction and total transactions per credit card

Table: transactions(credit_card_id, transaction_date, amount)

Solution:

%sql
CREATE TABLE transactions (
    credit_card_id INT,
    transaction_date DATE,
    amount DECIMAL(10,2)
);

%sql
INSERT INTO transactions (credit_card_id, transaction_date, amount) VALUES
(101, '2025-01-05', 250.00),
(101, '2025-01-10', 300.00),
(101, '2025-02-02', 150.00),
(102, '2025-01-15', 500.00),
(102, '2025-01-20', 200.00),
(103, '2025-02-01', 450.00),
(103, '2025-02-10', 600.00),
(103, '2025-03-05', 350.00);
SELECT
    credit_card_id,
    COUNT(*) AS total_txns,
    MAX(transaction_date) AS latest_txn
FROM transactions
GROUP BY credit_card_id;

Q6. Departments with > 5 employees

Tables: employees(emp_id, dept_id) and departments(dept_id, dept_name)

Solution:

WITH dept_counts AS (
    SELECT dept_id, COUNT(*) AS emp_count
    FROM employees
    GROUP BY dept_id
)
SELECT d.dept_name, dc.emp_count
FROM departments d
JOIN dept_counts dc
  ON d.dept_id = dc.dept_id
WHERE dc.emp_count > 5;

Q7. Match results summary

Table: matches(team_a, team_b, score_a, score_b) → find total played, won, lost, draw per team.

Solution:

%sql
CREATE TABLE matches (
    team_a VARCHAR(50),
    team_b VARCHAR(50),
    score_a INT,
    score_b INT
);

%sql
INSERT INTO matches (team_a, team_b, score_a, score_b) VALUES
('India',   'Australia', 250, 200),  -- India wins
('India',   'England',   180, 180),  -- Draw
('Australia','England',  220, 230),  -- England wins
('India',   'Australia', 210, 220),  -- Australia wins
('England', 'India',     190, 200),  -- India wins
('England', 'Australia', 300, 300);  -- Draw
%sql

with cte as (
  select team_a as team, score_a as score, score_b as opponent
  from matches 
  union all
  select team_b as team, score_b as score , score_a as opponent
  from matches

)
select team,
        count(*) as all_played,
        sum(case when score > opponent then 1 else 0 end) as wins,
        sum(case when score < opponent then 1 else 0 end ) as lost,
        sum(case when score = opponent then 1 else 0 end) as draw
        from cte
        group by team
        order by team
  

🔥 PySpark (3 Questions)

Q8. Remove duplicates by keeping latest row

Data: (id, value, updated_at) → keep only latest record per id.

Solution:

from pyspark.sql import functions as F, Window

w = Window.partitionBy("id").orderBy(F.col("updated_at").desc())

df_latest = df.withColumn("rn", F.row_number().over(w)) \
              .filter("rn = 1") \
              .drop("rn")

Q9. Pivot sales data by year

Data: (region, year, sales) → pivot years as columns.

Solution:

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.appName("PivotExample").getOrCreate()

#PIVOT

data = (["East", 1000, 2022], ["West", 2000, 2022], ["East", 3000, 2022], ["West", 4000, 2022], ["East", 5000, 2023], ["West", 6000, 2024])
schema = ["Region", "Sales", "Year"]

df = spark.createDataFrame(data, schema)

df = df.groupBy("Region").pivot("Year").sum("Sales")
df.show()


#UNPIVOT
df1 = df.selectExpr("Region", 
            stack(3, '2022','2022', '2022', '2022','2023','2024')as(Year,Sales))
)

df1.show()

Q10. Explode comma-separated values into rows

Data: (Name, Sports)"Alice", "Badminton,Tennis" → 2 rows.

Solution:

from pyspark.sql.functions import split, explode, trim

df2 = df.withColumn("Sport", explode(split("Sports", ",")))
df2 = df2.withColumn("Sport", trim(df2["Sport"]))
df2.show()

Extra Questions

Q1. Write a SQL query to find the most recent order date for each customer from the orders1 table, but only include orders placed within the last 30 days.

-- 1. Create table
CREATE TABLE orders1 (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100),
    amount DECIMAL(10,2),
    ordered_date DATE
);


-- 2.Insert values
INSERT INTO orders1 (name, amount, ordered_date) VALUES
('Alice', 250.00, '2025-09-10'),
('Bob', 180.50, '2025-09-05'),
('Charlie', 320.75, '2025-08-30'),
('David', 150.00, '2025-08-01'),   -- older than 30 days
('Eva', 500.00, '2025-09-13'),
('Frank', 275.00, '2025-09-01'),
('Grace', 420.00, '2025-09-07');

-- 3. Solution
select name, 
       max(ordered_date) as recent_from_orders
where datediff(CURDATE(), ordered_date) < 30
group by name;

Q2. Write a SQL query to find, for each month, the customer who spent the highest total amount. If a customer has multiple orders in the same month, sum them first before comparing with others.

-- Create table
CREATE TABLE orders (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100),
    amount DECIMAL(10,2),
    ordered_date DATE
);

-- Insert sample data across different months
INSERT INTO orders (name, amount, ordered_date) VALUES
('Alice',   200.00, '2025-01-05'),
('Bob',     450.00, '2025-01-12'),
('Charlie', 300.00, '2025-01-20'),
('Alice',   500.00, '2025-02-03'),
('David',   350.00, '2025-02-15'),
('Eva',     700.00, '2025-02-25'),

-- March
('Frank',   600.00, '2025-03-08'),
('Grace',   750.00, '2025-03-16'),
('Helen',   400.00, '2025-03-30'),

-- April
('Alice',   900.00, '2025-04-02'),
('Bob',     650.00, '2025-04-18'),
('Charlie', 800.00, '2025-04-25'),

-- May
('David',   1000.00, '2025-05-05'),
('Eva',      950.00, '2025-05-10'),
('Frank',    720.00, '2025-05-20');


WITH CTE1 AS (
  SELECT NAME, 
  SUM(AMOUNT) as totamt, 
  DATE_FORMAT(ordered_date, '%Y-%m') as month
  from orders
  group by name,DATE_FORMAT(ordered_date, '%Y-%m')
),
CTE2 as(
  select name, 
  totamt, 
  month, 
  row_number()over(PARTITION BY month order by totamt desc) as rn
  from CTE1
)
select name,
  totamt, 
  month 
  from CTE2 
  where rn =1;