Introduction

Are you preparing for an interview at Altimetrik or a similar data engineering/analytics company? 🚀
Altimetrik interviews often focus on problem-solving skills in Python, SQL, and PySpark, with a mix of coding challenges and conceptual questions.

In this blog, we’ll cover 10 real-world interview-style questions with detailed solutions—ranging from Python coding (strings, decorators, arrays), SQL queries (joins, aggregation, window functions), and PySpark transformations (pivot, explode, deduplication).

By the end of this guide, you’ll have a solid practice set to boost your confidence for technical interviews.

🔟 Questions Covered

Python – First non-repetitive character in a string
Python – Rearrange an array so negatives come first
Python – Decorator to log function execution time
Python – Flatten a nested list
SQL – Latest transaction and total transactions per credit card
SQL – Departments with more than 5 employees
SQL – Match results summary (played, won, lost, draw per team)
PySpark – Remove duplicates by keeping the latest row
PySpark – Pivot sales data by year
PySpark – Explode comma-separated values into rows

Altimetrik-Style Interview Practice Questions

🐍 Python (4 Questions)

Q1. First non-repetitive character in a string

Problem: Given a string, return the first character that does not repeat.

Solution:

from collections import Counter

def first_non_repeating(s: str):
    freq = Counter(s)
    for c in s:
        if freq[c] == 1:
            return c
    return None

print(first_non_repeating("swiss"))  # w

Q2. Rearrange an array so that negatives come first

Problem: Input [3, -2, -1, 5, 4, -3] → Output [-2, -1, -3, 3, 5, 4] (order within groups not important).

Solution:

def segregate(arr):
    i, j = 0, len(arr)-1
    while i < j:
        if arr[i] < 0:
            i += 1
        elif arr[j] >= 0:
            j -= 1
        else:
            arr[i], arr[j] = arr[j], arr[i]
    return arr

print(segregate([3, -2, -1, 5, 4, -3]))

Q3. Write a decorator that logs function execution time

Solution:

import time

def log_time(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        print(f"{func.__name__} took {time.time() - start:.4f}s")
        return result
    return wrapper

@log_time
def add_numbers(a, b):
    return a + b

print(add_numbers(5, 7))

Q4. Flatten a nested list

Problem: [1, [2, [3, 4], 5], 6] → [1,2,3,4,5,6]

Solution:

def flatten(lst):
    out = []
    for x in lst:
        if isinstance(x, list):
            out.extend(flatten(x))
        else:
            out.append(x)
    return out

print(flatten([1, [2, [3, 4], 5], 6]))

🗄 SQL (3 Questions)

Q5. Latest transaction and total transactions per credit card

Table: transactions(credit_card_id, transaction_date, amount)

Solution:

%sql
CREATE TABLE transactions (
    credit_card_id INT,
    transaction_date DATE,
    amount DECIMAL(10,2)
);

%sql
INSERT INTO transactions (credit_card_id, transaction_date, amount) VALUES
(101, '2025-01-05', 250.00),
(101, '2025-01-10', 300.00),
(101, '2025-02-02', 150.00),
(102, '2025-01-15', 500.00),
(102, '2025-01-20', 200.00),
(103, '2025-02-01', 450.00),
(103, '2025-02-10', 600.00),
(103, '2025-03-05', 350.00);

SELECT
    credit_card_id,
    COUNT(*) AS total_txns,
    MAX(transaction_date) AS latest_txn
FROM transactions
GROUP BY credit_card_id;

Q6. Departments with > 5 employees

Tables: employees(emp_id, dept_id) and departments(dept_id, dept_name)

Solution:

WITH dept_counts AS (
    SELECT dept_id, COUNT(*) AS emp_count
    FROM employees
    GROUP BY dept_id
)
SELECT d.dept_name, dc.emp_count
FROM departments d
JOIN dept_counts dc
  ON d.dept_id = dc.dept_id
WHERE dc.emp_count > 5;

Q7. Match results summary

Table: matches(team_a, team_b, score_a, score_b) → find total played, won, lost, draw per team.

Solution:

%sql
CREATE TABLE matches (
    team_a VARCHAR(50),
    team_b VARCHAR(50),
    score_a INT,
    score_b INT
);

%sql
INSERT INTO matches (team_a, team_b, score_a, score_b) VALUES
('India',   'Australia', 250, 200),  -- India wins
('India',   'England',   180, 180),  -- Draw
('Australia','England',  220, 230),  -- England wins
('India',   'Australia', 210, 220),  -- Australia wins
('England', 'India',     190, 200),  -- India wins
('England', 'Australia', 300, 300);  -- Draw

%sql

with cte as (
  select team_a as team, score_a as score, score_b as opponent
  from matches 
  union all
  select team_b as team, score_b as score , score_a as opponent
  from matches

)
select team,
        count(*) as all_played,
        sum(case when score > opponent then 1 else 0 end) as wins,
        sum(case when score < opponent then 1 else 0 end ) as lost,
        sum(case when score = opponent then 1 else 0 end) as draw
        from cte
        group by team
        order by team

🔥 PySpark (3 Questions)

Q8. Remove duplicates by keeping latest row

Data: (id, value, updated_at) → keep only latest record per id.

Solution:

from pyspark.sql import functions as F, Window

w = Window.partitionBy("id").orderBy(F.col("updated_at").desc())

df_latest = df.withColumn("rn", F.row_number().over(w)) \
              .filter("rn = 1") \
              .drop("rn")

Q9. Pivot sales data by year

Data: (region, year, sales) → pivot years as columns.

Solution:

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.appName("PivotExample").getOrCreate()

#PIVOT

data = (["East", 1000, 2022], ["West", 2000, 2022], ["East", 3000, 2022], ["West", 4000, 2022], ["East", 5000, 2023], ["West", 6000, 2024])
schema = ["Region", "Sales", "Year"]

df = spark.createDataFrame(data, schema)

df = df.groupBy("Region").pivot("Year").sum("Sales")
df.show()


#UNPIVOT
df1 = df.selectExpr("Region", 
            stack(3, '2022','2022', '2022', '2022','2023','2024')as(Year,Sales))
)

df1.show()

Q10. Explode comma-separated values into rows

Data: (Name, Sports) → "Alice", "Badminton,Tennis" → 2 rows.

Solution:

from pyspark.sql.functions import split, explode, trim

df2 = df.withColumn("Sport", explode(split("Sports", ",")))
df2 = df2.withColumn("Sport", trim(df2["Sport"]))
df2.show()

Extra Questions

Q1. Write a SQL query to find the most recent order date for each customer from the `orders1` table, but only include orders placed within the last 30 days.

-- 1. Create table
CREATE TABLE orders1 (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100),
    amount DECIMAL(10,2),
    ordered_date DATE
);


-- 2.Insert values
INSERT INTO orders1 (name, amount, ordered_date) VALUES
('Alice', 250.00, '2025-09-10'),
('Bob', 180.50, '2025-09-05'),
('Charlie', 320.75, '2025-08-30'),
('David', 150.00, '2025-08-01'),   -- older than 30 days
('Eva', 500.00, '2025-09-13'),
('Frank', 275.00, '2025-09-01'),
('Grace', 420.00, '2025-09-07');

-- 3. Solution
select name, 
       max(ordered_date) as recent_from_orders
where datediff(CURDATE(), ordered_date) < 30
group by name;

Q2. Write a SQL query to find, for each month, the customer who spent the highest total amount. If a customer has multiple orders in the same month, sum them first before comparing with others.

-- Create table
CREATE TABLE orders (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100),
    amount DECIMAL(10,2),
    ordered_date DATE
);

-- Insert sample data across different months
INSERT INTO orders (name, amount, ordered_date) VALUES
('Alice',   200.00, '2025-01-05'),
('Bob',     450.00, '2025-01-12'),
('Charlie', 300.00, '2025-01-20'),
('Alice',   500.00, '2025-02-03'),
('David',   350.00, '2025-02-15'),
('Eva',     700.00, '2025-02-25'),

-- March
('Frank',   600.00, '2025-03-08'),
('Grace',   750.00, '2025-03-16'),
('Helen',   400.00, '2025-03-30'),

-- April
('Alice',   900.00, '2025-04-02'),
('Bob',     650.00, '2025-04-18'),
('Charlie', 800.00, '2025-04-25'),

-- May
('David',   1000.00, '2025-05-05'),
('Eva',      950.00, '2025-05-10'),
('Frank',    720.00, '2025-05-20');


WITH CTE1 AS (
  SELECT NAME, 
  SUM(AMOUNT) as totamt, 
  DATE_FORMAT(ordered_date, '%Y-%m') as month
  from orders
  group by name,DATE_FORMAT(ordered_date, '%Y-%m')
),
CTE2 as(
  select name, 
  totamt, 
  month, 
  row_number()over(PARTITION BY month order by totamt desc) as rn
  from CTE1
)
select name,
  totamt, 
  month 
  from CTE2 
  where rn =1;

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.

Latest Posts

Why DELETE with Subqueries Fails in PySpark SQL (And How to Fix It)

January 1, 2026
GitHub Features & Settings Explained: The Ultimate GitHub Options Guide

December 28, 2025
Ingesting Data from AWS S3 into Databricks with Auto Loader: Building a Medallion Architecture

December 18, 2025
Building Scalable Data Pipelines with dlt-meta: A Metadata-Driven Approach on Databricks

December 9, 2025

Altimetrik Interview Questions 2025 – Python, SQL, and PySpark Explained

Introduction

🔟 Questions Covered

Altimetrik-Style Interview Practice Questions

🐍 Python (4 Questions)

Q1. First non-repetitive character in a string

Q2. Rearrange an array so that negatives come first

Q3. Write a decorator that logs function execution time

Q4. Flatten a nested list

🗄 SQL (3 Questions)

Q5. Latest transaction and total transactions per credit card

Q6. Departments with > 5 employees

Q7. Match results summary

🔥 PySpark (3 Questions)

Q8. Remove duplicates by keeping latest row

Q9. Pivot sales data by year

Q10. Explode comma-separated values into rows

Extra Questions

Q1. Write a SQL query to find the most recent order date for each customer from the `orders1` table, but only include orders placed within the last 30 days.

Q2. Write a SQL query to find, for each month, the customer who spent the highest total amount. If a customer has multiple orders in the same month, sum them first before comparing with others.

Latest Posts

Why DELETE with Subqueries Fails in PySpark SQL (And How to Fix It)

GitHub Features & Settings Explained: The Ultimate GitHub Options Guide

Ingesting Data from AWS S3 into Databricks with Auto Loader: Building a Medallion Architecture

Building Scalable Data Pipelines with dlt-meta: A Metadata-Driven Approach on Databricks

Subscribe for DAILY TIPS

Altimetrik Interview Questions 2025 – Python, SQL, and PySpark Explained

Introduction

🔟 Questions Covered

Altimetrik-Style Interview Practice Questions

🐍 Python (4 Questions)

Q1. First non-repetitive character in a string

Q2. Rearrange an array so that negatives come first

Q3. Write a decorator that logs function execution time

Q4. Flatten a nested list

🗄 SQL (3 Questions)

Q5. Latest transaction and total transactions per credit card

Q6. Departments with > 5 employees

Q7. Match results summary

🔥 PySpark (3 Questions)

Q8. Remove duplicates by keeping latest row

Q9. Pivot sales data by year

Q10. Explode comma-separated values into rows

Extra Questions

Q1. Write a SQL query to find the most recent order date for each customer from the orders1 table, but only include orders placed within the last 30 days.

Q2. Write a SQL query to find, for each month, the customer who spent the highest total amount. If a customer has multiple orders in the same month, sum them first before comparing with others.

Share this:

Latest Posts

Why DELETE with Subqueries Fails in PySpark SQL (And How to Fix It)

GitHub Features & Settings Explained: The Ultimate GitHub Options Guide

Ingesting Data from AWS S3 into Databricks with Auto Loader: Building a Medallion Architecture

Building Scalable Data Pipelines with dlt-meta: A Metadata-Driven Approach on Databricks

Q1. Write a SQL query to find the most recent order date for each customer from the `orders1` table, but only include orders placed within the last 30 days.