SQL for Data Analytics: 10 Top Queries to Practice

SQL is an essential tool for data analysts, allowing them to effectively manage and analyze large amounts of data. This blog post explores 10 SQL queries that are particularly beneficial for data analysts, providing a valuable resource for improving their data analysis skills.

Table of contents

  1. 01. SELECT statement with filtering and ordering
  2. 02. Joining tables
  3. 03. Aggregating data with GROUP BY
  4. 04. Subqueries
  5. 05. Conditional statements with CASE
  6. 06. Removing duplicates with DISTINCT
  7. 07. Limiting the number of rows
  8. 08. Working with dates and time
  9. 09. Filtering with NULL values
  10. 10. Calculating percentages
SQL for Data Analytics
Photo by Monstera on Pexels.com

01. SELECT statement with filtering and ordering

The SELECT statement is the most fundamental SQL query used to retrieve data from a database. It allows you to specify which columns you want to select, as well as criteria to filter the data and the desired order. For example:

SELECT column1, column2
FROM table
WHERE condition
ORDER BY column ASC/DESC;

You can replace column1 and column2 with the specific column names you want to retrieve, table with the table name, and condition with the filtering criteria. The ORDER BY clause allows you to sort the results in ascending (ASC) or descending (DESC) order.

02. Joining tables

Often, data is stored in multiple tables, and you need to combine them to perform analysis. SQL provides different types of joins, such as inner join, left join, right join, and full outer join, to match records between tables based on specified conditions. For example:

SELECT *
FROM table1
INNER JOIN table2
ON table1.column = table2.column;

In this query, table1 and table2 are the names of the tables you want to join, and column represents the common column between the tables.

03. Aggregating data with GROUP BY

The GROUP BY clause in SQL allows you to group rows based on one or more columns and apply aggregate functions, such as SUM, COUNT, AVG, etc., to calculate summarized values. For example:

SELECT column, SUM(sales)
FROM table
GROUP BY column;

This query groups rows by the specified column and calculates the sum of sales for each group.

04. Subqueries

Subqueries are queries nested within another query. They can be used to retrieve data based on the results of another query, perform calculations, or apply filtering conditions. For example:

SELECT *
FROM table
WHERE column IN (SELECT column FROM another_table);

This query retrieves all rows from table where the value of column exists in the result of the subquery.

05. Conditional statements with CASE

The CASE statement in SQL allows you to perform conditional operations and return different values based on specified conditions. It’s useful for data transformations and creating calculated columns. For example:

SELECT column,
CASE
WHEN condition1 THEN 'Value1'
WHEN condition2 THEN 'Value2'
ELSE 'Value3'
END AS new_column
FROM table;

This query creates a new column new_column based on the specified conditions.

06. Removing duplicates with DISTINCT

The DISTINCT keyword in SQL allows you to retrieve only unique values for a specific column or combination of columns. This can be useful when working with datasets that contain duplicate records. For example:

SELECT DISTINCT column1, column2
FROM table;

This query retrieves unique combinations of column1 and column2 from the table.

07. Limiting the number of rows

In some cases, you may want to limit the number of rows returned by a query, especially when working with large datasets. SQL provides the LIMIT or TOP clause to specify the maximum number of rows to retrieve. For example:

SELECT *
FROM table
LIMIT 100;


This query retrieves the first 100 rows from the table.

08. Working with dates and time

SQL has built-in functions to manipulate and extract information from date and time columns. This can be useful for time series analysis or performing calculations based on specific dates. For example:

SELECT column
FROM table
WHERE DATE(column) = '2022-01-01';


This query retrieves rows from the table where the date in column is equal to ‘2022-01-01’.

09. Filtering with NULL values

Dealing with NULL values is a common challenge in data analysis. SQL provides the IS NULL and IS NOT NULL operators to filter rows based on NULL values. For example:

SELECT column1, column2
FROM table
WHERE column1 IS NOT NULL;

This query retrieves rows from the table where column1 is not NULL.

10. Calculating percentages

Calculating percentages is often required in data analysis. SQL allows you to calculate percentages using arithmetic operations and aggregate functions. For example:

SELECT column, (COUNT() / (SELECT COUNT() FROM table)) * 100 AS percentage
FROM table
GROUP BY column;

This query calculates the percentage of each unique value in column relative to the total count of rows in the table.

Author: Srini

Experienced Data Engineer, having skills in PySpark, Databricks, Python SQL, AWS, Linux, and Mainframe