PySpark

Infogain: 5 Tricky Data Engineer Interview Questions

by Srini
Posted on May 1, 2024

The Infogain interview Q&A covers PySpark aspects like partitioning, bucketing, reading petabyte-size files, Delta Lake, & schema-less files.

PySpark

PySpark Interview Questions: Skip the First 2 Rows and Count the Nulls of Each Column

by Srini
Posted on April 23, 2024

This PySpark guide covers skipping rows (beyond header) and counting NULLs for each column of a DataFrame.

PySpark

3 PySpark Functions Frequently Asked in Interviews

by Srini
Posted on April 19, 2024April 19, 2024

The article covers PySpark’s Explode, Collect_list, and Anti_join functions, providing code examples and their respective outputs.

databricks

Databricks Time Travel: Tutorial on Recovering Delta Tables

by Srini
Posted on April 18, 2024April 22, 2024

If you mistakenly alter a Delta Lake table in Databricks, use time travel to retrieve a previous version.

Python

3 Puzzling Python String Programs That Look Easy

by Srini
Posted on April 16, 2024April 16, 2024

The examples showcase Python operations, including string manipulation, list sorting, and character counting, providing practical techniques for beginners.

Row Vs. Range SQL Window Functions: Top Differences

by Srini
Posted on April 11, 2024April 12, 2024

ROW and RANGE window functions in SQL operate within a window, but differ in row versus value consideration.

databricks

External Vs. Managed Tables in Databricks: Top Differences

by Srini
Posted on April 11, 2024April 11, 2024

In Databricks, managed tables store and manages both data and metadata, while external tables store data externally and manage only metadata.

PySpark

Writing Dataframes into Delta Tables in PySpark: 6 Top Benefits

by Srini
Posted on April 10, 2024April 10, 2024

Writing DataFrames to Delta tables offers data persistence, optimized performance, schema enforcement, transactional consistency, and integration with data systems.

PySpark

PySpark DBUtils: How to Use Each Command Effectively

by Srini
Posted on April 8, 2024

PySparkDatabricks Utilities (DBUtils) provides functionalities like accessing DBFS files, managing clusters, and widgets.

PySpark

SAL>AVG(SAL) SQL Query: How to Write in PySpark and Pandas

by Srini
Posted on April 6, 2024

The SQL query, Pandas, and PySpark code extract employees earning more than the average salary.

PySpark

How to Write PySpark Code: Employee Sal> Manager

by Srini
Posted on March 31, 2024March 31, 2024

The PySpark code demonstrates two methods to compare employee salaries with their manager’s and retrieve the results.

Srinimf

Tech. Jobs. Biz. Success