20 ETL Interview Questions for Your Next Job

1) What is meant by ETL?
The overall data acquisition process, called ETL (extraction, transformation,and loading), is generally grouped into three main components :
-Extraction: Involves obtaining the required data from the various sources.

-Transformation: Source data undergoes a number of operations that prepare it for import into the data warehouse (target database). To perform this task, integration and transformation programs are used which can reformat, recalculate, modify structure and data elements, and add lime elements. They can also perform calculations, summarization, de-normal-ization, etc.

-Loading: Involves physically placing extracted and transformed data in the target database. The initial loading involves a massive data import into the data warehouse. Subsequently, an extraction procedure periodically loads fresh data based on business rules and a pre–determined frequency.

2) What are the types of dimensional schema?
There are two types of dimensional schema :
– Star schema
– Snowflake schema

3) What is star schema?
Star schema, there is only one central fact table, and a set of dimension tables, one for each dimension. In star schema, each dimension is represented by only one table, and each table contains a set of attributes.

4) What is snowflake schema?
-A snowflake schema avoids the redundancy of star schemas by normalizing the dimension tables. Therefore, a dimension is represented by several tables related by referential integrity constraints.

-Snowflake schema means a central located fact table is surrounded by denormalized dimension tables again these denormalized dimension table is splited one or more normalized dimension tables.

5) What is meant by starflake schema?
A starflake schema is a combination of the star and the snowflake schemas where some dimensions are normalized while others are not.

6) What is Operational Data Store?
Operational Data Store (ODS) is a hybrid data architecture to cover the requirements for both analytical and operational tasks.

Also Read: Top 20 ultimate ETL Questions really good for interviews

7) What is the difference between star schema and snowflake schema?
The star schema consists of a fact table with a single table for each dimension. The snowflake schema is a variation on the star schema where the dimensional tables from a star schema are organized into a hierarchy by normalizing them. A fact constellation is a set of fact tables that share some dimension tables.

8) What is data staging?
Data staging is the process of transferring the data from the data sources (operational systems) into the target database of the data warehouse.

9) What is a session?
A session is a set of instructions that describes how and when to move data from sources to targets.

You May Like: ETL Testing from beginner to expert

10) What is mapping?
A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation.

11) What is Datadriven?
The informatica server follows instructions coded into update strategy transformations with in the session maping determine how to flag records for insert,update,delete or reject. If you do not choose data driven option setting, the informatica server ignores all update strategy transformations in the mapping.

12) What are the three major types of metadata in a data warehouse?
Metadata in a data warehouse fall into three major categories :

  • Operational Metadata
  • Extraction and Transformation Metadata
  • End–User Metadata

13) What is OLAP?

  • Allow users to run complex dimensional queries
  • Enable users to generate canned queries.
  • Two categories of online analytical processing are multidimensional online analytical processing (MOLAP) and relational online analytical processing (ROLAP).

14) What is meant by geographic information system(GIS)?
A software system that allows users to define, create, maintain, and control access to a geographic database.

15) What is dimension table?
A relational table that contains dimension data.

16) What is the diffrence between OLTP and OLAP?
The main differences between OLTP and OLAP are:

  • OLTP systems are for doing clerical/operational processing of data whereas OLAP systems are for carrying out analytical processing of the data.
  • OLTP systems look at data in one dimension; whereas in OLAP systems, data can be viewed in different dimensions and hence interesting business intelligence can be extracted from the data.
  • Operational personnel of an organization use the OLTP systems whereas
    management uses OLAP systems, though operational personnel may also use portions of OLAP system.
  • OLTP systems contain the current data as well as the details of the transactions. OLAP systems contain historical data, and also data in summarized form.
  • OLTP database size is smaller as compared to OLAP systems. If the OLTP database occupies Gigabytes (GB) of storage space, OLAP database occupies Terabytes (TB) of storage space.

More: Automating ETL: complete five projects

17) What is DTM?
DTM transform data received from reader buffer and its moves transformation to transformation on row by row basis and it uses transformation caches when necessary.

18) What is meant by spatial data warehouse?
A data warehouse that manipulates spatial data, thus allowing spatial analysis. This is to be contrasted with conventional and temporal data warehouses.

19) What is a Batch?
Batches provide a way to group sessions for either serial or parallel execution by the Informatica Server.

20) What are the types of batch?
There are two types of batches are :

  • Sequential batch: Runs sessions one after the other.
  • Concurrent batch: Runs sessions at the same time.

Author: Srini

Experienced software developer. Skills in Development, Coding, Testing and Debugging. Good Data analytic skills (Data Warehousing and BI). Also skills in Mainframe.