Here is my earlier post on Informatica transformations. You can find the list of interview questions for Data warehousing job.
1). What is data warehouse?
A data warehouse is the electronic storage of an Organization’s historical data for the purpose of analysis and reporting. According to Kimpball, a data warehouse should be subject-oriented, non-volatile, integrated, and time-variant.
Note here, Non-volatile means that the data once loaded in the warehouse will not get deleted later. Time-variant means the data will change with respect to time.
The above definition of data warehousing is typically considered as a “classical” definition. However, if you are interested, you may want to read the article – What is a data warehouse – A 101 guide to modern data warehousing – which opens up a broader definition of data warehousing.
2). What are the benefits of data warehouse?
A data warehouse helps to integrate data (see Data integration) and store them historically so that we can analyze different aspects of the business including, performance analysis, trend, prediction, etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.
3). Why Data Warehouse is used?
For a long time in the past and also even today, Data warehouses are built to facilitate reporting on different key business processes of an organization, known as KPI. Data warehouses also help to integrate data from different sources and show a single-point-of-truth values about the business measures.
A data warehouse can be further used for data mining which helps trend prediction, forecasts, pattern recognition, etc. Check this article to know more about data mining
4). What is the difference between OLTP and OLAP?
OLTP is a transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.
In a departmental shop, when we pay the prices at the check-out counter, the salesperson at the counter keys-in all the data into a “Point-Of-Sales” machine. That data is transaction data and the related system is an OLTP system.
On the other hand, the manager of the store might want to view a report on out-of-stock materials, so that he can place a purchase order for them. Such a report will come out from the OLAP system
5). What is data mart?
Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing, etc. stored in a data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.
6). What is ER model?
The ER model is an entity-relationship model that is designed with the goal of normalizing the data.
7). What is Dimensional Modeling?
The Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualify the data. The goal of the Dimensional model is not to achieve a high degree of normalization but to facilitate easy and faster data retrieval.
8). What is dimension?
A dimension is something that qualifies as a quantity (measure). If I just say… “20kg”, it does not mean anything. But 20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date), which gives a meaningful sense. These products, customers,s, and dates are some dimension that qualified the measure. Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.