DB2 Tutorial For Software Developers (1 of 5)

DB2 tutorials for software developers. Given in the form of questions and answers. You can learn quickly about your project for interviews. Highly useful to mainframe DB2 programmers.

What is Data ?
‘A representation of facts or instruction in a form suitable for communication’ – IBM Dictionary
‘Is a repository for stored data’ – C.J.Date

What is a database system?
An integrated and shared repository for stored data or collection of stored operational data used by application systems of some particular enterprise or ‘Nothing more than a computer-based record keeping system’.

Advantages of DBMS over File Management Systems

Data redundancy
Multiple views
Shared data
Data independence (logical/physical)
Data dictionary
Search versatility
Cost effective
Security & Control
Recovery restart & Backup
Concurrency

TYPES OF DATABASES

Hierarchical Model
Network Model
Relational Model
Object-Oriented Model

HIERARCHICAL

Top down structure resembling an upside-down tree
Parent child relationship
First logical database model
Available on most of the Mainframe computers. Example – IMS

NETWORK

Does not distinguish between parent and child. Any record type can be associated with any number of arbitrary record types
Enhanced to overcome limitations of other models but in reality, there is minimal difference due to frequent enhancements. Example – IDMS

RELATIONAL

Data stored in the form of tables consists of multiple rows and columns.
Examples – DB2, Oracle, Sybase, Ingres etc.

OBJECT -ORIENTED MODEL

Data attributes and methods that operate on those attributes are encapsulated in structures called objects

RELATIONAL DB CONCEPTS
Why Relational ? – Relation is a mathematical term for a table – Hence Relational database ‘is perceived’ by the users as a set of tables.
– All data values are atomic.
– Entries in columns are from the same domain
– Sequence of rows (T-B) is insignificant
– Each row is unique
– Sequence of columns (L-R) is insignificant

Relational Concepts (Terminology)

Relation: A table or File
Tuple: Row contains an entry for each attribute
Attributes: Columns or the characteristics that define the entity
Domain:. A range of values (or Pool)
Entity: Some object about which we wish to store information
Null: Represents an unknown/empty value
Atomic Value: Smallest unit of data; the individual data value
Candidate key: Some attribute (or a set of attributes) that may uniquely identify each row(tuple) in the relation(table)
Primary key: The candidate key that is chosen for primary attributes to uniquely identify each row.
Alternate key: The remaining candidate keys that were not chosen as primary key
Foreign key: An attribute of one relationship that might be a primary key of another relation.

Normalization (1NF – 5NF)
It is done to bring the design of database to a standardized mode or (form)

1NF : All entities must have a unique identifier, or key, that can be composed of one or more attributes. All attributes must be atomic and non repeating.
2NF : Partial functional dependencies removed – all attributes that are not a part of the key must depend on the entire key for that entity.
3NF : Transitive dependencies removed – attributes that are not a part of the key must not depend on any non-key attribute.
4NF : Multi valued dependencies removed
5NF : Remaining anomalies removed

Types of Integrity

Entity Integrity : Is a state Where no column that is part of a primary key can have a null values.
Referential Integrity : Is a state Where every foreign key in the first table must either match a primary key value in the second table or must be wholly null

CODD’s RELATIONAL RULES

All information in a relational database is represented explicitly at the logical level and in exactly one way – by values in tables
Each and every datum(atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of tablename, primary key value, and column name.
Null values are supported for representing missing information in a systematic way irrespective of the datatype.
The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
A relational system may support several languages and various modes of terminal use. However there must be one language whose statements can express all of the following items: (1)data definitions (2)view definitions (3)data manipulation(interactive and by program)(4) integrity constraints (5) authorization(6) transaction boundaries(begin, commit, rollback)
All views are theoretically updatable, are also updatable by the system
The capability of handling a base relation or a derived relation (view) as a single operand applies not only to the retrieval of of data but also to the insertion, updation and deletion of data
Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods
Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.
The data manipulation sublanguage of a relational DBMS must enable application programs and inquiries to remain logically the same whether and whenever data are physically centralized or distributed.
If a relational system has a low-level(single-record-at-a-time)language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher-level relational language(multiple-records-at-a-time)

Example of a Relational Structure
CUSTOMER Places ORDERS
ORDERS Has PRODUCTS
The above relations can be interpreted as follows :

Each order relates to only one customer (one-to-one)
Many orders can contain many products (many-to-many)
A CUSTOMER can place any number of orders (one-to-many)

Entity Relationship Model:

In the above example CUSTOMER, Order & Product are called ENTITIES.
An Entity may transform into table(s).
The unique identity for information stored in an ENTITY is called a PRIMARY KEY. E.g… CUSTOMER-No uniquely identifies each customer

A table essentially consists of
– Attributes, which define the characteristics of the table
– Primary key, which uniquely identifies each row of data stored in a table
– Secondary & Foreign Keys/indexes

Table Definition :

Table ‘CUSTOMER’ –
Attributes – CUST_NO, CUST_NAME,
CUST-LOCATION, CUST_ID, ORDER_NO…
Primary Key – CUSTOMER_NO
Secondary Key – CUST_ID
Foreign-Key – ORDER_NO

The Relationships transform into Foreign Keys. For e.g.. CUSTOMER is related to Orders through ‘ORDER_NO’ which is the Foreign-key in CUSTOMER and Primary key in Order. So basically the relationship ‘Places’ is through the ORDER_NO.
As per the relational integrity the Primary-Key, ORDER_NO, for the table ‘Orders’ can never be Null, while it can be so in the table ‘CUSTOMER’.
Tables exist in Tablespaces. A tablespace can contain one or more tables
Apart from the Primary Key, a table can have many secondary keys/indexes, which exist in Indexspaces.
These tablespaces and indexspaces together exist in a Database
To do transformations as described above we need a tool that will provide a way of creating the tables, manipulate the data present in these, create relationships, indexes, tablespace, indexspace and so on. DB2 provides SQL which performs these functions.

Also read

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.