For COBOL Developers
- The SQLCODE must be checked after every SQL statement. The Declare cursor statement is only a declarative, and it gets no return code from DB2. All other SQL calls get some return code. Return code data from the DB2 database system gets automatically loaded in the SQLCA communications area.
- Every program must include the SQLCA and a DCLGEN for each table being coded against. The DCLGEN is predefined with host variables that match the column definitions. They are used to select data into, insert and update from, and serve as the host variables in any Where clause. If DCLGEN fields are not being used, then any program declaring variables in the code must make sure that the variable being declared exactly matches the definition in DB2. If it doesn’t, then there is a possibility that DB2 may not choose an index to process. For example, if Column1 is defined as an Integer, then the host variable in COBOL should be defined as S9(9) comp.
- Every program must have a consistent DB2 abend routine. For batch programs, it is easiest to have a called program that handles the display of the SQLCA fields and calls the DSNTIAR DB2 routine to display further DB2 messages. For online programs, sometimes it is good to write out the SQLCA and DSNTIAR information to a file or table in order to fall back on errors that occur. The SQLCA contains a lot of information specific to a call that is critical to troubleshooting an error. It is important to write out all the information captured. Make sure that at least the SQLSTATE is displayed, along with the SQLCODE.
- Never code Select * in a program. Only code for the columns needed. If a program needs all the columns, then code each one. This will prevent an abend if a new column is ever added to the table. The fewer columns being brought into the program, the more efficient the processing. (See tuning tip #3 and tuning tip #29 in Chapter 1, “SQL Optimization Top 100+.”) More columns can have an effect on performance due to larger sort sizes, possible index-only processing, and join types. When DB2 looks at which join type is best, part of its analysis is the number of columns from each table being selected.
- Make sure any columns defined as Nullable contain a null indicator host variable as part of the Select, Insert, or Update statements. This is most important in Select statements because DB2 will return an invalid -305 SQLCODE when it returns a column of null to the program and there is no null indicator specified.
These null indicators must be defined in working storage as Pic S9(4) Comp. It is preferable to code the VALUE, COALESCE, or IFNULL SQL scalar function for any nullable columns because the program will not receive null indicators from DB2. This will alleviate -305 SQL errors where a program is not set up to handle the null indicator. It will also spare the program from having to define the null indicators in working storage. For example, Select COALESCE(PK_ID, 0) will return the PK_ID value if there is one, or it will return a zero if it is null. This could also be coded with the VALUE and COALESCE functions. All three would return the same result. The default specified must match the column definition. For example, since PK_ID is numeric, then the default must be a numeric—in this case, zero.
- Any SQL statement that contains one of the following aggregate functions should have a Null-Indicator host variable as part of the select (MIN MAX, AVG, SUM). DB2 will return a null indicator to the program if it finds no data to process these functions, and the COBOL program will have to define a null indicator. If the program is not set up with a null indicator, an invalid -305 SQLCODE is returned. It is preferable to code the VALUE, COALESCE, or IFNULL function to alleviate any null indicator logic. For example:
SELECT IFNULL(AVG(SALARY), 0)
WHERE WORKDEPT = ‘XYZ’
This will either return the average if rows are found or a zero if no rows were met in order to calculate an average.
- Minimize the number of times cursors are opened and closed during execution. If most of the time the open cursor and fetch retrieves only one row, then code a simple Select statement and execute the cursor processing only when a -811(duplicate rows) SQLCODE is returned.
Do not break up processing into multiple cursors unless performance seems to be an issue. If it takes a seven-table join, then code all seven tables in one cursor and let DB2 do the work. When you break it up, the process usually takes longer due to the extra times DB2 is sent SQL statements to process. So break up the join only when all other tuning efforts have been applied. Typically it would be more efficient to execute a seven-table join.
- CASE expressions should always contain an ELSE clause. If none of the conditions in the CASE are met, then DB2 will return a null (via a null indicator) to the program. If the program is not set up to handle a null being returned from the CASE expression, then a -305 SQLCODE is returned, which usually causes the program to abend.
- Always display counts for the number of Selects, Inserts, Updates, Deletes, and Open cursors that have been executed in the program. The overhead in COBOL to define the counters and increment them through the processing is minimal to the overall runtime of the program. Displaying these counts provides invaluable information when problems occur, helping a developer figure out which program to look into. Make sure the counts are displayed on every abend and at the end of processing.
- Always display the values in host variables for a SQL statement that has an invalid SQL return code and the program goes into its abend error routine. Every developer knows how frustrating it is to have a program error out or even abend and not know what values were being processed.
- Watch out for any SQL warnings that may occur in an SQL statement. Most programs seem to ignore warnings that many times help to detect potential problems. There are two indications of a warning message in the SQLCA: One is a positive SQLCODE other than +100; the other is a W in the SQLCA’s SQLWARN0 field. When either of these exists, DB2 is issuing a warning that something worrisome happened on the prior call and that while you may have received data back, it may not be what you expected. When SQLWARN0 is a W, DB2 also provides helpful information about the problem in one or more of the other SQLWARNn fields. Also check warnings on every SQL statement return. For example:
If SQLWARN0 = ‘W’
Display ‘*** Warning error ***’
Display ‘Sqlstate = ‘ Sqlstate
- Take advantage of the SQLERRD (3) out of the SQLCA. The third occurrence of the SQLERRD array is one of the most useful fields in the SQLCA. This field is populated after a successful insert, update, or delete with a count of the number of rows inserted, updated, or deleted. This is not populated when a mass delete with no Where logic is coded or populated due to deletes affected by delete cascade.
- Take advantage of fetching rowsets in your cursor processing. (See tuning tip #46 in Chapter 1.) This should be strictly enforced for large cursors because of the runtime savings.
- Apply all calculations within the COBOL code and then move the value to a host variable. Then reference the host variable in the SQL statement. Keep calculations out of SQL statements whenever possible.
- Hard code any and all values known within an SQL statement. For example, if a program always processes the terminated rows on a table, then use the SQL statement Where Status_Code = ‘T’. This is extremely helpful especially if frequency value statistics are present for the different values of Status_Code in the catalog tables.
For All SQL Developers
- All SQL join statements should have the columns from each table noted with a Correlation ID when referenced in Select, Where, Group By, or Order By clauses. A Correlation ID should be something other than a letter of the alphabet. Use something descriptive so others can understand from which table each column is coming. This makes the join logic more clear and readable.
- Do not apply any SQL scalar functions against columns coded in the Where clause. This is especially important for columns that make up any index for a table. For example, coding Where Integer(CLM_ID) will automatically eliminate the use of the index for CLM_ID. As another example, the following:
WHERE YEAR(HIREDATE) = 2003
should be coded as:
WHERE HIREDATE BETWEEN ‘2003-01-01’ and ‘2003-12-31’ to make it an indexable predicate.
- Check your queries with the DB2 Explain tool. A Plan_Table under your ID will need to be created from the DBAs, or use the Plan_Table defined for theDB2 subsystem you are operating under. For example:
Delete from Plan_Table
Explain Plan Set Queryno = 11 for
SELECT EMPNO, LASTNAME,
WHERE DEPTNO = ?
Select * from Plan_Table
Order by Queryno, Planno, Qblockno, Mixopseq
- Watch out for Order By and Group By statements in queries. Each of these may cause a sort, which requires resource utilization. Code them only if needed. The fewer the columns and rows in a sort, the faster the sort will run, so make sure only the columns needed are coded.
- When coding UNION statements in SQL, start with UNION ALL. By just coding UNION, a sort gets executed to eliminate duplicates, causing more resource utilization. Many times there are not duplicates, so UNION ALL should be the choice that prevents a sort from taking place. Avoid UNIONs if possible. Sometimes the logic can be rewritten using outer joins, case statements, etc.
- Watch out for DISTINCT. This also causes a sort, which requires more runtime. Only code this when absolutely necessary. Many times a rewrite of the statement that can get the same results without the DISTINCT may run more efficiently.(See tuning tip #4 in Chapter 1.)
- Be careful when using the CASE expression as part of the Select statement. This expression can have some considerable overhead during execution. If there are many rows being returned as part of the query, it may help to move that logic as part of your source code after each row is returned. This is especially true if your source is compiled code.
- Do not use Select Count(*) for existence checking. Use this only when you need a total number of rows. It is best to code a Select using the FETCH FIRST 1 ROW ONLY and then check for SQLCODE = 0 or +100.
- Always check the Performance Monitoring and Tuning guide for V9, and the Managing Performance guide for V10 for how to code (or how not to code) predicates to make them indexable and/or stage 1 versus stage 2. (See tuning tip #14 in Chapter 1.) The IBM Data Studio Visual Explain tool will also note any stage 2 predicates.
- Watch out for <> (not equal) predicates. These predicates are non-indexable, but they are stage 1.
- Make sure there is an understanding of inner vs. outer joins. Many times SQL is written with Table1 outer joined to Table2, and then inner joined to Table3. The inner join being coded last can offset the exceptions that took place in the outer join. Many times the three tables could all be coded with inner joins, which would run more efficiently. Outer joins are not inefficient, but if they bring in extra exception rows, and a subsequent inner join then gets rid of those extra rows, it was processing not needed. Also, make sure that if outer joins are coded, the program is set up to handle nulls being returned from the table where the join is not met. The VALUE, COALESCE, or IFNULL function should be used to keep DB2 from trying to send a null indicator back to the program.
- Try to stay away from NOT logic in general. Try to keep predicates positive as much as possible. For example, the following predicate:
WHERE NOT HIREDATE > :WS-DATE
could be recoded as:
Where HIREDATE <= :WS-DATE
- When coding predicates, keep the logic away from the column to make it an indexable predicate. For example:
WHERE SALARY * 1.10 > 100000.00
is a non-indexable predicate and should be coded as:
WHERE SALARY > 100000.00 / 1.1
- When using date-labeled durations (adding or subtracting years/months/days) to a date, it is logically important in which order they are coded and executed. For example, when adding, the order should be years first, then months, then days:
SELECT CURRENT DATE + 2 YEARS + 3 MONTHS + 1 DAY When subtracting, the order should be just the opposite: days first, then months, then years: SELECT CURRENT DATE – 1 DAYS – 3 MONTHS – 2 YEARS This is important because if they are coded in a different order, the results could be incorrect! Results can be different due to date adjustments on the months. For example, subtracting 1 month from March 31 will result in February 28 or 29. 15. If you need to know the last day of a month, use the Last_Day SQL function to get it. For example: SELECT LAST_DAY(CURRENT DATE) INTO :HV1 -- Where HV1 is some Host Variable FROM SYSIBM.SYSDUMMY1
- A more efficient way to get the same result as in #15 above is to use the Set statement.
SET :HV1 = LAST_DAY(CURRENT DATE)
NOTE Use the Set Host Variable assignment over the SYSIBM.SYSDUMMY1 whenever
possible, especially when the statement may get executed hundreds or thousands of times within its runtime.
- Take advantage of the many date functions in SQL instead of programming code to provide the information needed: Year/Month/Day returns only that portion of the date value.
DAYOFWEEK/DAYOFWEEK_ISO returns a number (1–7), depending on whether the week begins on Sunday or Monday. DAYOFWEEK_ISO states Monday as the first day of the week. DAYOFMONTH/DAYOFYEAR returns the specific day number in a month (1–31) or year (1–366). LAST_DAY returns the last day of the month for a specific date. If the date was 10/15/2005, the date returned would be 10/31/2005. NEXT_DAY returns a timestamp representing the first weekday greater than the specified date. The function needs to have the weekday specified. For example: NEXTDAY('01/31/2005', 'MON') returns the date of the next Monday after the date ‘01/31/2005’. DAYS is used to get the days difference between two dates. For example: SELECT DAYS(HIREDATE) - DAYS(BIRTHDATE) returns the number of days difference. WEEK returns a number (1–54) that represents the week of the year. Week 1 is the first week that contains the first day of the year. WEEK_ISO returns a number (1–53) that represents the week of the year. Week 1 is the first week of the year that contains a Thursday, which is equivalent to the first week that contains January 4. CHAR is used to get a date column back in a specific format (USA, ISO or JIS, EUR). Subtracting two dates from each other returns a decimal number that has the number of years, months, and days difference between the dates: SELECT DATE('2010-01-01') - DATE('2007-10-15') FROM SYSIBM.SYSDUMMY1 returns 20217, which means 2 years, 2 months, 17 days. To get just the years difference, use: SELECT YEAR(DATE('2010-01-01') - DATE('2007-10-15') )
- Not Between is non-indexable. For example, the following predicate:
WHERE SALARY NOT BETWEEN 50000.00 and 100000.00
is a non-indexable predicate and should be cod ed as follows:
WHERE SALARY < 50000.00
OR SALARY > 100000.00
- Watch out for the Like predicate. If the Like statement is a Begins With predicate,then that predicate is indexable. If the Like statement is a Contains or Ends With predicate, then it is non-indexable. For example:
WHERE LASTNAME LIKE ‘A%’ – Begins with logic
WHERE LASTNAME LIKE ‘%A%’ – Contains logic
WHERE LASTNAME LIKE ‘%A’ – End with logic
- Code only the columns needed in the Select. Extra columns can cause the optimizer to choose a different access path that may not be the best choice. Extra columns cause sorts to be more expensive and adds to transmission cost. Even one extra column (at times) can cause the optimizer to choose a different access path. Basically, the wider the result set, the more DB2 has to pull and ship.
- Queries and/or cursors that bring back multiple rows in the result set should have For Fetch Only at the end of the query. This tells DB2 that there is no intention of updating any of the rows being fetched. Because of this, DB2 will try to avoid locking the pages and will possibly block the data rows being returned. For Read Only also does the same.
- Code the most restrictive predicates first. This does not mean that this is the exact order in which DB2 will execute the queries. DB2 will always pick stage 1 indexable predicates first, no matter where they are coded. But within these, it is important to use the correct order.
- Rewrite > Any and > All subqueries. For example, recode this:
SELECT EMPNO, LASTNAME
Where Salary > Any
Where Workdept = ‘C11’)
Select Empno, lastname
Where Salary >
Where Workdept = ‘C11’)