Data Warehouse Maintenance

This section discusses data warehouse maintenance for Maconomy.

Data warehouse maintenance includes:

  • Creating ETL jobs
  • Scheduling loads
  • Processing manual loads

Top-Level Jobs for Maintaining the Data Warehouse

The following are top-level jobs for maintaining the data warehouse. The first two jobs are for loading data. The third is for upgrading. Once a data warehouse is successfully created, you want to load data regularly, such as once a day or once a week, using Incremental Load. Once a month or so, run Full Load to get any missing dependencies corrected. Run Upgrade when you install service packs or new versions.

Use the following jobs for maintaining the data warehouse once it has been created:

Full Load This job resets the fact tables and loads all data in again. Over time the data in incrementally loaded fact tables may become out of sync with some data in their dimension tables for certain scenarios. For example, a record in a dimension table may change without resulting in corresponding fact records to be adjusted.

Therefore, it is recommended to do a full load, for example, once a month.

Incremental Load This job loads data that have changed, been inserted or deleted, such as once the data warehouse has been created, new or modified data are retrieved by running this job.

It is recommended to schedule this job to run regularly.

Upgrade This job upgrades the data warehouse.

Scheduling Loads

For periodic updates of the BPM DWH, you must use incremental loads. The Kitchen tool, which is part of Pentaho Data Integration, allows you to launch jobs, such as Incremental Load, from the command line.

Note: For more information, refer to the documentation on Pentaho Data Integrator.

The call may be displayed as follows:

Kitchen.bat /rep: Pentaho repository /job: "Incremental Load" /dir:/ /user:admin /pass:admin /level:Basic
Note: Periodic updates are accomplished by writing a batch script that launches Kitchen and then schedules the launch of the script.

Start a text editor, such as Notepad, paste the following text, and then save it as updateDWH.bat.

@echo off
rem remove '-' from the date
for /F tokens=1-4 delims=- %%i in ('date/t') do set
filedate=%%i%%j%%k
rem remove ':' from the date
for /F tokens=1-4 delims=: %%i in ('time/t') do set
filetime=%%i%%j%%k
rem set the log filename with date and time
set
logfilename=C:\Kettle-2[1].3.0\MyLogFiles\Log_%filedate%_%filetime%.log
call Kitchen.bat /rep: Pentaho repository /job: "Incremental
Load" /dir:/ /user:admin /pass:admin /level:Basic >
%logfilename%
echo Load finished. See the logfile %logfilename%

To schedule loading of data through a Scheduled Task , follow these steps:

  1. Create a folder named MyLogFiles in the Kettle folder.
  2. Open a command prompt and make the Kettle folder the current directory.
  3. To test the script, write updateDWH.bat. This may take some time, since the script loads the data warehouse.
  4. Open the control panel and then open Scheduled Tasks.
  5. Click Add Scheduled Task, browse to select the script updateDWH.bat, and specify how often to perform the task (for example, daily or weekly).
  6. Go through the rest of the wizard. You must enter the password of the user for whom the task is scheduled.

Process Manual Loads

To process manual updates of data warehouse, follow these steps:

  1. On the repository, start Spoon.
  2. Log in as admin/admin.
  3. Open the Incremental Load job.
  4. Run the job (F9).
Note: The tab Log view should display Success in the result column. If you cannot open the repository, it may be because a firewall is blocking the access. Deactivate the firewall or consult your network administrator

.