Data Warehouse Maintenance
This section discusses data warehouse maintenance for Maconomy.
Data warehouse maintenance includes:
- Creating ETL jobs
- Scheduling loads
- Processing manual loads
Top-Level Jobs for Maintaining the Data Warehouse
The following are top-level jobs for maintaining the data warehouse. The first two jobs are for loading data. The third is for upgrading. Once a data warehouse is successfully created, you want to load data regularly, such as once a day or once a week, using Incremental Load. Once a month or so, run Full Load to get any missing dependencies corrected. Run Upgrade when you install service packs or new versions.
Use the following jobs for maintaining the data warehouse once it has been created:
Full Load This job resets the fact tables and loads all data in again. Over time the data in incrementally loaded fact tables may become out of sync with some data in their dimension tables for certain scenarios. For example, a record in a dimension table may change without resulting in corresponding fact records to be adjusted.
Therefore, it is recommended to do a full load, for example, once a month.
Incremental Load This job loads data that have changed, been inserted or deleted, such as once the data warehouse has been created, new or modified data are retrieved by running this job.
It is recommended to schedule this job to run regularly.
Upgrade This job upgrades the data warehouse.
Scheduling Loads
For periodic updates of the BPM DWH, you must use incremental loads. The Kitchen tool, which is part of Pentaho Data Integration, allows you to launch jobs, such as Incremental Load, from the command line.
The call may be displayed as follows:
Kitchen.bat /rep: Pentaho repository /job: "Incremental Load" /dir:/ /user:admin /pass:admin /level:Basic |
Start a text editor, such as Notepad, paste the following text, and then save it as updateDWH.bat.
@echo off |
To schedule loading of data through a Scheduled Task , follow these steps:
- Create a folder named MyLogFiles in the Kettle folder.
- Open a command prompt and make the Kettle folder the current directory.
- To test the script, write updateDWH.bat. This may take some time, since the script loads the data warehouse.
- Open the control panel and then open Scheduled Tasks.
- Click Add Scheduled Task, browse to select the script updateDWH.bat, and specify how often to perform the task (for example, daily or weekly).
- Go through the rest of the wizard. You must enter the password of the user for whom the task is scheduled.
Process Manual Loads
To process manual updates of data warehouse, follow these steps:
- On the repository, start Spoon.
- Log in as admin/admin.
- Open the Incremental Load job.
- Run the job (F9).
.