Stelo Technical Documents
Stelo Watchdog for Linux
Last Update: 28 May 2024
Product: StarQuest Data Replicator
Version: v6.3x and later
Article ID: SQV00DR045
Abstract
The SQDR (Stelo Data Replicator) application runs as a service - i.e. a daemon process that is started at boot time and typically always running.
The Stelo Watchdog is a Linux daemon that automatically restarts the SQDR service if it stops unexpectedly. This is similar to the IBM Db2 fault monitor daemon (db2fmcd) used to restart Db2 LUW if it should fail.
It will also automatically start the SQDR service after the completion of initial setup tasks such as licensing and control database configuration.
Stelo does not provide a watchdog for Windows as this function is performed by the Recovery feature of Windows Service Manager.
Solution
Starting the Watchdog
In the Docker environment, the watchdog is configured to start automatically along with the SQDR service itself. The watchdog command is located in the script sqdrsvc-start. For user of the SQDR Linux Container, no action is required and this document is primarily informational.
Users who have installed SQDR themselves on a Linux system may need to configure their environment to start the watchdog command at system startup.
Usage
$ /opt/StarQuest/sqdr/bin/stelo-watchdog -h
Usage: stelo-watchdog [-c command] [-i interval(in seconds)] [-r restart(on|off)] [-t retry] [-v] [-h | -?]
A typical invocation is
sqdr/bin/stelo-watchdog -c '/opt/StarQuest/sqdr/sqdrsvcd.sh start' -i 60
If you specify stelo-watchdog without any parameters, it will use the contents of the configuration file /var/sqdr/stelo-watchdog.conf. Parameters specified on the command line will take precedence over the contents of the configuration file.
- -c: Command to start the SQDR service
- -i: Interval in seconds for checking the SQDR service
- -r: restart: attempt to restart the service (on) or simply log the status (off)
- -t: retry limit. Specify the number of times watchdog will try to restart the service if failed
- -v: display the version of watchdog
- -h | -?: display the usage message.
If the SQDR service has been intentionally stopped by user action, then the watchdog will not attempt to restart it.
To stop the watchdog service, determine its process ID and kill it:
# ps ax | grep stelo-watchdog
# kill watchdog-PID
Configuration File
The configuration file is located at /var/sqdr/stelo-watchdog.conf
example:
Content: {"interval": 60, "command": "/opt/StarQuest/sqdr/sqdrsvcd.sh start", "restart": true, "retry": 10}
If the configuration file is modified, stelo-watchdog will reload the new values automatically.
If stelo-watchdog is invoked with parameters, an existing watchdog service will reload with those options.
Log File and Status Codes
Log files are located in folder "/var/sqdr/watchdog/".
Maximum log size is 10Mb. Logs are kept in 7 days.
User can monitor the log with $ tail -f /var/sqdr/watchdog/stelo-watchdog.INFO
The status codes in the log file are defined as:
- 1: Running. The SQDR service is running.
- -1 Unexpected: the SQDR service has stopped unexpected i.e. the process is not running, but /var/tmp/sqdrsvc.pid exists. The watchdog will restart the service unless it was invoked with -r off (monitor only).
- 0 Stopped. The SQDR service was stopped by user action; the file /var/tmp/sqdrsvc.pid does not exist. The watchdog will not attempt to restart the SQDR service.
DISCLAIMER
The information in technical documents comes without any warranty or applicability for a specific purpose. The author(s) or distributor(s) will not accept responsibility for any damage incurred directly or indirectly through use of the information contained in these documents. The instructions may need to be modified to be appropriate for the hardware and software that has been installed and configured within a particular organization. The information in technical documents should be considered only as an example and may include information from various sources, including IBM, Microsoft, and other organizations.