For many embedded software engineers, the watchdog timer is something that is designed in at the end. The thought is you just turn it on prior to shipping and “She’ll be right”. But this is a terrible travesty. For those who do not know what a watchdog timer (WDT) is, it is an internal part of many microprocessors that perform a safety function. If the software gets hung up, the WDT takes over and reboots the processor. It is normally implemented as a hardware count down timer. Your software resets the count value and over time the value decreases. If it gets to zero before your software reset the counter (petting the dog) then the WDT reboots the processor.
Often the hasty implementation is simply to activate the WDT during initialization and then in some section of the code that gets called fairly often you pet the dog. For some simple applications this might be enough, but if the processor runs an RTOS, it often is not.
Assuming an application with several threads, it is not inconceivable that one thread will get hung up and not perform its duties. But the WDT will not know this as a different thread is responsible for petting the dog. So what is one to do?
My solution is to run a separate thread devoted to the WDT. I call this module a Guardian Angel. This thread initializes the WDT and then in a loop waits a set time. In my application it wakes up each second. It then looks at the loop counters of all the other threads. It determines how many iteration each thread has run since the last time it polled. It compares these against a minimum expected value and if anyone is not in range it fails to pet the dog.
Now if anyone thread goes out to lunch, the Guardian Angel will sense it and reboot the system.
One does need to be careful in what threads you monitor. Threads that run on a non regular basis (like a user command interface) will need to be monitored in other ways if at all.
My implementation encapsulates much of the mess. When a thread is created, the thread creator can call a registration routine that provides the Guardian Angel the address of the loop counter, a unique ID for the thread, and the expected minimum number of loops in the polling period. After this, the only thing the thread needs to do it to increment its loop counter.