jBPM/RHPAM — Cleaning up of orphaned timer instances

Balakrishnan
3 min readNov 4, 2022

--

From an operations perspective, large number of process instances that were created in a jBPM setup for testing purposes have to be cleaned up completely. However, running the abort operation on a large volume of long running process instances (of the order of 10000s) don’t go through well at times and could result in orphaned timers!.

The orphaned timers typically don’t really affect jBPM engine’s functionality, however, it creates unwarranted load on the kie-server and piles up/bombards the log files with the trace message “ERROR [stderr] (EJB default — 1) java.lang.RuntimeException: No scheduler found for…..” as shown below.

15:37:40,212 ERROR [stderr] (EJB default - 1) java.lang.RuntimeException: No scheduler found for Sample_2.0.0-SNAPSHOT-timerServiceId15:37:40,212 ERROR [stderr] (EJB default - 1)  at deployment.kie-server.war//org.jbpm.persistence.timer.GlobalJpaTimerJobInstance.call(GlobalJpaTimerJobInstance.java:74)15:37:40,213 ERROR [stderr] (EJB default - 1)  at deployment.kie-server.war//org.jbpm.persistence.timer.GlobalJpaTimerJobInstance.call(GlobalJpaTimerJobInstance.java:48)15:37:40,213 ERROR [stderr] (EJB default - 1)  at deployment.kie-server.war//org.jbpm.services.ejb.timer.EJBTimerScheduler.executeTimerJobInstance(EJBTimerScheduler.java:124)15:37:40,213 ERROR [stderr] (EJB default - 1)  at deployment.kie-server.war//org.jbpm.services.ejb.timer.EJBTimerScheduler.transaction(EJBTimerScheduler.java:218)15:37:40,213 ERROR [stderr] (EJB default - 1)  at deployment.kie-server.war//org.jbpm.services.ejb.timer.EJBTimerScheduler.executeTimerJob(EJBTimerScheduler.java:113)15:37:40,213 ERROR [stderr] (EJB default - 1)  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method).....
....
..
15:37:40,220 WARN [org.jbpm.services.ejb.timer.EJBTimerScheduler] (EJB default - 1) Execution of time failed Interval Trigger failed. Skipping GlobalJpaTimerJobInstance [timerServiceId=Sample_2.0.0-SNAPSHOT-timerServiceId, getJobHandle()=EjbGlobalJobHandle [uuid=10-5-1]]

Given that the default overdue wait time is 20 seconds, this trace would get continuously logged for every orphaned timer and the log files could grow exponentially (of the order of Gigabytes) very quickly when large number of orphaned timers are left out.

Here’s a JIRA https://issues.redhat.com/browse/JBPM-10087 that covers a similar issue around orphaned timers but for this to be usable, the engine has to updated to a later release. Also, this fix covers primarily the database backed timer data-store and it’s unclear if this covers the setups with “file based” (i.e. the default) timer data-store.

So, what’s the way out?

Here’s a hack that could be used to clean up the orphaned timers without doing any changes/upgrades to the engine.

Getting the orphaned timer references

First, the list of all such orphaned timers have to be obtained. This could be easily retrieved from the server.log file by looking for the pattern `uuid=[….]`. For example, the log shown above has the line that contains the reference to a JobHandle uuid like this.

15:37:40,220 WARN  [org.jbpm.services.ejb.timer.EJBTimerScheduler] (EJB default - 1) Execution of time failed Interval Trigger failed. Skipping GlobalJpaTimerJobInstance [timerServiceId=Sample_2.0.0-SNAPSHOT-timerServiceId, getJobHandle()=EjbGlobalJobHandle [uuid=10-5-1]]

To extract the set of all uuids from the server.log file, the following shell command should help:

$ grep -Eo "(uuid=).*(]])" server.log | sed 's/uuid=//' | sed 's/]]//' > orphaned_uuid.log && sort orphaned_uuid.log | uniq# example output  366337-386961-2 
372129-390378-2
372129-390378-3
372129-390378-4
461361-474848-2
461361-474848-3
461461-474956-2
461461-474956-3
562966-548963-2
562978-548975-2

Removing the orphaned timer instances

The next step would be to get hold of jBPM’s internal EJBTimerScheduler and feed the identified timer references to it to do the clean up. The Java class given below does exactly that and this class could be included in a jBPM project as a DataObject class.

Disclaimer: Use this code at your own risk.

The code does the following:

  1. Gets the singleton instance of EJBTimerScheduler object from the jBPM runtime
  2. Invokes getTimerByName method in EJBTimerScheduler to get the TimerJobInstance object by passing the uuid
  3. Invoke getJobHandle method on TimerJobInstance object to get the JobHandle object
  4. Pass the JobHandle object to the removeJob method in the EJBTimerScheduler object

With this in place, a jBPM process with a script task that invokes the aforementioned implementation could be created (as shown below) and fed with the list of timer references (uuid) to perform the cleanup.

Happy cleaning up of orphaned timers!.

--

--

Balakrishnan
Balakrishnan

Written by Balakrishnan

Principal Architect at Red Hat

No responses yet