Monday, March 12, 2012

Concept : suspending triggers automatically when target system is down

With this question we came up with a concept to automatically detect if there is one of the target systems down and then suspend the triggers used for this connection. 
Once the system detects that this target system is back online, we will enable the triggers again and the normal process can continue.

 Why did we start this concept? 
We’ve seen that we cannot guarantee the 24/7 uptime of the target systems and we wanted to have some dynamic tool which will take care of these downtimes (scheduled or not) so we shouldn’t have to worry about any failed transactions and resubmit them. This in some cases can be a hard task if there are thousand failed messages.

Concept  requirements:

1. Database

We need to identify the relationship between the trigger and the connections. Why? One connection can be used in different packages and also one package can have multiple triggers pointing to different target systems.
We will store this data in a database which can be easily implemented and cached as well.
  • Info required for the database
  • Full path name of the trigger
  • Package name of the trigger
  • Type of connection (e.g. : JDBC, SAP,…)
  • Full path name of the connection
  • identifier to make it unique if we have multiple target systems for 1 trigger. So we can identify all triggers to one target system.
2. Monitoring Job

We also need to have a job running every x time to check if all connections are still active and responding. We will create a dummy query per type of connection so we can see if the target system is still responding, if not there is an issue and we should not send any data towards that IZ.
Of course we also need to build in some security checks like, what if there was a network blips why the dummy query didn’t ran successfully. So we will have to execute this a 2nd time to make sure that there is definitely an issue, but we have to delay this request with some time (e.g. one minute) to know there are no one time issues.
Let’s say we’ve detected a target system which is down. We now have to start another thread which will suspend the triggers and connection for that target system. Why disabling the connection? It’s not required, but easier to monitor if a connection has been disabled. As in a good environment we won’t have any disabled connections.

3. Recycle Job

This is the job which will check if the target system is back online and responding. In this we can try to enable the connection again. If this doesn’t work that means the target system is still not responsive. If the connection get’s enabled again we can enable the triggers again and resume normal processing. If not, re-execute this job again. The purpose of this job is to check every X time if the connection is back online. Take in mind we will not execute this job every seconds which will be an overkill of the machines. But we can let it wait one minute and then try again.

4. Queue Monitor

If we suspend processing of documents they will remain in the broker, so we will have to monitor the broker so it will not grow to large and even in the worst case the broker can crash.
We can use Optimize for that of develop custom code to check the total size of the broker and alert if it reaches a predefined limit. We could even make it “smart” and release the trigger that is causing the broker to pill up and once the broker is back to a “safe” size, we can see if the target is back online or not, if not we can suspend this trigger again.

Now with this information in mind we can develop some concept. Will keep you posted on the progress and post some code samples as well.

Author : Jeroen W.

No comments:

Post a Comment