-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
246
Nagios script to monitor Fedora can enter a state where false reports are generated thereafter.
We need some modifications to the Nagios script for monitoring Fedora.
The problem has only happened on dev so far but once the script fails it will continue to give false positives. The script needs to have fallback logic added when trying Fedora functions to wrap the retries.
Fedora runs asynchronously and if you try to test after a purge too quickly you can wind up with an item in Fedora's index that is no long in the low level store. The current Nagios script hits Fedora right away and then does six quick retries. We need have a fallback time multiplier for the six retries (for example double the delay for each retry). This issue was improved in later versions of Fedora but since there are very few transactional file stores it was not possible to fix it entirely. We can run an SQL operation to remove the purged item from the index but it is better is we just don't try to test too quickly after a purge (or an ingest).