We have a distributed system where multiple services interact with each other, and we often run into inconsistent data states due to one or more services failing. What sort of methodologies can we use to identify such inconsistencies (e.g. anomaly detection systems)
In my experience you don't. Instead what you do is:<p>1. Make sure your services have a retry mechanism so even if it fails it keeps retrying.<p>2. Have the services write the data more than once. Have a job that runs them often.