Hi all,
We have started to develop a project by using SOA architecture. We've created a service layer (REST) for serving business operations that endpoint clients use. This service methods can do transaction management so there is no need to worry if the unit of works done properly or not.
But we couldn't figure out how to handle network problems during HTTP requests. What happens if a service call does all the unit of work execution, but response is failed on network due to a timeout? Is there a pattern for handling these network layer errors? Do we need to call a control method after the timeout to understand if the request has been completed successfully on service layer? If that so do we need to write control methods for all the methods in services that we write?
All your ideas are much appreciated, thanks for your time.
I recently had to deal with a similar issue, mobile app that had to connect to an API, but the phone would not have internet access over 80% of the time and once internet access becomes available, it needs to sync.
I built a queue on top of local-storage, just like in Kafka, you have one field that is simply a pointer to where in the queue you are and then the everything else is queue data with a number showing its position.
So a queue would look something like (localstorage needs a key-value pair):
pos=387
q387 => {jsondata}
q388 => {jsondata}
q389 => {jsondata}
As soon as an internet connection is established, pos would be checked, that would return 387, I'd lookup q387 and sync that command with the server, increment pos to 388 and delete q387. By including a GUID + date inside each entity generated in the UI, I can see if a create entity was already synced or not, if it was, I just ignore it and jump to the next one.
If you create an entity and no internet connection is available, keep the create command in the queue, if you update that entity, you can merge the update into the create and merge any subsequent updates with the create action until the create + updates could be synced, but this is the more complicated route to go.
If the create synced and a version number was returned, somebody else updates that created entity, you update it again and your version is different, you have a conflict and dialog should popup allowing you to merge your changes with the changes that was made to the entity by someone else (similar to git - accept theirs, accept mine, merge line by line)
We have a similar requirement for mobile applications that we deploy in the field. Frequently, users are in areas with bad connectivity so we had to think through this very problem. Here's our solution:
When the user submits data remove any option to do anything else until the app receives a response - I know this sounds like a no brainer, but it's one of those things that sometimes can get glossed over. In times when we need high data integrity, sometimes we need to take control away from the user for a bit. This can be in the form of a loading message that covers the screen (this is what we do), disabling buttons, etc.
Immediately present the user with an error that blocks further action - This ensures that they are aware of the error and allows you as a developer to control the flow of the conversation in such an event.
On error, have a high speed process that checks submission - Whenever our app detects an error, there is a follow-up process that gets called immediately that effectively performs a check for the submitted data. The results are communicated to the user. If this fails we write to a local queue for future processing.
Write to a local queue for processing - When all else fails, we write the data to a local queue in an encrypted store on the device to await future processing. The app will do periodic check-ins with the APIs to check for availability and resubmit/perform checks as soon as connectivity is good enough. The queue is available to the user to see and manage and the user receives notifications once records have been confirmed as submitted.
Your process is probably a bit different, but those are some of the steps we've take to mitigate network errors.