Tuesday, April 18, 2006

Distributed Atomic Transactions and Heuristic Outcomes

I've started working on my master thesis. To those of you who don't know, it is going to be about "Distributed Transactions in SOA". Pretty challenging and very interesting! There is still not much information about this, but in general the topic for distributed transactions has been around for like 20 years now. That said, there is a substantial research behind, but what is missing right now is the "glue", which will bring that knowledge to the SOA world.

I am planning to implement two different concurrency protocols - WS-AtomicTransation and WS-BusinessActivity. Nonetheless, WS-Coordination is the coordination protocol used in both of these concurrent protocols, so I have implement it as a pre-requisite. Well these protocols are still a work in progress by OASIS Web Services Transactions group (link).

Starting with WS-AtomicTransactions, I ended up on a problem, which occurred to be well-known in the distributed transactions and concurrency control namespace. It is called "heuristic outcomes" and it about the two-phase commit (2PC) protocol. Let's consider that after the first phase (the preparation phase), all participants reply with PREPARE/COMMIT message. Accordingly, the coordinator decides to commit the whole transaction and thus starts sending COMMIT message to all of the participants. But .. suddenly after he has sent the COMMIT message to 3 of the participants, an unexpected network outage occurs. So the remaining let's say 2 participants are waiting for their COMMIT message, but it doesn't arrive. To summarize - we have 3 participants which have already committed the transaction and another 2 which are wondering what to do (from here comes the term "heuristic outcomes", because the final decision of these 2 will make is somehow heuristic). Well, so far I haven't seen a resolution for this issue - it is rather relied upon that the logging system will be pretty sophisticated so that the administrator of your system will see this non-conformity and will manually "compensate" the COMMIT changes of the 3 participants and thus bring the whole system back in a consistent state.

3 comments:

Anonymous said...

Possioble solution for automatic resolution of "Heuristic Outcomes" problem: During the COMMIT phase, each participan sends message to others for successfull receiving of COMMIT message.
In the provided example if every participant receives notification messages from the other 4 participants, then the transaction is commited for the current node. If current node can not receive 4 notifications from the other participants, then transaction is aborted after some period of time. The disadvantage of the above solution is network latency: if some of the participans receives lately notification from the other participants, only current node will reject the transaction.

givanov said...

Hi Stephan,

The problem with your solution is that no participant knows about the others - that's why you have a coordinator in the two phase commit protocol (2PC).

However, one possible solution to this issue (but still not a complete solution) is a three-phase commit protocol. As the name suggests, after the second COMMIT phase, the coordinator enters into a third phase, where he sends a ISCOMMITTED(or ISABORTED respectively) notification to all of the participants and in case some of them are not committed, they are now able to accomplish the desired result. Now, this is still not a
"one shot" solution to the "heuristics outcomes", because as you may see, what happens if during the third phase some of the transaction participants still don't receive the ISCOMMITTED message? Probably end up with four-phase commit protocol? :-)

However, thanks for posting and sharing your opinion.

Martin Kulov said...

It will be very interesing work. I hope you can work it out.