A typical wait-free/lock-free doubly-linked list (or dequeue) uses a DCAS or CAS2 (double-compare-and-swap).
In general, just working with atomic collections opens up some major challenges... Take a look at existing implementations, the ABA problem, hazard pointers, memory barriers, and threads discussing CAS, CMPXCHG, DCAS, CAS2, etc. Because there is not always support for DCAS/CAS2 on a particular embedded architecture, it would be nice to have an "atomic API" that falls back to CAS for the collection implementation.
Is this possible?
Libraries that currently employ atomics layer the assembly ISA so that all the assembly implemented behind a few C routines that can effectively be swapped out for different architectures. We've been working towards a non-traditional atomic API that enables intrinsics like DCAS to fallback to CAS without the need to change the data structure implementation. For now though, I'll just focus on why atomic collections are powerful.
A general API for a doubly linked-list in our framework would accept an arbitrary "element type". The C API might look something like this:
- void List_Create(list_t* pList);
- void List_AddHead(list_t* pList, elem_t* pElem);
- void List_AddTail(list_t* pList, elem_t* pElem);
- void List_AddAfter(list_t* pList, elem_t* pAfterElem, elem_t* pElem);
- elem_t* List_RemoveHead(list_t* pList);
- elem_t* List_RemoveTail(list_t* pList);
- void List_Remove(list_t* pList, elem_t* pElem);
- void List_Destroy(list_t* pList);
So, in a typical implementation, if the main execution context started List_RemoveHead, but could be interrupted by an interrupt that uses List_AddTail, then we would want to protect the shared list by either:
1. disabling interrupts around the critical sections of the Add/Remove code
2. use some sort of primitive lock or mutex to similarly protect these sections
...otherwise, the list pointers would get corrupted by arbitrary pre-emption of the main execution context.
Both of these above solutions are really undesirable from a performance perspective. As old as the "disable-all-interrupts" technique is, we *still* see it used quite frequently inside of client libraries, kernels, and sample software from various embedded vendors. This certainly threatens the interrupt's ability to service in a timely manner, but of course it can be tuned for the specific real-time application and not matter. It's certainly not a safe generalizable solution.
The use of mutex locks typically infers the side effect of causing a priority inversion for the interrupt if it attempts to List_AddTail when a lower-priority main context has already acquired the lock. These may be "tunable" and not critical on a single-microprocessor system, but as the number of cores increases, these design choices impact performance significantly.
One last performance point: The trivial case of locking the entire list_t* structure for access to any element is common for STL. This means that we cannot leverage concurrency when that two different elements have no pointers in common. We would cumbersomely lock the entire structure in every case.
When we introduce atomic algorithms, we resolve many of these issues:
- We NEVER disable interrupts
- Locks are NOT held on the structure, so..
- We NEVER priority invert or force a context-switch to modify the data structure
- Although not perfect, we can calculate a worst-case (wait-free) time it will take an interrupt to execute it's List_AddTail call, thereby enabling hard real-time constraints on the software, thus..
- The highest priority interrupt accessing the common structure always "wins" (gains access to the data structure without a traditional "lock"), and..
- Multiple cores (or threads) can modify different parts of the data structure concurrently when there are no common pointers between the elements involved