Wednesday, May 26, 2010

Atomics and collections in C for embedded systems

I have a lot catching up on many technical topics here at Fuel7...

We have a set of software which provides collections in C.  I found this FOSS software which is kind of similar to what we did: http://code.google.com/p/c-generic-library/

There are a number of notable differences in our implementations:
  • We have implemented some of the more complex data structures like left-leaning RB trees, etc.
  • We have a more generalized concept of "elements" with a collection.  Element comparison, hashing, keying, destruction, etc. is handled through collection callback functions.  In the case of search on a tuple, an element structure contains both the key and the value.
  • Our model does not rely on heap allocation/de-allocation.  Most implementations use a "static memory model."  This is user-specified so the elements can be dynamically or statically allocated.
  • "Elements" can efficiently be moved from one collection to another.
  • We have generalized iterators.
  • The collections support thread concurrency.

I think we've solved the iteration w/ concurrency in a unique way as well.  When using an iterator on a collection, the iterator is thread-safe on the element it's cursor points to.  This means there is a callback for "destruction" or destroying an element.  All other software follows a reference count/release policy.

The goals for this software have already been met.  These collections provide the foundation for a resource constrained, high performance ANSI C framework that we use to bootstrap work for our clients.  I have not yet decided to open source the entire library, but it's always on the back of my mind.

The topic I'm most interested in is thread concurrency.  One of the most critical limiting factors in performance scalability w/ respect to multi-core is the use of thread locking.  Concurrency w/ synchronization kills performance. (interesting & relevant article from Nvidia earlier this month!) The main idea behind using atomics is that they enable non-blocking synchronization.  We avoid all kinds of locking mechanisms at a per-element level meaning:
  1. there is improved performance because priority inversion does not occur
  2. concurrent (other-core) access to a different element in the same collection is uninhibited

http://www.audiomulch.com/~rossb/code/lockfree/ has a pretty good set of open and commercial solutions on the market.  Some of the challenges here are problems that have been solved well by other people,we need to take a breadth-first approach to understand what techniques are currently employed.  There are some higher-level libraries like boost & tbb that put these techniques to good use, but they are tied to certain data structures (Ex: heap) the C++ language semantics with is not as portable as C, or are tied to a specific processor architecture.  We've been working on atomic implementations on the higher level data structures for some time and are closing in on a generalizable solution that can be integrated into our software framework and fall back to a "thread model" when the intrinsics are not available.

This has enabled lock-free scheduling, algorithm partitioning, task dispatching, high-performance network servers, concurrent heap allocation, among other use cases at Fuel7.

No comments: