Pages

Wednesday, November 24, 2010

Polymorphic Data Structures Using C macros

Have you seen kernel source files, especially header files or any device driver implementation? They all make a heavy use of C macros. I always wondered why would anyone use C macros so much. Macros are not debug-friendly; you cannot set a breakpoint inside a multiline macros. I used to be afraid of macros until i realized during my kernel project that how benign macros could be.
For the record, C macros can be used for
  • defining "constants"
  • true inline functions with no type safety
  • enhancing readability of source code
  • writing platform independent code with help of conditional compilation
But there is more to it that can take advantage of the fact that C preprocessor runs before the compiler.
One of the more frustrating aspects of non-Object-Oriented languages like C is the apparent lack of re-usable code. For example, a primitive data manipulation structure (like a tree, queue, or list) that has been implemented in one structure cannot manipulate another structure without some workarounds. There are mainly two frequently used workarounds -
  • Copy &Paste - This means we copy the algorithm for one structure and by changing name create another data manipulation structure. Though it sounds time efficient method, maintaining different copies of same algorithm is pain and moreover if there exists any bug in algorithm, it's propagated to all copies.
  • Generalized data structures - These structures which accept only void * for data are frequently used but they come with a price too. The use of void * weakens the limited type safety of C language and allows for bugs to creep into the code. It also suffers from ppor performance because the system has to do twice as many memory references for both data and structural links.
None of the above solutions are as promising  as true object-oriented programming. But there is a way to achieve both efficiency and time safety by using making use of macros! By leveraging C preprocessor's text manipulation one can construct the C equivalent of a polymorphic data structure. Using this technique we can construct a reliable toolbox of primitive data structures that can be used in any C program, since they take advantage of C syntax and not the specifics of any particular implementation.

C preprocessor only does text substitution and does not care of the meaning of the strings. Let's take an example.
#define QUEUE(Q_TYPE, DATA_TYPE) \
struct Q_TYPE{ \
struct Q_TYPE *left; \
struct Q_TYPE*right; \
DATA_TYPE data; \
}
QUEUE(queue_t, int)  myQueue;
The above code generates a unique structure where only allowed data type is int. Hence, we get full advantage of C type safety and we can create structures for different data types on the fly. Sounds familiar? C++ templates? Then, we are on the same page.

The main power comes from the ability to generate algorithmic code which can work on these polymorphic data structure again using macros. Let's see some code -
I define a macro Q_NEW_HEAD(Q_HEAD_TYPE, Q_ELEM_TYPE) which generates a new structure of type Q_HEAD_TYPE representing the head of a queue of elements of type Q_ELEM_TYPE.

#define Q_NEW_HEAD(Q_HEAD_TYPE, Q_ELEM_TYPE) \
 typedef struct { \
  Q_ELEM_TYPE *front; \
  Q_ELEM_TYPE *tail; \
  int size;  \
 }Q_HEAD_TYPE
But I need a flexibility in choosing the link name for my structure. Hence, I define one more macro

#define Q_NEW_LINK(Q_ELEM_TYPE) struct Q_ELEM_TYPE*

This is how I can use these macros in my C program -

Q_NEW_HEAD(int_list_head_t, int_list_t);

typedef struct int_list_t{
  Q_NEW_LINK(int_list_t) myLink;
  int data;
} int_list_t;

Iterating over a linked list is the most common code, here is a syntactic sugar in the form of macros -

#define Q_FOREACH(CURRENT_ELEM,Q_HEAD,LINK_NAME)   \
 for (CURRENT_ELEM = (Q_HEAD)->front; CURRENT_ELEM; CURRENT_ELEM = CURRENT_ELEM->LINK_NAME)

Similarly we can define macros for
#define Q_INSERT_TAIL(Q_HEAD, Q_ELEM, LINK_NAME)
#define Q_INSERT_FRONT(Q_HEAD, Q_ELEM, LINK_NAME)
#define Q_INSERT_AFTER(Q_HEAD,Q_ELEM_TYPE,Q_INQ,Q_TOINSERT,LINK_NAME)
#define Q_REMOVE(Q_HEAD,Q_ELEM,LINK_NAME)
and so on ...
The program becomes much more readable and manageable. 

But, make sure to test all macro definitions for all corner cases, It's tricky to write and debug such long complicated macros. Here are some common pitfalls - well documented. Looking at the generated code may help sometimes - 

$ gcc -E my_macro_loaded_source.c