Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
CDSA – A library of generic data structures and algorithms in ANSI C (github.com/michaeljwelsh)
95 points by Mike_TriHard_Cx on Jan 14, 2018 | hide | past | favorite | 14 comments


A word has been dropped from the title. The title at the site is: "A library of generic intrusive data structures and algorithms in ANSI C" (emphasis added).

For those, like me, who did not know what makes a data structure "intrusive", it means that the things needed for the data structure are stored together with the data.

For example, suppose our application needs to keep track of people. As far as we are concerned, a person looks like this:

  struct person
  {
    int age;
    const char * name;
  };
If we want to make a linked list of people, and do so by adding a link field to our person struct, like this:

  struct person
  {
    person * next_person;
    int age;
    const char * name;
  };
and then make our list by linking our person structures through the next_person field, that would be an intrusive linked list.

A non-intrusive linked list would be to do something like this:

  struct list_element
  {
    list_element * next;
    void * data;
  };
and then make our linked list by making a linked list of list_element structures, with their data pointers pointing to our person structures. If we were using C++, another way to do a non-intrusive linked list of people would be to use std::list<person>.


Yes, that word is extremely important. With that, intrusive data-structures really are great in a lot of cases, it's unfortunate that most languages can't really handle them. That said, for languages with a GC they aren't really as useful from a programming standpoint (They may still be useful from a speed standpoint). Still, they allow you avoid tons of small allocations, as well as avoid the extra 'hop' necessary when you store the node information separate form the data.

Another great part about intrusive data structures is that you can place an object on multiple structures without having to allocate any separate data for them. So to give an example, `std::list<person>` will actually create an 'intrusive' list - it will store the node information along with `person`. However, that `person` is completely tied to the list. If you want to take a `person` off of that list and place it into another one, you can't, you would have to either copy it into a node on the other list, or make the all the lists `std::list<person *>` and then store a pointer to the same person in both lists. With intrusive structures, you can instead use the same set of node information for every list, so you can detach a node from the first list, and then attach it to the second one without having to do any allocations.

As a note, the Linux Kernel makes fairly extensive use of intrusive data-structures, which is an area they particularly shine in since you really want to avoid all those small allocations if you can.


Boost, too, provides a number of intrusive data-structures, if you're in C++ land.

http://www.boost.org/doc/libs/1_66_0/doc/html/intrusive.html

There's also intrusive_ptr, an intrusive reference-counted smart-pointer class template.

http://www.boost.org/doc/libs/1_66_0/libs/smart_ptr/doc/html...


i highly appreciate you explaining what was meant by intrusive as I was wondering what it meant and i was pretty sure I wasn't going to get good results asking Google considering how intrusive tends to map to things of a more illicit nature c:


Neat trick using “list_for_each” to shorten a verbose for loop that’s nearly always the same. Although it did make me do a double take. And since it’s hiding control flow, it would make ‘break’ and ‘continue’ look out of place. Is this a common convention and you just kind of get used to it?


Hidden control flow is why I decided to learn Zig [1]. I was learning Rust, and I intend to keep at it, but I have chosen Zig to be my C replacment because I am personally finding it easier to be more productive. Andrew Kelley's doing a great job with the language!

  [1] - ziglang.org


The list is very, very, VERY similar to linux's kernel list.h -- which is a little masterpiece by itself...


Wouldn't it be preferable to write only a forward declaration of the struct in the header files, and define the structs themselves at the .c files? In other words, wouldn't it be better to define them as Abstract Data Types (ADT) ?

That way, users of the library would have no direct access to the fields of the struct (e.g. hashtable->num_buckets;), they'd have to use a function declared at the correspondent header file. (e.g. get_num_buckets(hashtable);).

Direct access to member fields could break the data structure because data structures always needs to follow certain rules in insertion and deletion.


That's not possible - the compiler needs to know the size so that it can allocate space for the structure.

You can get away with forward declarations of things if you only ever need a pointer to the struct, because the compiler knows how big a pointer is and only needs to know the size of the structure when you dereference it.

You also need to know the size when you allocate it - whether you're using malloc, or allocating statically. As you want to leave control of allocation up to the caller, then the caller needs to know the size.

Theoretically I suppose you could allocate through a macro that used an explicit size number in the header, but going to those lengths to take the struct out of the header seems a bit extreme.


And, as always, check out Phong Vo's CDT: https://www.usenix.net/legacy/publications/library/proceedin...


This is cool.. I remember last I did something in C I needed a hashtable and had to write my own.

The sad part is that the more fancy header functions and tricks that gets added to a library like this, the closer you are to reinventing C++ :)



i applaud the effort...but it looks like someone posted a couple of their homework assignments to github. Not exactly comprehensive or groundbreaking.


Huge collection of data structure problems Mostly in C++ -

http://www.techiedelight.com/list-of-problems/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: