Thursday, November 22, 2007

What Every Programmer Should Know About Memory (PDF)

I just came across this PDF on reddit.com titled "What every programmer should know about memory". Its written by Ulrich Drepper from RedHat, you should know who he is.

Link to PDF

It's going to take me awhile to get through this (its 114 pages long) - but so far its a decent read. I'm currently cheating and searching through it for things that interest me. I'm currently taking in section 7.3 'Measuring Memory Usage'. This section is particularly interesting to me because I've been toying with a project of mine lately that collects massive amounts of data. Searching and sorting that data efficiently has not been easy.

Ulrich states in the PDF that using libc's malloc to store a linked list you populate for later retrieval and use is probably a bad idea. This is true, because theres no guarantee malloc will return memory that is close or even near to the next member in the linked list. There are alternatives to using the traditional libc malloc library such as obstack and Google's TCMalloc.

There's lots of other good stuff in his paper, take a look for yourself.

2 comments:

Travis said...

looks like Ulrich has lots of good topics on his site, i'll read more when i get a chance. thanks for the link Chris, wouldn't have found it otherwise.

Travis

http://travisaltman.com

PierreG said...

To cache a large number of small files and lists in memory, I tried several scenarios and the most efficient way was also the simplest: one big malloc (with the pre-computed required size for all the items) and then pointers to structs for definitions with pointers on data (making sorting faster).

Just my one-cent contribution...