• 29 Nov 2009 /  Code, Technology

    I’ve had a couple discussions lately discussing the difference in speed between accessing data from memory and disk. In most of these conversations people were surprised to know that there is something faster than Main Memory that the programmer can take advantage of .

    First, let’s consider the memory hierarchy. If you do a google image search for memory hierarchy, you’ll see a few different (and very insightful) interpretations. In it’s simplest, it looks like this.

    A version of the memory hierachy. L0 = Level 0, L1 = Level 2, and so on. It should be noted that some high-end systems have a third SRAM Cache, which becomes L3. This causes L3 (Main Memory) to become L4 and L4 to become L5.

    A version of the memory hierarchy. L0 = Level 0, L1 = Level 2, and so on. It should be noted that some high-end systems have a third SRAM Cache, which becomes L3. This causes L3 (Main Memory) to become L4 and L4 to become L5. Also, please also keep in mine that the higher you move up the memory hierarchy, the more monetarily expensive things become (5 GB of RAM is more expensive than 5 GB of hard disk space).

    Main memory is DRAM, while Level 2 (L2) Cache is SRAM. Now, I hear you asking, “What about my SDRAM? Is that the best of both worlds?” No, it’s still DRAM. The S means something different.

    • DRAM: Dynamic Random Access Memory
    • SRAM: Static Random Access Memory
    • SDRAM: Synchronous Dynamic Random Access Memory
    • DDR SDRAM: Double Data Rate & one hell of an acronym.

    If you want to access something on your hard disk, it takes roughly 100,000 times longer than DRAM, and 1,000,000 times longer than SRAM, Making SRAM ~ 10x faster than DRAM.

    There are actually two, and in some machines even three, levels of SRAM Cache. L1 actually lives on-board the processor, while L2 (and the less common L3) live off-board. Although they’re typically all SRAM, in this post references to “SRAM” specifically mean L2 Cache.

    Now I hear you say “Awesome! I’m building this killer machine, and it’s got all the Main Memory it can handle, but I’ve still got some money to burn. Can I increase it’s SRAM too?” No. SRAM (L2 Cache) lives in a piece of additional hardware on your motherboard, you can’t really do anything to it.

    So what’s it all about then? In general, the majority of your accesses to memory will be very close, even adjacent, to your previous access. It won’t always be the case, but it probably will. Caches take advantage of this probability. If you access memory location [x], there is a high probability that you’re next memory access will be to memory location [x+1] or something similar. If you accessed an int in memory, which has a typical size of 4 bytes, then your next memory access will probably  be [x+4]. If L2 Cache is nearly 10x faster, as we know what you’re probably access next, we can speed things up by automatically moving what you’ll probably move into L2 Cache. When your program makes it’s next memory request, we check to see if it’s already in L2. If it is (cache hit) great! If it’s not, (cache miss), too bad – we goto main memory and swap that and it’s adjacent bytes, into L2 and proceed. A typical Intel Pentium chip has 128 KB – 2 MB of L2 Cache, and typically swaps data in and out of cache from main memory in 32 byte chunks.

    So how can a programmer take advantage of this? If you’re going to be accessing chunks of memory, try to access it as you know it appears spatially in memory.

    Let us consider the loop below, and assume for sake of example,

    • Length = 8
    • Any given array[i], an integer, is 4 bytes
    • L2 Cache swaps data in 16 byte chunks (as opposed to the actual 32, noted above. That’s 4 integers).
    • At the start of the loop, the L2 Cache was empty (cold cache).

    [cc lang="cpp"]
    int sumArray(int array[Length])
    {
    int sum = 0;
    for (int i = 0; i < Length; i++)
    sum += array[i];
    return sum;
    }
    [/cc]
    The below table represents cache hits and misses. White are cache hits, and red are cache misses.

    array[i] i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8
    Access Order 1 2 3 4 5 6 7 8

    This is the best that we can do. Now let us consider a 2 dimensional array, given the assumtions

    • Rows = 8
    • Columns = 4
    • C stores arrays/matrices in row-major order.
    • The rest is the same as the example above.

    [cc lang="c++"]
    int sumMatrix(int matrix[Columns][Rows])
    {
    int sum = 0;
    for (int col = 0; col < Columns; col++)
    for (int row = 0; row < Rows; row++)
    sum += matrix[col][row];
    return sum;
    }
    [/cc]

    Let’s look at those cache hits and misses. The values in the table body are the order in which matrix[col][row] was accessed.

    matrix[col][row] row=0
    row=1
    row=2
    row=3 row=4 row=5 row=6 row=7
    col = 0 1 2
    3 4 5 6 7 8
    col = 1 9 10 11 12 13 14 15 16
    col = 2 17 18 19 20 21 22 23 24
    col = 3 25 26 27 28 29 30 31 32

    Now let us consider the case where we do nothing except swap the order in which we access the matrix elements, and inspect the cache hit/miss table.
    [cc lang="c++"]
    int sumMatrix(int matrix[Columns][Rows])
    {
    int sum = 0;
    for (int row = 0; row < Rows; row++)
    for (int col = 0; col < Columns; row++)
    sum += matrix[col][row];
    return sum;
    }
    [/cc]

    matrix[col][row] row=0 row=1 row=2 row=3 row=4 row=5 row=6 row=7
    col=0 1 2 3 4 5 6 7 8
    col=1 9 10 11 12 13 14 15 16
    col=2 17 18 19 20 21 22 23 24
    col=3 25 26 27 28 29 30 31 32

    Every access is a cache miss! Meaning that we have to goto main memory for every access of the matrix. This will be approximately 9 times slower than if we hadn’t swapped the row/column order. This is because data is not actually stored as a two dimensional matrix in main memory – everything is a linear array, with row and column offsets being calculated. Because some data resides right next to other data in main memory, it will also appear this way when 32 bytes are moved into the cache. If you access them in the same order in which they’re stored in main memory, you can take advantage of the cache. If you don’t, you end up forcing cache two swap stuff in and out from main memory.

    John Carmack was once quoted as saying “Low level programming is good for the programmer’s soul”.

  • 25 Nov 2009 /  Life

    “Oh, I’m bad at math, haha“.

    I can’t count the number of times that I’ve heard this, and I’m sure you can’t either. I hear this all the time. I say I’m bad at math, but that’s because I had my [explative] handed to me on a plate in equations (which, as my calculus II professor put it, are “a whole new world of pain”). But when I hear a lot of people say that they’re bad at math, what they mean, is they can’t figure out 50% of 150. But that’s okay, cause they’re bad at math, haha.

    What I’d like to particularly note is that people usually laugh after saying they’re bad at math. In fact, it’s “cool” to suck at math. “Oh, that math crap is hard and is for nerds and losers. I hate it, haha” I hear too often. I can appreciate a certain aversion to academics (yes, there have been times when I have cursed class, any class). However,  math, especially at the algebra level, is a basic enough skill that everyone should have at least a basic understanding and ability. Furthermore, it should not be “cool” to hate it.

    My discrete math professor said something that I’ll never forget:

    “Why is it that people think it’s cool or funny to suck at math? That’s like saying ‘Oh, I can’t read, I suck at reading, haha.’ People are ashamed to say they can’t read.”

    The catalyst to this post was something that I saw over on Wil Wheaton’s blog today in which he indicated that he’s not as good at math as he used to be. Look at point 5 in his post, which expands upon

    “I mean, I’m a pretty smart guy, and there are times when I have to write down a math problem that I used to be able to do in my head, and some of the spelling and grammar errors I make are just embarrassing – and I’m a writer! So my idea is for an 8 week class that meets once or twice a week for a couple hours, that would be a mental tune up for guys like me.”

    This sounds like a fantastic idea, one which I’d probably attend. I love the idea that he wants to be better at math! In fact, I think that there might be something more here. There are all sorts of interest groups, organizations, and events that exist on a national level: hackerspaces, FabLab, MakerFair, ACM, IEEE, etc… Why can’t there be some sort of organization or group that offers adult oriented refresher course? I’d be more than happy to brainstorm with anyone who’d like to with me.

    Tags: ,