• 29 Nov 2009 /  Code, Technology

    I’ve had a couple discussions lately discussing the difference in speed between accessing data from memory and disk. In most of these conversations people were surprised to know that there is something faster than Main Memory that the programmer can take advantage of .

    First, let’s consider the memory hierarchy. If you do a google image search for memory hierarchy, you’ll see a few different (and very insightful) interpretations. In it’s simplest, it looks like this.

    A version of the memory hierachy. L0 = Level 0, L1 = Level 2, and so on. It should be noted that some high-end systems have a third SRAM Cache, which becomes L3. This causes L3 (Main Memory) to become L4 and L4 to become L5.

    A version of the memory hierarchy. L0 = Level 0, L1 = Level 2, and so on. It should be noted that some high-end systems have a third SRAM Cache, which becomes L3. This causes L3 (Main Memory) to become L4 and L4 to become L5. Also, please also keep in mine that the higher you move up the memory hierarchy, the more monetarily expensive things become (5 GB of RAM is more expensive than 5 GB of hard disk space).

    Main memory is DRAM, while Level 2 (L2) Cache is SRAM. Now, I hear you asking, “What about my SDRAM? Is that the best of both worlds?” No, it’s still DRAM. The S means something different.

    • DRAM: Dynamic Random Access Memory
    • SRAM: Static Random Access Memory
    • SDRAM: Synchronous Dynamic Random Access Memory
    • DDR SDRAM: Double Data Rate & one hell of an acronym.

    If you want to access something on your hard disk, it takes roughly 100,000 times longer than DRAM, and 1,000,000 times longer than SRAM, Making SRAM ~ 10x faster than DRAM.

    There are actually two, and in some machines even three, levels of SRAM Cache. L1 actually lives on-board the processor, while L2 (and the less common L3) live off-board. Although they’re typically all SRAM, in this post references to “SRAM” specifically mean L2 Cache.

    Now I hear you say “Awesome! I’m building this killer machine, and it’s got all the Main Memory it can handle, but I’ve still got some money to burn. Can I increase it’s SRAM too?” No. SRAM (L2 Cache) lives in a piece of additional hardware on your motherboard, you can’t really do anything to it.

    So what’s it all about then? In general, the majority of your accesses to memory will be very close, even adjacent, to your previous access. It won’t always be the case, but it probably will. Caches take advantage of this probability. If you access memory location [x], there is a high probability that you’re next memory access will be to memory location [x+1] or something similar. If you accessed an int in memory, which has a typical size of 4 bytes, then your next memory access will probably  be [x+4]. If L2 Cache is nearly 10x faster, as we know what you’re probably access next, we can speed things up by automatically moving what you’ll probably move into L2 Cache. When your program makes it’s next memory request, we check to see if it’s already in L2. If it is (cache hit) great! If it’s not, (cache miss), too bad – we goto main memory and swap that and it’s adjacent bytes, into L2 and proceed. A typical Intel Pentium chip has 128 KB – 2 MB of L2 Cache, and typically swaps data in and out of cache from main memory in 32 byte chunks.

    So how can a programmer take advantage of this? If you’re going to be accessing chunks of memory, try to access it as you know it appears spatially in memory.

    Let us consider the loop below, and assume for sake of example,

    • Length = 8
    • Any given array[i], an integer, is 4 bytes
    • L2 Cache swaps data in 16 byte chunks (as opposed to the actual 32, noted above. That’s 4 integers).
    • At the start of the loop, the L2 Cache was empty (cold cache).

    [cc lang="cpp"]
    int sumArray(int array[Length])
    {
    int sum = 0;
    for (int i = 0; i < Length; i++)
    sum += array[i];
    return sum;
    }
    [/cc]
    The below table represents cache hits and misses. White are cache hits, and red are cache misses.

    array[i] i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8
    Access Order 1 2 3 4 5 6 7 8

    This is the best that we can do. Now let us consider a 2 dimensional array, given the assumtions

    • Rows = 8
    • Columns = 4
    • C stores arrays/matrices in row-major order.
    • The rest is the same as the example above.

    [cc lang="c++"]
    int sumMatrix(int matrix[Columns][Rows])
    {
    int sum = 0;
    for (int col = 0; col < Columns; col++)
    for (int row = 0; row < Rows; row++)
    sum += matrix[col][row];
    return sum;
    }
    [/cc]

    Let’s look at those cache hits and misses. The values in the table body are the order in which matrix[col][row] was accessed.

    matrix[col][row] row=0
    row=1
    row=2
    row=3 row=4 row=5 row=6 row=7
    col = 0 1 2
    3 4 5 6 7 8
    col = 1 9 10 11 12 13 14 15 16
    col = 2 17 18 19 20 21 22 23 24
    col = 3 25 26 27 28 29 30 31 32

    Now let us consider the case where we do nothing except swap the order in which we access the matrix elements, and inspect the cache hit/miss table.
    [cc lang="c++"]
    int sumMatrix(int matrix[Columns][Rows])
    {
    int sum = 0;
    for (int row = 0; row < Rows; row++)
    for (int col = 0; col < Columns; row++)
    sum += matrix[col][row];
    return sum;
    }
    [/cc]

    matrix[col][row] row=0 row=1 row=2 row=3 row=4 row=5 row=6 row=7
    col=0 1 2 3 4 5 6 7 8
    col=1 9 10 11 12 13 14 15 16
    col=2 17 18 19 20 21 22 23 24
    col=3 25 26 27 28 29 30 31 32

    Every access is a cache miss! Meaning that we have to goto main memory for every access of the matrix. This will be approximately 9 times slower than if we hadn’t swapped the row/column order. This is because data is not actually stored as a two dimensional matrix in main memory – everything is a linear array, with row and column offsets being calculated. Because some data resides right next to other data in main memory, it will also appear this way when 32 bytes are moved into the cache. If you access them in the same order in which they’re stored in main memory, you can take advantage of the cache. If you don’t, you end up forcing cache two swap stuff in and out from main memory.

    John Carmack was once quoted as saying “Low level programming is good for the programmer’s soul”.

  • 26 Oct 2009 /  Technology

    Another late night up coding and playing with tech. So RB mentioned that he wanted to start a blog where he’d talk about stuff that “every programmer should know”, but that he wanted partner’s in crime.

    We got to talking, and messing around, and by early hours in the morning, a new blog was born; 16Bit: Stuff every programmer should learn.

    We’re both pretty excited about this blog, to see where is takes us and other people.

  • 23 Oct 2009 /  Technology

    RB im’ed me this today. This is just, awesome. One thing that I really like about this is that the term ‘hacker’ is well received. It’s not this shady, misunderstood, scary word.

    via campellbrown.blogs.cnn

    Tags: ,

  • 23 Oct 2009 /  Technology

    I stayed up waay too late last night, working on a personal project – one in which I’m trying to learn PHP & MySQL

    MySqlFront

    MySqlFront is a nifty sql client that you can use to connect to your database. Hosmonter, my host, supports this. It’s a lot easier than going through phpMyAdmin. MySqlFront has a 30 free trial period, after which it’s $35.00. I’m still looking for an open source client. Oh, and Microsoft Sql Server Management Studio Express does not connect to MySql Databases. (I didn’t think it would, but hey, though I’d give it a go – and MS SSME is legally free – no trial period).

    Stored Procedures on Hostmonster

    Okay, this was a hangup last night. I think I must have spent at least two frustrating hours on this before stumbling upon a forum post somewhere.

    A Stored Procedure (Sproc) is one or more frequently executed sql commands. For more information on sprocs, please see the excellent post Ten Common Database Design Mistakes section on sprocs. Louis explains maintainability, encapsulation, security, and performance benefits.

    Anyway, so, my situation was that I could create and execute the sproc in MySqlFront, but could not execute it from my PHP page. After spending much time reading both MySql and PHP documentation, I stumbled upon a forum post that basically said “Hostmonsters’ response to my ticket: If you need to use stored procedures in MySQL, you’ll have to find hosting elsewhere.”

    Blast, there goes two hours of my time.

    FaceBox

    FaceBox is a nice looking modal javascript window framework. It’s intended to look very similar to FaceBook’s lightboxes. It uses the jQuery library, and is all available for download. The framework’s super easy to use; just out thie really brief tutorial.

    If you’re thinking “jQuery, I don’t know what that is – another technology I have to learn?”, don’t. It’s just a javascript library. No need to fret. Checkout jQuery’s Wikipedia page.

    Summary

    Yeah, so that was pretty much my last night. Not the most productive night, but progress is progress. :-)

    Tags: , , , , ,

  • 22 Oct 2009 /  Life, Order vs Chaos, Technology

    Yeah, we all pretty much agree, aviation checklists are good. It’s reasonable, it’s smart, it just plain makes sense. Conversely, most of us have seen Office Space (or, we got that memo) and agree that, never mind TPS reports themselves, TPS Cover Sheets are stupid mindless paperwork keeping us from just gettin’ on with our jobs! Most people probably have not actually dealt with a TPS Report, per se, but we’re familiar with the metaphor – stupid worthless paperwork.

    So what am I getting at? Aviation Checklists and TPS Report Cover Sheets are really the same thing.

    tpscover
    Aviation Checklist (Yup, there’s an App for that!) TPR Report Cover Sheet (did you get that memo?)

    The purpose of the aviation check list is pretty straight forward. To keep you un-deadified.The purpose of the TPS Cover is a little more convoluted, but it’s essentially a management checklist.

    So how did the aviation checklist come into being? Did someone, one day, say for no reason “Let’s have a checklist. Yeah!”. No. First, somebody messed up. Bad. Then, people got together and had a meeting to discuss what went wrong and how to solve the problem.

    A similar history lies behind much “mindless” paperwork similar to TPS Covers. A young or growing company, without the proper process or individual accountability, can turn a team of young, energetic, and enthusiastic recent college grads into a collection of sleep-deprived jaded suicidal alcoholics in mere days. After one, or more, projects have missed deadlines horribly, a meeting is called. In that meeting it’s asked “Why was the project so late? The original time line still looks reasonable – something went wrong. How can we fix this?”. The answer is often more process, more checklists, more paperwork.

    So, TPS Cover Sheets are the end result of a project (or three) gone wrong. As much as a pain as they may be, they may actually be saving you from being driven into a jaded state of suicidal alcoholism. Conversely, there’s definitely the possibility that too much process and paperwork comes out of the meeting – perhaps an email to an email list for a memo for the coversheet for the tps report, which is ultimately for the test process for the actual product. There’s obviously a balance, but TPS Report Cover Sheets are probably not as bad as Peter Gibbons would have you believe. In the end – Damn, it feels good to be a gansta.

    Tags: , , , ,

  • 20 Oct 2009 /  Order vs Chaos, Technology

    databasesymbolSo I’ve spent the past couple nights drinking herbal tea and brushing up on my database design principles. I’m slightly ashamed to say that I never even touched databases in college. The entirety of my SQL/Database skillz come from “the field”. However, now that I’m once again on the job market, I figure it’s probably a good idea to brush up on things like design theory terminology, such as “first normal form”.

    Something that I’m noticing is that I’m completely ignorant of some of these terms, but the concepts already are very close to my heart. Take, for example, the above mentioned first normal form.

    A table of the first normal form 1) has columns that contain only atomic values and 2) has no repeating groups. What? Okay, atomic forms – another term I’m unfamiliar with. Repeating groups? What’s a group – do you mean no repeating rows? I don’t know… I’m confused. After digging in (i.e. reading down the page of my book where the example was illustrated with a picture), it all made sense.

    Atomicity: Atomic values are values that are perceived to be broken down as much as reasonably possible. For example, names in the database in the a single columns “Firstname Lastname” are not atomic, while having a column “Firstname” and second column “Lastname” is atomic. Obviously, you can break it down more (by letter), but that’s unreasonable.

    Repeating Groups:  An example of a repeating group would be the case where you have a table, listing Book-Id, Book-Title, and Authors, where you have the columns [BookId], [BookTitle], [Author1], [Author2], [Author3]. The repeating groups are the 3 author columns.

    After I came to understand what was being said in the definition of the first normal form, I thought “What a waste of my time. This is so painfully obvious!” I mean, from my “field” learnings, I can testify to how painful it is to deal with non-atomic columns and repeating groups. I felt like this definition was analogous to having a section in the auto-manual that stated “It it good practice, when temperatures fall below freezing, to utilize your vehicle’s atmospheric-entropy-elevator, or aee device, to maintain the homeostatic environment of the vehicle“. I mean, have you ever actually tried to deal with a database with a single name column where some names have been entered as “FirstName, LastName”, others as “LastName, FirstName”, others as “FirstName MiddleName LastName”, and even “LastName, MiddleName1, MiddleName2, MiddleName3, FirstName, Junior/Senior”? [..oh God, bad memories flashing back...]

    Now, this is a cynical outlook; it’s actually quite good for me to revisit these concepts in formal and academic style. I’m curious how many other people fall into this boat with me, and how our applications will vary from those of people with more formal database backgrounds.

    In my random googlings, I stumbled upon this: Ten Common Database Design Mistakes.  Most of these seemed pretty “common sense” to me, but I thought it was excellent to see them laid out in text.

    Database From Hell

    Database From Hell, via teejayhanton on Flickr. "this was at my old job. it was a database brought to me from a government agency (unclassified). my task was to a) sort it out and b) create a web application that would plug into it."

    Tags: , , ,

  • 02 Oct 2009 /  Code, Technology

    A lot of people don’t like to put their e-mail address directly on a web-page. You may have seen e-mail addresses that look like “jondoeATgmailDOTcom”, or “j o h n do e @ gma il . c o m!”. Strangeness such as spelling out words like “AT”, odd spaces, or non e-mail address characters such as ! are all tricks employed to avoid having your e-mail address scopped up by someone looking to spam you.

    The way in which I approach the problem is by having javascript write my e-mail address for me. I get both a well formed and clickable e-mail link, as well as avoid getting my e-mail address scooped up by a web crawler. Now, I know that the javascript document.write() function is shunned in most circles, but below is some example code.
    [cc lang="javascript"]
    // File is contactInfo.js
    document.write(”David Kennedy:
    Web
    http://davidwkennedy.com|
    E-Mail dave@orderinchaos.org |
    Phone: 435.770.6865 “);
    [/cc]
    Which is then called in the HTML file as seen below
    [cc lang="html"]

    [/cc]
    I’m sure that it’s probably quite possible to write a web crawler that will be smart enough to pick it up, but I think most web crawlers won’t.

    For example, when I use the web crawler below on my personal site davidwkennedy.com you can see that the results do not contain my e-mail address!

    Below is a web crawler, written in Python & borrowed from IBM’s site on web spiders.
    [cc lang="python"]
    #!/usr/local/bin/python

    import httplib
    import sys
    import re
    from HTMLParser import HTMLParser

    class miniHTMLParser( HTMLParser ):

    viewedQueue = []
    instQueue = []

    def get_next_link( self ):
    if self.instQueue == []:
    return ”
    else:
    return self.instQueue.pop(0)

    def gethtmlfile( self, site, page ):
    try:
    httpconn = httplib.HTTPConnection(site)
    httpconn.request(”GET”, page)
    resp = httpconn.getresponse()
    resppage = resp.read()
    except:
    resppage = “”

    return resppage

    def handle_starttag( self, tag, attrs ):
    if tag == ‘a’:
    newstr = str(attrs[0][1])
    if re.search(’http’, newstr) == None:
    if re.search(’mailto’, newstr) == None:
    if re.search(’htm’, newstr) != None:
    if (newstr in self.viewedQueue) == False:
    print ” adding”, newstr
    self.instQueue.append( newstr )
    self.viewedQueue.append( newstr )
    else:
    print ” ignoring”, newstr
    else:
    print ” ignoring”, newstr
    else:
    print ” ignoring”, newstr

    def main():

    if sys.argv[1] == ”:
    print “usage is ./minispider.py site link”
    sys.exit(2)

    mySpider = miniHTMLParser()

    link = sys.argv[2]

    while link != ”:

    print “\nChecking link “, link

    retfile = mySpider.gethtmlfile( sys.argv[1], link )
    mySpider.feed(retfile)
    link = mySpider.get_next_link()

    mySpider.close()

    print “\ndone\n”

    if __name__ == “__main__”:
    main()
    [/cc]
    Below are the results of the above crawler on davidwkennedy.com
    [cc lang="bash"]
    dave@dave-sparta:~$ python miniCrawler.py davidwkennedy.com /

    Checking link /
    adding index.htm
    adding photos.htm
    adding videos.htm
    adding projects.htm
    ignoring http://orderinchaos.davidwkennedy.com
    adding funstuff.htm
    adding about.htm
    ignoring http://twitter.com/davidwkennedy
    ignoring statcounter

    Checking link index.htm
    Checking link photos.htm
    Checking link videos.htm
    Checking link projects.htm
    Checking link funstuff.htm
    Checking link about.htm
    done

    dave@dave-sparta:~$

    [/cc]

    Tags: , , , ,

  • 01 Oct 2009 /  Technology

    As a follow up to a previous post on Syntax Color Schemes in Vim, it’s only appropriate to talk about code snippets in my blog.

    I use wordpress as my blogging engine. Wordpress is some pretty fantastic (and hugely popular, and free) software that you can either install on your own server or create a free account like any other site.

    If you install it on your own server, there are a ton of plugins (mostly written by third-party bloggers) that do all sorts of fancy stuff. The one that I’ve found to help out with code is codecolorer. It’s powerful simplicity at its best.

    If you have trouble installing it, there’s a good chance that you may need to upgrade the version of wordpress that you’re running. Before I upgraded, I received an error every time I tried to log into my wp-admin page. Now that the issues resolved, I forget exactly what the error was, but it was an error in one of the php files for that plugin.

    Tags: , , ,

  • 27 Sep 2009 /  Technology
    Microsoft's BarrelFish Multi-Kernel Operating System Architecture.

    Microsoft's BarrelFish Multi-Kernel Operating System Architecture.

    Microsoft has a new pet-project: Barrelfish, a multi-kernel operating system (via OS News). As a bit of back story, a kernel is the meat of an operating system. The kernel is largely responsible for enabling other software to make use of hardware.

    In the research team’s article The Multikernel: A new OS architecture for scalable multicore systems they write “We investigate a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing,”.

    This is really good stuff! From a hardware perspective, this is exactly where the future is headed. When I took “Distributed & Network Programming” I remember the class getting Dr. Allan off on a tangent about how a single computer these days is really a distributed system.

    A “distributed system” is generally thought of multiple computers networked together and working together. This idea was popularized when “a computer” was a pretty straight forward idea. A computer consisted of a processor, some main memory, possibly some persistent memory, and some input/output. Today’s computers are much more complex. A video card often has it’s own dedicated main-memory and set of multiple-processors. Hard disk drives are beginning to contain embedded systems to encrypt data saved on the disks. Even main memory is not a straightforward concept these days, not with virtual memory which requires and additional piece of hardware called a memory management unit. A “personal computer” is a series of smaller computers (video cards, hard drives, processors, etc…) networked together. This will be a continuing trend, which will require much more complex operating systems to manage the hardware. This is why I’m so excited over BarrelFish!

    I think the Dinosaur Book will be needing a new chapter! (aaaaawww yeeeaaahhh… the Dinosaur Book – you know it!)

    I hope Microsoft Open-Sources BarrelFish like they did Singularity, which was their research OS
    in which the kernel, device drivers, and applications were all written in managed code.

    Tags: , , , , ,

  • 27 Sep 2009 /  Code, Technology

    Vi and Vi-Improved (Vim) are a couple of text editors that many people use frequently on Linux systems. When I first was introduced to Linux I didn’t have to worry about all the setup stuff, and I learned Vim. Typing “vi” from the command prompt took me to vim. Vim was also configured to color computer code, as in the screenshot below.

    Vim with Syntax Coloring on For Programming Languages

    Vim with Syntax Coloring on For Programming Languages

    After installing Linux on my personal machine, I was aghast that the color schemes are not the default in Vim. This had to be rectified. I’m running Ubuntu 7.10 – Gusty Gibbon. (At the time of this blog post, the current version of Ubuntu is 9.04 – Jaunty Jackalope)

    I found that the Vim command for syntax color is :syntax on . However, with the default installation of Ubuntu, I received the following error:

    [cc]Sorry, the command is not available in this version: syntax on.[/cc]

    This means that you don’t have the full version of Vim installed. I searched for this error and found a variety of commands to try – none of which worked; there were various install errors. The command that worked I found on the Ubuntu Vim Tip of the Week and it was
    [cc lang="bash"]$sudo aptitude install vim-full[/cc]
    Now the command should work. However, it’s a pain to run the command every time I fire up Vim. Instead, you can make this automatic by putting the commant in the file
    [cc lang="bash"]~$ ./vimrc[/cc]
    Many of the sites on which I saw reference to the file didn’t mention it’s location. I searched tirelessly for this file. Turns out, be default, it does not exist. Vim runs just fine without it. To save Vim configuration settings, such as color schemes or tab-sizes, create the .vimrc file in your home directory.

    In addition to simply turning the syntax color on or off, you can even install new themes. A collection of themes are available from Vim.org Scripts

    The theme that I found that I like is the OceanDeep.

    After hours of playing around on my Linux, my finaly .vimrc file looks like (*note: remove comments to make work)
    [cc lang="csharp"]
    syntax on // turn color scheme on
    colorscheme oceandeep // set the color scheme
    filetype indent on // the next three lines auto indent while programming
    set autoindent
    set nu
    set ts=4 // set tab size to 4 spaces
    [/cc]

    Tags: , , , , ,