• 29 Nov 2009 /  Code, Technology

    I’ve had a couple discussions lately discussing the difference in speed between accessing data from memory and disk. In most of these conversations people were surprised to know that there is something faster than Main Memory that the programmer can take advantage of .

    First, let’s consider the memory hierarchy. If you do a google image search for memory hierarchy, you’ll see a few different (and very insightful) interpretations. In it’s simplest, it looks like this.

    A version of the memory hierachy. L0 = Level 0, L1 = Level 2, and so on. It should be noted that some high-end systems have a third SRAM Cache, which becomes L3. This causes L3 (Main Memory) to become L4 and L4 to become L5.

    A version of the memory hierarchy. L0 = Level 0, L1 = Level 2, and so on. It should be noted that some high-end systems have a third SRAM Cache, which becomes L3. This causes L3 (Main Memory) to become L4 and L4 to become L5. Also, please also keep in mine that the higher you move up the memory hierarchy, the more monetarily expensive things become (5 GB of RAM is more expensive than 5 GB of hard disk space).

    Main memory is DRAM, while Level 2 (L2) Cache is SRAM. Now, I hear you asking, “What about my SDRAM? Is that the best of both worlds?” No, it’s still DRAM. The S means something different.

    • DRAM: Dynamic Random Access Memory
    • SRAM: Static Random Access Memory
    • SDRAM: Synchronous Dynamic Random Access Memory
    • DDR SDRAM: Double Data Rate & one hell of an acronym.

    If you want to access something on your hard disk, it takes roughly 100,000 times longer than DRAM, and 1,000,000 times longer than SRAM, Making SRAM ~ 10x faster than DRAM.

    There are actually two, and in some machines even three, levels of SRAM Cache. L1 actually lives on-board the processor, while L2 (and the less common L3) live off-board. Although they’re typically all SRAM, in this post references to “SRAM” specifically mean L2 Cache.

    Now I hear you say “Awesome! I’m building this killer machine, and it’s got all the Main Memory it can handle, but I’ve still got some money to burn. Can I increase it’s SRAM too?” No. SRAM (L2 Cache) lives in a piece of additional hardware on your motherboard, you can’t really do anything to it.

    So what’s it all about then? In general, the majority of your accesses to memory will be very close, even adjacent, to your previous access. It won’t always be the case, but it probably will. Caches take advantage of this probability. If you access memory location [x], there is a high probability that you’re next memory access will be to memory location [x+1] or something similar. If you accessed an int in memory, which has a typical size of 4 bytes, then your next memory access will probably  be [x+4]. If L2 Cache is nearly 10x faster, as we know what you’re probably access next, we can speed things up by automatically moving what you’ll probably move into L2 Cache. When your program makes it’s next memory request, we check to see if it’s already in L2. If it is (cache hit) great! If it’s not, (cache miss), too bad – we goto main memory and swap that and it’s adjacent bytes, into L2 and proceed. A typical Intel Pentium chip has 128 KB – 2 MB of L2 Cache, and typically swaps data in and out of cache from main memory in 32 byte chunks.

    So how can a programmer take advantage of this? If you’re going to be accessing chunks of memory, try to access it as you know it appears spatially in memory.

    Let us consider the loop below, and assume for sake of example,

    • Length = 8
    • Any given array[i], an integer, is 4 bytes
    • L2 Cache swaps data in 16 byte chunks (as opposed to the actual 32, noted above. That’s 4 integers).
    • At the start of the loop, the L2 Cache was empty (cold cache).

    [cc lang="cpp"]
    int sumArray(int array[Length])
    {
    int sum = 0;
    for (int i = 0; i < Length; i++)
    sum += array[i];
    return sum;
    }
    [/cc]
    The below table represents cache hits and misses. White are cache hits, and red are cache misses.

    array[i] i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8
    Access Order 1 2 3 4 5 6 7 8

    This is the best that we can do. Now let us consider a 2 dimensional array, given the assumtions

    • Rows = 8
    • Columns = 4
    • C stores arrays/matrices in row-major order.
    • The rest is the same as the example above.

    [cc lang="c++"]
    int sumMatrix(int matrix[Columns][Rows])
    {
    int sum = 0;
    for (int col = 0; col < Columns; col++)
    for (int row = 0; row < Rows; row++)
    sum += matrix[col][row];
    return sum;
    }
    [/cc]

    Let’s look at those cache hits and misses. The values in the table body are the order in which matrix[col][row] was accessed.

    matrix[col][row] row=0
    row=1
    row=2
    row=3 row=4 row=5 row=6 row=7
    col = 0 1 2
    3 4 5 6 7 8
    col = 1 9 10 11 12 13 14 15 16
    col = 2 17 18 19 20 21 22 23 24
    col = 3 25 26 27 28 29 30 31 32

    Now let us consider the case where we do nothing except swap the order in which we access the matrix elements, and inspect the cache hit/miss table.
    [cc lang="c++"]
    int sumMatrix(int matrix[Columns][Rows])
    {
    int sum = 0;
    for (int row = 0; row < Rows; row++)
    for (int col = 0; col < Columns; row++)
    sum += matrix[col][row];
    return sum;
    }
    [/cc]

    matrix[col][row] row=0 row=1 row=2 row=3 row=4 row=5 row=6 row=7
    col=0 1 2 3 4 5 6 7 8
    col=1 9 10 11 12 13 14 15 16
    col=2 17 18 19 20 21 22 23 24
    col=3 25 26 27 28 29 30 31 32

    Every access is a cache miss! Meaning that we have to goto main memory for every access of the matrix. This will be approximately 9 times slower than if we hadn’t swapped the row/column order. This is because data is not actually stored as a two dimensional matrix in main memory – everything is a linear array, with row and column offsets being calculated. Because some data resides right next to other data in main memory, it will also appear this way when 32 bytes are moved into the cache. If you access them in the same order in which they’re stored in main memory, you can take advantage of the cache. If you don’t, you end up forcing cache two swap stuff in and out from main memory.

    John Carmack was once quoted as saying “Low level programming is good for the programmer’s soul”.

  • 08 Nov 2009 /  Code

    Often times you may want to get the floor of a specific date. For example, you may want to see everything that has happened on your system since the beginning of today, or perhaps between the start of yesterday and the start of today. For simple queries, this is usually done by entering in a datetime manually. For example
    [cc lang="tsql"]
    SELECT * FROM fooTable WHERE [createdate] >= ‘2009-11-06′ AND [createdate] < '2009-11-07'
    [/cc]
    However, it's often the case that punching in manual values like this is just not practical. It'd be nice if you could just give it any date and it would find the floor of the day.

    You can do this fairly easily by taking advantage of T-SQL's DATEADD() and DATEDIFF() functions. To find the floor of a given datetime, do something similar to
    [cc lang="tsql"]
    SELECT DATEADD(dd, DATEDIFF(dd, 0, GETDATE()),0)
    [/cc]
    This will give you the floor of today. Replace GETDATE() with any datetime, including a variable, to get the floor of that day. This will return something of the form '2009-11-07 00:00:00' Additionally, replace dd with hh or mi for the floor of the hour or minute.

    Based on the code above, I use the function below to simplify getting the floors of various time periods.
    [cc lang="tsql"]
    CREATE FUNCTION fn_gettimefloor
    (
    @format VARCHAR(16),
    @datetime DATETIME
    )
    RETURNS DATETIME
    AS
    BEGIN
    IF @format in ('yy','yyyy','year')
    RETURN DATEADD(yy, DATEDIFF(yy, 0, @datetime),0)
    ELSE IF @format in ('qq','q', 'quarter')
    RETURN DATEADD(qq, DATEDIFF(qq, 0, @datetime),0)
    ELSE IF @format in ('mm','m','month')
    RETURN DATEADD(mm, DATEDIFF(mm, 0, @datetime),0)
    ELSE IF @format in ('dd','d','day')
    RETURN DATEADD(dd, DATEDIFF(dd, 0, @datetime),0)
    ELSE IF @format in ('wk','ww','week')
    RETURN DATEADD(wk, DATEDIFF(wk, 0, @datetime),0)
    ELSE IF @format in ('hh','hour')
    RETURN DATEADD(hh, DATEDIFF(hh, 0, @datetime),0)
    ELSE IF @format in ('mi', 'n','minute')
    RETURN DATEADD(mi, DATEDIFF(mi, 0, @datetime),0)
    ELSE IF @format in ('ss','s','second')
    RETURN DATEADD(ss, DATEDIFF(ss, 0, @datetime),0)
    ELSE IF @format in ('ms', 'millisecond')
    RETURN DATEADD(ms, DATEDIFF(ms, 0, @datetime),0)
    ELSE IF @format in ('mcs','microsecond')
    RETURN DATEADD(mcs, DATEDIFF(mcs, 0, @datetime),0)
    ELSE IF @format in ('ns','nanosecond')
    RETURN DATEADD(ns, DATEDIFF(ns, 0, @datetime),0)
    RETURN GETDATE()
    END
    GO
    [/cc]
    The above function can then be used in a way similar to the below example
    [cc lang="tsql"]
    select dbo.fn_gettimefloor('dd',GETDATE()) -- The floor of today
    select dbo.fn_gettimefloor('hh',@someDateVar) -- The floor of a datetime passed into the function
    select dbo.fn_gettimefloor('hh',DATEADD(hh, -5, GETDATE())) -- Get hour-floor of 5 hours ago
    [/cc]
    However, sometimes this isn't even enough. Let's say, for example, we want to break things up into 10 minute intervals. Get can further nest DATEADD()s and DATEDIFF()s. The below example is for the floor of every 10 minutes, but it is extendable to other date-types (days, hours, months, etc) and other time-intervals (10, 20, 3, etc).
    [cc lang="tsql"]
    select DATEADD(mi,DATEDIFF(mi,0,dateadd(mi,datediff(mi,0,getdate())/10*10,0)),0)
    [/cc]
    What's happening in the above code is, from the inside out, we

    1. Get today’s full datetime
    2. Get the difference between step 1 and zero (in minutes), which is an integer.
    3. We divide step 2 by 10 and then multiply that by 10, effectively flooring the value to the nearest number evenly divisible by 10.
    4. We add the integer from step 3 to the start of eternity (dateadd’ing to zero), and we’ve got the floor of the current 10 minute interval.
    5. After step 4, we have a datetime value, and we’re left with something like SELECT DATEADD(dd, DATEDIFF(dd, 0, <step 4 datetime>),0).

    So we’ve got a method to get the floor of a time period, a function that abstracts the mess into something easily used, and then a method to get the floor of a time-increment. I’ll leave it up to the ambitious reader to get the last part, the 10 minute example, into a function as well. :-)

    Tags: , , ,

  • 02 Oct 2009 /  Code, Technology

    A lot of people don’t like to put their e-mail address directly on a web-page. You may have seen e-mail addresses that look like “jondoeATgmailDOTcom”, or “j o h n do e @ gma il . c o m!”. Strangeness such as spelling out words like “AT”, odd spaces, or non e-mail address characters such as ! are all tricks employed to avoid having your e-mail address scopped up by someone looking to spam you.

    The way in which I approach the problem is by having javascript write my e-mail address for me. I get both a well formed and clickable e-mail link, as well as avoid getting my e-mail address scooped up by a web crawler. Now, I know that the javascript document.write() function is shunned in most circles, but below is some example code.
    [cc lang="javascript"]
    // File is contactInfo.js
    document.write(”David Kennedy:
    Web
    http://davidwkennedy.com|
    E-Mail dave@orderinchaos.org |
    Phone: 435.770.6865 “);
    [/cc]
    Which is then called in the HTML file as seen below
    [cc lang="html"]

    [/cc]
    I’m sure that it’s probably quite possible to write a web crawler that will be smart enough to pick it up, but I think most web crawlers won’t.

    For example, when I use the web crawler below on my personal site davidwkennedy.com you can see that the results do not contain my e-mail address!

    Below is a web crawler, written in Python & borrowed from IBM’s site on web spiders.
    [cc lang="python"]
    #!/usr/local/bin/python

    import httplib
    import sys
    import re
    from HTMLParser import HTMLParser

    class miniHTMLParser( HTMLParser ):

    viewedQueue = []
    instQueue = []

    def get_next_link( self ):
    if self.instQueue == []:
    return ”
    else:
    return self.instQueue.pop(0)

    def gethtmlfile( self, site, page ):
    try:
    httpconn = httplib.HTTPConnection(site)
    httpconn.request(”GET”, page)
    resp = httpconn.getresponse()
    resppage = resp.read()
    except:
    resppage = “”

    return resppage

    def handle_starttag( self, tag, attrs ):
    if tag == ‘a’:
    newstr = str(attrs[0][1])
    if re.search(’http’, newstr) == None:
    if re.search(’mailto’, newstr) == None:
    if re.search(’htm’, newstr) != None:
    if (newstr in self.viewedQueue) == False:
    print ” adding”, newstr
    self.instQueue.append( newstr )
    self.viewedQueue.append( newstr )
    else:
    print ” ignoring”, newstr
    else:
    print ” ignoring”, newstr
    else:
    print ” ignoring”, newstr

    def main():

    if sys.argv[1] == ”:
    print “usage is ./minispider.py site link”
    sys.exit(2)

    mySpider = miniHTMLParser()

    link = sys.argv[2]

    while link != ”:

    print “\nChecking link “, link

    retfile = mySpider.gethtmlfile( sys.argv[1], link )
    mySpider.feed(retfile)
    link = mySpider.get_next_link()

    mySpider.close()

    print “\ndone\n”

    if __name__ == “__main__”:
    main()
    [/cc]
    Below are the results of the above crawler on davidwkennedy.com
    [cc lang="bash"]
    dave@dave-sparta:~$ python miniCrawler.py davidwkennedy.com /

    Checking link /
    adding index.htm
    adding photos.htm
    adding videos.htm
    adding projects.htm
    ignoring http://orderinchaos.davidwkennedy.com
    adding funstuff.htm
    adding about.htm
    ignoring http://twitter.com/davidwkennedy
    ignoring statcounter

    Checking link index.htm
    Checking link photos.htm
    Checking link videos.htm
    Checking link projects.htm
    Checking link funstuff.htm
    Checking link about.htm
    done

    dave@dave-sparta:~$

    [/cc]

    Tags: , , , ,

  • 27 Sep 2009 /  Code, Technology

    Vi and Vi-Improved (Vim) are a couple of text editors that many people use frequently on Linux systems. When I first was introduced to Linux I didn’t have to worry about all the setup stuff, and I learned Vim. Typing “vi” from the command prompt took me to vim. Vim was also configured to color computer code, as in the screenshot below.

    Vim with Syntax Coloring on For Programming Languages

    Vim with Syntax Coloring on For Programming Languages

    After installing Linux on my personal machine, I was aghast that the color schemes are not the default in Vim. This had to be rectified. I’m running Ubuntu 7.10 – Gusty Gibbon. (At the time of this blog post, the current version of Ubuntu is 9.04 – Jaunty Jackalope)

    I found that the Vim command for syntax color is :syntax on . However, with the default installation of Ubuntu, I received the following error:

    [cc]Sorry, the command is not available in this version: syntax on.[/cc]

    This means that you don’t have the full version of Vim installed. I searched for this error and found a variety of commands to try – none of which worked; there were various install errors. The command that worked I found on the Ubuntu Vim Tip of the Week and it was
    [cc lang="bash"]$sudo aptitude install vim-full[/cc]
    Now the command should work. However, it’s a pain to run the command every time I fire up Vim. Instead, you can make this automatic by putting the commant in the file
    [cc lang="bash"]~$ ./vimrc[/cc]
    Many of the sites on which I saw reference to the file didn’t mention it’s location. I searched tirelessly for this file. Turns out, be default, it does not exist. Vim runs just fine without it. To save Vim configuration settings, such as color schemes or tab-sizes, create the .vimrc file in your home directory.

    In addition to simply turning the syntax color on or off, you can even install new themes. A collection of themes are available from Vim.org Scripts

    The theme that I found that I like is the OceanDeep.

    After hours of playing around on my Linux, my finaly .vimrc file looks like (*note: remove comments to make work)
    [cc lang="csharp"]
    syntax on // turn color scheme on
    colorscheme oceandeep // set the color scheme
    filetype indent on // the next three lines auto indent while programming
    set autoindent
    set nu
    set ts=4 // set tab size to 4 spaces
    [/cc]

    Tags: , , , , ,

  • 22 Sep 2009 /  Code, Technology

    Here’s yet another old C# code snipped from my antiquated and dead blog. But throwin’ this out there for anyone who might find it helpful.

    [cc lang="csharp" tab_size="2"]
    try
    {
    string newTime;
    System.Net.Sockets.TcpClient t =
    new System.Net.Sockets.TcpClient(”time-a.nist.gov”, 13);
    System.IO.StreamReader rd = new System.IO.StreamReader(t.GetStream());
    newTime = rd.ReadToEnd();

    string[] times = newTime.Split(’ ‘); // Parses The String
    rd.Close();
    t.Close();
    Console.WriteLine(”Todays Date Is: ” + times[1]);
    }
    [/cc]

    I read somewhere that in order for this code to work, the TCP client has to be installed and enabled in Windows Vista. The TCP client is not, be default, enabled in Vista. To install it, Control Panel — Programs and Features — Turn Windows Features On or Off — TCP Client.

    However, I tried this after unchecking my TCP client and it still worked – so maybe it’s just an old wive’s tale! I’d suggest trying it without the TCP client installed – but don’t count on it working if you’re planning on distributing this without further research.

    The second line after the opening try { has ” time-a.nist.gov”, which is the time server. You can swap this out with some of the other time servers below.If you were feeling ambitious, you might try creating a class that will try a random time server on it’s first try and then randomly try other time servers if the first x fail.

    Name | IP | Location
    time-a.nist.gov | 129.6.15.28 | NIST, Gaithersburg, Maryland
    time-b.nist.gov | 129.6.15.29 | NIST, Gaithersburg, Maryland
    time-a.timefreq.bldrdoc.gov| 132.163.4.101 | NIST, Boulder, Colorado
    time-b.timefreq.bldrdoc.gov | 132.163.4.102 | NIST, Boulder, Colorado
    time-c.timefreq.bldrdoc.gov | 132.163.4.103 | NIST, Boulder, Colorado
    utcnist.colorado.edu | 128.138.140.44 | University of Colorado, Boulder
    time.nist.gov | 192.43.244.18 | NCAR, Boulder, Colorado
    time-nw.nist.gov | 131.107.1.10 | Microsoft, Redmond, Washington
    nist1.datum.com | 209.0.72.7 | Datum, San Jose, California
    nist1.dc.certifiedtime.com | 216.200.93.8 | Abovnet, Virginia
    nist1.nyc.certifiedtime.com | 208.184.49.9 | Abovnet, New York City
    nist1.sjc.certifiedtime.com | 208.185.146.41 | Abovnet, San Jose, California

    Tags: , , , ,

  • 21 Sep 2009 /  Code, Technology

    This is an older post, from another [now dead] blog of mine, but I thought I’d throw it up here as well.

    When working in C#, if you are trying to load a jpeg file and you use the System.Drawing.Image.FromFile method on a file that you do not have the proper permissions for, you will get an “Out of Memory” exception. Somewhat misleading – but that is the error resulting from improper permissions.

    I found a web site that claimed that you create a FileStream object from your file and then use the FromStream method on the Image object.

    My solution for loading a file was as follows
    [cc lang="csharp"]
    private void addPhotoButton_Click(object sender, EventArgs e)
    {
    DialogResult newPhoto;
    Image newPhotoImage;
    string newPhotoFileName;
    newPhoto = openFileDialogAddPhoto.ShowDialog();
    if(newPhoto == DialogResult.OK)
    {
    newPhotoFileName = openFileDialogAddPhoto.FileName;
    FileStream photoStream =
    new FileStream(newPhotoFileName,
    FileMode.Open, FileAccess.Read);
    newPhotoImage = Image.FromStream(photoStream);
    }
    }
    [/cc]

    Tags: , ,