Software by Steven

Sometimes a square peg is a round hole

Cluster size, Cache Lines and Why Old Video Games had 3-letter High Scores

Like many other software developers, I’m constantly trying to deepen my understanding of things, to learn more and push myself further. As such, I took it upon myself recently to try and make a high score system for a 3D game I’d made last year. I wasn’t content with just a simple “cout >> score”, however, I wanted to make this an academic exercise. I wanted the “perfect” high score system.

So I started researching. Given that this dealt with IO, I wanted to make sure I adhered to a few good practices, and to learn about others. For this, I had to consider the low-level implementation details. First consideration…

Cluster Size

Windows file systems (NTFS, FAT and exFAT) allocate space on the hard drive according to something called cluster size. File allocation on the hard drive must be a multiple of the cluster size. Files up to the cluster size (lets say 4 KB) will take up that much space on the disk. If the file contents amount to less (lets say a 2 KB text file), the rest of the drive space is padded. This is done for speed reasons. Files larger than the cluster size (6 KB) will have extra clusters allocated to allow for the entire file to be stored (8 KB on disk). The relationship can be generalized as:

disk space = round up ( file size / cluster size )

One downside of this is that every file which isn’t equal to exactly the cluster size has wasted space. Another is that since the OS reads in information in entire clusters, performance suffers if file size is less than the cluster size (only some of the data on the cluster is relevant). While the age of terabyte hard drives makes the first downside negligible, the second should still be a concern. However, given this being an academic exercise, I opted to make my high scores file be the cluster size.

So just how big might the cluster size be? Optimal cluster size varies by file system, operating system and hard drive size, but by and large a reliable figure is 4KB. According to Microsoft this is the default on NTFS for any Windows NT 4.0 machine, or anything newer than Windows 2000 with a hard drive size under 16 TB. Given that the game will require DirectX 9 and will likely run on a home computer, the Windows 4-6 family is almost certainly to be the host OS, and NTFS is a near-guarantee for the file system. So until 16 TB hard drives are common, I should be safe with a 4KB file size :)

Cache Line Size

Here’s where I had to begin to take a lot of consideration and care. With varying types of memory in computers (registers, L1 Cache, L2 Cache etc) I wanted to try and make my data structures as accommodating as possible. For this, I wanted to be sure they would fit on the L1 Cache as cleanly as possible. This way, more data could be kept there for quick transfer to the CPU rather than having to retrieve values from secondary memory like RAM. Turns out, both AMD and Intel recommend that for this, structure size should be an integral multiple of the cache line size, and aligned on that byte boundary. In other words, an L1 Cache with a cache line size of 64 works best on a 64 (or 128) byte structure stored in memory at an address divisible by 64.

Turns out, nearly all modern processors are 64-bit and work off a 64 byte L1 Cache. It’s not a guarantee to say than an n-bit processor will have a n-byte L1 Cache line size, but based on my research in the Pentium family, it’s a fair assumption. So I started trying to juggle how to work with these 64 bytes. I didn’t see score going above 4 billion, so 4 bytes seemed sufficient for this. I wanted to store a date as well, which at an 8 byte value seemed enough. With 12 bytes down and 52 to go, it seemed 64 was a bit much. Aiming for 32, I then gave an extra 4 bytes for a bit field. 16 bytes down, 16 to go. And I needed a name. 16 bytes makes for an array of 8 wchar_t variables, which meant a name of 7 letters could be supported (plus 1 null byte on the end). Why wchar_t? I wanted to go beyond the 1-byte “char” to not limit myself to ASCII characters.

#define CACHE_LINE_SIZE 32
#define NAME_SIZE (CACHE_LINE_SIZE - sizeof(int)*2 - sizeof(long))/2
typedef struct score_t {               // Engineered to be CACHE_LINE_SIZE bytes in size
 int score;                            // Most commonly used member first, used in ordering (4 bytes)
 int flags;                            // Score flags (4 bytes), helps natural alignment of next item (the date)
 long date;                            // Time stamp (8 bytes)
 wchar_t name[NAME_SIZE];              // Player's name, takes up remainder. Place last to ease natural alignment
};

With this structure, I was able to fit 2 scores in one cache line. Upon seeing if I could fit 4, I realized something. At a 16 byte struct size, I would’ve stripped out the bit field, which if you do the math would allow for 3 letters to be specified as the earner of the high score. That sounded familiar…

Pac-Man Hi Score

Pac-Man Hi Score

Notice how there’s only three letters in the name field? A lot of old 8-bit games only allowed for 3 letters… Pac-man (above), Galaga, Contra, and Mario among them. Given the knowledge that they ran on an 8-bit processor and the assumption that an 8-byte cache line size was involved, along with an assumption that the character set was likely limited to ASCII characters, you have the following variation on the above:

#define CACHE_LINE_SIZE 8
#define NAME_SIZE (CACHE_LINE_SIZE - sizeof(int))
typedef struct score_t {               // Engineered to be CACHE_LINE_SIZE bytes in size
 int score;                            // Most commonly used member first, used in ordering (4 bytes)
 char name[NAME_SIZE];                 // Player's name, takes up remainder. Place last to ease natural alignment
};

Do the math and it works out that there is just enough room for 3 letters to be given. Sure, the size of an int wouldn’t’ve been 4 bytes on that processor, but I’m going to guess that based on the perfect Pac-man high score being realized at 3,333,360 that 4 bytes were allocated for the score field all the same. Maybe first, middle and last initials weren’t the real reason for having a 3-letter high score entry.

At the end of it all, I used the first code snippet in my game: 4,096 bytes of high scores at 32 bytes/score makes for a hefty 128 high scores. However the real joy I took out of this wasn’t the code but gaining new insight into some of my first games, with a  like-mindedness of trying to eke out every cycle possible. Code on.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: