I have been playing around with writing some basic encoders, figuring a good first step would to start with something on the easier side, I decided to jump in with a Base32 encoder.
First things first, here are some details on Base32 from RFC 4648, Section 6.
The Base32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but that need not be human readable.
A 33-character subset of US-ASCII is used, enabling 5 bits to be represented per printable character. (The extra 33rd character, “=”, is used to signify a special processing function.) The encoding process represents 40-bit groups of input bits as output strings of 8 encoded characters. Proceeding from left to right, a 40-bit input group is formed by concatenating 5 8bit input groups. These 40 bits are then treated as 8 concatenated 5-bit groups, each of which is translated into a single character in the base 32 alphabet. When a bit stream is encoded via the base 32 encoding, the bit stream must be presumed to be ordered with the most-significant- bit first. That is, the first bit in the stream will be the high-order bit in the first 8-bit byte, the eighth bit will be the low- order bit in the first 8bit byte, and so on. Through the development process, I came up with 3 basic designs for the process.
Each 5-bit group is used as an index into an array of 32 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 3, below, are selected from US-ASCII digits and uppercase letters.
In my tinkering around with this, I came up with a couple of options, some quick and dirty, and some a
little more in-depth. I’ll outline each of the methods that I used, and surprisingly they all seemed to perform within about a 1/1000th of a second (in some very basic performance testing that I did).
The first method that I tired was basically a simple method that took 1 bit at a time and shifted it into a buffer. This method, simply put, builds the 5-bit blocks bit-by-bit.
public static string ToBase32String(byte[] bytes) { if (bytes == null || bytes.Length < 1) return string.Empty; StringBuilder sb = new StringBuilder(); byte index = 0; int hi = 0; int currentByte = 0; int totalBytes = bytes.Length; byte workingByte = bytes[currentByte]; int bits = 0; index = 0; while (true) { index = (byte)((index << 1) | ((workingByte >> (7 - bits)) & 1)); bits++; hi++; if (hi == 5) { if (index == 0 && currentByte >= totalBytes) { index = 32; } sb.Append(ValidRFC4648Chars[index]); if (index == 32 && ((sb.Length > 8 && (sb.Length % 8) == 0) || (sb.Length == 8))) { break; } index = 0; hi = 0; } if (bits == 8) { currentByte++; bits = 0; if (currentByte >= totalBytes) { workingByte = 0; if ((sb.Length % 8) == 0 && sb.Length >= 8) { break; } } else { workingByte = bytes[currentByte]; } } } if (sb.Length < 8) { return sb.ToString().PadRight(8, '='); } return sb.ToString(); }
The next method that I tired, was the simplest method. I say simplest because it uses the most built in .Net stuff and doesn’t do as much binary arithmetic and bit shifting. I created a bit string (base-2 number string) using Convert.ToString(byte, 2) then I pull 5-bit (in this case, 5-character) chunks from the string and then encode them.
public static string ToRFC4648Base32_Strings(byte[] bytes) { StringBuilder sb = new StringBuilder(); int pos = 0; byte index = 0; StringBuilder byteString = new StringBuilder(); foreach (byte b in bytes) { byteString.Append(Convert.ToString(b, 2).PadLeft(8, '0')); } string bString = byteString.ToString(); while (pos < (byteString.Length / 5)) { index = Convert.ToByte(bString.Substring(pos * 5, 5), 2); pos++; sb.Append(ValidRFC4648Chars[index]); } if ((pos * 5) < byteString.Length) { index = Convert.ToByte(bString.Substring(pos * 5).PadRight(5, '0'), 2); sb.Append(ValidRFC4648Chars[index]); } while ((sb.Length < 8) || (sb.Length > 8 && sb.Length % 8 != 0)) { sb.Append("="); } return sb.ToString(); }
The final method, the one that I like the best, pulls the data out in a more block based manner and really just accomplishes the task.
public static string ToRFC4648Base32(byte[] bytes) { if (bytes == null || bytes.Length < 1) return string.Empty; StringBuilder sb = new StringBuilder(); int currentByte = 0; int start = 0; int stop = 5; byte workByte = 0; byte index = 0; int numBytes = bytes.Length; while (currentByte < numBytes || (sb.Length < 8 || (sb.Length > 8 && sb.Length % 8 != 0))) { if (start == 0) { workByte = (byte)(bytes[currentByte] >> 3); start += 5; stop += 5; } else { if (currentByte < numBytes) { workByte = bytes[currentByte]; } else { workByte = 0; } workByte = (byte)(((workByte << start) & 0xff) >> 3); byte nextByte = 0; if (stop > 8) { if (currentByte < numBytes - 1) { nextByte = bytes[++currentByte]; } else { currentByte++; } workByte = (byte)(workByte | (nextByte >> (16 - stop))); start -= 3; stop -= 3; } else { if (currentByte < numBytes) { nextByte = bytes[currentByte]; } else { nextByte = 0; } workByte = (byte)(workByte | ((nextByte & (((1 << (8 - stop)) - 1) << (8 - stop))))); start += 5; stop += 5; } } if (workByte == 0 && currentByte == numBytes && (numBytes % 5) == 0) break; index = workByte; if (index == 0 && currentByte >= numBytes) index = 32; sb.Append(ValidRFC4648Chars[index]); } return sb.ToString(); }
From the limited testing I did, the latter code performed an average encoding speed of 103,057,258-bit/s. For the testing I encoded 5000 blocks of 32-bytes of random data. On average it took 0.0124028 seconds to perform those encodings. Your results, and math may vary.
With all of these, I’m sure there are many things that could be done to improve performance, but for my playing around they perform perfectly well. Feel free to use and update as you wish!