Quite some years ago, I started my adventures in the wonderful world of programming. The tool of choice for me at that point of time was QBasic. Besides the obvious starters like printing information to the screen and sending random numbers to the PC speaker, I also started digging through the source files of both gorilla.bas and nibbles.bas which came with every installation of MS-DOS :)
After QBasic, I started to look at Pascal and after some time at Pascal in combination with assembly. Fortunately, my local library had books about assembly, including the ones from Peter Norton :) Mostly used the assembly to efficiently do graphical things. Waaaaaaaay before you had DirectX/OpenGL/nVidia/shaders/etc you only had Mode 13h / Mode-X and things like paged memory / protected mode / etc. The best was, you could render the most amazing things on a 386 in real-time :) When I was learning assembly, I really got fascinated with cool and very clever tricks you can do with bit-operations.
The other day I was reading a very informative article about Unicode that for the first time made all the details about Unicode completely clear to me. If you still have some issues with wrapping your head around Unicode, this is definitely an article I can recommend!
A side-effect of reading this article was that it made me think about ASCII and some clever tricks that were built in.
Upper and lower case in ASCII
If you look at the ASCII values for the letters a-z and A-Z in ASCII, you will see that for each letter, the difference between the upper and lower case is exactly 32. If you look at the ASCII number for the letter g, it is 103. the ASCII number for the letter G is 71.
If you look at the bit representation, this comes down to the following:
Because of the 32 offset, the only difference between these two letters is bit 5: if bit 5 is set, we are dealing with a lower case letter, if this bit is not set we are dealing with an uppercase letter.
This means that if you have a string that you know only consists of letters, you can easily toggle the case of all letters by doing a bitwise XOR with
0010 0000 for each of the letters.
To create a lowercase version of the string is just a matter of doing a bitwise OR with
0010 000 for each of the letters. All the letters that are already lowercase will stay lowercase and all the letters that were uppercase will now have the bit 5 set and therefore become lowercase.
Alternatively, if you want to create an uppercase version of a string, you can do a bitwise AND operation with the value
1101 111 forcing bit 5 to be reset, resulting in uppercase letters.
Note that you will have to ensure you only do this on the characters that represent letters (you cannot convert the & character into an uppercase or lowercase :) )
Integer values in ASCII
One of the other cool tricks that I have used when I was playing around with Assembly and Pascal is the fact that the characters 0-9 have been put in a particular place also in the ASCII table, starting at 48:
So if you know that a specific character is a number, you can get the numerical value as an integer very easy by just performing an AND operation on the ASCII value with the mask
0000 1111. For example, performing an AND on
0011 0110 (the binary representation for the ASCII character '6') with the mask
0000 1111 will result in
0000 0110 which is exactly the same as the value represented by the character '6'.
Besides all of the above clever tricks, one additional cool thing you can do with ASCII is creating the ASCII art:
I really think it is amazing that they thought about these kinds of clever things when designing the ASCII table. With the abundance of memory and computing power nowadays, I am not sure if these kinds of bit tricks would still be included during a design phase.
Are you aware of any other cool or clever tricks you can do with ASCII, or any other clever bit tricks in general, let me know in a comment!