Latest revision as of 15:15, 22 May 2024

Inside a computer, everything is bits. The topic of information representation discusses how different information inside a computer program actually looks like in the background.

Word

Instead of one bit at a time, computers do operations on several at the same time. The word size of the computer is the number of bits representing each piece of data that it processes.

Notably, since instructions are also "data" received by the CPU, each instruction is also constrained by the word size.

Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).

Byte and Endianess

Most modern day computers are byte-addressable memory. The byte is the lowest unit of data that has an address and can be accessed at once.

Since we are storing multiple bytes of a word across memory, the order in which we store them, endianess, needs to be specified.

There are two choices: Big Endian and Little Endian. Let's consider how to store 0x10203040 in a 32-bit machine

In Big Endian, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.

In Little Endian, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.

Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.

Signed Integers

While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.

The first method to represent signed integers with bits is signed magnitude. Under this system, the first bit (sign bit) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.

The second method is two's complement.

Floating Point

To store decimal values, the floating point representation is used. Similar to scientific notation, it uses a series of bits called the mantissa to store significant binary digits, and another series of bits called the exponent to store the order of magnitude of the number. The final calculation is something like

$1.m\times 2^{e}$

To handle negative numbers, the first bit is used as a sign bit.

To handle negative exponents, all exponents are subtracted by a constant to center the range of exponents on 0. For example, when 8 bits is used to represent the exponent (as is usually the case), the range of possible values is 0 to 255, so 127 is subtracted from the exponent so that the actual range is -127 to 128.

Arrays

The simplest structured group of values is an array, in which multiple values of the same type is stored consecutively in a block of memory. We then keep track of the address of the first element in the array and the size of each element so that we can find the n-th element easily.

For example, consider an array of 10 32-bit integers arr. To access the 6th element, I can simply move 5 integer sizes after the first element to get the address I want:

*(&arr + 5*sizeof(int))

Notably, this is also a major reason why array indices start at 0 in most programming languages.

Struct

A struct boils down to an array that can have different types of values, each taking up different sizes.

@@ Line 9: / Line 9: @@
 == Byte and Endianess ==
-Most modern day computers are '''byte-addressable'''. The byte is the lowest unit of data that has an address and can be accessed at once.
+Most modern day computers are [[byte-addressable memory]]. The byte is the lowest unit of data that has an address and can be accessed at once.
 Since we are storing multiple bytes of a word across memory, the order in which we store them, '''endianess''', needs to be specified.
@@ Line 26: / Line 26: @@
 The first method to represent signed integers with bits is '''signed magnitude'''. Under this system, the first bit (''sign bit'') is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.
-The second method is called '''2's complement''', and it is more widely used due to several advantages covered later.
+The second method is [[two's complement]].
-In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2's complement.
+= Floating Point =
+To store decimal values, the '''floating point''' representation is used. Similar to scientific notation, it uses a series of bits called the '''mantissa''' to store significant binary digits, and another series of bits called the '''exponent''' to store the order of magnitude of the number. The final calculation is something like
-The advantage of using 2's complement over signed magnitude is twofold:
+<math>1.m\times2^{e}</math>
-* Normal binary arithmetic can be applied.
+To handle negative numbers, the first bit is used as a sign bit.
-* There is no negative zero
-To negate signed numbers, flip all bits and add 1. This comes from math: <math>-2 = -4 + ((4 - 1) - 2) + 1</math>
+To handle negative exponents, all exponents are subtracted by a constant to center the range of exponents on 0. For example, when 8 bits is used to represent the exponent (as is usually the case), the range of possible values is 0 to 255, so 127 is subtracted from the exponent so that the actual range is -127 to 128.
 = Arrays =
+The simplest structured group of values is an '''array''', in which multiple values of the same type is stored consecutively in a block of memory. We then keep track of the address of the first element in the array and the size of each element so that we can find the n-th element easily.
+For example, consider an array of 10 32-bit integers <c>arr</c>. To access the 6th element, I can simply move 5 integer sizes after the first element to get the address I want:
+<sh>
+*(&arr + 5*sizeof(int))
+</sh>
+Notably, this is also a major reason why array indices start at 0 in most programming languages.
+= Struct =
+A '''struct''' boils down to an array that can have different types of values, each taking up different sizes.
 [[Category:Computer Architecture]]

Anonymous

Search

Information Representation: Difference between revisions

Namespaces

More

Page actions

Latest revision as of 15:15, 22 May 2024

Contents

Word

Byte and Endianess

Signed Integers

Floating Point

Arrays

Struct

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Information Representation: Difference between revisions

Latest revision as of 15:15, 22 May 2024

Word

Byte and Endianess

Signed Integers

Floating Point

Arrays

Struct

Navigation

Wiki tools

Page tools

Categories