Information Representation: Difference between revisions

From Rice Wiki
No edit summary
Line 1: Line 1:
Inside a computer, everything is bits. The topic of '''information representation''' discusses how different information inside a computer program actually looks like in the background.
Inside a computer, everything is bits. The topic of '''information representation''' discusses how different information inside a computer program actually looks like in the background.
= Word =
Instead of one bit at a time, computers do operations on several at the same time. The '''word size''' of the computer is the number of bits representing each piece of data that it processes.
Notably, since instructions are also "data" received by the CPU, each instruction is also constrained by the word size.
Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).
== Byte and Endianess ==
Most modern day computers are '''byte-addressable'''. The byte is the lowest unit of data that has an address and can be accessed at once.
Since we are storing multiple bytes of a word across memory, the order in which we store them, '''endianess''', needs to be specified.
There are two choices: Big Endian and Little Endian. Let's consider how to store 0x10203040 in a 32-bit machine
In '''Big Endian''', the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.
In '''Little Endian''', the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.
Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.


= Signed Integers =
= Signed Integers =
Line 16: Line 36:


To negate signed numbers, flip all bits and add 1. This comes from math: <math>-2 = -4 + ((4 - 1) - 2) + 1</math>
To negate signed numbers, flip all bits and add 1. This comes from math: <math>-2 = -4 + ((4 - 1) - 2) + 1</math>
= Arrays =
[[Category:Computer Architecture]]
[[Category:Computer Architecture]]

Revision as of 21:32, 2 April 2024

Inside a computer, everything is bits. The topic of information representation discusses how different information inside a computer program actually looks like in the background.

Word

Instead of one bit at a time, computers do operations on several at the same time. The word size of the computer is the number of bits representing each piece of data that it processes.

Notably, since instructions are also "data" received by the CPU, each instruction is also constrained by the word size.

Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).

Byte and Endianess

Most modern day computers are byte-addressable. The byte is the lowest unit of data that has an address and can be accessed at once.

Since we are storing multiple bytes of a word across memory, the order in which we store them, endianess, needs to be specified.

There are two choices: Big Endian and Little Endian. Let's consider how to store 0x10203040 in a 32-bit machine

In Big Endian, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.

In Little Endian, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.

Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.

Signed Integers

While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.

The first method to represent signed integers with bits is signed magnitude. Under this system, the first bit (sign bit) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.

The second method is called 2's complement, and it is more widely used due to several advantages covered later.

In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2's complement.

The advantage of using 2's complement over signed magnitude is twofold:

  • Normal binary arithmetic can be applied.
  • There is no negative zero

To negate signed numbers, flip all bits and add 1. This comes from math:

Arrays