Hardware: what constitutes a computer

Computer hardwares are the physical objects which constitutes the computer in the real world. Most of the computer hardwares are made of complex electronic circuits with semiconductors. Those circuits, in theory, are no different from the battery-and-wire circuit one could make on a circuit board, but in a much smaller scale.

Computer hardware can be divided into two categories: peripheral hardware and core hardware.

Peripheral hardware includes the screen, keyboard, mouse, speaker, trackpad, etc. A computer can function properly without peripheral hardwares. Yes, without a screen or keyboard, you may not interact directly with the computer, but the computer can still run perfectly and can receive instructions in other forms, such as via the internet. In comparison, a computer will not function without core hardwares, which includes the CPU, GPU, ram, motherboard, disk, etc.

Check these videos¹ to see how different hardwares are assembled in a computer, these ² for how computer hardware works in the electronic circuits level.

Peripheral hardware matters little for our purpose. Let us focus intead on how core hardwares function.

Too boring? You may skip this chapter

This chapter is a brief overview of how each parts of the computer work together. You will understand this chapter better after you have got your hands dirty.

If you want to start coding right away, go to choosing os to see why Linux is a better operating system for coding.

Central Processing Unit (CPU)

CPU is the brain and the most important part of the computer. All data, computation, input, output, and programming logics are processed by the CPU. CPU is the most important factor that determines how smoothly and fast the computer runs.

Here are some terms that describes a CPU:

Architecture.

The architecture describes the low level circuit design of the CPU. Some would say the “architecture of the computer” when he means the “architecture of the CPU of the computer”. For most user, the architecture only matters when installing softwares, as some softwares may not work on certain architectures.

Colloquially, the term architecture may be mistaken as the term instruction set, which defines how the software interacts with the CPU (i.e., the API). Common instruction set includes x86-64, Arm, Mips, Risc-V, and LoongArch. Most intel and Amd’s CPU are of x86-64 (also known as amd64). Mac’s M series chips, chips of most cell phone, and Qualcomm’s Snapdragon are of Arm.

Instruction set is only a abstract definition for how CPU shall function. It is up to the manufactures to design the CPUs to fullfill the standard.

Intel and Amd have jointly developped x86-64 instruction set based on earlier ones for over forty years. This is both a blessing and a curse, as it makes x86 the most popular, stable, and well-supported instruction set, while, for compatibility, some outdated designs have to remain.

Apple’s M series chips use a special Arm instruction set developped by Apple. Apple designs M series chips to be power efficient, and one way to acheive it is to reduce the chip’s ’complexity by removing legacy support. It is not of surprise, as a result, M series computer are more power efficient in the price of all softwares have to be rewritten for it.

Another aspect of architecture is bit width. Most CPU architecture today are 64 bits, which means the size of the register in CPU is 64 bits wide. A register in CPU is where data is being processed. A 64 bit register can process data of 64 bits (That is, 8 bytes, or 1/128 of a KB) for each CPU clock cycle. Apple’s M series, Intel’s Core Series, and Amd’s Ryzen series are all of 64 bits.
Frequency, or Clock Rate³

Frequency is the number of clock cycles a CPU can perform in a second. Most CPU can perform a single task in one clock cycle, such as adding two numbers. However, more complicated instruction may take multiple cycles. Some CPU may execute multiple instructions in one cycle.

Frequency is measured in Hertz, which is times per second. The higher the frequency usually means the faster the CPU.

The first Intel CPU, Intel 4004, released in 1971, had a frequency of 740 kHz. In 2000 Intel managed to produced Pentium 4 with the freqency of 2 GHz, a 2700-time increase over 30 years. Since then it became increasingly difficult to further increase the frequency. In 2024, Intel released Core i9-14900KS, with frequency of 6.2 GHz, only three times of its 2000 predecessor. Yet this chip runs more than 1000 times much faster than Pentium 4 thanks to innovations in other areas.
Threads, Cores

A CPU with 4 threads can perform 4 tasks at a time. One way of acheiving it is to solder multiple smaller CPUs into one. Each of the smaller CPU is called a core. Some cores can execute multiple threads syncronously (e.g., Intel’s Hyperthreading).

Integrating several cores into one chip enables flexibility in CPU design. For example, some CPUs have efficency cores for lighter task and performance cores for heavy tasks.
Cache

For CPU to run faster, it needs to obtain data faster. Data are stored in memory (RAM) and storage device (SSD). Retrieving data from memory may take thousands of CPU cycles, but it only takes one cycle to execute it. Cache was invented to solve this problem.

This is how caches work: when CPU is to retrieve data, it predicts data required for the subsequent operations and stored them in the cache. CPU can retrieve data from cache super fast, usually under 10 or 100 cycles.

Sometimes there are multiple levels of cache, denoted as L1, L2, L3, etc. L1 caches is the fastest but has smallest capcity; L3 is slowest but can hold more data. CPU may have two L1 caches, L1d and L1i. L1d stores data, where L1i stores instructions. This design takes into consideration the cost-effectiveness of different memory, as faster memory usually costs much more.

Caches, RAM and hard disk memory constitutes the memory hierarchy⁴ paradigm explained in the next session.

History of Intel Processors 5

This tables describes the interesting evolution of Intel CPU.

Name	Year	Frequency	Cache	Cores	N.B.
Intel 4004	1971	740 kHz	None	1	First single chip IC processor, 4 bit
Intel 8008	1972	500 kHz	None	1	8 bit
Intel 8085	1976	3 MHz	None	1	8 bit pre x86
Intel 8086	1978	10 MHz	None	1	16 bit x86
Intel386™ DX	1985	33 MHz	None	1	32 bit x86
Intel486™ DX	1989	50 MHz	8 Kb	1	32 bit x86
Intel® Pentium®	1993	66 MHz	8 Kb L1d, 8Kb L1i	1	32 bit x86. First super-scalar x86 processor. (Meaning executing 2 instruction at one clock cycle)
Intel® Pentium® II	1997	300 MHz	512 Kb L2	1	32 bit x86. Has 2 level cache
Intel® Pentium® 4	2000	2 GHz	256 Kb L2	1	32 bit x86. First x86 processor with hyper-threading
Intel® Pentium® 4 Extreme Edition	2004	2 GHz	2 Mb L2	1	32 bit x86
Intel® Core™2 Duo Processor E4300	2007	1.8 GHz	2 Mb L2	2	64 bit x86
Intel® Core™ i7-950 Processor	2009	3.06 GHz	8 Mb	4 cores, 8 threads	64 bit
Intel® Core™ i7-3970X	2012	3.5 GHz	15 Mb	6 cores, 12 threads
Intel® Core™ i7-6700	2015	3.4 GHz (Turbo)	8 Mb	4 cores, 8 threads	Integraged Intel® HD Graphics 530 GPU
Intel® Core™ i7-8665UE	2018	4.0 GHz (Turbo)	8 Mb	4 cores, 8 threads	Integraged Intel® HD Graphics 620 GPU
Intel® Core™ i9 processor 14900KS	2024	6.2 GHz (Turbo)	32 Mb	24 cores, 32 threads	Integraged with Intel® UHD Graphics 770 GPU

The Memory Hierarchy

Moder computer are designed with memory hierarchy principle for effective data storage, fast retrieval while maintaing cost.

Caches, the fastest, most expensive, and the smallest in capacity stay at the top of the hierarchy. Random Access Memory (RAM) lays in the middle. It is slower than cache but faster than hard disk. Hard disk is the cheapest, largest, and the slowest.

Please see the table in the next session for comparison of the speed of caches, RAM, and hard disk, and the time for CPU to execute various other tasks.

Characterising Performance of Storage Devices: Bandwidth and Latency

Two comman benchmarking metrics for storage devices are bandwidth and latency.

Bandwidth (or throughputs), measured in bytes per second, is the maximum amount of data that can be transferred per second. There are two operations for a storage device, read and write, and bandwidth for reading is likely to be higher than writing. Moreover, the bandwidth would diminish dramatically when the device needs read and write simultaneously, or the capacity of the device is close to full.

Latency is the time between the CPU sends instruction for retrieving the data and the first byte of data delivered to the CPU.

In ideal situation bandwidth and latency are independent. In reality, however, they are weakly correlated: it is of no use that you can transfer 10 GB per second but it took 10 seconds to initiate the transfer.

Accurately benchmarking storage devices is very diffiult for two reason. First, the performance of the storage device is unstable. It dependends on the CPU,the design of the motherboard, and the type and content of the data. Second, most of the low level input and output are controlled solely by the hardware and is not accessible to the user or the operating system.

Let us now introduce memory and secondary storage devices, the lower part of the hierarchy.

Random Access Memory (RAM)

Random Access Memory, or RAM, is in the middle layer of the memory hierarchy. Slower and cheaper than cache, but faster and more expensive than hard disks.

RAM is volatile storage device; this measn all data stored in RAM will be lost after powering off.

When computers are executing some program stored in SSD, it will first load all instructions and data into RAM for faster retrieval. The exact behaviour will depend on operating system.

Secondary Storage (Harddisk, SSD, HDD)

Secondary storage devices (or, colloquially, the hard disk) is at the bottom of the hierarchy. They are static storage devices. The data will not be lost after powering off. All persistent data and software, including the operating system, are stored in a secondary storage device.

Solid State Drive (SSD) and Hard Disk Drive (HDD)

The technology for storage devices has underwent various innovation since its inception. In the early days people used tapes and floppy disks. Two common modern technology are Solid State Drive (SSD) and Hard Disk Drive (HDD).

HDD is usually cheaper, has larger capacity and longer lifetime. It uses a spinning magnetic disk for data storage and a magnetic needle for reading and writing. You can hear the disking spinning when it is powered on.

SSD is the newer technology that boasts for its high bandwidth and low latency. It does not use magnetic disk but semiconductor logical gates. This means SSD looks like a chip and can be made smaller.

Historically, SSD has been very expensive. Their price has dropped dramatically since 2023, thanks to intense market competitions from the Chinese firms.

Comparison of Time Needed for Various Operations

The following table compares the time needed to execute various operations.

Event	Time (ns)	Time (\( \mu s\))	Time (ms)	Scale
One CPU cycle	0.25			0.5
L1 cache access	0.5			1
L2 cache access	7			14
Mutex lock/unlock	25			50
Main memory access	100	0.1		200
Transmitting an Ethernet Packet	15000	15		\(3 \times 10^4\)
SSD Access	250000	250		\(5 \times 10^5\)
Move Data to GPU		800	0.8	\(1.6 \times 10^6\)
Internet Round Trip		5000	5	\(1 \times 10^7\)
HDD Access		10000	10	\(2 \times 10^7\)
Cacluating Primes to 100,000			100	\(2 \times 10^8\)
Send a Packet From Beijing to Edinburgh			500	\(1 \times 10^9\)
Computer Reboot			30,000	\(6 \times 10^{10}\)

To make sense of these numbers, the following table assumes one CPU cycle to take 0.25 seconds, and enlong the time for other events on the same proportion.

Computer Event	Real Life Event	Time	Scale
One CPU cycle	Eye Blinking	0.25 s	0.5
L1 cache access	Typing a Word	0.5 s	1
L2 cache access	Running 50 meters	7 s	14
Mutex lock/unlock	A round in Basketball	25 s	50
Main memory access	Reading a short passage	100 s	200
Transmitting an Ethernet Packet	Flight from London to Moscow	4.2 h	\(3 \times 10^4\)
SSD Access	Life span of may fly	70 h	\(5 \times 10^5\)
Move Data to GPU	Slovania Won Independence	10 days	\(1.6 \times 10^6\)
Internet Round Trip	China’s Chang’e 6 Returns Samples from Moon	2 months	\(1 \times 10^7\)
HDD Access	Wheat Maturation	4 months	\(2 \times 10^7\)
Cacluating Primes to 100,000	One US president Term	4 years	\(2 \times 10^8\)
Send a Packet From Beijing to Edinburgh	Pyramid of Giza Built	20 years	\(1 \times 10^9\)
Computer Reboot	Span of Byzantine Empire	1200 years	\(6 \times 10^{10}\)

Graphics Processing Unit (GPU) 6

While CPU has a dozen of big and powerful cores, GPU has thousands of small cores. The GPU’s cores can effectively handle computation, but has difficulties in processing logics.

This makes GPU suitable for running algorithms that can be splitted into smaller parallel tasks, provided those smaller tasks are mutually independent. These algorithms are sometimes called embarrasingly parallel algorithms. Most algorithms in linear algebra and statistics, such as vector addition, dot product, matrix multiplication, and finding the mean value are embarrassingly parallel.

Most graphical algorithms are also embarrassingly parallel, as the color of one pixel does not depend on the other. In fact, most graphical algorithms takes the form of matrices. Thus comes the name of GPU, as its original purpose is to process graphical algorithms.

In recent years people found that GPU can execute machine learning algorithms more efficiently than CPU. It shall be of no surprise as machine learning utilises a lot of linear algebra and statistical algorithms.

Other Hardwares

Footnotes

Assemble the core parts; Chinese Version

Branch Education. Branch Education on Bilibili.

Intel Explanation for Clock Rrate

⁴

Wikipedia: Memory Hierarchy

⁵

Intel Processors; i7-950; i7-3970x; i7-6700; i7-8665UE; i9-1490KS

⁶

Intel on Difference of CPU and GPU

Yetin Guide for Coding