Yetin Guide for Coding

It is better to teach people fishing than to give them fish. – Chinese Proverb

This is a guide for self studying computer science, i.e., coding. The targeted audience are those with little or no experience of coding, and those who have learned coding only for a specific task but lack the systematic knowledge. The only prerequisites are a personal computer and the determination to learn.

The abundance of information on the internet makes self-studying coding possible, but it does not make it easy. These information is disorganised and unequal on quality, with some even blatantly false. It is impossible for beginners to organise the informations, judge their correctness, overcome all the troubles of understanding them, and make coherent plans for what to learn next. This book will help you conquer these problems.

Specifically, it will help you in two ways:

  1. It will teach you enough knowledge so that you can study computer science by yourself,
  2. It will provide for you enough recourses for further study.

If you do not plan to study computer science, I encourage you to try. Computer science and related technologies has influenced every aspect of our lives. Not understanding how they work is the modern illiteracy.

Computer science is a new born subject, and many of its denominations have not been settled. The term ‘coding’ is usually used synonymously with computer science. This book will adopt this custom, and use coding to mean both the act of writing the code and the complex theories behind it.

Logistics

As of January 2025, the book is still in early draft. The content is prereleased on book.yetin.net and Github.

All suggestions and criticisims are warmly welcomed as issues on Github.

Who wrote this book?

This book is written by Harry Han. See Github and postscript for how this book was written.

TODO: Why the author choose to write another tutorial on coding?

The Roadmap

A proper path to learn coding for a beginner ‘only’ takes these five heuristic steps:

  1. Step up a computer and its development environment;
  2. Learn how to use the computer and, to a basic level, how computer works;
  3. Lear a handful of programming languages and tools;
  4. Learn the theories coding, including algorithms, operating systems, etc,
  5. Put all of the knowledge into practice by writing a large project.

Upon completion of these five steps you can claim youself a programmer. The ocean of knowledge of computer science, however, is wide and deep, yet at this point you equipped with enough knowledge and experience to explore it with free will.

Step 0: Setting Up a Computer

One can not learn coding without a computer.

For any practical advices on buying a computer, see appendix.

A functional computer also needs an operating system. I recommend Linux, which is designed by and for the programmers and can greatly boost your productivity.

Check what are softwares and choosing OS.

To make an informed choice on choosing the hardwares and softwares, you need to learn the basics of how each piece of the hardwares and software work together to produce the powerful machine that we call a computer today. See what is a computer.

Step 1: Learn the Basics

The first step after obtaining the computer is getting to know how to use it most effectively.

Computers usually provide two interface for human interaction: graphical interface and commmand line interface. Graphical user interface (GUI) allows users to control the computer by clicking buttons, using dropdown manuals, and various other graphical utilities. In comparison, users of the command line interface (CLI) control the computer by typing the command in the terminal.

Although GUI may seem more intuitive and easier to use, CLI has its unique advantages in efficiency, customisability, and automation. You may not believe it, but almost all tasks that are done on GUI can be done in a different way in CLI, including writing code, viewing image, even browsing internet.

An education in coding without learning CLI would be incomplete.

Check how to using the computer on command line here.

Step 2: Acquire Several Programming Languages

Now the true works begin: you need to master at least a handful of programming languages. Learning a programming lanuguage is not so difficult as you may imagine.

Step 3: Dive Deeper

In the step the act of coding become a science. Many programs, initially, may only need some intuitions, but ingenuity wears out quickly. Very soon you will encounter truely difficult problems, some of them remained unsolved today1, and one has to resort to literature, textbooks, and laborious study for solution, if one exists.

Step 4: Grand Final

A bad Example: The naive Method to learn coding.

The naive definition for coding is to write codes in a programming language that can run on a computer. To achieve this is easy: pop up python interactive mode, jupyter notebook, or any online code editor for python, and type

print("hello world!")

The next step is to find some tutorials for python grammar and mock projects. After writing several hundred lines of code, finishing ten tutorials, and uploading them to Github, you proudly add python into your CV after only one week of work.

This method, however, will not get you far. The reason is that each tutorial will only teach you to solve a specific problem, while in coding, you will frequently encounter new problems whose solution is not written as a tutorial and published online.

To create a truly amazing software, such as an operating system or a programming language, requires deep understanding on multi-disciplinary fields. To name a few, one need to master the programming tools including the programming language, design an effective software structure, understand various programming protocols and standards, know how the low level hardware works, and study many applied and even pure mathematics like algorithmic analysis, logics, and category theory.

Studying these subjects is a laborious process, and very few may claim to master all of them. In many circumstances, one who know only a few parts can be called an expert.2

You shall be, nevertheless, encouraged to take this journey, and your inner ambitions shall be alighted, because, by following this tutorial to the end, you can proudly and righteously call yourself a master of computer programming.

1

The most famous unsolved problem in computer science, which is also one of the seven millennium prize problem by Clay Institute, is P vs NP. To put it simply, it asks can we find the solution of a problem quickly, if we can determine the rightfulness of any purposed solution quickly. This is a quick philosophical problem, but it is intimately related to computer science. To give an naive example, if there is a magical machine that give correct answer of yes or no to any question asked, can we solve all the problems in the world? An naive solution is to ask, for a question Q, the machine is ‘a’ the first letter of a computer program that can solve the question ‘Q’? Then asking in turns ‘b’, ‘c’, ‘d’, spaces, etc, so on the so forth. It seems like we will finally get a valid program to solve any mystical problem Q. However, this argument is a fallacy, as such a program may not exist, or, if exist, may not terminate for finite steps.

2

The author, at the time of writing, is a final year undergraduate student. In no ways the author may be regarded as an expert in these fields. He is actively learning and consulting other experts while composing this book. Come, readers! We shall learn and advance together.

What is a Computer

The computer is such a powerful and magical machine, that many fail to realise it is ‘only’ made from preconfigured electronic circuits.

The electronic circuits are hardwares, and all the rules the hardwares follow are the software.

They are explained in the following subchapters.

Hardware: what constitutes a computer

Computer hardwares are the physical objects which constitutes the computer in the real world. Most of the computer hardwares are made of complex electronic circuits with semiconductors. Those circuits, in theory, are no different from the battery-and-wire circuit one could make on a circuit board, but in a much smaller scale.

Computer hardware can be divided into two categories: peripheral hardware and core hardware.

Peripheral hardware includes the screen, keyboard, mouse, speaker, trackpad, etc. A computer can function properly without peripheral hardwares. Yes, without a screen or keyboard, you may not interact directly with the computer, but the computer can still run perfectly and can receive instructions in other forms, such as via the internet. In comparison, a computer will not function without core hardwares, which includes the CPU, GPU, ram, motherboard, disk, etc.

Check these videos1 to see how different hardwares are assembled in a computer, these 2 for how computer hardware works in the electronic circuits level.

Peripheral hardware matters little for our purpose. Let us focus intead on how core hardwares function.

Too boring? You may skip this chapter

This chapter is a brief overview of how each parts of the computer work together. You will understand this chapter better after you have got your hands dirty.

If you want to start coding right away, go to choosing os to see why Linux is a better operating system for coding.

Central Processing Unit (CPU)

CPU is the brain and the most important part of the computer. All data, computation, input, output, and programming logics are processed by the CPU. CPU is the most important factor that determines how smoothly and fast the computer runs.

Here are some terms that describes a CPU:

  1. Architecture.

    The architecture describes the low level circuit design of the CPU. Some would say the “architecture of the computer” when he means the “architecture of the CPU of the computer”. For most user, the architecture only matters when installing softwares, as some softwares may not work on certain architectures.

    Colloquially, the term architecture may be mistaken as the term instruction set, which defines how the software interacts with the CPU (i.e., the API). Common instruction set includes x86-64, Arm, Mips, Risc-V, and LoongArch. Most intel and Amd’s CPU are of x86-64 (also known as amd64). Mac’s M series chips, chips of most cell phone, and Qualcomm’s Snapdragon are of Arm.

    Instruction set is only a abstract definition for how CPU shall function. It is up to the manufactures to design the CPUs to fullfill the standard.

    Intel and Amd have jointly developped x86-64 instruction set based on earlier ones for over forty years. This is both a blessing and a curse, as it makes x86 the most popular, stable, and well-supported instruction set, while, for compatibility, some outdated designs have to remain.

    Apple’s M series chips use a special Arm instruction set developped by Apple. Apple designs M series chips to be power efficient, and one way to acheive it is to reduce the chip’s ’complexity by removing legacy support. It is not of surprise, as a result, M series computer are more power efficient in the price of all softwares have to be rewritten for it.

    Another aspect of architecture is bit width. Most CPU architecture today are 64 bits, which means the size of the register in CPU is 64 bits wide. A register in CPU is where data is being processed. A 64 bit register can process data of 64 bits (That is, 8 bytes, or 1/128 of a KB) for each CPU clock cycle. Apple’s M series, Intel’s Core Series, and Amd’s Ryzen series are all of 64 bits.

  2. Frequency, or Clock Rate3

    Frequency is the number of clock cycles a CPU can perform in a second. Most CPU can perform a single task in one clock cycle, such as adding two numbers. However, more complicated instruction may take multiple cycles. Some CPU may execute multiple instructions in one cycle.

    Frequency is measured in Hertz, which is times per second. The higher the frequency usually means the faster the CPU.

    The first Intel CPU, Intel 4004, released in 1971, had a frequency of 740 kHz. In 2000 Intel managed to produced Pentium 4 with the freqency of 2 GHz, a 2700-time increase over 30 years. Since then it became increasingly difficult to further increase the frequency. In 2024, Intel released Core i9-14900KS, with frequency of 6.2 GHz, only three times of its 2000 predecessor. Yet this chip runs more than 1000 times much faster than Pentium 4 thanks to innovations in other areas.

  3. Threads, Cores

    A CPU with 4 threads can perform 4 tasks at a time. One way of acheiving it is to solder multiple smaller CPUs into one. Each of the smaller CPU is called a core. Some cores can execute multiple threads syncronously (e.g., Intel’s Hyperthreading).

    Integrating several cores into one chip enables flexibility in CPU design. For example, some CPUs have efficency cores for lighter task and performance cores for heavy tasks.

  4. Cache

    For CPU to run faster, it needs to obtain data faster. Data are stored in memory (RAM) and storage device (SSD). Retrieving data from memory may take thousands of CPU cycles, but it only takes one cycle to execute it. Cache was invented to solve this problem.

    This is how caches work: when CPU is to retrieve data, it predicts data required for the subsequent operations and stored them in the cache. CPU can retrieve data from cache super fast, usually under 10 or 100 cycles.

    Sometimes there are multiple levels of cache, denoted as L1, L2, L3, etc. L1 caches is the fastest but has smallest capcity; L3 is slowest but can hold more data. CPU may have two L1 caches, L1d and L1i. L1d stores data, where L1i stores instructions. This design takes into consideration the cost-effectiveness of different memory, as faster memory usually costs much more.

    Caches, RAM and hard disk memory constitutes the memory hierarchy4 paradigm explained in the next session.

History of Intel Processors5

This tables describes the interesting evolution of Intel CPU.

NameYearFrequencyCacheCoresN.B.
Intel 40041971740 kHzNone1First single chip IC processor, 4 bit
Intel 80081972500 kHzNone18 bit
Intel 808519763 MHzNone18 bit pre x86
Intel 8086197810 MHzNone116 bit x86
Intel386™ DX198533 MHzNone132 bit x86
Intel486™ DX198950 MHz8 Kb132 bit x86
Intel® Pentium®199366 MHz8 Kb L1d, 8Kb L1i132 bit x86. First super-scalar x86 processor. (Meaning executing 2 instruction at one clock cycle)
Intel® Pentium® II1997300 MHz512 Kb L2132 bit x86. Has 2 level cache
Intel® Pentium® 420002 GHz256 Kb L2132 bit x86. First x86 processor with hyper-threading
Intel® Pentium® 4 Extreme Edition20042 GHz2 Mb L2132 bit x86
Intel® Core™2 Duo Processor E430020071.8 GHz2 Mb L2264 bit x86
Intel® Core™ i7-950 Processor20093.06 GHz8 Mb4 cores, 8 threads64 bit
Intel® Core™ i7-3970X20123.5 GHz15 Mb6 cores, 12 threads
Intel® Core™ i7-670020153.4 GHz (Turbo)8 Mb4 cores, 8 threadsIntegraged Intel® HD Graphics 530 GPU
Intel® Core™ i7-8665UE20184.0 GHz (Turbo)8 Mb4 cores, 8 threadsIntegraged Intel® HD Graphics 620 GPU
Intel® Core™ i9 processor 14900KS20246.2 GHz (Turbo)32 Mb24 cores, 32 threadsIntegraged with Intel® UHD Graphics 770 GPU

The Memory Hierarchy

Moder computer are designed with memory hierarchy principle for effective data storage, fast retrieval while maintaing cost.

Caches, the fastest, most expensive, and the smallest in capacity stay at the top of the hierarchy. Random Access Memory (RAM) lays in the middle. It is slower than cache but faster than hard disk. Hard disk is the cheapest, largest, and the slowest.

Please see the table in the next session for comparison of the speed of caches, RAM, and hard disk, and the time for CPU to execute various other tasks.

Characterising Performance of Storage Devices: Bandwidth and Latency

Two comman benchmarking metrics for storage devices are bandwidth and latency.

Bandwidth (or throughputs), measured in bytes per second, is the maximum amount of data that can be transferred per second. There are two operations for a storage device, read and write, and bandwidth for reading is likely to be higher than writing. Moreover, the bandwidth would diminish dramatically when the device needs read and write simultaneously, or the capacity of the device is close to full.

Latency is the time between the CPU sends instruction for retrieving the data and the first byte of data delivered to the CPU.

In ideal situation bandwidth and latency are independent. In reality, however, they are weakly correlated: it is of no use that you can transfer 10 GB per second but it took 10 seconds to initiate the transfer.

Accurately benchmarking storage devices is very diffiult for two reason. First, the performance of the storage device is unstable. It dependends on the CPU,the design of the motherboard, and the type and content of the data. Second, most of the low level input and output are controlled solely by the hardware and is not accessible to the user or the operating system.

Let us now introduce memory and secondary storage devices, the lower part of the hierarchy.

Random Access Memory (RAM)

Random Access Memory, or RAM, is in the middle layer of the memory hierarchy. Slower and cheaper than cache, but faster and more expensive than hard disks.

RAM is volatile storage device; this measn all data stored in RAM will be lost after powering off.

When computers are executing some program stored in SSD, it will first load all instructions and data into RAM for faster retrieval. The exact behaviour will depend on operating system.

Secondary Storage (Harddisk, SSD, HDD)

Secondary storage devices (or, colloquially, the hard disk) is at the bottom of the hierarchy. They are static storage devices. The data will not be lost after powering off. All persistent data and software, including the operating system, are stored in a secondary storage device.

Solid State Drive (SSD) and Hard Disk Drive (HDD)

The technology for storage devices has underwent various innovation since its inception. In the early days people used tapes and floppy disks. Two common modern technology are Solid State Drive (SSD) and Hard Disk Drive (HDD).

HDD is usually cheaper, has larger capacity and longer lifetime. It uses a spinning magnetic disk for data storage and a magnetic needle for reading and writing. You can hear the disking spinning when it is powered on.

SSD is the newer technology that boasts for its high bandwidth and low latency. It does not use magnetic disk but semiconductor logical gates. This means SSD looks like a chip and can be made smaller.

Historically, SSD has been very expensive. Their price has dropped dramatically since 2023, thanks to intense market competitions from the Chinese firms.

Comparison of Time Needed for Various Operations

The following table compares the time needed to execute various operations.

EventTime (ns)Time (\( \mu s\))Time (ms)Scale
One CPU cycle0.250.5
L1 cache access0.51
L2 cache access714
Mutex lock/unlock2550
Main memory access1000.1200
Transmitting an Ethernet Packet1500015\(3 \times 10^4\)
SSD Access250000250\(5 \times 10^5\)
Move Data to GPU8000.8\(1.6 \times 10^6\)
Internet Round Trip50005\(1 \times 10^7\)
HDD Access1000010\(2 \times 10^7\)
Cacluating Primes to 100,000100\(2 \times 10^8\)
Send a Packet From Beijing to Edinburgh500\(1 \times 10^9\)
Computer Reboot30,000\(6 \times 10^{10}\)

To make sense of these numbers, the following table assumes one CPU cycle to take 0.25 seconds, and enlong the time for other events on the same proportion.

Computer EventReal Life EventTimeScale
One CPU cycleEye Blinking0.25 s0.5
L1 cache accessTyping a Word0.5 s1
L2 cache accessRunning 50 meters7 s14
Mutex lock/unlockA round in Basketball25 s50
Main memory accessReading a short passage100 s200
Transmitting an Ethernet PacketFlight from London to Moscow4.2 h\(3 \times 10^4\)
SSD AccessLife span of may fly70 h\(5 \times 10^5\)
Move Data to GPUSlovania Won Independence10 days\(1.6 \times 10^6\)
Internet Round TripChina’s Chang’e 6 Returns Samples from Moon2 months\(1 \times 10^7\)
HDD AccessWheat Maturation4 months\(2 \times 10^7\)
Cacluating Primes to 100,000One US president Term4 years\(2 \times 10^8\)
Send a Packet From Beijing to EdinburghPyramid of Giza Built20 years\(1 \times 10^9\)
Computer RebootSpan of Byzantine Empire1200 years\(6 \times 10^{10}\)

Graphics Processing Unit (GPU) 6

While CPU has a dozen of big and powerful cores, GPU has thousands of small cores. The GPU’s cores can effectively handle computation, but has difficulties in processing logics.

This makes GPU suitable for running algorithms that can be splitted into smaller parallel tasks, provided those smaller tasks are mutually independent. These algorithms are sometimes called embarrasingly parallel algorithms. Most algorithms in linear algebra and statistics, such as vector addition, dot product, matrix multiplication, and finding the mean value are embarrassingly parallel.

Most graphical algorithms are also embarrassingly parallel, as the color of one pixel does not depend on the other. In fact, most graphical algorithms takes the form of matrices. Thus comes the name of GPU, as its original purpose is to process graphical algorithms.

In recent years people found that GPU can execute machine learning algorithms more efficiently than CPU. It shall be of no surprise as machine learning utilises a lot of linear algebra and statistical algorithms.

Other Hardwares

Footnotes

Software

Computer hardwares are everything that exists physically, and the softwares are everthing else. A computer needs both to function properly.

Examples of softwares include:

  1. The application softwares that includes the web browser, pdf viewer, music players, python interpreter, etc, with which a user will interact directly;
  2. The desktop environment which will allow you to open, close, resize, and organsize application windows with your mouse and keyboard. All application software will depend on desktop environment.
  3. The device drivers that controls your mouse, keyboard, and other accessories.
  4. The one that manage the cpu, memory, the disk, etc. They are the operating system and system softwares. Every other software depends on them.

This is the software hierarchy.

Lower level software provides support for higher level ones. At the lowest level is the operarting system, which communicate directly with the hardware. At the highest level is application softwares, with which the users interact directly.

The operating system and other related low-level software are called system software.

Application software can not access the memory or send a instruction to cpu directly; instead, it make such requests to the system software, which then communicate directly with the hardware, and returns the required informations back to the application software.

Operating System

Operating system (OS) is likely the most important software.

Its precise definition is:

[Operating system is] an intermediary between the user of the computer and the computer hardware. 1

To put it in another way, operating system is magical software that allow users to send instruction to the silicon chips by using the keyboard, mouse, etc, and return the chips’ respond in human-readable forms.

Operating system, however, is not:

  1. related to the hardware. Just like any other software any operating system, idealy, can run on any hardware
  2. related to user interface and appearence.

Kernel, the Core of the Operating System

Kernel is the most important part of the operating system that has control of almost everything in the computer. It provides many abstract and innovative functionalities every other software relies upon that ensure the security, speed, and stability of the computer.

Here are some of the typical subsystems of a kernel:

  1. Threads and scheduler: Thread is the sequence of instructions that computer shall execute. Each computer program lives on its respective thread. Some program may spawn multiple threads for parallel execution. A cpu can only run a dozen threads concurrently, but thousand of them may be waiting to be executed at a time. To solve this problem, the scheduler, as part of the kernel, based on the priority of the program, schedules the cpu to run certain threads for a short time, switch to other threads, run for another short time, and switch back. In a user’s perspective, thousands of programs may seem to run concurrently.
  2. Memory management: It is common that a program requests a large space of memory at the start of execution, but will only use them later, with some memory staying unused for the whole duration. It is a lot of waste if the memory is allocated at the time of request. Virtual memory is invented to solve this problem. This is how it works: at the start of execution each thread is allocated an unlimited amount of virtual memory, and only at the time of use these virtual memory is mapped to the phsycial RAM. (The full scheme is much more complicated)

The kernel, as magical as it seems, is written in programming languages just like any other softwares. The kernels of Windows, Linux, and MacOS are all written in C and assembly language. Currently, there is attempt to write the kernel in rust.

You can check the source code of the most famous open-sourced kernel here: Linux Kernel.

Other parts of the Operating System

Many utility softwares, besides the kernel, are necessary for a functioning computer. These softwares are system softwares. Some of the system softwares are considered part of the operating system.

For example:

  1. A bootloader to load the operating system when the computer is booting up;
  2. Various device drivers for keyboards, mice, usbs, wifi, gpu, etc;
  3. The desktop environment, which is a series of software that controls the windows and graphical output of the computer,
  4. Some utility software for assemble the assembly, searching symbols, linking object files, managing archives, etc.

Some other softwares seem even more mundane, but are essential:

  1. An application for user login;
  2. A software to create and delete files and directory,

Many operating systems ship with these system softwares. As they are incorporated into the OS very closely, many consider them a part of the operating system.

There are much debates on if a software can be considered a system software or an application software. For example, web browsers on most operating system is an application software. On ChromeOS, however, the browser may be considered an operating system utility software, as every application is launched by the browser, which acts as the only user-interface.

The following list shows which level of software the user is interacting with when booting up the computer.

  1. You turn on the computer by pressing the power button: you are working directly with the hardware;
  2. The logo of your computer brand shows up: the motherboard is powered on, kernel is being loaded onto ram (likely init ram file system);
  3. The logo of your operating system shows up, sometimes with pages of logs: the kernel is loaded and under initialisation;
  4. Start up screen shows up: the operating system is fully loaded;
  5. You entered the username and password: a system software is trying to authenticate your identity;
  6. You enterred the desktop and openned up the browser: you are now working with application softwares.

Examples of Operating System

The most popular operating systems today include Windows, MacOS, and the series of Linux distributions. There are, however, many more less known ones. The BSD and Solaris belongs to the Unix family. AmigaOS, OS/2, BeOS, RISC OS, MorphOS, and Haiku were famous in their days. Minix is another one designed for teaching.

The most popular operating systems for personal computer is MacOS and Windows. They are also the only preinstalled operating system for most commercially available computers.

Linux is overwhelmingly more popular, however, among professional servers and super computers. Since 2017, all of the top 500 super computer runs on Linux.

Linux is also suitable for personal use.

If you have not used Linux before, you should try it, as it is an objectively superior OS for writing code in most circumstances.

Choosing an Operating System for Personal Use

The short answer is to choose Linux for coding, Windows to play games, and Mac to waste some money.

Check Choosing OS session for a detailed comparison of Linux, MacOS, and Windows.

Desktop Environment

Desktop environment is a series of softwares that control the appearence of application windows, how they are arranged, and how they interact with the mouse and the keyboard, i.e., it decides the graphical user interface.

Windows and MacOS are bundled with an uninstallable desktop environment. Desktop environment in general, however, are not dependent on the operating system and can be installed and uninstalled without effecting the OS.

Some examples of desktop environment are Gnome, KDE plasma, and xfce.

There are some innovative desktop environments (or windows manager) that are, although vastly different from the window and drag style in Windows and Mac, equally effective. They are i3, sway, and hyprland.

It is the desktop environment that defines the immediate user experience of a computer. As linux systems may have very differet desktop environemnt, the user experience and the GUI among linux system varies dramatically.

Package Manager

Package management is one of the most complicated tasks in programming.

Think of the following tricky senarios for package management:

Too much dependencies

The depedency tree could expansive. For example, a pacakge may depend on dozens of other packages, which in term depdends on dozens more. A huge package may have thousands of dependencies in total. 2

Different name, same software

Packages may be bundled differently in different operating systems. For example, ALSA (Advanced Linux Sound Architecture) provides kernel sound card drivers and a user space programming library on Linux systems. The user space library is called alsa-lib on Arch. alsa-lib does not exist in Debian, which is instead included in libasound2. Such examples are countless, and this is a major issue for software portability.

Circular Dependency

Circular dependency is when a software A depends on B, B depends on C, but C depends on A again. (This forms a circle.) Circular depedency is rare for application softwares, but every operating system itself has this problem. The reason is simple: an operating system is a piece of software that needs the C library to run, but C library itself requires the operating system to work. To solve this problem, the operating system softwares has to be compiled and configured with the C library on a working computer, which is bundled into a single file, called the image. The image can then be installed into a computer without an operating system. Where did the first operating system come from, you may ask. In the early days, all programs are written in punched card, recording directly the binary files that a computer can understand!

Dependency Hell

Dependency hell is the case where softwares B and C depends on two different versions of software A. This usually takes place in web, python, data science, and AI programming, where software updates are frequent. On a single system, usually only one version of the software A can exists. Say software A version v1.0 was installed in the computer, and software B depended on it. Software A was updated to v2.0, where new features are added and some obsolete features were removed. Software C depends on the new features, but software B depends on the obsolete feature: this is a dead end, software A, B, C can not coexists in the same system. (The fundamental assumption is that two different versions of the same software can not coexist on the system. If they do, they have to be named differently, such as gcc-12, gcc-13, etc.) One method to solve this problem is virtual environment, in which a group of softwares may run independly from the softwares outside of it. Examples of technologies that implement virtual environment include python-venv, docker, and npm,

The Solution: Package Manager

Package manager was invented to solve these problems. Package manager can install, remove, update, and in any other way manage all the packages. All softwares, programming libraries, scripts, or anything that can be downloaded from the internet can be considered a package and installed by the package manager.

Package manager is an essential tool for programming.

Most Linux distributions have a built-in package manager, which can be regarded as part of the operating system. MacOS and Windows, however, do not have an official package manager. There is third-party package manager homebrew for Mac and winport for Windows, but their effectiveness is questionable.

Footnotes

1

Page 1, Silberschatz, A., Galvin, P.B. and Gagne, G. (2018). Operating system concepts. [online] Hoboken, N.J Wiley. Available at: https://os.ecci.ucr.ac.cr/slides/Abraham-Silberschatz-Operating-System-Concepts-10th-2018.pdf.

2

Firefox has total of 252 unique dependencies on Arch Linux. Its full dependency tree is here, which is the output of pactree firefox.

4

OS is short for operating system. There are MacOS, ChromeOS, HarmonyOS, etc.

5

Macbook made before 2020 running on intel CPU had the option to dual boot windows. It is practically impossible to run any system except MacOS for laptops made after 2020 with a M series chip, because Apple is deliberately untransparent on the design of M series chips to hinder developers of other operating system to adapt to Apple products. Asahi Linux seems to be the only barely working exception.

Choosing Operating System

I will convince you in this chapter to choose Linux for coding, Windows for playing games, and MacOS for donating money to Apple.

Let us investigate the difference between these OS.

Skip this chapter?

This chapter presents many less known facts about Windows, MacOS, and Linux distributions.

If you have already set up your mind to use Linux, check Let There be Linux.

If you do not plan to use Linux, I strongly recommend you to try, and you will likely to benefit from reading this chapter.

If you have decided to skip Linux, go to Let’s Write Code.

Windows

Windows is a widely available operating system developed by Microsoft. It is shipped with almost all softwares one may need, including a desktop environment, a C/C++ compiler with libraries, a developer suite (Visual Studio Code), and various application softwares. This makes it ready to use out of the box. Windows is also compatible with almost all softwares, including most videos games.

Windows, contrary to common believe, is not free. It costs around 100 GBP. This fact may be overlloked, as when purchasing a computer with Windows preinstalled, the price of Windows is included in the price of the computer.

The important drawback is that Microsoft has the complete control of a Windows system instead of the users, and it often uses its unchecked rights for abuse.

There are many controversies, but here are a few facts:

  1. Microsoft’s privacy statement explicitly states that it will collect your personal data and share them with third-parties organisations. It does not say what data is collected, but a previous statement says it will collect personal information including password and browsing histories. 1
  2. Microsoft’s softwares explicitly exploits user data. For example, Microsoft Recall, launched in June 2024, is a software service that helps user find what they have done on the computer previously. Recall works by taking the snapshots of the computer periodically and processing them with Microsoft Copilot. Any other programs that can take a computer’s screenshot and share it through the internet without user’s permission is called a spyware. 2

In conclusion, Microsoft have total power over Windows, not the user. This is a big problem for coding.

User Experience

Most people have satifiable experience on Windows:

  1. It is likely preinstalled on most computers;
  2. it is a stable OS, and almost all softwares can run on it;
  3. Microsoft’s services, including OneDrive, gaming service, and Copilot, may be helpful.

There are, however, significant inconvenience:

  1. Windows may force a system update, or keeping popping up remainders. User can not turn off the reminders;
  2. Many built-in software can not be uninstalled, including Microsoft edge3, copilot, etc. A software that can not be uninstalled is called a malware;
  3. Windows has built-in advertisements that can not be removed;
  4. Windows does not have a general-purpose package manager. (Mac does not have it, either, but most Linux ditributions do.) To install a software on Windows, one have to search on internet and find the most plausible link for download. This is not only an inconvenience, but also a severe security concern, as a hacked software may be installed. 4
  5. Windows is a bulky and takes unreasonable amount of resources. Windows usually will take 7 GB of RAM and around 100 GB of disk space. In comparison, a fully funcitonal, ready to use Linux distribution with full graphical interface only need 2 GB of RAM and 20 GB of disk space.
  6. Some Windows App may be laggy for no apparent reasons, and user can do nothing about it.

Coding Experience

The overall coding experience on Windows is mediocre:

  1. Windows is not designed for programmers but for ordinary users and gamers.5
  2. Microsoft forces every programmer to use Visual Studio, which is a bundle of necessary programming tools for software development. Visual Studio is not open-sourced, and a programmer can not view, edit, and learn from its source code, and, if something goes wrong (which often takes place), the programmer can do nothing except waiting for Microsoft to fix it. This is a major inconvenience for learners.
  3. Many programming tools are not available, or can not be used conveniently on Windows, such as neovim, tmux, and fish.
  4. There is no official package manager.

That being said, most game are developed on Windows, as Windows has the best support for most gaming development tools.

MacOS

MacOS is POSIX compliant Unix system developed by Apple. It is only available on Apple computers. Many consumer appraise MacOS for its stability, beautiful apperance, and convenient interaction with other Apple devices.

Like Microsoft, Apple has the complete control over MacOS, a privilage that was abused:

  1. Apple has secretly underclocked older devices (making them slower) without notifying users. Many suspected Apple did it to encourage consumers to buy new Apple products. Apple refuted this statement but agreed to pay 113 million. 6
  2. Apple voluntarily stopped the support for OpenGL and Vulkan to promote Metal, a graphical framework created by Apple. This makes many AAA games not functional on MacOS.

User Experience

Apple’s products are famous for their beautiful appearances, smooth animations, and long battery life. Interaction between Apple’s device are also appraised. Apart from these, however, Apples’s products are poor on many aspects:

  1. MacOS has serious compatibility issues. Many software can not run on MacOS, including most of the games and many developer tools. Some softwares refuses to offer MacOS support, some software’s support was deliberately undermined by Apple;
  2. Apple promotes the software made by Apple, and hampers third party softwares. 7
  3. Purchasing applications through Apple’s App Store costs more, as Apple takes 30% of any purchase.
  4. MacOS consumes a lot of RAM and disk space. A brand new installation of MacOS takes likely more than 40 GB of disk and 4 GB of RAM.

Coding Experience

Mac’s coding experience is poor:

  1. The new Mac uses Apple’s M series chips, which, although fast and efficient, have dropped all legacy support. This makes many programming tool non-compatible with MacOS, and there is no way to fix them.
  2. Any series programming on Apple requires Xcode, which is a developer’s suite like Visual Studio Code. XCode is poorly made, runs very slowly, buggy, occupying over 20 GB, and hard to use. There is no other alternative.
  3. Apple promoted programming languages and libraries made by Apple and stopped supports for their competitors. This is especially bad for programming learners.
  4. Developing and distributing a MacOS software requires specific permission from Apple, which would costs thousands of US dollars;
  5. There is no official package manager.

Linux

Short History

Linux is a free and open-source kernel developed by the community. Linux kernel was first developed by Linus Torvalds in the 1990s as a free alternative to the expensive Unix systems. At the same time Richard Stallman and Free Software Fundation has developed free alternatives of other parts of an operating system, but failed to create a functional kernel.

As a result, people created GNU/Linux8 operating system by combining the Linux kernel and GNU softwares, which has quickly gained popularity among the open-source enthusiasts.

In the early days of Linux, most contributers are open-source enthusiasts working as volunteers. As the project became more popular, the Linux Fundation was established in 2000 to support Linux development. Since then, many coorporations, including Google, Microsoft, Huawei, etc, have been the sponsors for the project. Many Linux maintainers have become full-time programmers of these companies. (The companies are paying them full time to work on Linux.)

As of 2024, Linus Torvalds still leads the Linux development.

Linux distribution

Linux distribution is an operating system built on the Linux kernel bundled with various system and application softwares.

There are many free and open-source Linux distributions. The most famous ones are Debian, Ubuntu, RedHat, Fedora, Arch, Manjaro, CentOS, PureOS, etc. Most of these distributions are built by different group of volunteers and open-source enthusiasts.

There are very few computers sold with Linux preinstalled. To use Linux, one has to buy a non-Apple computer and install the desired distribution on his own. See obtaining Linux.

As Linux distributions are free and open source, a user is able to modify his system, including the kernel and desktop environment, in any way he fancies. This makes Linux systems very different from each other.

One significant difference among Linux distributions are the amount configurations required. Arch distributes only the most essential softwares without a desktop environment. A functional Arch computer thus requires the user to install many softwares and configure them properly. In comparison, Ubuntu and Manjaro are preconfigured and can work out of box.

User Experience

Different Linux distributions will lead to different user experience. This be Linux’s greatest advantage: you may choose the distribution that fit your need.

Many modern Linux distribution, including Ubuntu, Manjaor, and Linux Mint, focuse on user experience. Using them is similar to use Windows or MacOS: everything will simply work, and all can be done by using a mouse to click the GUI.

That being said, here are the common advantages of all Linux distributions:

  1. Linux is free and open-sourced. This means user has the ultmate power over the system.
  2. There is no telemetry on Linux system. If there are, you can uninstall them.
  3. Linux is very customisable: you can use your creativity to make a speical system that fits your need.
  4. Linux kernel is light weighted. An Arch system with a fully functional graphical desktop environment and common application software often takes less than 20 GB of disk space. The operating system and the desktop environment may take less than 2 GB of RAM.

Here are some disadvantages:

  1. Most Linux system will not have customer support, as it is a free software developed by the community. There are, however, abundant support on the internet.
  2. With great power comes great responsibilities. Linux granted the ultimate power to the users, and it will not prevent the users from breaking the systems.
  3. Some softwares may not run on Linux, including certain videos games like League of Legends and Valorant. The good news is many Linux developers are currently solving this issue. For example, Wine is an open-source which enables most Windows software to run on Linux natively. Proton, based on Wine, was developped by Valve which allows most games on Steam to run on Linux. (The Steamdeck runs on Arch Linux.), Sima Qian

The Coding Experience

Linux distributions are particularly suitable for coding:

  1. They are made by developers for developers, not for marketing or for sales. Almost every programming library and tool can run on Linux. Many programming utility tools that can boost your coding productivity are hard to find on Windows and MacOS, but are readily available on Linux. Some examples are Neovim, cscope, riggrep, fd, fish, and zsh.
  2. Most Linux distributions have built-in package manager, which is extremely helpful for coding.
  3. Linux is free and open-source, meaning you can view and edit any source code of any software. This is particularly helpful for learners.
  4. Linux is customisable. Linux user has more choices on every softwares including the desktop environment. Linux systems can also be freely customised to boost your productivity. See ricing.

The Power of Open Source

Linux is a free and open source software. The word free means both free of charge and freedom: any one can obtain a copy of it without charge and can have absolute freedom over this copy, including viewing the source code.

The success of Linux kernel may largely attribute to it being open source. Anyone can view the source code, and many of the more enthusiastics were ready to fix bugs, propose new functionalities, and in other ways contribute to Linux. Many Linux users have contributed codes to the Linux kernel by sending patches to Linus Torvalds or one of the maintainers, who would review the patch and accept those he deems good. The work of Linus Torvalds himself, as of 2024, mostly consist of reviewing patches instead of writing the code.

Many companies have strong incentives to contribute to Linux. For example, if a hardware manufacturer has developped a new hardware, which requires a speical device driver, it will likely code the driver and send it to the Linux kernel. If the driver is accepted into the kernel (which is very likely), its customers can use the hardware on a Linux system free of trouble.

Many big companies like Amazon and Google also have strong incentives to use Linux. For example, Amazon Web Service provides cloud servers as a service. The server has to run on a operating system. Let it be Windows, Unix, or MacOS, Amazon would need to pay much for a license and will not be able to customise them. As a result, Amazon chose the free alternative, Linux.

As a free software, Linux is particularly suitable for individuals to learning to code, as it give the use the most freedom to explore and learn the ins and outs of the operating system.

Conclusion: Use Linux

This may be a biased opinion: I regard Linux as the best operating system for coding and personal use.

If you are not using Linux, try it.

Trying Linux costs nothing except some time, efforts, and courage, but will return you many rewards.

Footnotes

2

Microsoft Recall; Recall Controversy; Recall can be abused by hackers: Abuse Recall; Reddit Discussion

3

Edge can not be uninstalled

4

It is a common practice of hackers to create a fraudulent website that looks similar to a famous website and link hacked software for download. Any computer that downloaded and run the fraudulent software will be hacked. OBS hack is a recent notorious example. OBS is a popular free and open-sourced software for video recording and live streaming. Hackers has created a fake website that looked similar to OBS’s official website with a similar URL, but the linked download was a hacked version of OBS. Hackers adverstised the fake website with Google ad, which made their fake website showing up first among Google search results. Many has downloaded the fraudulent software and been hacked. See gthub discussion, OBS forum, youtube discussion;

5

You can not find any allusion to programming in Windows 11’s official advertisement.

7

TODO

9

In general, an open-source software are distributed under a license similar to GPLv2, which states anyone can distribute, modify, or in any other ways use the software, but the software is provided “as is”, meaning the user shall accept the software in its current form, and the creators will not be held accountable for any potential faults. Closed-source softwares also hold licenses with similar disclaimers. MacOS’s terms and condition states ‘THE APPLE SOFTWARE AND SERVICES ARE PROVIDED “AS IS” AND “AS AVAILABLE”, WITH ALL FAULTS AND WITHOUT WARRANTY OF ANY KIND,’. For Microsoft Windows, the only warranty is that ‘properly licensed software will perform substantially as described in any Microsoft materials that accompany the software’.

Let There be Linux

And God said, “Let there be light,” and there was light. – Genesis 1:3

This chapter will teach you how to install Linux on a computer hardware and how to use it. Linux is likely not compatible with Apple’s hardware, however.1

Let us start by choosing and installing a Linux distribution. Next, we will learn to use the command line and set up development envrironment. Various other details are also discussed in chapter subsessions.

Which Distribution to Use?

For beginners, I recommend Manjaro and Linux Mint, both of which are stable and will work out of the box.

For the more advanturous I recommend Arch, which requires extensive configurations.

To install the Linux distribution, you need to download the official image.

Manjaro (and many other Linux distribution) have different versions of images with different desktop environment. (as explained before, the operating system is different from the desktop environment.) I recommend KDE Plasma and Gnome desktop environment for beginners. After installation, you can always install more desktop environment onto your system.

It is best to watch these videos2 and check their official website to know more about the different desktop environments.

Installing Linux

There are four steps to install a Linux distribution. (Besides obtaining the installation image)

  1. Etch the installation image into a USB; (This is the most convenient way, to boot an image. An image can also be booted with a CD, netword, etc,)
  2. Shutdown the computer, enter the BIOS, and set the proper boot order;
  3. Restart the computer, and boot into the installation image;
  4. Install the OS with the installation software. Reboot the computer. The installation is complete.

In summary, you have to make a USB a bootable device containting the installation image of a Linux distribution, boot into this distribution, and use its installation software.

Besides the written explanation here, check this video. (To Do: make a video about it).

Preparing the USB

After downloading the installation image, you have to etch it into a USB, and make the USB a bootable device.

The easiest way is to use balena etcher.

There are other ways to do it. Check this wiki.

Set Boot Order

The next step is to set the boot order in BIOS, so that the computer can boot into the USB.

First shutdown the computer. After which, plugin the USB, and reboot it. When the computer is booting up (before the screen was lit), keep pressing the hotkey to open the BIOS or boot order page.

Different keys needs to be pressed for computers of different manufacturers, but it is likely ESC, F1, F2, F12, or Backspace. When not sure, search on the internet.

There are now two possibilities:

  1. The computer enters the boot order page, and you shall select to boot into the USB;
  2. The computer enters bios directly, and you shall look for boot obtions and make USB the highest in boot order. Then exit BIOS and reboot.

You shall now boot into the live installation media.

Install the Operating System

You can explore the live installation media, whose behavior shall be similar to a Linux system installed in the harddrive.

Look for the OS installation software, open it, and follow the instructions for setting up languages, keyboard, users, etc.

You will likely be asked to setup the following:

  1. Hostname, which is the name of your computer in the local network. Get a good name. You can change the hostname later.
  2. User and password. This is the user name and password for you to sign into the computer.
  3. Root user password. Root user is the super user in Linux. You can set it the same as your user password. It is a customary practise on Linux to log in through an ordinary user account and only invoke privilege of the root user when needed. (Usually through sudo, standing for super user do.)
  4. Disk partitions. For beginners, I recommend to let Linux occupy the whole disk, that is, irrevocably wipe all data in the disk. Some older computers have two disks, one smaller and faster SSD, another bigger and slower HDD. Install Linux onto the SSD.
  5. Swap memory3. Swap is a portion of hard disk reserved for memory usage. When RAM is closed to full, some less frequently used data is transferred to swap. When the computer is hibernating, opened files stored in RAM are written to filesystems, and everything else is transferred to swap, which will be transferred back to RAM after hibernation. Swap is optional, but I recommend to always use swap, unless your disk space is very small (<100GB). The size of swap shall be bigger than RAM.

Reboot the computer and enjoy.

Footnotes

1

Apple has ‘closed’ its computer in the sense that it does not allow users to manipulate the computer hardware as they wish. As a result, one have to ‘hack’ the Mac to install Linux, which is a difficult and unreliable procedure. There are efforts to do this, nevertheless. See Asahi Linux.

3

See Arch Wiki for more.

Linux on Comment Line

You can navigate a gnome or KED Linux system with only GUI (Graphical User Interface), explore, configure, execute all tasks by clicking the buttons.

GUI is not the only method of controlling the computer. Contrary to it is the command line interface (CLI), that is, to control the computer by typing commands in the termial.

CLI has unique advantages:

  1. CLI can boost you productivity.
  2. CLI can help you automate tasks.
  3. Some workflow may only work with CLI, such as connecting to a remote server.
  4. Many utilities tools were developped with CLI in mind, which may be more convenient to use than their GUI counterparts. Examples are make, gcc, git, find, grep, tmux, etc.

For most modern computers the command line is only accessible via a shell which usually runs in a virtual terminal. The virual terminal1 is usually called the terminal emulater, or just ‘the terminal’, which process the input and the output, and is usually opened in a window.

A shell, in contrary, is command interpreter that executes the commands a user types into the terminal.

The shell and the terminal are two pieces of software that work together to provide the command line interface. Some sources may wrongly use ‘shell’ or ‘terminal’ to denote the command line interface itself.

In this guide, we will focus on Bash, the Bourne Again SHell, which is the default shell in most Linux systems and one of the most widely used.

The choice of terminal emulater, in comparison, depends on tastes. You can use any terminal emulater.

Learn Bash

You can master the command line in bash with six steps:

  1. Start the terminal, and identify which shell is running;
  2. Learn common shell commands, including man, apropos, echo, cat, wc, ping, etc;
  3. Navigating , viewing, creating, renaming, and moving files and directories;
  4. Use the package manger to install new command line utilities and learn to use them.
  5. Miscellenous advanced concpets including variable, pipeline, PATH, bashrc, alias, input/output redirection, globs pattern, etc.
  6. Write bash scripts.

Subsessions of this chapter will guide you through the first five steps. Writing bash scripts will be introduced in Let’s Write Code chapter.

References

  1. Effective Shell
  2. Gnu Bash Manual
  3. Advanced Bash Scripting Guides

Footnotes

1

In the early days, computers are huge, expensive, and are only used in big coorporations. There will be one central computer, called the server, and many users can use it simultaneously in different rooms via a terminal, which looks like a small TV with keyboard. The terminal will register user input, send it to the server, and return the server’s response. In 2024, personal computers are common, and there is no need for a physical terminal. The custom of using the terminal, however, is handed down, and users instead rely on a virtual terminal, which imitates the behavior of the physical terminal, to control the computer. See wikipedia entry for more.

Hi, Terminal

Most Linux systems are preinstalled with at least a shell and a terminal.

If you are using Gnome or KDE, search the ‘terminal’ app. On i3, sway, and hyprland, use mod+enter to open the default terminal. (mod key is likely windows or alt key).

The shell runs in the terminal, which shall look like this:

[hostname@username ]$ 

This is the shell prompt. Commands entered are shown after the prompt and are executed after pressing Enter.

All prompt are abbreviated to a single $ hereafter.

Try hello world:

$ echo "Hello, World!" # echo displays a line of text
Hello World!  # texts after hashtap signs are ignored

Although bash is the most widely-used shell, your default shell may not be bash. Check your default shell with:

$ echo $SHELL  # $SHELL is a bash variable
/bin/bash  # your default shell is bash

You can temporarily enter the bash shell with the command bash:

You can change the default shell with chsh.

$ chsh -s /bin/bash  # will prompt you to enter password.

Effect will take place after next log in (or just reboot).

Terminal Keyboard Shortcuts

Terminals has some idiosyncratic shortcuts. Significantly, Ctrl + C is not copy, but terminate the current command. Copy and paste in terminal are Ctrl + Shift + C and Ctrl + Shift + V, respectively.

Here are commonly used shortcuts implemented in most terminals:

ShortcutDescription
Ctrl + Shift + Ccopy
Ctrl + Shift + Vpaste
Up Arrowshow last command
Down Arrowshow next command
Tabauto-complete command
Ctrl + Lclear the screen
Ctrl + Ccancel a command
Ctrl + Rsearch the history
Ctrl + Dexit the terminal

These are some less common shortcuts:

ShortcutDescription
Ctrl + Amove to the beginning of the line
Ctrl + Emove to the end of the line
Ctrl + Udelete from the cursor to the beginning of the line
Ctrl + Kdelete from the cursor to the end of the line
Ctrl + Wdelete the word before the cursor

For full reference of terminal shortcuts, see Appendix and fly on command line.

Common Shell Commands

There are many shell commands. Here some commonly used one are presented with many practical examples.

echo, cat, and Output Redirection

As seen in the previous section, echo display the text.

echo can also print bash variables, which is preceded by the dollar sign, $.

$ echo $XDG_CURRENT_DESKTOP  # may not work on certain distros
sway  # I am using sway windows manager. TODO: check other distros
$ MY_VAR="YES!"  # Variable definition. Strictly no space besides `=`!
$ echo $MY_VAR
YES!
$ echo $EDITOR 
nvim  # default editor

cat can be used to show contents of a file.

The system informations on most Linux systems are stored in the file /etc/os-release.

$ cat /etc/os-release  # May not work on same distributions
NAME="Manjaro Linux"
PRETTY_NAME="Manjaro Linux"
ID=manjaro
ID_LIKE=arch
BUILD_ID=rolling
ANSI_COLOR="32;1;24;144;200"
HOME_URL="https://manjaro.org/"
DOCUMENTATION_URL="https://wiki.manjaro.org/"
SUPPORT_URL="https://forum.manjaro.org/"
BUG_REPORT_URL="https://docs.manjaro.org/reporting-bugs/"
PRIVACY_POLICY_URL="https://manjaro.org/privacy-policy/"
LOGO=manjarolinux

If cat /etc/os-release does not work, try cat /etc/*release. * is a glob wild card explained later. If none of them works, try hostnamectl. ctl stands for control.

A file can be created with echo and output redirection >, or redirection with append >>.

$ echo "Hello" > hello.txt 
$ cat hello.txt 
Hello
$ echo "World" >> hello.txt 
$ cat hello.txt 
Hello 
World

The behavior of the commands can be modified with flags.

$ cat -n hello.txt  # number lines in output
$ cat --help  # show cat help docs

Most commands will offer --help and --version flags to show short help documents and version. This is not a requirement, but a custom most programmers follow when creating the command.

Getting Help with man

--help flags will only show a short help message. For detailed manual, one should use man <command>.

Try man echo, man cat, and man man.

Man pages can be navigated by arrow keys or by scrolling. To exit it, press q. To search for a pattern, enter /<pattern>. Press h for helps on navigating the man pages.

Arch user may need to manually install the manual pages by pacman -S man-db.

man also offers manuals for system calls and library calls in different sections.

Check man 3 open and man 2 open, which will show the manual for open in section 2 and three.

apropos, less, and pipe

If you have forgotten the name of a command, use apropos <keywords> to search relevent commands related to <keywords>.

apropos will usually output a long list. You can display its content interactly by piping it to less:

apropos <keywords> | less

Navigating the contents presented by less is the same as navigating the man page. (Indeed, man pages likely use less).

| is the pipe keywords, which redirect the output of the previous command to be the input of the next command. It is explained in advanced concepts.

copy, and Glob Pattern

cp copies the files and directories.

Here are the examples:

$ cp file.txt newfile.txt  # create a new file named newfile.txt with the contents of file.txt
$ cp file.txt -t new/dir  # create a copy of file.txt and place it uder new/dir
$ cp -r /path/to/dir /new/dir  # the directory /path/to/dir recursively to /new/dir

A small detail many may miss is that cp -r /path/to/dir /new/dir will create /new/dir and make it a copy of /path/to/dir. To copy the contents of /path/to/dir and place them under /new/dir, use the following:

$ cp -r path/to/dir/* -t /new/dir  # copies every files and directories in dir/files/recursively and place them under /new/dir

Miscellanea

  • clear: clear terminal screen
  • sleep : sleep for some seconds
  • rmdir: remove empty directory
  • whatis: display one-line manual page descriptoin
  • whereis: find where does the command locate
  • ps: get pid of process
  • kill: kill a process with pid

More commands

There are many more shell commands, and we can not list everyone here.

To learn more, man pages and chatgpt are your friends.

Navigating the File System

Create and Navigate Directories and Files

When openning a terminal, the default working directory is your home directory. Check you current working directory by pwd

$ pwd
/home/<username>

Check files in current directories with ls:

$ ls 
Desktop      Documents  fork   Pictures  research  Screenshots  Templates
didacticism  Downloads  Music  Public    sandbox   study        Videos

You can also use cd, short for change directory, to change to home directory:

$ cd ~
$ cd $HOME  # same as above

The symbol ~ and variable $HOME will be automatically expanded into your home directory path. Check echo ~ and echo $HOME.

Make a new directory with mkdir:

$ mkdir scratch

To make multiple new directory, use -p, parent flag:

$ mkdir -p scratch/try/new_dir; cd scratch/try/new_dir
$ pwd
/home/<username>/scratch/try/new_dir

; is bash command separator.

You can make an empty file by touch

$ touch empty.txt
$ ls 
empty.txt

You can also use relative path to navigate directories:

$ cd ..  # go to the parent directory
$ cd ../..  # go to the parent of the parent

ls

ls is one of the most frequently used command.

In Linux (and Unix), files or directories that begin with . will be hidden.

$ touch .secret
$ ls  # prints nothing

To view hidden files, pass -a:

$ ls -a
. .. .secret  # . and .. are current and parent directory

You can use -l flags to list details

$ ls -al 
drwxr-xr-x  4 virtus virtus 4096 Oct 14 21:28 .
drwx------ 27 virtus virtus 4096 Oct 14 21:29 ..
-rw-r--r--  1 virtus virtus    0 Oct 14 21:28 .secret

The drwxr-xr-x is the file type and permission. d means directory. rwx means read, write, and execute for the owner, group, and others (will be explained later).

You can list files in order of time created with -t, and in reverse order with -r:

$ ls -lt  # list by time created, newest first
$ ls -ltr  # list by time created in reverse order

rm

rm is used to remove files.

$ touch a_file
$ rm a_file

To remove a directory, pass -r flag, which stands for recursive.

$ mkdir a_dir; touch a_dir/a a_dir/b
$ rm a_dir  
rm: cannot remove 'a_dir': Is a directory
$ rm -r a_dir  # remove success

NOTE remove with caution! rm will delete the file from the file-system, and it is extremely difficult to revert the removal.

You can instead use a tool like trash-cli, which will move the files into a trash-can.

Pacakge Manager

Package manager on Linux

Differet Linux distros are equipped with different package managers, but their functionailities are similar.

Arch and Arch derived distros, including Manjaro, Garuda, and EndeavorOS use pacman. Debian and its derivatives, including Ubuntu and Linux mint, uses apt. Red hat and related distros use dnf.

Download software and its dependencies just need one line:

$ sudo pacman -S neovim  # Arch, Manjaro, EndeavorOS, etc
$ sudo apt install neovim  # Debian, Ubuntu, Linux Mint, etc
$ sudo dnf install neovim  # RHEL, Centos, Fedora, etc
$ sudo snap install neovim  # Ubuntu

These commands will install the package system-wise and will need sudo privilege, requiring you to enter current user’s password. The password will not be shown, but will be recorded. Just type the password and press enter when done. You shall successfully install neovim, a terminal-based text editor.

If you do not have sudo privilege, the command will be refuted. Check FAQ for how to grant sudo privilege for user.

This is how to remove a package

$ sudo pacman -R neovim  # Arch, Manjaro, EndeavorOS, etc 
$ sudo apt remove neovim  # Debian, Ubuntu, Linux Mint, etc 
$ sudo dnf remove neovim  # RHEL, Centos, Fedora, etc
$ sudo snap remove neovim  # Ubuntu

Sometimes a package manager will not allow you to remove certain packages, as they are the dependencies of other packages. If you insist on removing it, your system may break.

For system upgrades:

$ sudo pacman -Syu  # Arch, Manjaro, EndeavorOS, etc 
$ sudo apt update && sudo apt upgrade  # Debian, Ubuntu, Linux Mint, etc 
$ sudo dnf update  # RHEL, Centos, Fedora, etc

Certain packages are registered with different names for different package mangers. For example, the libgl library is called libgl1-mesa-glx for aptand mesa-libGL for dnf. libgl is not listed as a single package but included in glut for pacman.

The easist way to find the name for your desired package is to search on Google.

You can check more usages of package manager here.

Package Manager on MacOS and Windows

Homebrew is a third-party package manager for Mac, Winget, the one for Windows. The overall experience using them, however, is inferior compared to Linux’s.

Advanced (but Common) Concepts

Bashrc

Bash can be configured with the bashrc file, located at $HOME/.bashrc.

Bashrc file contains bash scripts that will be executed when the interactive bash shell is launching.

To see it work, append this line to the end of ~/.bashrc

echo "I will become a great programmer!"

When opening a new terminal, you will see I will become a great programmer printed before the first prompt.

After updating .bashrc the current session will not automatically load the new configurations. You can also manually load the configurations by source ~/.bashrc.

Prompt

Bash prompt is stored in the variable PS1. You can change its value by running the following line:

PS1="\
\[\033[01;32m\]\u@\h\[\033[00m\]\
:\
\[\033[01;34m\]\w\[\033[00m\]\
:\033[02m\d \t\033[00m\n\$ "

A different prompt will be shown. If you like it, append this line onto .bashrc.

For more prompt customisation, see wiki and prompt generator.

Alias

Alias command in bash is defined this way

alias ll="ls -lar --sort t --color"  # append to .bashrc for persistant change

Now ll will be automatically expanded into ls -lar --sort t --color. To ignore the alias and use the original command, prepend with slash: \ll.

Path and Executable

In Linux, an executable file can be directly executed by entering its path.

Let us create a bash file and make it executable:

$ cd ~; mkdir executable_demo; cd executable_demo;
$ echo "echo hi executable" > hi.bash
$ chmod +x hi.bash  # make it executable
$ ~/executable_demo/hi.bash  # execute it by entering its full path
hi executable
$ ./hi.bash  # execute it by entering its relative path
hi executable

You can append the directory into PATH envrionment variable. (For persistant change modify the bashrc).

$ PATH="$PATH:$HOME/executable_demo"
$ hi.bash  
hi executable

When entering a bash command, bash will search all executable under $PATH directories and execute the first matched one.

Check your current PATH variable this way:

$ echo $PATH  # it shall look like this
/home/virtus/.local/bin:/home/virtus/bin:/usr/local/bin:/usr/bin:/home/virtus/.cargo/bin/
$ echo $PATH | tr ':' '\n'  # substitute : for \n, which is carriage return
/home/virtus/.local/bin
/home/virtus/bin
/usr/local/bin
/usr/bin
/home/virtus/.cargo/bin/

Glob pattern

Strings in bash containing *, ?, or [ are glob patterns, which are wildcard that can match strings following these rules (among others):

  1. * matches any strings including the empty string;
  2. ? match exactly one character of any sort;
  3. [, ] match any single character contained in the brackets

Glob pattern is different from regular expression, or regex.

See following examples:

$ touch a.txt b.txt abc.txt ab.md ac.md a.md  # create some files for experiment
$ ls *.txt  # list any files that end with .txt
abc.txt  a.txt  b.txt
$ ls ??.*  # list files of any extension that have two letters before .
ab.md  ac.md
$ ls [ab].txt
a.txt b.txt
$ ls [ab]*.txt
abc.txt  a.txt  b.txt
$ rm *  # rm all files

Stdin, Stdout, Stderr

Stdin, Stdout, Stderr are abbreviations for standard input, standard output, and standard error. These are stdio (standard input and output) streams used in operating system as a abstract layer to handle the inputs and outputs of all programs.

For simlicity:

  1. Stdin, stdout, stderr are represented by fd 0, 1, 2, respectively. Fd means file descriptor.
  2. If a bash command runs successfully, it may a return some information through stdout, which will be printed on the terminal;
  3. If a command failed, it may return the informations through stderr, which will be printed on the terminal and likely have no difference from stdout for users’s perspective;
  4. Many commands can handle inputs from command arguments and from stdin;
  5. Bash has many tools to handle stdio, including | (pipeline), > (output redirection), < (input redirection), etcs.

Check stdio(3)

Symbols controlling input and output are the most exploited features in bash. They are used to produce obfuscated code like this:

$ for i in $(echo -e 'G\nM\nK'); do du -hsx *  2>/dev/null | grep '[0-9]'$i | sort -rn; done

This code enumerate files in the current directory in order of size.

The following sesssion is a gental introduction for bash io symbols.

Pipeline, |

Pipeline redirect the stdout of the previous command to the stdin of the next command. This symbol is used with commands that accept input from the stdin.

Here is a naive example:

$ echo "hello" | wc -c  # count the number of characters in the string 

Many useful and creative commands depend on pipeline.

$ grep -r "pattern" . | wc -l  # count the number of lines that contain "pattern" in the current directory
$ ls -l | grep "drwx"  # list all directories in the current directory
$ grep -r "pattern" . | sort | uniq  # sort and remove duplicates
$ find . -type f | less  # list all files recursively and open in less

Most commands will treat stdin and command arguments differently.

One example is to count the total number of lines in all files under the current directory, by using find ., wc, xargs, and pipeline.

wc -l <filename> counts the number of lines in a file; find . -type f lists all files in the current directory. find . -type f | wc -l will, however, only counts the number of lines in the output of find . -type f, i.e., how many files there are in the current directory.

We need, instead, pipe the output of find to the arguments of wc. xargs is the commad used for this.

$ find . -type f | xargs wc -l # count total number of lines in files of current directory

Bash History, !, arrow keys, and Ctrl-r.

history command shows the bash command history.

Here is an example:

$ history
  (omitted)
  928  a
  929  history 
  930  man history 
  931  ls
  933  man history
  934  clear

You can use up and down arrow keys to substitute current command with history commands.

!<n> is the shortcut to execute the nth command in the history.

Chech man history for more.

Command Line Workflows

Almost everything that can be done with GUI can also be done with CLI.

Here is a selected list of common Linux workflows with CLI with some esoteric examples. For more examples, please consult stackoverflow and chatgpt.

Controlling the Computer

Cli can command the computer to shutdown now reboot.

$ shutdown now # shutdown the computer immediately
$ reboot # reboot the computer

Sound, display, and keyboard

$ # These commands may not be present on all systems or may not work for all environments
$ brightnessctl set 10%-  # increase the brightness of screen by 10%
$ amixer set Master 5%+  # increase audio by 5%. Only works on pulseaudio systems.
$ nmtui  # An intuitive tool to connects to wifi
$ gammastep -O 4000  # set the display color tempreture to 4000K. (For night light)

Many system configurations, such as controlling the mouse speed, set the display screen and resolutions, are controlled by the desktop environment. Desktop environments often offers some cli tools for more control.

Here are some examples for Gnome:

$ gsettings set org.gnome.desktop.background picture-uri "file:///path/to/your/image.jpg"

Flash iso image onto a USB and make it a bootable device

The first step of installing an operating system is to flash its installation image onto a usb. This can be done with the command cp on Linux.

First, find the path of your USB.

$ lsblk  # short for list block device
sda           8:0    1  58.6G  0 disk  # This is the USB
├─sda1        8:1    1   4.4G  0 part 
├─sda2        8:2    1   4.9M  0 part 
├─sda3        8:3    1   300K  0 part 
└─sda4        8:4    1  54.2G  0 part 
nvme0n1     259:0    0 476.9G  0 disk # This is the SDD
├─nvme0n1p1 259:1    0     1G  0 part /boot
├─nvme0n1p2 259:2    0    16G  0 part [SWAP]
└─nvme0n1p3 259:3    0 459.9G  0 part /
What is the output of `lsblk`?

The physical disk is presented as block device on Linux. Linux, like Unix, adopts the philosophy of “everything is a file”. The block devices are presented as files under /dev directory.

/dev/sda is the path of the block device representing the usb. /dev/sda1 for the first partition of the usb.

USB devices are usually named as sda , sdb, etc. It can be distinguished by its capacity; or just compare the output of lsblk before and after plugin a USB. The path for the USB will be in the form of \dev\sda.

Assuming the path for the required image is live-disk.iso, the following command will flash the image to the usb and make the USB a bootable device.

Note, the destination is not /dev/sda1, but /dev/sda, without the number.

$ cp live-disk.iso /dev/sda  # This may take a while.

There are also other methods; check this arch wiki.

Make a bootable device on Windows or on Mac is more complicated.

The easist method on Windows or on Mac is to download a media creation tool, such as balena etcher, and let it do the job.

Media creation tools may be more than 200 MB in size, and many time they do not work as well as the cp command.

Esoteric Examples

$ browsh # open a browser in terminal
$ tiv image.jpg # view an image in terminal
$ beep # make a beep sound.

Linux Ricing

Manually Install Linux

FAQ

Sudo Privilege

If using a modern OS installer when installing the OS, you shall be grated sudo privilege automatically.

If not, you can log in as a root user, and manually grant sudo access to a normal user.

$ su -  # login as root user. Enter root password
$ visudo  # this will open /etc/sudoer in vi
$ nano /etc/sudoers  # same as visudo, but open in nano

Enter this line into /etc/sudoers

<username> ALL=(ALL:ALL) ALL

You need to remember the root user to follow this procedure. If you do not know the root user password, try this. Otherwise, the only option left is restalling the OS.

NOTE: This procedure is only for a personal OS, and is a bad practise in general. If you are a system manager, you shall set the user groups with appropriate privileges and organise users into proper groups.

Linux, Unix?

Let’s Write Code

C

C++

Rust

Bash

Python

Python Reproducibility

Java

Assembly

Operating System

Appendix

The Power of Open Source

XZ Anecdote1

On March 2024 a backdoor was found in the popular open-source archiving tool xz. Hackers may use this backdoor to gain control of the computer in certain circumstances. This backdoor was created by a series of malicious code written by one of xz’s maintainers.

The interesting story is that, the backdoor was found by Andres Freund, a developer at Microsoft, when he noticed ssh login time increased from 0.3 seconds to 0.9 seconds. He then analyze the execution of ssh login, the source code and commit history of xz, and found the backdoor.

Such backdoor, if exists on Windows or MacOS for an closed-sourced software, will likely never to be discovered, as it is common for Windows or MacOS to slowdown suddenly for no apparent reasons, let alone reviewing the logs and source code for a closed-source software.

Many, however, criticise open-source software for the lack of reliability, citing this instance. I hereby refute their opinions. A closed-sourced software will not offer more reliability[^reliability], and the creaters of these software may inject malicious backdoors with equal probability, and the difference is no one will ever find these backdoors.

1

Andres Freund found the backdoor. News report.

Buying a Computer

This is a short guide for choosing a computer. For comprehensive advice, look up on Youtube or Bilibili.

Laptop

The pragmatic advise is to pick a laptop from a reputable company within your price range that was made no longer than one year ago.

My list of reputable companies include Lenovo, Xiaomi, Asus, Acer, Dell, and HP.

I suggest against buying old, second-handed, or refurbished laptops, because a computer made today may be similar in price to another made three years ago, but in much better quality. This is due to the rapid development in computer manufacturing technologies that has never been slowed down since its inception.

I would also avoid chromebooks and laptops made by Microsoft, as they offer poor quality-to-price ratio.

Desktop

Choices for pre-assembled desktops made from a reputable company are few.

If you are not an expert, it is never a good idea to purchase a computer of a less-known brand, because those manufacturers may exploit your ignorance and sell poor products with high prices.

So my advice is, either to buy a laptop from a reputable company, or to seek advices from a trust-worthy source before buying a desktop.

There is also the option to buy each piece of the hardware separately and assemble the computer by yourself. Again, do your research beforehand.

Apple Computers

Apple computers are of great quality, but they are also much more expensive compared to other products of similar quality.

If you want to buy an Apple computer, check the alternatives in similar price range. You will likely find them equipped with more RAM and more powerful CPU and GPU.

That being said, Apple computers can be a good choice if you don’t care about money.

Dedicated GPU?

This passage does not apply to Apple’s computer.

There are two kinds of GPU on the market for non-Apple computers, intergrated and dedicated. Most CPU are incorporated with a small GPU on the same die, these are called intergrated GPUs. Dedicated GPU is independent from CPU, likely more expensive, but much more powerful.

Some computers only have intergrated GPU, and the classical question is whether one need a dedicated GPU.

The short answer is that anyone performing tasks related to gaming, video or photo editing, AI/ML, and simulation will need a dedicated GPU. Others likely do not.

A dedicated GPU, nevertheless, is always helpful if it can be fitted into the budget.

How much RAM?

The more the better, and 12 GB is the minimum.

Terminal References

Glossary

A

  • API, Application Programming Interface

B

C

  • CPU, Central Processing Unit

D

  • Disk

E

  • e.g., exempli gratia (Latin), for the sake of the example
  • etc., et cetera (Latin), and others

F

  • Freqency, Or Clock Rate,

G

  • GPU, Graphics Processing Unit

H

I

  • i.e., id est (Latin), that is (literaly)

J

K

L

M

N

  • N.B., nota bene (Latin): Take good notes! (imperative). Literaly means note well.

O

P

Q

R

  • RAM, Random Access Memory

S

T

U

V

W

X

Y

Z

Writing Guide

呂不韋乃使其客人人著所聞,集論以為八覽、六論、十二紀,二十餘萬言,以為備天地萬物古今之事,號曰呂氏春秋。布咸陽市門,懸千金其上,延諸侯游士賔客有能增損一字者予千金。– 史記 呂不韋列傳第二十五

Lü Buwei ordered … to write a book called Lü’s Annals, recording anything that has taken place under the heaven and since the creation. (After it was finished) The book was exhibited in the market. Anyone that can delete or add one word into it was rewarded with thousands of gold, which was hanged above it. – The Records of the Grand Historian, Sima Qian

The ultimate standard for tutorial like this is to make it informative, succinct, easy to read, and enjoyable to read, ranking in the descending order of importance.

To put it rhetorically, the removal of any one word will hamper the passage, while the inclusion of another will make it redundant.

The goal is not, however, either to make this writing a piece of grand style literature of Shakespeare or Walter Scott, or to make it like an encyclopedia with only boring facts.

informative

A passage is informative if it provides good informations. Delete every sentence that does not to acheive this.

Succinct

A succint passage is an informative passage that is short. The goal is to express the same information in fewest words without losing any quality or making the writing awkward.

Here is a non-succint passage:

The common standard for a piece of succinct writings is that it can express a lot of information in relatively small number of words, while at the same time expressing the same idea without losing any details or make the writing bad in expression.

In general I believe all writing shall be succinct, particularly technical writings.

Pointers to Further Study

Bibliography

  1. Silberschatz, A., Galvin, P.B. and Gagne, G. (2018). Operating system concepts. [online] Hoboken, N.J Wiley. Available at: https://os.ecci.ucr.ac.cr/slides/Abraham-Silberschatz-Operating-System-Concepts-10th-2018.pdf.

Postscript

Postscript comes from the Latin phrase post scriptum, meaning written in the end.

I designed this book to serve as a guide instead of the definitive source of learning. This means it will not contain every detail one needs to know; instead, it surveys the areas the author deemed important for a beginner and provides author’s comments with information for furthur study. In particular, this book emphasises equally on the theories and development of computer science as well as the pragmatic tips for writing the code.

Such a design is deliberate, as the difficulties of learning to code today seems to be not the lack of the information, but the lack of organisation and credibility of them.

The author strives to make this book informative, succinct, easy-to-read, and attractive, and wishes it be helpful for the fellow comrades in their early journey of computer science.

The author acknowledges that he is constrained and prejudiced by his experience. Instead of writing passages with only undeniable facts, and in such way making this book a liveless encylopedia, the author believes, it is worthy to envigorate this book by presenting his opinions, where appropriate, in the price of potentially conflicting with those of his audiences.

As of July 2024, the book is still in early draft. The content is prereleased on book.yetin.net and Github.

This book is written in markdown and built with mdBook. It is deployed on Netlify with domain name registered on Squarespace.