Process

From Rice Wiki
Revision as of 18:27, 4 October 2024 by Rice (talk | contribs) (→‎Machine state)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A program is a passive set of machine code instructions and data stored in an executable image. A process can be thought of as this passive program in action. More formally, it is one or more threads in their own address space.

Processes are independent separate tasks. If one process crashed, it will not affect any other processes. The end goal of processes is to achieve the illusion of having multiple CPUs when executing multiple programs. This is done by virtualization.

Machine state

Representing the execution of a program, a process includes the state of the program first and foremost. This includes

Beyond the state of the program, each process is allocated its own process identifier (PID) and virtual memory among other things.

Virtualization

The virtualization of CPU's is what allows more processes than CPU to exist. This is implemented by low-level time-sharing mechanisms. On top of these mechanisms resides scheduling policies, which decides which program should run.

API

Typical process APIs include the following:

  • Create
  • Destroy - Killing processes forcefully
  • Wait - Suspend a process to be restarted later
  • Status - Obtain current information of the process

There are many other misc. controls that are possible.

Process creation

  1. Code and static data are loaded into memory. In modern operating systems, this is done lazily.
  2. Allocate memory such as runtime stack and heap
  3. Perform initialization tasks, especially those related to I/O.
  4. Start the program

Process state

A process' state diagram from Three Pieces textbook

A process has three states

  1. Running - Its instructions are currently being executed by the CPU
  2. Ready - Ready to run but OS is not running it for misc. reasons (such as time-sharing)
  3. Blocked - Waiting some I/O to finish, such as requesting a write to disk.

A process is scheduled when the OS moves it from Ready to Running. It is descheduled when it is moved from Running to Ready. This decision is made by the scheduler.

The kernel also keeps a table of all the metadata of processes. This include PID, parent, page tables, etc.

Processes in Linux

When the computer starts, the only thing the kernel execute in the user space is a init command (shown with ps -f 1). This process is allocated a PID of 1 and is the only process with no parent process.

In contrast to the init process, all other processes are created by parent processes. This is done with fork and exec. First, the shell will run fork to create a cloned process from the current process, and exec replaces the program in the cloned process with another program.

The syscall fork clones a process almost perfectly. For example, the child process would not start running at main; instead, it continues where the parent leaves off. The only differences between the two processes is the return value of fork: In the parent it is the PID of the child, whereas in the child it is 0. This syscall is notably non-deterministic, as the CPU scheduler is free to run either process before the other.

The wait syscall halt execution of the parent until the child is complete.

To run a new process, the exec syscall can be used. It replaces the current process with that specified in the arguments.

Motivation of fork

The motivation behind this unintuitive set of APIs (fork, wait, exec) is to allow the parent process time to alter the environment of the child process.

Sources