About how fork and exec work on Unix

March 11, 2023

This article is about how fork and exec work on Unix. You may already know, and some people still don't know. When I learned about it a few years ago, I was amazed.
What we have to do is start a process. We've talked a lot about system calls on the blog, and every time you start a process or open a file, this is a system call. So you might think there is such a system call:
Start_process(["ls","-l","my_cool_directory"])
This is a reasonable idea, obviously this is how it works in DOS or Windows. What I want to say is that this is not how Linux works. However, I looked at the documentation and there is indeed a system call for posix_spawn that basically does, but this is beyond the scope of this article.
Fork and exec
The posix_spawn on Linux is implemented through two system calls, fork and exec (actually execve), which are often used by people. Although people use posix_spawn on OS X, and fork and exec are not recommended, we will discuss Linux.
Every process in Linux exists in the Process Tree. You can view the process tree by running the pstree command. The root of the tree is init and the process number is 1. Every process (except init) has a parent process, and a process can have many child processes.
So, suppose I want to start a process called ls to list a directory. I just want to start a process ls just fine? no.
What I have to do is create a child process, which is a clone of my (me) itself, and then the "brain" of the child process is eaten and becomes ls.
The beginning is like this:
My parent
|- me
Then run fork() to generate a child process, which is a clone of my (me) myself:
My parent
|- me
|-- cloneof me
Then I let the child process run exec("ls") and it becomes like this:
My parent
|- me
|-- ls
When the ls command ended, I almost changed back to myself:
My parent
|- me
|-- ls(zombie)
At this point ls is actually a zombie process. This means it is dead, but it is still waiting for me in case I need to check its return value (using the wait system call). Once I get the return value of it, I will resume the state of being alone.
My parent
|- me
Code implementation for fork and exec
If you are writing a shell, this is an exercise you must do.
It turns out that with C or Python skills, you can write a very simple shell in a few hours, like bash. (At least if you have someone who knows a little better next to you, if you don't have it, it will take a little longer.) I have finished it, it's really great.
This is the implementation of fork and exec in the program. I wrote a pseudo code for C. Remember, fork can also fail.
Intpid = fork();
// I want to be separated.
// Who is "I"? May be a child process or a parent process
If(pid == 0){
// I am a child process now
// "ls" eats my brain and turns into a completely different process
Exec(["ls"])
}elseif(pid == -1){
// God, the fork failed, itâ€™s a disaster!
}else{
// I am the father process yeah
// Continue to be a cool beauty man.
// I can wait for the child process to finish if needed
}
What does the above mean by "the brain is eaten"?
A process has many properties:
Open files (including open network connections)
Environmental variable
Signal handler (What happens when you run Ctrl + C on your program?)
Memory (your "address space")
register
Executable (/proc/$pid/exe)
Cgroups and namespaces (related to Linux containers)
Current working directory
User running the program
Other things I have not thought of yet
When you run execve and let another program eat your brain, virtually everything is the same! You have the same environment variables, signal handlers, open files, and more.
The only thing that changed was memory, registers, and running programs, which was a big deal.
Why is fork not so resource intensive (copy on write)
You might ask: "If I have a process that uses 2GB of memory, does this mean that every time I start a child process, all 2 GB of memory has to be copied once? This sounds a lot of resources!"
In fact, Linux implements a copy-on-write copy of write for the fork() call. For the 2GB of memory of the new process, it is like "Look at the old process, it's the same!". Then, if any process tries to write to memory, then the system actually copies a copy of the memory to the process. If the memory of the two processes is the same, you do not need to copy it.
Why do you need to know so much?
You might say, well, these details sound great, but why is it so important? Will the details about signal handlers or environment variables be inherited? What is the actual impact of my daily programming?
It is possible! For example, there is a very interesting bug on Kamal's blog. It discusses how Python makes signal handlers ignore SIGPIPE. That is, if you run a program from Python, it will ignore SIGPIPE by default! This means that the program will behave differently from Python scripts and from shell startup. In this case, it will cause a strange problem.
Therefore, the environment of your program (environment variables, signal handlers, etc.) may be important and is inherited from the parent process. Knowing this is useful when debugging.

Manual Pulse Generator
A manual pulse generator (MPG) is a device normally associated with computer numerically controlled machinery or other devices involved in positioning. It usually consists of a rotating knob that generates electrical pulses that are sent to an equipment controller. The controller will then move the piece of equipment a predetermined distance for each pulse.
The CNC handheld controller MPG Pendant with x1, x10, x100 selectable. It is equipped with our popular machined MPG unit, 4,5,6 axis and scale selector, emergency stop and reset button.
Manual Pulse Generator,Handwheel MPG CNC,Electric Pulse Generator,Signal Pulse Generator
Jilin Lander Intelligent Technology Co., Ltd , https://www.jilinlandermotor.com