在上次的getpid系統調用中,發現getpid函數只能第一次執行進入系統調用,后面的就直接執行,似乎沒利用系統調用。
先查一下直接利用int $0x80的系統調用流程。
函數如下:
int GetpidAsm(int argc, char **argv)
{
pid_t pid;
asm volatile(
"mov $20, %%eax\n\t"
"int $0x80\n\t"
"mov %%eax, %0\n\t"
:"=m"(pid)
);
printf("current process's pid(ASM):%d\n",pid);
return 0;
}
在系統調用執行的時候,函數就停在了設置的斷點sys_getpid處,如下圖:
圖中的SYSCALL_DEFINE宏甚是顯眼,有資料解釋如下:
It is used (obviously) to define the given block of code as a system call. For example, fs/ioctl.c has the following code :
SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
/* do freaky ioctl stuff */
}
Such a definition means that the ioctl syscall is declared and takes three arguments. The number next to the SYSCALL_DEFINE means the number of arguments. For example, in the case of getpid(void), declared in kernel/timer.c, we have the following code :
SYSCALL_DEFINE0(getpid)
{
return task_tgid_vnr(current);
}
只不過getpid(void)現在的位置挪到了kernel/sys.c中。
- 邏輯似乎清晰了些,不妨去查查SYSCALL_DEFINE的水表。
在Linux/include/linux/syscalls.h這個目錄下,我們可以看到
Linux系統調用之SYSCALL_DEFINE已經有前輩解釋了這樣做的目的,以及為什么要這樣。不得不服其中的精妙。
這就是為什么,在我們使用getpid()這個函數的時候,我們并不知道系統究竟做了什么,因為系統里面并沒有這個函數的直接實現。而是通過一堆宏定義在預處理的時候展開。
那么索性也來做一次展開,對getpid的展開。
系統停在斷點sys_time的時候,代碼停在了這個位置:
│816 SYSCALL_DEFINE0(getpid)
b+>│817 {
│818 return task_tgid_vnr(current);
│819 }
宏展開規則如下:
175 #define SYSCALL_METADATA(sname, nb, ...)
...
178 #define SYSCALL_DEFINE0(sname) \
179 SYSCALL_METADATA(_##sname, 0); \
180 asmlinkage long sys_##sname(void)
使用的宏為SYSCALL_DEFINE0(getpid) ->asmlinkage long sys_getpid(void);
顯然,展開之后的函數變為:
│816 asmlinkage long sys_getpid(void)
b+>│817 {
│818 return task_tgid_vnr(current);
│819 }
這個時候,對sys_getpid()的調用一目了然。
那么,顯然,接下來的調用是傳入的參數current,不是很明白。
│10 DECLARE_PER_CPU(struct task_struct *, current_task);
│11
│12 static __always_inline struct task_struct *get_current(void)
│13 {
>│14 return this_cpu_read_stable(current_task);
│15 }
│16
│17 #define current get_current()
#define this_cpu_read_stable(var) percpu_from_op("mov", var, "p" (&(var)))
只是一個宏,在單CPU上,應該沒有效果。
- 接下來執行到task_tgid_vnr
│1770 static inline pid_t task_tgid_vnr(struct task_struct *tsk)
│1771 {
│1772 return pid_vnr(task_tgid(tsk));
│1773 }
- 處理傳入的參數task_tgid,實際上返回了一個結構體pid。
│1708 static inline struct pid *task_tgid(struct task_struct *task)
│1709 {
>│1710 return task->group_leader->pids[PIDTYPE_PID].pid;
│1711 }
- upid、pid、pid_link定義如下:
44 /*
45 * struct upid is used to get the id of the struct pid, as it is
46 * seen in particular namespace. Later the struct pid is found with
47 * find_pid_ns() using the int nr and struct pid_namespace *ns.
48 */
49
50 struct upid {
51 /* Try to keep pid_chain in the same cacheline as nr for find_vpid */
52 int nr;
53 struct pid_namespace *ns;
54 struct hlist_node pid_chain;
55 };
56
57 struct pid
58 {
59 atomic_t count;
60 unsigned int level;
61 /* lists of tasks that use this pid */
62 struct hlist_head tasks[PIDTYPE_MAX];
63 struct rcu_head rcu;
64 struct upid numbers[1];
65 };
66
67 extern struct pid init_struct_pid;
68
69 struct pid_link
70 {
71 struct hlist_node node;
72 struct pid *pid;
73 };
- pid_vnr實際上調用了pid_nr_ns,傳入了一個task_active_pid_ns來獲取namespace,不太懂。。。
其實最后返回的就是pid_nr_ns返回的nr;
│497 pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)
│498 {
│499 struct upid *upid;
│500 pid_t nr = 0;
│501
│502 if (pid && ns->level <= pid->level) {
│503 upid = &pid->numbers[ns->level];
│504 if (upid->ns == ns)
│505 nr = upid->nr;
│506 }
│507 return nr;
│508 }
│509 EXPORT_SYMBOL_GPL(pid_nr_ns);
│510
│511 pid_t pid_vnr(struct pid *pid)
>│512 {
│513 return pid_nr_ns(pid, task_active_pid_ns(current));
│514 }
- 獲取namespace
│542 struct pid_namespace *task_active_pid_ns(struct task_struct *tsk)
│543 {
>│544 return ns_of_pid(task_pid(tsk));
│545 }
│546 EXPORT_SYMBOL_GPL(task_active_pid_ns);
│124 /*
│125 * ns_of_pid() returns the pid namespace in which the specifie
│126 * allocated.
│127 *
│128 * NOTE:
│129 * ns_of_pid() is expected to be called for a process (task) that has
│130 * an attached 'struct pid' (see attach_pid(), detach_pid()) i.e @pid
│131 * is expected to be non-NULL. If @pid is NULL, caller should handle
│132 * the resulting NULL pid-ns.
│133 */
│134 static inline struct pid_namespace *ns_of_pid(struct pid *pid)
│135 {
│136 struct pid_namespace *ns = NULL;
>│137 if (pid)
│138 ns = pid->numbers[pid->level].ns;
│139 return ns;
│140 }
- 數據結構太復雜,有些關鍵的地方并不理解什么意思。從字面上理解,分析到這里的時候,并沒有發現有貓膩。
突然有個想法,進程本身并沒有有局部變量或者全局變量來保存這個pid的值,因為沒必要(進程結束回收后,進程號直接作廢了,每次啟動的時候都會分配不同的pid)。
那么會不會是編譯器的原因,這個值放在了寄存器中了?畢竟是1號進程,這個值以后也不再會變動了,編譯器發現1號進程的pid不會變化,把這個值緩存起來了?每次需要讀取的時候,直接從這里拿?
在我們使用標準API的時候,一般都會包含unistd.h這個頭文件。
我們需要的信息都隱藏在這里面。
unistd.h 中所定義的接口通常都是大量針對 系統調用的封裝(英語:wrapper functions),如 fork、pipe 以及各種 I/O 原語(read、write、close 等等)
還沒有好的思路,下次再寫。。
stackoverflow上一個大牛的回答,還沒有完全理解。
What is better “int 0x80” or “syscall”?
My answer here covers your question.
In practice, recent kernels are implementing a VDSO, notably to dynamically optimize system calls (the kernel sets the VDSO to some code best for the current processor). So you should use the VDSO, and you'll better use, for existing syscalls, the interface provided by the libc.
Notice that, AFAIK, a significant part of the cost of simple syscalls is going from user-space to kernel and back. Hence, for some syscalls (probably gettimeofday, getpid...) the VDSO might avoid even that (and technically might avoid doing a real syscall). For most syscalls (like open, read, send, mmap ....) the kernel cost of the syscall is large enough to make any improvement of the user-space to kernel space transition (e.g. using SYSENTER or SYSCALL machine instructions instead of INT) insignificant.
注意這一句:
**Hence, for some syscalls (probably gettimeofday, getpid...) the VDSO might avoid even that (and technically might avoid doing a real syscall). **大牛的回答,要看這么多東西,給跪了。
It is explained in Linux Assembly Howto. And you should read wikipedia syscall page (and also about VDSO), and also intro(2) & syscalls(2) man pages. See also this answer and this one. Look also inside Gnu Libc & musl-libc source code. Learn also to use strace
to find out which syscalls are made by a given command or process.
See also the calling conventions and Application Binary Interface specification relevant to your system. For x86-64 it is here.
- 又見一出資料,顯示,getpid 緩存了pids
C library/kernel differences
Since glibc version 2.3.4, the glibc wrapper function for getpid() caches PIDs, so as to avoid additional system calls when a process calls getpid() repeatedly. Normally this caching is invisible, but its correct operation relies on support in the wrapper functions for fork(2), vfork(2), and clone(2): if an application bypasses the glibc wrappers for these system calls by using syscall(2), then a call to getpid() in the child will return the wrong value (to be precise: it will return the PID of the parent process). See also clone(2) for discussion of a case where getpid() may return the wrong value even when invoking clone(2) via the glibc wrapper function.
思路斷掉了,不知道該怎么捋清楚。
但是在使用API getpid的時候,是如何和聯系上系統調用呢?兩者是如何對應起來的呢?glibc中有如下的代碼:
pid_t getpid(void)
{
pid_t (f)(void);
f = (pid_t ()(void)) dlsym (RTLD_NEXT, "getpid");
if (f == NULL)
error (EXIT_FAILURE, 0, "dlsym (RTLD_NEXT, "getpid"): %s", dlerror ());
return (pid2 = f()) + 26;
}
這個dlsym的大概意思是說從打開的共享庫中找到getpid這個函數的地址,然后直接拿來調用。
之前看到過vdso,難道和這個有關系?
最后返回值加上26是什么意思,又不明白了。