Skip to the content.

关于信号中断与慢系统调用的深度发现

2015-08-19 11:22:59


这段时间在看Unix网络编程卷1,在5.9节处理SIGCHLD信号,关于处理僵死进程第四步如下写道:信号是在父进程阻塞于慢系统调用(accept)时由父进程捕获的,内核就会使慢系统调用(accept)返回一个EINTR错误。

看到上面那段落的时候,想到我前段时间写网络服务器遇到的问题,链接地址:http://bbs.csdn.net/topics/391032981,其实里面也有我关于这方面问题的困惑。

总结一下我论坛的那个问题,其实我无论如何是不能通过信号中断,测试epoll_wait出错errno置EINTR进而执行程序退出的,说个很简单的理由,当选用IDE进行调试程序的时候,若程序增加的有断点,程序会在步入断点的时候接收到SIGTRAP信号,这样一样会使慢系统调用返回错误并置errno为EINTR,这个时候的我肯定是不想程序就此退出。当然那个问题后来我是通过SocketPair解决的,其一用于epoll_wait监听,另一个在需要退出的时候写入数据,epoll_wait返回的时候,首先判断用于监听的那个套接字是否需要接收数据,若有数据需要接收那就是我通知程序退出呢。

接下来关于信号中断与慢系统调用的两个测试,测试环境Centos7 64,所有代码均无错误处理,请自行包含相关头文件

void sig_func(int signo)
{
	printf("catch %d\n", signo);
}

int main(int argc, char *argv[])
{
	signal(SIGINT, sig_func);

	int lfd = socket(AF_INET, SOCK_STREAM, 0);

	struct sockaddr_in addr;
	addr.sin_family = AF_INET;
	addr.sin_port = htons(1234);
	inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

	bind(lfd, (struct sockaddr *)&addr, sizeof(addr));

	listen(lfd, 10);

	while (true) {
		struct sockaddr_in caddr;
		socklen_t len = sizeof(caddr);

		int cfd = accept(lfd, (struct sockaddr *)&caddr, &len);
		printf("accept %d\n", cfd);
	}
}

多次Ctrl+C,程序作如下输出:

[root@bogon Debug]# ./server 
^Ccatch 2
^Ccatch 2

现像是通过Ctrl+C产生中断,并不会使accept函数返回,这和我们预想的不一样啊,这可能就是因为系统的signal已如Unix网络编程书中所说的那样,默认增加了自动重启这个标志,这时只能求助于手册,手册中有如此写道:

Furthermore, certain
       blocking system calls are automatically restarted if interrupted by a signal handler (see signal(7)).  The BSD semantics are equivalent to calling sigaction(2) with the
       following flags:

           sa.sa_flags = SA_RESTART;
void sig_func(int signo)
{
	printf("catch %d\n", signo);
}

int main(int argc, char *argv[])
{
	signal(SIGINT, sig_func);

	int lfd = socket(AF_INET, SOCK_STREAM, 0);

	struct sockaddr_in addr;
	addr.sin_family = AF_INET;
	addr.sin_port = htons(1234);
	inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

	bind(lfd, (struct sockaddr *)&addr, sizeof(addr));

	listen(lfd, 10);

	fd_set fds;
	FD_ZERO(&fds);
	FD_SET(lfd, &fds);
	int res = select(lfd+1, &fds, NULL, NULL, NULL);
	printf("select %d -- errno %d\n", res, errno);
}

单次Ctrl+C,程序作如下输出:

[root@bogon Debug]# ./server 
^Ccatch 2
select -1 -- errno 4

代码其实基本还是上面的代码,只是将想验证的系统调用换成了select调用,这个又和我们预想的一样,但是若真如我们手册中看到那样,这又是错误的啦,这时我们只能再次求助于帮助手册,又好好翻了翻手册,终于找到原因啦。

   Interruption of system calls and library functions by signal handlers
       If a signal handler is invoked while a system call or library function call is blocked, then either:

       * the call is automatically restarted after the signal handler returns; or

       * the call fails with the error EINTR.

       Which of these two behaviors occurs depends on the interface and whether or not the signal handler was established  using  the  SA_RESTART  flag
       (see sigaction(2)).  The details vary across UNIX systems; below, the details for Linux.

       If a blocked call to one of the following interfaces is interrupted by a signal handler, then the call will be automatically restarted after the
       signal handler returns if the SA_RESTART flag was used; otherwise the call will fail with the error EINTR:

           * read(2), readv(2), write(2), writev(2), and ioctl(2) calls on "slow" devices.  A "slow" device is one where the I/O call may block for  an
             indefinite time, for example, a terminal, pipe, or socket.  (A disk is not a slow device according to this definition.)  If an I/O call on
             a slow device has already transferred some data by the time it is interrupted by a signal handler, then the call  will  return  a  success
             status (normally, the number of bytes transferred).

           * open(2), if it can block (e.g., when opening a FIFO; see fifo(7)).

           * wait(2), wait3(2), wait4(2), waitid(2), and waitpid(2).

           * Socket  interfaces: accept(2), connect(2), recv(2), recvfrom(2), recvmsg(2), send(2), sendto(2), and sendmsg(2), unless a timeout has been
             set on the socket (see below).

           * File locking interfaces: flock(2) and fcntl(2) F_SETLKW.

           * POSIX message queue interfaces: mq_receive(3), mq_timedreceive(3), mq_send(3), and mq_timedsend(3).

           * futex(2) FUTEX_WAIT (since Linux 2.6.22; beforehand, always failed with EINTR).

           * POSIX semaphore interfaces: sem_wait(3) and sem_timedwait(3) (since Linux 2.6.22; beforehand, always failed with EINTR).

       The following interfaces are never restarted after being interrupted by a signal handler, regardless of the use of SA_RESTART; they always  fail
       with the error EINTR when interrupted by a signal handler:

           * Socket  interfaces,  when  a timeout has been set on the socket using setsockopt(2): accept(2), recv(2), recvfrom(2), and recvmsg(2), if a
             receive timeout (SO_RCVTIMEO) has been set; connect(2), send(2), sendto(2), and sendmsg(2), if a send timeout (SO_SNDTIMEO) has been set.

           * Interfaces used to wait for signals: pause(2), sigsuspend(2), sigtimedwait(2), and sigwaitinfo(2).

           * File descriptor multiplexing interfaces: epoll_wait(2), epoll_pwait(2), poll(2), ppoll(2), select(2), and pselect(2).

           * System V IPC interfaces: msgrcv(2), msgsnd(2), semop(2), and semtimedop(2).

           * Sleep interfaces: clock_nanosleep(2), nanosleep(2), and usleep(3).

           * read(2) from an inotify(7) file descriptor.

           * io_getevents(2).

       The sleep(3) function is also never restarted if interrupted by a handler, but gives a success return: the number of seconds remaining to sleep.

这里大致说明一下就是:当一个信号被捕获处理时正阻塞着系统调用或是库函数,将会有两种情况发生,一种是调用将会自动重启当信号处理函数返回的时候,另一种是调用失败返回errno置EINTR。因为signal默认增加SA_RESTART标志,所以上面的是中断因为有标志自动重启的,下面的是无论如何都会返回错误的。

经过测试,若自己通过sigaction函数提供Signal函数调用,并在函数中取消SA_RESTART标志,这样的话,第一个测试程序将会有如下输出:

[root@bogon Debug]# ./server 
sigaction successed.
^Ccatch 2
accept -1
^Ccatch 2
accept -1

关于Signal函数这里不再实现。

这里我们就不再验证epoll_wait,因为根据论坛提的问题来看,epoll_wait也是会返回错误的。至于为什么在多线程的时候某种情况下就不会返回,这就要牵涉到捕获信号的机制了吧,相关书籍我没有看过,我这里用不太准确的话描述一下,就是若假定程序有两个线程,主线程启动,执行signal捕获信号(这里是主线程执行该语句哦),另一个线程执行epoll_wait,这个时候若产生信号,将会由主线程负责捕获和处理所有信号,这对另一个线程没有产生任何影响,所以epoll_wait才不会返回的。总结性来说,多线程中只会有一个线程来捕获和处理信号。