From: Jeff Dike Cc: user-mode-linux-devel@lists.sourceforge.net The following stress-test crashed the 2.6 kernel - this was fixed in 2.6.4-1um release - this is the second part (historically speaking) of the fix: $ while /bin/true ; do /bin/true ; done Also, since it's a one-liner, I've added this recent (on top 2.6.10) patch for arch/um/kernel/skas/process.c: #@@ -323,9 +323,10 @@ block_signals(); if(sigsetjmp(fork_buf, 1) == 0) new_thread_proc(stack, handler); - set_signals(flags); remove_sigstack(); + + set_signals(flags); } void thread_wait(void *sw, void *fb) Quoting from Jeff Dike about the main one: "The patch below fixes the 2.6 scheduler crash. It's another subtle signal problem. This one is caused by siglongjmp restoring the old signal mask before calling longjmp. See the comment for the gory details. I've had the stress test running for more than an hour with this patch. Without it, it would be lucky to go for five minutes. Jeff" Quoting from Jeff Dike about the second patch: "This patch fixes a long-standing problem in skas mode process creation. Chris Aker has been seeing it at linode, and found a way of reproducing it. Once I spotted the bug, I found an easier way: ping flood the UML from the host while running while true; do ls > /dev/null; done In 10-15 seconds, UML will simply exit back to the shell with a segfault, no panic, no output, no nothing. When UML sets up the kernel stack for a new process, it sends itself a SA_ONSTACK signal with the signal stack being the new kernel stack. It calls setjmp there to set up a context that it can longjmp to when the new process is run for the first time. The problem was that, while signals were blocked during this, they were re-enabled before SA_ONSTACK was disabled. Thus, a signal arriving at the wrong time, between signals being turned on and SA_ONSTACK being disabled, would cause the signal to be handled on the stack, destroying the context that had been set up there. When the new process ran, it would longjmp to this trashed stack, and UML would die. Jeff" About the backport itself I have a doubt - in the original patch, there is this hunk for arch/um/kernel/time_kern.c, which is not very well explained: irqreturn_t um_timer(int irq, void *dev, struct pt_regs *regs) { + unsigned long flags; + do_timer(regs); - write_seqlock_irq(&xtime_lock); + write_seqlock_irqsave(&xtime_lock, flags); timer(); - write_sequnlock_irq(&xtime_lock); + write_sequnlock_irqrestore(&xtime_lock, flags); return(IRQ_HANDLED); } The 2.4 code is different, but there is anyway a write_lock() (Not a write_lock_irq()). Since I was in doubt, I have anyway turned the write_lock() into a write_lock_irqsave(). If this was unneded, it would only affect performance anyway, not correctness. Signed-off-by: Paolo 'Blaisorblade' Giarrusso --- um-linux-2.4.27-paolo/arch/um/include/signal_user.h | 2 ++ um-linux-2.4.27-paolo/arch/um/kernel/skas/process.c | 13 +++++++++++++ um-linux-2.4.27-paolo/arch/um/kernel/time_kern.c | 6 ++++-- 3 files changed, 19 insertions(+), 2 deletions(-) diff -puN arch/um/include/signal_user.h~uml-sched-fix-2 arch/um/include/signal_user.h --- um-linux-2.4.27/arch/um/include/signal_user.h~uml-sched-fix-2 2005-04-15 11:13:10.000000000 +0200 +++ um-linux-2.4.27-paolo/arch/um/include/signal_user.h 2005-04-15 11:13:10.000000000 +0200 @@ -11,6 +11,8 @@ extern int signal_stack_size; extern int change_sig(int signal, int on); extern void set_sigstack(void *stack, int size); extern void set_handler(int sig, void (*handler)(int), int flags, ...); +extern int set_signals(int enable); +extern int get_signals(void); #endif diff -puN arch/um/kernel/skas/process.c~uml-sched-fix-2 arch/um/kernel/skas/process.c --- um-linux-2.4.27/arch/um/kernel/skas/process.c~uml-sched-fix-2 2005-04-15 11:13:10.000000000 +0200 +++ um-linux-2.4.27-paolo/arch/um/kernel/skas/process.c 2005-04-15 11:13:10.000000000 +0200 @@ -193,15 +193,28 @@ void userspace(union uml_pt_regs *regs) void new_thread(void *stack, void **switch_buf_ptr, void **fork_buf_ptr, void (*handler)(int)) { + unsigned long flags; sigjmp_buf switch_buf, fork_buf; *switch_buf_ptr = &switch_buf; *fork_buf_ptr = &fork_buf; + /* Somewhat subtle - siglongjmp restores the signal mask before doing + * the longjmp. This means that when jumping from one stack to another + * when the target stack has interrupts enabled, an interrupt may occur + * on the source stack. This is bad when starting up a process because + * it's not supposed to get timer ticks until it has been scheduled. + * So, we disable interrupts around the sigsetjmp to ensure that + * they can't happen until we get back here where they are safe. + */ + flags = get_signals(); + block_signals(); if(sigsetjmp(fork_buf, 1) == 0) new_thread_proc(stack, handler); remove_sigstack(); + + set_signals(flags); } void thread_wait(void *sw, void *fb) diff -puN arch/um/kernel/time_kern.c~uml-sched-fix-2 arch/um/kernel/time_kern.c --- um-linux-2.4.27/arch/um/kernel/time_kern.c~uml-sched-fix-2 2005-04-15 11:13:10.000000000 +0200 +++ um-linux-2.4.27-paolo/arch/um/kernel/time_kern.c 2005-04-15 11:13:10.000000000 +0200 @@ -94,12 +94,14 @@ void boot_timer_handler(int sig) void um_timer(int irq, void *dev, struct pt_regs *regs) { + unsigned long flags; + do_timer(regs); - write_lock(&xtime_lock); + write_lock_irqsave(&xtime_lock, flags); vxtime_lock(); timer(); vxtime_unlock(); - write_unlock(&xtime_lock); + write_unlock_irqrestore(&xtime_lock, flags); } long um_time(int * tloc) _