From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from localhost (riel@localhost) by brutus.conectiva.com.br (8.11.1/8.11.1) with ESMTP id eAHFA4e01301 for ; Fri, 17 Nov 2000 13:10:08 -0200 Message-ID: <3A146EDA.36D1F9C4@redhat.com> Date: Thu, 16 Nov 2000 18:33:46 -0500 From: Bob Matthews MIME-Version: 1.0 Subject: Hung kswapd (2.4.0-t11p5) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit ReSent-To: linux-mm@kvack.org ReSent-Message-ID: Sender: owner-linux-mm@kvack.org Return-Path: To: riel@nl.linux.org Cc: johnsonm@redhat.com List-ID: Rik, Al Viro suggested I send this bug report to you. We're running some heavy duty kernel stress tests and have been able to cook up a case which reliably locks up the machine, and has done so on every 2.4.0 kernel we've tried. No OOPS is generated. Machine: 8 x PIII, 550Mhz, 8GB Config: 2.4.0-test11-p5, 64GB Himem support, 8 x 128MB swap partitions all pri=0 We run 5 parallel 'memtst' processes. Each memtst malloc's 1900 MB of memory and then steps through it writing 0xaaaaaaaa + i at mem[i]. It then verifies what was written and repeats the process. The lockup occurs very soon after the machine first dips in to swap (according to /usr/bin/top.) According to sysrq, most of the processes are waiting here: (gdb) list *0xc0131d10 0xc0131d10 is in wakeup_kswapd (vmscan.c:1137). 1132 1133 if (waitqueue_active(&kswapd_wait)) 1134 wake_up(&kswapd_wait); 1135 schedule(); 1136 1137 remove_wait_queue(&kswapd_done, &wait); 1138 __set_current_state(TASK_RUNNING); 1139 } 1140 1141 /* kswapd itself appears to be stuck here: (gdb) list *0xc01394c2 0xc01394c2 is in create_buffers (buffer.c:1240). 1235 1236 /* 1237 * Set our state for sleeping, then check again for buffer heads. 1238 * This ensures we won't miss a wake_up from an interrupt. 1239 */ 1240 wait_event(buffer_wait, nr_unused_buffer_heads >= MAX_BUF_PER_PAGE); 1241 goto try_again; 1242 } 1243 1244 static int create_page_buffers(int rw, struct page *page, kdev_t dev, int b[], int size) I threw a show_stack into the sysreq-T code and got this stack traceback: (gdb) list *0xc0139535 0xc0139535 is in create_page_buffers (buffer.c:1257). 1252 * Allocate async buffer heads pointing to this page, just for I/O. 1253 * They don't show up in the buffer hash table, but they *are* 1254 * registered in page->buffers. 1255 */ 1256 head = create_buffers(page, size, 1); 1257 if (page->buffers) 1258 BUG(); 1259 if (!head) 1260 BUG(); 1261 tail = head; (gdb) list *0xc013ac6b 0xc013ac6b is in brw_page (buffer.c:2091). 2086 * create_page_buffers() might sleep. 2087 */ 2088 fresh = 0; 2089 if (!page->buffers) { 2090 create_page_buffers(rw, page, dev, b, size); 2091 fresh = 1; 2092 } 2093 if (!page->buffers) 2094 BUG(); 2095 (gdb) list *0xc0131f56 0xc0131f56 is in rw_swap_page_base (page_io.c:89). 84 85 /* Note! For consistency we do all of the logic, 86 * decrementing the page count, and unlocking the page in the 87 * swap lock map - in the IO completion handler. 88 */ 89 if (!wait) 90 return 1; 91 92 wait_on_page(page); 93 /* This shouldn't happen, but check to be sure. */ (gdb) list *0xc01300e5 0xc01300e5 is in lru_cache_add (swap.c:263). 258 BUG(); 259 DEBUG_ADD_PAGE 260 add_page_to_active_list(page); 261 /* This should be relatively rare */ 262 if (!page->age) 263 deactivate_page_nolock(page); 264 spin_unlock(&pagemap_lru_lock); 265 } 266 267 /** (gdb) list *0xc01289bc 0xc01289bc is in add_to_page_cache_locked (/usr/src/linux/include/asm/spinlock.h:94). 89 spin_lock_string 90 :"=m" (lock->lock) : : "memory"); 91 } 92 93 static inline void spin_unlock(spinlock_t *lock) 94 { 95 #if SPINLOCK_DEBUG 96 if (lock->magic != SPINLOCK_MAGIC) 97 BUG(); 98 if (!spin_is_locked(lock)) (gdb) list *0xc0132027 0xc0132027 is in rw_swap_page (page_io.c:119). 114 PAGE_BUG(page); 115 if (!PageSwapCache(page)) 116 PAGE_BUG(page); 117 if (page->mapping != &swapper_space) 118 PAGE_BUG(page); 119 if (!rw_swap_page_base(rw, entry, page, wait)) 120 UnlockPage(page); 121 } 122 123 /* (gdb) list *0xc01305db 0xc01305db is in try_to_swap_out (vmscan.c:205). 200 set_pte(page_table, swp_entry_to_pte(entry)); 201 spin_unlock(&mm->page_table_lock); 202 203 /* OK, do a physical asynchronous write to swap. */ 204 rw_swap_page(WRITE, page, 0); 205 deactivate_page(page); 206 207 out_free_success: 208 page_cache_release(page); 209 return 1; (gdb) list *0xc01307f5 0xc01307f5 is in swap_out_vma (vmscan.c:259). 254 do { 255 int result; 256 mm->swap_address = address + PAGE_SIZE; 257 result = try_to_swap_out(mm, vma, address, pte, gfp_mask); 258 if (result) 259 return result; 260 if (!mm->swap_cnt) 261 return 0; 262 address += PAGE_SIZE; 263 pte++; It's not clear to me whether the swapper is hung here, or if the sysrq-T is always catching it at the same place, but the machine is pretty much useless at this point. We are also seeing another symptom which may be related. Just before the lockup, the console fills with "alloc_page: 0-order allocation failed" messages. I throttled these back to print just 1 out of every 100,000 messages. We typically get several hundred thousand before the lock up. If I can provide any more information to you, please don't hesitate to contact me. Bob -- Bob Matthews Red Hat, Inc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/