From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <200109082102.f88L2vo18714@mailb.telia.com> Content-Type: text/plain; charset="iso-8859-1" From: Roger Larsson Subject: [SMP lock BUG?] Re: Feedback on preemptible kernel patch Date: Sat, 8 Sep 2001 22:58:18 +0200 References: In-Reply-To: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: Arjan Filius , Robert Love Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org List-ID: Hi, This is interesting. [Assumes UP Athlon - correct] Note that all BUGs out in highmem.h:95 (kmap_atomic) and that test is only on if you have enabled HIGHMEM_DEBUG [my analyze is done with a 2.4.10-pre2 kernel, but I checked with later patches and I do not think they fix it either...] The preemptive kernel puts more SMP stress on the kernel than running with multiple CPUs. So this might be a potential bug in the kernel proper, running with a SMP computer. If I understand the bug correctly, a process gets a page fault. Starts to map in the page. But before the final part it checks - and the page is already there!!! Correct? On Saturday den 8 September 2001 19:33, Arjan Filius wrote: > Hello Robert, > > > I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1. > But it seems to hit highmem (see below) (i do have 1.5GB ram) > 2.4.10-pre4 plain runs just fine. > > With the kernel option mem=850M the patched kernel boots an seems to run > fine. However i didn't do any stress testing yet, but i still notice > hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz > Athlon, and removing for example a _large_ file (reiser-on-lvm). > > My syslog output with highmem: > > Sep 8 18:10:16 sjoerd kernel: kernel BUG at > /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep 8 18:10:16 sjoerd > kernel: invalid operand: 0000 > Sep 8 18:10:16 sjoerd kernel: CPU: 0 > Sep 8 18:10:16 sjoerd kernel: EIP: 0010:[do_wp_page+636/1088] > [- - -] > sjoerd kernel: Call Trace: [handle_mm_fault+141/224] > [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64] > [do_exit+595/640] Sep 8 18:10:16 sjoerd kernel: [error_code+52/64] Lets look at this example. You need to add some inline functions... handle_mm_fault takes the mm->page_table_lock [this should prevent reschedules] allocs pmd allocs pte handle_pte_fault(...) handle_pte_fault [inline, most likely path] pte is present it is a write access but the pte is not writeable - call do_wp_page do_wp_page plays some games with the lock... finally calls copy_cow_page [inline] with the page_table_lock UNLOCKED! copy_cow_page calls clear_user_highpage or copy_user_highpage both clear_user_highpage and copy_user_highpage calls kmap_atomic kmap_atomic page is a highmem page but during the time this process was unlocked some other thread has allocated the page in question... BUG out. So somewere between the UNLOCK (might be a lot later) and the BUG test in kmap_atomic the process running in kernel got preempted. (most likely during the page copy since it will take some time) Another process (thread) started to run - hit the same page fault but succeeded in its alloc. Back to the first process it continues, finally checks - the page is there... and BUGS. Note that this can happen in a pure SMP kernel. But let the processes (threads) run on two CPUs. And let the first get an interrupt/bh after unlock - the other can pass and add the page before the first one can continue - same result! /RogerL -- Roger Larsson Skelleftea Sweden -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/