From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 11 Sep 2000 21:36:35 -0400 (EDT) From: bcrl@redhat.com Subject: Re: [PATCH] workaround for lost dirty bits on x86 SMP In-Reply-To: <200009120059.RAA78304@google.engr.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Kanoj Sarcar Cc: linux-mm@kvack.org, torvalds@transmeta.com List-ID: On Mon, 11 Sep 2000, Kanoj Sarcar wrote: > One of the worst races is in the page stealing path, when the stealer > thread checks whether the page is dirty, decides to pte_clear(), and > right then, the user dirties the pte, before the stealer thread has done > the flush_tlb. Are you trying to handle this situation? That's the one. It also crops up in msync, munmap and such. > FWIW, previous notes/patches on this topic can be found at > > http://reality.sgi.com/kanoj_engr/smppte.patch > > and this also tries to handle the filemap cases. Right, that doesn't look so good -- it walks over the memory an extra pass or two, which is not good. > I _think_ that with your patch, the page fault rate would go up, so > it would be appropriate to generate some benchmark numbers. Yes, the fault rate will go up, but only for clean-but-writable pages that are written to and only on SMP kernels. We already do this on SPARC (which the _PAGE_W trick was modeled after) and MIPS, so the overhead should be reasonably low. Note that we may wish to do this anyways to keep track of the number of pinned pages in the system. > I would be willing to port the patch on the web page to 2.4, but > thus far, my impression is that Linus is not happy with its > implementation ... The alternative is to replace pte_clear on a clean but writable pte with an xchg to get the old pte value. For non-threaded programs the locked bus cycles will slow things down for no real gain. Another possibility is to implement real tlb shootdown. Fwiw, with the patch, running a make -j bzImage on a 4 way box does not seem to have made a difference. A patched run: bcrl@toolbox linux-v2.4.0-test8]$ time make -j -s bzImage init.c:74: warning: `get_bad_pmd_table' defined but not used Root device is (8, 1) Boot sector 512 bytes. Setup is 4522 bytes. System is 873 kB 294.04user 23.81system 1:26.78elapsed 366%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (382013major+542948minor)pagefaults 0swaps [bcrl@toolbox linux-v2.4.0-test8]$ vs unpatched: [bcrl@toolbox linux-v2.4.0-test8]$ time make -j -s bzImage init.c:74: warning: `get_bad_pmd_table' defined but not used Root device is (8, 1) Boot sector 512 bytes. Setup is 4522 bytes. System is 873 kB 294.19user 23.94system 1:26.88elapsed 366%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (382013major+542947minor)pagefaults 0swaps [bcrl@toolbox linux-v2.4.0-test8]$ So it's in the noise for this case (hot cache for both runs). It probably makes a bigger difference for threaded programs in the presense of lots of msync/memory pressure, but the overhead in tlb shootdown on cleaning the dirty bit probably dwarfs the extra page fault, not to mention the actual io taking place. There are other areas that need improving before write-faults on clean-writable pages make much of a difference. -ben -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/