From: Kanoj Sarcar <kanoj@google.engr.sgi.com>
To: bcrl@redhat.com
Cc: linux-mm@kvack.org, torvalds@transmeta.com
Subject: Re: [PATCH] workaround for lost dirty bits on x86 SMP
Date: Mon, 11 Sep 2000 19:34:04 -0700 (PDT) [thread overview]
Message-ID: <200009120234.TAA39381@google.engr.sgi.com> (raw)
In-Reply-To: <Pine.LNX.3.96.1000911210010.7937B-100000@kanga.kvack.org> from "bcrl@redhat.com" at Sep 11, 2000 09:36:35 PM
>
> On Mon, 11 Sep 2000, Kanoj Sarcar wrote:
>
> > One of the worst races is in the page stealing path, when the stealer
> > thread checks whether the page is dirty, decides to pte_clear(), and
> > right then, the user dirties the pte, before the stealer thread has done
> > the flush_tlb. Are you trying to handle this situation?
>
> That's the one. It also crops up in msync, munmap and such.
>
> > FWIW, previous notes/patches on this topic can be found at
> >
> > http://reality.sgi.com/kanoj_engr/smppte.patch
> >
> > and this also tries to handle the filemap cases.
>
> Right, that doesn't look so good -- it walks over the memory an extra pass
> or two, which is not good.
Not really, I thought I had it in a state where the bare minimum
was being done. Of course, this was the non-PAE version ...
>
> > I _think_ that with your patch, the page fault rate would go up, so
> > it would be appropriate to generate some benchmark numbers.
>
> Yes, the fault rate will go up, but only for clean-but-writable pages that
> are written to and only on SMP kernels. We already do this on SPARC
> (which the _PAGE_W trick was modeled after) and MIPS, so the overhead
> should be reasonably low. Note that we may wish to do this anyways to
> keep track of the number of pinned pages in the system.
Yes, this approach is mentioned in the web page, alongwith the
types of apps that will see the greatest hit:
"Note that an alternate solution to the ia32 SMP pte race is to change
PAGE_SHARED in include/asm-i386/pgtable.h to not drop in _PAGE_RW. On a
write fault, mark the pte dirty, as well as drop in the _PAGE_RW bit into
the pte. Basically, behave as if the processor does not have a hardware
dirty bit, and adopt the same strategy as in the alpha/mips code.
Disadvantage - take an extra fault on shm/file mmap'ed pages when a
program accesses the page, *then* dirties it (no change if the first access
is a write)."
>
> > I would be willing to port the patch on the web page to 2.4, but
> > thus far, my impression is that Linus is not happy with its
> > implementation ...
>
> The alternative is to replace pte_clear on a clean but writable pte with
> an xchg to get the old pte value. For non-threaded programs the
> locked bus cycles will slow things down for no real gain. Another
> possibility is to implement real tlb shootdown.
>
> Fwiw, with the patch, running a make -j bzImage on a 4 way box does not
> seem to have made a difference. A patched run:
>
> bcrl@toolbox linux-v2.4.0-test8]$ time make -j -s bzImage
> init.c:74: warning: `get_bad_pmd_table' defined but not used
> Root device is (8, 1)
> Boot sector 512 bytes.
> Setup is 4522 bytes.
> System is 873 kB
> 294.04user 23.81system 1:26.78elapsed 366%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (382013major+542948minor)pagefaults 0swaps
> [bcrl@toolbox linux-v2.4.0-test8]$
>
> vs unpatched:
>
> [bcrl@toolbox linux-v2.4.0-test8]$ time make -j -s bzImage
> init.c:74: warning: `get_bad_pmd_table' defined but not used
> Root device is (8, 1)
> Boot sector 512 bytes.
> Setup is 4522 bytes.
> System is 873 kB
> 294.19user 23.94system 1:26.88elapsed 366%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (382013major+542947minor)pagefaults 0swaps
> [bcrl@toolbox linux-v2.4.0-test8]$
>
> So it's in the noise for this case (hot cache for both runs). It probably
> makes a bigger difference for threaded programs in the presense of lots of
> msync/memory pressure, but the overhead in tlb shootdown on cleaning the
With the above type of apps, you do not need too high a _memory_ pressure
to trigger this, just compute pressure. Each time you come in to drop in
the "dirty" bit, you would need to do establish_pte(), which does a
flush_tlb_page(), which gets costly when you have higher cpu counts.
This of course depends on how smart flush_tlb_page is, and the processor
involved.
Kanoj
> dirty bit probably dwarfs the extra page fault, not to mention the actual
> io taking place. There are other areas that need improving before
> write-faults on clean-writable pages make much of a difference.
>
> -ben
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
next prev parent reply other threads:[~2000-09-12 2:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-09-12 0:43 Ben LaHaise
2000-09-12 0:59 ` Kanoj Sarcar
2000-09-12 1:36 ` bcrl
2000-09-12 1:56 ` Rik van Riel
2000-09-12 2:34 ` Kanoj Sarcar [this message]
2000-09-12 3:39 ` Benjamin C.R. LaHaise
2000-09-12 6:13 ` Kanoj Sarcar
2000-09-12 10:24 ` Stephen C. Tweedie
2000-09-12 16:54 ` Ben LaHaise
2000-09-15 19:56 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200009120234.TAA39381@google.engr.sgi.com \
--to=kanoj@google.engr.sgi.com \
--cc=bcrl@redhat.com \
--cc=linux-mm@kvack.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox