From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hugh@veritas.com>,
Andrea Arcangeli <andrea@suse.de>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [patch][rfc] remove ZERO_PAGE?
Date: Mon, 30 Jul 2007 06:30:08 +0200 [thread overview]
Message-ID: <20070730043008.GB7222@wotan.suse.de> (raw)
In-Reply-To: <alpine.LFD.0.999.0707292026190.4161@woody.linux-foundation.org>
On Sun, Jul 29, 2007 at 08:45:25PM -0700, Linus Torvalds wrote:
>
>
> On Mon, 30 Jul 2007, Nick Piggin wrote:
> >
> > Well the issue wasn't exactly that, but the fact that a lot of processes
> > all exitted at once, while each having a significant number of ZERO_PAGE
> > mappings. The freeing rate ends up going right down (OK it wasn't quite a
> > livelock, finishing in > 2 hours, but without ZERO_PAGE bouncing they
> > exit in 5 seconds).
>
> Umm. Isn't this because of the new page->mapping counting for reserved
> pages?
>
> The one I was violently against, and told you (and Hugh) was pointless and
> bad?
>
> Or what "bouncing" are you talking about?
page->mapcount, yes. I don't quite remember anybody being violently against
it, but that's in the past now anyway.
> In other words, this doesn't really sound like a ZERO_PAGE problem at all,
> but a problem that was engineered by unnecessarily trying to count those
> pages in the first place. No?
Yeah, but that's a little unfair: it was engineered by removing the code
to *not* count those pages :) Another option I gave you was to add
back some of that code to avoid this refcounting, but you were violently
against that :).
> > > Kernel builds with/without this? If the page faults really are that big a
> > > deal, this should all be visible.
> >
> > Sorry if it was misleading: the kernel build numbers weren't really about
> > where ZERO_PAGE hurts us, but just trying to show that it doesn't help too
> > much (for which I realise anything short of a kernel.org release is sadly
> > inadequate).
> >
> > Anyway, I'll see if I can get anything significant...
>
> The thing that really riles me up about this is that the whole damn thing
> seems to be so pointless. This "ZERO_PAGE is bad" thing has been a
> constant background noise, where people are pushing their opinions with no
> real technical reasons.
>
> I want technical reasons, but I get the feeling that this pogrom is abotu
> anything but technical arguments.
>
> So I _really_ don't want to hear you blaming ZERO_PAGE for something that
> you introduced yourself with Hugh, and that had _zero_ to do with
> ZERO_PAGE, and that I spent weeks saying was pointless, to the point where
> I just gave up.
>
> Now, that you apparently have found the perfect load that proved me right,
> you instead of blaming the pointless refcounting, you want to blame the
> poor ZERO_PAGE. Again.
No, I'm not saying ZERO_PAGE is bad because it has a refcounting scalability
problem. We can avoid or eliminate that by *not refcounting it* or doing
something crazy like per-node ZERO_PAGEs.
My first patch to fix the problem was actually to not refcount it. Remember?
The technical argument for this patch is that I'm trying to reason that
ZERO_PAGE is no good in the first place. In this debate, the refcounting
issue is really moot (IOW, I'm not trying to argue the ZERO_PAGE is bad
*because* of the refcounting problem, but because it is not good).
> And THAT is what makes me irritated with this patch. I hate the
> background, and what looks like intellectual dishonesty. I _told_ people
> that refcounting reserved pages was pointless and bad. Did you listen? No.
> And now that it causes problems, rather than blame the refcounting, you
> blame the victim.
>
> The zero page is *cheaper* to set up than normal pages. It always was. You
> just *made* it more expensive, because of the irrational fear of
> PageReserved() that protected us from all those unnecessary bounces in the
> first place.
>
> So a totally equivalent fix would be to just re-instate the PageReserved
> checks. It would likely even be easier these days (one logical place to do
> so would be in "vm_normal_page()", which automatically would catch all
> unmapping cases, but you'd still have to make sure you don't *increment*
> the page count when you map it too, of course).
I didn't like PageReserved for a number of reasons including that it allowed
people to be sloppy with refcounting. However I wouldn't mind adding back
some checks to skip counting (although if we do that, let's add a new page
flag for it, and PageReserved can eventually disappear).
But we should do it on the right level. That is, the struct page itself is
a refcounted object. If we skip refcounting for userspace mappings, it
should be done there, rather than an ugly hack in put_page.
The fear of PageReserved was not irrational -- as I said, I needed to get
rid of the put_page special casing for the lockless pagecache. I'm not
against properly special casing special pages.
> But if you can actually show that ZERO_PAGE literally slows things down
> (and none of this page count bouncing crud that you were the one that
> introduced in the first place), then _that_ would be a totally different
> issue. At that point, you have an independent reason for removing code
> that has basically been there since day 1, and all my arguments go away.
>
> See?
>
> I'd love to hear "here's a real-life load, and yes, the ZERO_PAGE logic
> really does hurt more than it helps, it's time to remove it". At that
> point I'll happily apply the patch.
>
> But what I *don't* want to hear is "we screwed up the reference-counting
> of ZERO_PAGE, so now it's so expensive that we want to remove the page
> entirely". That just makes me sad. And a bit angry.
I'd say it will be hard to actually get a significant real world
improvement from removing it vs de-refcounting it. Maybe on an mmap_sem
constrained workload, but that seems to be a lot better after the glibc
fix and private futexes...
But is there a good reason to keep it? You say it is cheaper to set up,
but make -j still does 500 extra page faults per second per cpu because
of ZERO_PAGE, making it effectively more expensive even ignoring the
refcounting problem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-07-30 4:30 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-27 2:19 Nick Piggin
2007-07-27 5:29 ` Linus Torvalds
2007-07-27 5:54 ` Nick Piggin
2007-07-27 15:21 ` Linus Torvalds
2007-07-30 3:08 ` Nick Piggin
2007-07-30 3:45 ` Linus Torvalds
2007-07-30 3:56 ` Linus Torvalds
2007-07-30 4:35 ` Nick Piggin
2007-07-30 4:30 ` Nick Piggin [this message]
2007-07-27 15:59 ` Hugh Dickins
2007-07-30 13:52 ` Luiz Fernando N. Capitulino
2007-07-30 18:57 ` Andrew Morton
2007-07-30 22:39 ` J. Bruce Fields
2007-07-30 23:09 ` Andrew Morton
2007-07-31 0:03 ` Luiz Fernando N. Capitulino
2007-08-01 1:47 ` Nick Piggin
2007-08-01 1:53 ` J. Bruce Fields
2007-08-01 2:19 ` Luiz Fernando N. Capitulino
2007-08-01 3:03 ` J. Bruce Fields
2007-08-02 4:37 ` J. Bruce Fields
2007-08-03 1:40 ` Neil Brown
2007-08-01 2:17 ` Luiz Fernando N. Capitulino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070730043008.GB7222@wotan.suse.de \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=andrea@suse.de \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox