From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
Rik van Riel <riel@redhat.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: page_waitqueue() considered harmful
Date: Tue, 27 Sep 2016 18:49:03 +0200 [thread overview]
Message-ID: <20160927164903.GO5016@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CA+55aFzqQkbHLHr+n+=ZsG=UzFCz1XywEYKCmbz+wmrX7g=67g@mail.gmail.com>
On Tue, Sep 27, 2016 at 09:31:54AM -0700, Linus Torvalds wrote:
> On Tue, Sep 27, 2016 at 7:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Right, I never really liked that patch. In any case, the below seems to
> > boot, although the lock_page_wait() thing did get my brain in a bit of a
> > twist. Doing explicit loops with PG_contended inside wq->lock would be
> > more obvious but results in much more code.
> >
> > We could muck about with PG_contended naming/placement if any of this
> > shows benefit.
> >
> > It does boot on my x86_64 and builds a kernel, so it must be perfect ;-)
>
> This patch looks very much like what I was thinking of. Except you
> made that bit clearing more complicated than I would have done.
>
> I see why you did it (it's hard to clear the bit when the wait-queue
> that is associated with it can be associated with multiple pages), but
> I think it would be perfectly fine to just not even try to make the
> "contended" bit be an exact bit. You can literally leave it set
> (giving us the existing behavior), but then when you hit the
> __unlock_page() case, and you look up the page_waitqueue(), and find
> that the waitqueue is empty, *then* you clear it.
>
> So you'd end up going through the slow path one too many times per
> page, but considering that right now we *always* go through that
> slow-path, and the "one too many times" is "two times per IO rather
> than just once", it really is not a performance issue. I'd rather go
> for simple and robust.
My clear already does that same thing, once we find nobody to wake up we
clear, this means we did the waitqueue lookup once in vain. But yes it
allows the bit to be cleared while there are still waiters (for other
bits) on the waitqueue.
The other benefit of doing what you suggest (and Nick does) is that you
can then indeed use the bit for other waitqueue users like PG_writeback.
I never really got that part of the patch as it wasn't spelled out, but
it does make sense now that I understand the intent.
And assuming collisions are rare, that works just fine.
> I get a bit nervous when I see you being so careful in counting the
> number of waiters that match the page key - if any of that code ever
> gets it wrong (because two different pages that shared a waitqueue
> happen to race at just the right time), and the bit gets cleared too
> early, you will get some *very* hard-to-debug problems.
Right, so I don't care about the actual number, just it being 0 or not.
Maybe I should've returned bool.
But yes, its a tad more tricky than I'd liked, mostly because I was
lazy. Doing the SetPageContended under the wq->lock would've made things
more obvious.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-09-27 16:49 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-26 20:58 Linus Torvalds
2016-09-26 21:23 ` Rik van Riel
2016-09-26 21:30 ` Linus Torvalds
2016-09-26 23:11 ` Kirill A. Shutemov
2016-09-27 1:01 ` Rik van Riel
2016-09-27 7:30 ` Peter Zijlstra
2016-09-27 8:54 ` Mel Gorman
2016-09-27 9:11 ` Kirill A. Shutemov
2016-09-27 9:42 ` Mel Gorman
2016-09-27 9:52 ` Minchan Kim
2016-09-27 12:11 ` Kirill A. Shutemov
2016-09-29 8:01 ` Peter Zijlstra
2016-09-29 12:55 ` Nicholas Piggin
2016-09-29 13:16 ` Peter Zijlstra
2016-09-29 13:54 ` Nicholas Piggin
2016-09-29 15:05 ` Rik van Riel
2016-09-27 8:03 ` Jan Kara
2016-09-27 8:31 ` Mel Gorman
2016-09-27 14:34 ` Peter Zijlstra
2016-09-27 15:08 ` Nicholas Piggin
2016-09-27 16:31 ` Linus Torvalds
2016-09-27 16:49 ` Peter Zijlstra [this message]
2016-09-28 10:45 ` Mel Gorman
2016-09-28 11:11 ` Peter Zijlstra
2016-09-28 16:10 ` Linus Torvalds
2016-09-29 13:08 ` Peter Zijlstra
2016-10-03 10:47 ` Mel Gorman
2016-09-27 14:53 ` Nicholas Piggin
2016-09-27 15:17 ` Nicholas Piggin
2016-09-27 16:52 ` Peter Zijlstra
2016-09-27 17:06 ` Nicholas Piggin
2016-09-28 7:05 ` Peter Zijlstra
2016-09-28 11:05 ` Paul E. McKenney
2016-09-28 11:16 ` Peter Zijlstra
2016-09-28 12:58 ` Paul E. McKenney
2016-09-29 1:31 ` Nicholas Piggin
2016-09-29 2:12 ` Paul E. McKenney
2016-09-29 6:21 ` Peter Zijlstra
2016-09-29 6:42 ` Nicholas Piggin
2016-09-29 7:14 ` Peter Zijlstra
2016-09-29 7:43 ` Peter Zijlstra
2016-09-28 7:40 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160927164903.GO5016@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=riel@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox