linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>, Oleg Nesterov <oleg@redhat.com>,
	 Michal Hocko <mhocko@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
Date: Thu, 23 Jul 2020 17:07:27 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2007231650170.1281@eggly.anvils> (raw)
In-Reply-To: <CAHk-=wgvGOnMF0ePU4xS236bOsP8jouj3rps+ysCaGXvCjh2Dg@mail.gmail.com>

On Thu, 23 Jul 2020, Linus Torvalds wrote:
> On Thu, Jul 23, 2020 at 4:11 PM Hugh Dickins <hughd@google.com> wrote:
> > On Thu, 23 Jul 2020, Linus Torvalds wrote:
> > >
> > > I'll send a new version after I actually test it.
> >
> > I'll give it a try when you're happy with it.
> 
> Ok, what I described is what I've been running for a while now. But I
> don't put much stress on my system with my normal workload, so..
> 
> > I did try yesterday's
> > with my swapping loads on home machines (3 of 4 survived 16 hours),
> > and with some google stresstests on work machines (0 of 10 survived).
> >
> > I've not spent long analyzing the crashes, all of them in or below
> > __wake_up_common() called from __wake_up_locked_key_bookmark():
> > sometimes gets to run the curr->func() and crashes on something
> > inside there (often list_del's lib/list_debug.c:53!), sometimes
> > cannot get that far. Looks like the wait queue entries on the list
> > were not entirely safe with that patch.
> 
> Hmm. The bug Oleg pointed out should be pretty theoretical. But I
> think the new approach with WQ_FLAG_WOKEN was much better anyway,
> despite me missing that one spot in the first version of the patch.
> 
> So here's two patches - the first one does that wake_page_function()
> conversion, and the second one just does the memory ordering cleanup I
> mentioned.
> 
> I don't think the second one shouldn't matter on x86, but who knows.
> 
> I don't enable list debugging, but I find list corruption surprising.
> All of _that_ should be inside the page waiqueue lock, the only
> unlocked part was the "list_empty_careful()" part.
> 
> But I'll walk over my patch mentally one more time. Here's the current
> version, anyway.

Thanks, I'll start some tests going shortly.

I do have to "port" these patches to a different kernel, and my first
assumption on seeing crashes was that I'd screwed that up; but that
seemed much less likely once the home test on top of v5.8-rc5 crashed
in much the same way.  The latter was not a list_del() crash, but on
curr->func itself; but I take them all as just indicating that the
wait queue entry can in rare cases be freed and reused.

(And the amount of "port"ing was close to nil here: our trees did
differ on an "unlikely" that one end had added or removed, plus I
did start off by reverting two of my three patches. But perhaps I'm
missing a subtle dependence on differences elsewhere in the tree.)

I say that for full disclosure, so you don't wrack your brains
too much, when it may still turn out to be a screwup on my part.

Hugh


  reply	other threads:[~2020-07-24  1:17 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  6:32 Michal Hocko
     [not found] ` <FCC3EB2D-9F11-4E9E-88F4-40B2926B35CC@lca.pw>
2020-07-21 11:25   ` Michal Hocko
     [not found]     ` <664A07B6-DBCD-4520-84F1-241A4E7A339F@lca.pw>
2020-07-21 12:17       ` Michal Hocko
     [not found]         ` <20200721132343.GA4261@lca.pw>
2020-07-21 13:38           ` Michal Hocko
2020-07-21 14:17 ` Chris Down
2020-07-21 15:00   ` Michal Hocko
2020-07-21 15:33 ` Linus Torvalds
2020-07-21 15:49   ` Michal Hocko
2020-07-22 18:29   ` Linus Torvalds
2020-07-22 21:29     ` Hugh Dickins
2020-07-22 22:10       ` Linus Torvalds
2020-07-22 23:42         ` Linus Torvalds
2020-07-23  0:23           ` Linus Torvalds
2020-07-23 12:47           ` Oleg Nesterov
2020-07-23 17:32             ` Linus Torvalds
2020-07-23 18:01               ` Oleg Nesterov
2020-07-23 18:22                 ` Linus Torvalds
2020-07-23 19:03                   ` Linus Torvalds
2020-07-24 14:45                     ` Oleg Nesterov
2020-07-23 20:03               ` Linus Torvalds
2020-07-23 23:11                 ` Hugh Dickins
2020-07-23 23:43                   ` Linus Torvalds
2020-07-24  0:07                     ` Hugh Dickins [this message]
2020-07-24  0:46                       ` Linus Torvalds
2020-07-24  3:45                         ` Hugh Dickins
2020-07-24 15:24                     ` Oleg Nesterov
2020-07-24 17:32                       ` Linus Torvalds
2020-07-24 23:25                         ` Linus Torvalds
2020-07-25  2:08                           ` Hugh Dickins
2020-07-25  2:46                             ` Linus Torvalds
2020-07-25 10:14                           ` Oleg Nesterov
2020-07-25 18:48                             ` Linus Torvalds
2020-07-25 19:27                               ` Oleg Nesterov
2020-07-25 19:51                                 ` Linus Torvalds
2020-07-26 13:57                                   ` Oleg Nesterov
2020-07-25 21:19                               ` Hugh Dickins
2020-07-26  4:22                                 ` Hugh Dickins
2020-07-26 20:30                                   ` Hugh Dickins
2020-07-26 20:41                                     ` Linus Torvalds
2020-07-26 22:09                                       ` Hugh Dickins
2020-07-27 19:35                                     ` Greg KH
2020-08-06  5:46                                       ` Hugh Dickins
2020-08-18 13:50                                         ` Greg KH
2020-08-06  5:21                                     ` Hugh Dickins
2020-08-06 17:07                                       ` Linus Torvalds
2020-08-06 18:00                                         ` Matthew Wilcox
2020-08-06 18:32                                           ` Linus Torvalds
2020-08-07 18:41                                             ` Hugh Dickins
2020-08-07 19:07                                               ` Linus Torvalds
2020-08-07 19:35                                               ` Matthew Wilcox
2020-08-03 13:14                           ` Michal Hocko
2020-08-03 17:56                             ` Linus Torvalds
2020-07-25  9:39                         ` Oleg Nesterov
2020-07-23  8:03     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2007231650170.1281@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox