From: "Arechiga Lopez, Jesus A" <jesus.a.arechiga.lopez@intel.com>
To: "Williams, Dan J" <dan.j.williams@intel.com>,
"Torvalds, Linus" <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>,
Brian Foster <bfoster@redhat.com>, Linux-MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-xfs <linux-xfs@vger.kernel.org>,
Hugh Dickins <hughd@google.com>,
"tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
"Gross, Mark" <mark.gross@intel.com>
Subject: RE: writeback completion soft lockup BUG in folio_wake_bit()
Date: Tue, 25 Oct 2022 15:58:32 +0000 [thread overview]
Message-ID: <SA1PR11MB582719B82C984A4AEEB36BAE84319@SA1PR11MB5827.namprd11.prod.outlook.com> (raw)
In-Reply-To: <6356f72ca12e0_4da32941c@dwillia2-xfh.jf.intel.com.notmuch>
> -----Original Message-----
> From: Williams, Dan J <dan.j.williams@intel.com>
> Sent: Monday, October 24, 2022 3:36 PM
> To: Torvalds, Linus <torvalds@linux-foundation.org>; Williams, Dan J
> <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@infradead.org>; Brian Foster
> <bfoster@redhat.com>; Linux-MM <linux-mm@kvack.org>; linux-fsdevel
> <linux-fsdevel@vger.kernel.org>; linux-xfs <linux-xfs@vger.kernel.org>;
> Hugh Dickins <hughd@google.com>; Arechiga Lopez, Jesus A
> <jesus.a.arechiga.lopez@intel.com>; tim.c.chen@linux.intel.com
> Subject: Re: writeback completion soft lockup BUG in folio_wake_bit()
>
> Linus Torvalds wrote:
> > On Mon, Oct 24, 2022 at 1:13 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
> > >
> > > Arechiga reports that his test case that failed "fast" before now
> > > ran for 28 hours without a soft lockup report with the proposed
> > > patches applied. So, I would consider those:
> > >
> > > Tested-by: Jesus Arechiga Lopez <jesus.a.arechiga.lopez@intel.com>
> >
> > Ok, great.
> >
> > I really like that patch myself (and obviously liked it back when it
> > was originally proposed), but I think it was always held back by the
> > fact that we didn't really have any hard data for it.
> >
> > It does sound like we now very much have hard data for "the page
> > waitlist complexity is now a bigger problem than the historical
> > problem it tried to solve".
> >
> > So I'll happily apply it. The only question is whether it's a "let's
> > do this for 6.2", or if it's something that we'd want to back-port
> > anyway, and might as well apply sooner rather than later as a fix.
> >
> > I think that in turn then depends on just how artificial the test case
> > was. If the test case was triggered by somebody seeing problems in
> > real life loads, that would make the urgency a lot higher. But if it
> > was purely a synthetic test case with no accompanying "this is what
> > made us look at this" problem, it might be a 6.2 thing.
> >
> > Arechiga?
>
> I will let Arechiga reply as well, but my sense is that this is more in the latter
> camp of not urgent because the test case is trying to generate platform
> stress (success!), not necessarily trying to get real work done.
Yes, as Dan mentioned it is trying to generate platform stress, We've been seeing the soft lockup events on test targets (2 sockets with high core count CPU's, and a lot of RAM).
The workload stresses every core/CPU thread in various ways and logs results to a shared log file (every core writing to the same log file). We found that this shared log file was related to the soft lockups.
With this change applied to 5.19 it seems that the soft lockups are no longer happening with this workload + configuration.
next prev parent reply other threads:[~2022-10-25 15:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-15 19:07 Brian Foster
2022-03-16 20:59 ` Matthew Wilcox
2022-03-16 23:35 ` Linus Torvalds
2022-03-17 15:04 ` Matthew Wilcox
2022-03-17 19:26 ` Linus Torvalds
2022-03-17 21:16 ` Matthew Wilcox
2022-03-18 13:16 ` Jan Kara
2022-03-18 18:56 ` Linus Torvalds
2022-03-19 16:23 ` Theodore Ts'o
2022-03-30 15:55 ` Christoph Hellwig
2022-03-17 15:31 ` Brian Foster
2022-03-17 13:51 ` Brian Foster
2022-03-18 14:14 ` Brian Foster
2022-03-18 14:45 ` Matthew Wilcox
2022-03-18 18:58 ` Linus Torvalds
2022-10-20 1:35 ` Dan Williams
2022-10-23 22:38 ` Linus Torvalds
2022-10-24 19:39 ` Tim Chen
2022-10-24 19:43 ` Linus Torvalds
2022-10-24 20:14 ` Dan Williams
2022-10-24 20:13 ` Dan Williams
2022-10-24 20:28 ` Linus Torvalds
2022-10-24 20:35 ` Dan Williams
2022-10-25 15:58 ` Arechiga Lopez, Jesus A [this message]
2022-10-25 19:19 ` Matthew Wilcox
2022-10-25 19:20 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SA1PR11MB582719B82C984A4AEEB36BAE84319@SA1PR11MB5827.namprd11.prod.outlook.com \
--to=jesus.a.arechiga.lopez@intel.com \
--cc=bfoster@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mark.gross@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox