linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
@ 2024-09-12 21:18 Christian Theune
  2024-09-12 21:55 ` Matthew Wilcox
  0 siblings, 1 reply; 81+ messages in thread
From: Christian Theune @ 2024-09-12 21:18 UTC (permalink / raw)
  To: linux-mm, linux-xfs, linux-fsdevel, linux-kernel
  Cc: torvalds, axboe, Daniel Dao, Dave Chinner, willy, clm,
	regressions, regressions

Hello everyone,

I’d like to raise awareness about a bug causing data loss somewhere in MM interacting with XFS that seems to have been around since Dec 2021 (https://github.com/torvalds/linux/commit/6795801366da0cd3d99e27c37f020a8f16714886).

We started encountering this bug when upgrading to 6.1 around June 2023 and we have had at least 16 instances with data loss in a fleet of 1.5k VMs.

This bug is very hard to reproduce but has been known to exist as a “fluke” for a while already. I have invested a number of days trying to come up with workloads to trigger it quicker than that stochastic “once every few weeks in a fleet of 1.5k machines", but it eludes me so far. I know that this also affects Facebook/Meta as well as Cloudflare who are both running newer kernels (at least 6.1, 6.6, and 6.9) with the above mentioned patch reverted. I’m from a much smaller company and seeing that those guys are running with this patch reverted (that now makes their kernel basically an untested/unsupported deviation from the mainline) smells like desparation. I’m with a much smaller team and company and I’m wondering why this isn’t tackled more urgently from more hands to make it shallow (hopefully).

The issue appears to happen mostly on nodes that are running some kind of database or specifically storage-oriented load. In our case we see this happening with PostgreSQL and MySQL. Cloudflare IIRC saw this with RocksDB load and Meta is talking about nfsd load.

I suspect low memory (but not OOM low) / pressure and maybe swap conditions seem to increase the chance of triggering it - but I might be completely wrong on that suspicion.

There is a bug report I started here back then: https://bugzilla.kernel.org/show_bug.cgi?id=217572 and there have been discussions on the XFS list: https://lore.kernel.org/lkml/CA+wXwBS7YTHUmxGP3JrhcKMnYQJcd6=7HE+E1v-guk01L2K3Zw@mail.gmail.com/T/ but ultimately this didn’t receive sufficient interested to keep it moving forward and I ran out of steam. Unfortunately we can’t be stuck on 5.15 forever and other kernel developers correctly keep pointing out that we should be updating, but that isn’t an option as long as this time bomb still exists.

Jens pointed out that Meta's findings and their notes on the revert included "When testing nfsd on top of v5.19, we hit lockups in filemap_read(). These ended up being because the xarray for the files being read had pages from other files mixed in."

XFS is known to me and admired for the very high standards they represent regarding testing and avoiding data loss but ultimately that doesn’t matter if we’re going to be stuck with this bug forever.

I’m able to help funding efforts, help creating a reproducer, generally donate my time (not a kernel developer myself) and even provide access to machines that did see the crash (but don’t carry customer data), but I’m not making any progress or getting any traction here.

Jens encouraged me to raise the visibility in this way - so that’s what I’m trying here.

Please help.

In appreciation of all the hard work everyone is putting in and with hugs and love,
Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 21:18 Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Christian Theune
@ 2024-09-12 21:55 ` Matthew Wilcox
  2024-09-12 22:11   ` Christian Theune
  2024-09-12 22:12   ` Jens Axboe
  0 siblings, 2 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-12 21:55 UTC (permalink / raw)
  To: Christian Theune
  Cc: linux-mm, linux-xfs, linux-fsdevel, linux-kernel, torvalds,
	axboe, Daniel Dao, Dave Chinner, clm, regressions, regressions

On Thu, Sep 12, 2024 at 11:18:34PM +0200, Christian Theune wrote:
> This bug is very hard to reproduce but has been known to exist as a
> “fluke” for a while already. I have invested a number of days trying
> to come up with workloads to trigger it quicker than that stochastic
> “once every few weeks in a fleet of 1.5k machines", but it eludes
> me so far. I know that this also affects Facebook/Meta as well as
> Cloudflare who are both running newer kernels (at least 6.1, 6.6,
> and 6.9) with the above mentioned patch reverted. I’m from a much
> smaller company and seeing that those guys are running with this patch
> reverted (that now makes their kernel basically an untested/unsupported
> deviation from the mainline) smells like desparation. I’m with a
> much smaller team and company and I’m wondering why this isn’t
> tackled more urgently from more hands to make it shallow (hopefully).

This passive-aggressive nonsense is deeply aggravating.  I've known
about this bug for much longer, but like you I am utterly unable to
reproduce it.  I've spent months looking for the bug, and I cannot.



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 21:55 ` Matthew Wilcox
@ 2024-09-12 22:11   ` Christian Theune
  2024-09-12 22:12   ` Jens Axboe
  1 sibling, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-09-12 22:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-xfs, linux-fsdevel, linux-kernel, torvalds,
	axboe, Daniel Dao, Dave Chinner, clm, regressions, regressions

Hi Matthew,

> On 12. Sep 2024, at 23:55, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Thu, Sep 12, 2024 at 11:18:34PM +0200, Christian Theune wrote:
>> This bug is very hard to reproduce but has been known to exist as a
>> “fluke” for a while already. I have invested a number of days trying
>> to come up with workloads to trigger it quicker than that stochastic
>> “once every few weeks in a fleet of 1.5k machines", but it eludes
>> me so far. I know that this also affects Facebook/Meta as well as
>> Cloudflare who are both running newer kernels (at least 6.1, 6.6,
>> and 6.9) with the above mentioned patch reverted. I’m from a much
>> smaller company and seeing that those guys are running with this patch
>> reverted (that now makes their kernel basically an untested/unsupported
>> deviation from the mainline) smells like desparation. I’m with a
>> much smaller team and company and I’m wondering why this isn’t
>> tackled more urgently from more hands to make it shallow (hopefully).
> 
> This passive-aggressive nonsense is deeply aggravating.  I've known
> about this bug for much longer, but like you I am utterly unable to
> reproduce it.  I've spent months looking for the bug, and I cannot.

I’m sorry. I’ve honestly tried my best to not make this message personally injuring to anybody involved while trying to also communicate the seriousness of this issue that we’re stuck with. Apparently I failed. 

As I’m not a kernel developer I tried to stick to describing the issue and am not sure what strategies would typically need to be applied when individual efforts fail. 

I’m not sure why it’s nonsense, though.

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 21:55 ` Matthew Wilcox
  2024-09-12 22:11   ` Christian Theune
@ 2024-09-12 22:12   ` Jens Axboe
  2024-09-12 22:25     ` Linus Torvalds
  1 sibling, 1 reply; 81+ messages in thread
From: Jens Axboe @ 2024-09-12 22:12 UTC (permalink / raw)
  To: Matthew Wilcox, Christian Theune
  Cc: linux-mm, linux-xfs, linux-fsdevel, linux-kernel, torvalds,
	Daniel Dao, Dave Chinner, clm, regressions, regressions

On 9/12/24 3:55 PM, Matthew Wilcox wrote:
> On Thu, Sep 12, 2024 at 11:18:34PM +0200, Christian Theune wrote:
>> This bug is very hard to reproduce but has been known to exist as a
>> ?fluke? for a while already. I have invested a number of days trying
>> to come up with workloads to trigger it quicker than that stochastic
>> ?once every few weeks in a fleet of 1.5k machines", but it eludes
>> me so far. I know that this also affects Facebook/Meta as well as
>> Cloudflare who are both running newer kernels (at least 6.1, 6.6,
>> and 6.9) with the above mentioned patch reverted. I?m from a much
>> smaller company and seeing that those guys are running with this patch
>> reverted (that now makes their kernel basically an untested/unsupported
>> deviation from the mainline) smells like desparation. I?m with a
>> much smaller team and company and I?m wondering why this isn?t
>> tackled more urgently from more hands to make it shallow (hopefully).
> 
> This passive-aggressive nonsense is deeply aggravating.  I've known
> about this bug for much longer, but like you I am utterly unable to
> reproduce it.  I've spent months looking for the bug, and I cannot.

What passive aggressiveness?! There's a data corruption bug where we
know what causes it, yet we continue to ship it. That's aggravating.

People are aware of the bug, and since there's no good reproducer, it's
hard to fix. That part is fine and understandable. What seems amiss here
is the fact that large folio support for xfs hasn't just been reverted
until the issue is understood and resolved.

When I saw Christian's report, I seemed to recall that we ran into this
at Meta too. And we did, and hence have been reverting it since our 5.19
release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping
things that are known broken.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:12   ` Jens Axboe
@ 2024-09-12 22:25     ` Linus Torvalds
  2024-09-12 22:30       ` Jens Axboe
                         ` (4 more replies)
  0 siblings, 5 replies; 81+ messages in thread
From: Linus Torvalds @ 2024-09-12 22:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matthew Wilcox, Christian Theune, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner, clm,
	regressions, regressions

On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
>
> When I saw Christian's report, I seemed to recall that we ran into this
> at Meta too. And we did, and hence have been reverting it since our 5.19
> release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping
> things that are known broken.

I do think that if we have big sites just reverting it as known broken
and can't figure out why, we should do so upstream too.

Yes,  it's going to make it even harder to figure out what's wrong.
Not great. But if this causes filesystem corruption, that sure isn't
great either. And people end up going "I'll use ext4 which doesn't
have the problem", that's not exactly helpful either.

And yeah, the reason ext4 doesn't have the problem is simply because
ext4 doesn't enable large folios. So that doesn't pin anything down
either (ie it does *not* say "this is an xfs bug" - it obviously might
be, but it's probably more likely some large-folio issue).

Other filesystems do enable large folios (afs, bcachefs, erofs, nfs,
smb), but maybe just not be used under the kind of load to show it.

Honestly, the fact that it hasn't been reverted after apparently
people knowing about it for months is a bit shocking to me. Filesystem
people tend to take unknown corruption issues as a big deal. What
makes this so special? Is it because the XFS people don't consider it
an XFS issue, so...

                Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:25     ` Linus Torvalds
@ 2024-09-12 22:30       ` Jens Axboe
  2024-09-12 22:56         ` Linus Torvalds
  2024-09-13 12:11       ` Christian Brauner
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Jens Axboe @ 2024-09-12 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Christian Theune, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner, clm,
	regressions, regressions

On 9/12/24 4:25 PM, Linus Torvalds wrote:
> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> When I saw Christian's report, I seemed to recall that we ran into this
>> at Meta too. And we did, and hence have been reverting it since our 5.19
>> release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping
>> things that are known broken.
> 
> I do think that if we have big sites just reverting it as known broken
> and can't figure out why, we should do so upstream too.

Agree. I suspect it would've come up internally shortly too, as we're
just now preparing to roll 6.11 as the next kernel. That always starts
with a list of "what commits are in our 6.9 tree that aren't upstream"
and then porting those, and this one is in that (pretty short) list.

> Yes,  it's going to make it even harder to figure out what's wrong.
> Not great. But if this causes filesystem corruption, that sure isn't
> great either. And people end up going "I'll use ext4 which doesn't
> have the problem", that's not exactly helpful either.

Until someone has a good reproducer for it, it is going to remain
elusive. And it's a two-liner to enable it again for testing, hence
should not be a hard thing to do.

> And yeah, the reason ext4 doesn't have the problem is simply because
> ext4 doesn't enable large folios. So that doesn't pin anything down
> either (ie it does *not* say "this is an xfs bug" - it obviously might
> be, but it's probably more likely some large-folio issue).
> 
> Other filesystems do enable large folios (afs, bcachefs, erofs, nfs,
> smb), but maybe just not be used under the kind of load to show it.

It might be an iomap thing... Other file systems do use it, but to
various degrees, and XFS is definitely the primary user.

> Honestly, the fact that it hasn't been reverted after apparently
> people knowing about it for months is a bit shocking to me. Filesystem
> people tend to take unknown corruption issues as a big deal. What
> makes this so special? Is it because the XFS people don't consider it
> an XFS issue, so...

Double agree, I was pretty surprised when I learned of all this today.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:30       ` Jens Axboe
@ 2024-09-12 22:56         ` Linus Torvalds
  2024-09-13  3:44           ` Matthew Wilcox
  0 siblings, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-12 22:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matthew Wilcox, Christian Theune, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner, clm,
	regressions, regressions

On Thu, 12 Sept 2024 at 15:30, Jens Axboe <axboe@kernel.dk> wrote:
>
> It might be an iomap thing... Other file systems do use it, but to
> various degrees, and XFS is definitely the primary user.

I have to say, I looked at the iomap code, and it's disgusting.

The "I don't support large folios" check doesn't even say "don't do
large folios". That's what the regular __filemap_get_folio() code does
for reads, and that's the sane thing to do. But that's not what the
iomap code does. AT ALL.

No, the iomap code limits "len" of a write in iomap_write_begin() to
be within one page, and then magically depends on

 (a) __iomap_get_folio() using that length to decide how big a folio to allocate

 (b) iomap_write_begin() doing its own "what is the real length:" based on that.

 (c) the *caller* then having to do the same thing, to see what length
iomap_write_begin() _actually_ used (because it wasn't the 'bytes'
that was passed in).

Honestly, the iomap code is just odd. Having these kinds of subtle
interdependencies doesn't make sense. The two code sequences don't
even use the same logic, with iomap_write_begin() doing

        if (!mapping_large_folio_support(iter->inode->i_mapping))
                len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
        [... alloc folio ...]
        if (pos + len > folio_pos(folio) + folio_size(folio))
                len = folio_pos(folio) + folio_size(folio) - pos;

and the caller (iomap_write_iter) doing

                offset = offset_in_folio(folio, pos);
                if (bytes > folio_size(folio) - offset)
                        bytes = folio_size(folio) - offset;

and yes, the two completely different ways of picking 'len' (called
'bytes' in the second case) had *better* match.

I do think they match, but code shouldn't be organized this way.

It's not just the above kind of odd thing either, it's things like
iomap_get_folio() using that fgf_set_order(len), which does

        unsigned int shift = ilog2(size);

        if (shift <= PAGE_SHIFT)
                return 0;

so now it has done that potentially expensive ilog2() for the common
case of "len < PAGE_SIZE", but dammit, it should never have even
bothered looking at 'len' if the inode didn't support large folios in
the first place, and we shouldn't have had that special odd 'len =
min_t(..)" magic rule to force an order-0 thing, because

Yeah, yeah, modern CPU's all have reasonably cheap bit finding
instructions. But the code simply shouldn't have this kind of thing in
the first place.

The folio should have been allocated *before* iomap_write_begin(), the
"no large folios" should just have fixed the order to zero there, and
the actual real-life length of the write should have been limited in
*one* piece of code after the allocation point instead of then having
two different pieces of code depending on matching (subtle and
undocumented) logic.

Put another way: I most certainly don't see the bug here - it may look
_odd_, but not wrong - but at the same time, looking at that code
doesn't make me get the warm and fuzzies about the iomap large-folio
situation either.

                Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:56         ` Linus Torvalds
@ 2024-09-13  3:44           ` Matthew Wilcox
  2024-09-13 13:23             ` Christian Theune
  0 siblings, 1 reply; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-13  3:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Christian Theune, linux-mm, linux-xfs, linux-fsdevel,
	linux-kernel, Daniel Dao, Dave Chinner, clm, regressions,
	regressions

On Thu, Sep 12, 2024 at 03:56:17PM -0700, Linus Torvalds wrote:
> On Thu, 12 Sept 2024 at 15:30, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > It might be an iomap thing... Other file systems do use it, but to
> > various degrees, and XFS is definitely the primary user.
> 
> I have to say, I looked at the iomap code, and it's disgusting.

I'm not going to comment on this because I think it's unrelated to
the problem.

We have reports of bad entries being returned from page cache lookups.
Sometimes they're pages which have been freed, sometimes they're pages
which are very definitely in use by a different filesystem.

I think that's what the underlying problem is here (or else we have
two problems).  I'm not convinced that it's necessarily related to large
folios, but it's certainly easier to reproduce with large folios.

I've looked at a number of explanations for this.  Could it be a page
that's being freed without being removed from the xarray?  We seem to
have debug that would trigger in that case, so I don't think so.

Could it be a page with a messed-up refcount?  Again, I think we'd
notice the VM_BUG_ON_PAGE() in put_page_testzero(), so I don't think
it's that either.

My current best guess is that we have an xarray node with a stray pointer
in it; that the node is freed from one xarray, allocated to a different
xarray, but not properly cleared.  But I can't reproduce the problem,
so that's pure speculation on my part.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:25     ` Linus Torvalds
  2024-09-12 22:30       ` Jens Axboe
@ 2024-09-13 12:11       ` Christian Brauner
  2024-09-16 13:29         ` Matthew Wilcox
  2024-09-13 15:30       ` Chris Mason
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Christian Brauner @ 2024-09-13 12:11 UTC (permalink / raw)
  To: Linus Torvalds, Pankaj Raghav, Luis Chamberlain
  Cc: Jens Axboe, Matthew Wilcox, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner,
	clm, regressions, regressions

On Thu, Sep 12, 2024 at 03:25:50PM GMT, Linus Torvalds wrote:
> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > When I saw Christian's report, I seemed to recall that we ran into this
> > at Meta too. And we did, and hence have been reverting it since our 5.19
> > release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping
> > things that are known broken.
> 
> I do think that if we have big sites just reverting it as known broken
> and can't figure out why, we should do so upstream too.
> 
> Yes,  it's going to make it even harder to figure out what's wrong.
> Not great. But if this causes filesystem corruption, that sure isn't
> great either. And people end up going "I'll use ext4 which doesn't
> have the problem", that's not exactly helpful either.
> 
> And yeah, the reason ext4 doesn't have the problem is simply because
> ext4 doesn't enable large folios. So that doesn't pin anything down
> either (ie it does *not* say "this is an xfs bug" - it obviously might
> be, but it's probably more likely some large-folio issue).
> 
> Other filesystems do enable large folios (afs, bcachefs, erofs, nfs,
> smb), but maybe just not be used under the kind of load to show it.
> 
> Honestly, the fact that it hasn't been reverted after apparently
> people knowing about it for months is a bit shocking to me. Filesystem
> people tend to take unknown corruption issues as a big deal. What
> makes this so special? Is it because the XFS people don't consider it
> an XFS issue, so...

So this issue it new to me as well. One of the items this cycle is the
work to enable support for block sizes that are larger than page sizes
via the large block size (LBS) series that's been sitting in -next for a
long time. That work specifically targets xfs and builds on top of the
large folio support.

If the support for large folios is going to be reverted in xfs then I
see no point to merge the LBS work now. So I'm holding off on sending
that pull request until a decision is made (for xfs). As far as I
understand, supporting larger block sizes will not be meaningful without
large folio support.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13  3:44           ` Matthew Wilcox
@ 2024-09-13 13:23             ` Christian Theune
  0 siblings, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-09-13 13:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Jens Axboe, linux-mm, linux-xfs, linux-fsdevel,
	linux-kernel, Daniel Dao, Dave Chinner, clm, regressions,
	regressions, mironov.ivan

Hi,

> On 13. Sep 2024, at 05:44, Matthew Wilcox <willy@infradead.org> wrote:
> 
> My current best guess is that we have an xarray node with a stray pointer
> in it; that the node is freed from one xarray, allocated to a different
> xarray, but not properly cleared.  But I can't reproduce the problem,
> so that's pure speculation on my part.

I’d love to help with the reproduction. I understand that BZ is unloved and I guess putting everything I’ve seen so far from various sources into a single spot might help - unfortunately that creates a pretty long mail. I selectively didn’t inline some more far fetched things.

A tiny bit of context about me: I’m a seasoned developer, but not a kernel developer. I don’t know the subsystems from a code perspective. I stare at kernel code (or C code generally) mostly only when things go wrong. I did my share of debugging hard things over the last 25 years and I am good at trying to attack things from multiple angles.

I have 9 non-production VMs that exhibited the issue last year. I can put those on custom compiled kernels and instrument them as needed. Feel free to use me as a resource here.

Rabbit hole 1: the stalls and stack traces
==========================================

I’ve reviewed all of the stall messages (see below) that I could find and noticed:

- All of the VMs that are affected have at least 2 CPUs. I haven’t seen this on any single CPU VMs AFAICT, but I wouldn’t fully eliminate that it could be possible to also be happening there. OTOH (obviously?) race conditions would be easier to produce on an multi processing machine than with a single core … ;)

- I’ve only ever seen it on virtual machines, however, there’s a redhat bug report from 2023 that shows data that points to an Asus board, so it looks like a physical machine, so I guess a physical/virtual machine distinction is not relevant.

- Most call stacks come from the VFS, but I’ve seen two that originate from a page fault (if I’m reading things correctly) - so trying to swap a page in? That’s interesting because it would hint at a reproducer that doesn’t need FS code being involved.

Here’s the stalls that I could recover from my efforts last year:

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	1-....: (1 GPs behind) idle=bcec/1/0x4000000000000000 softirq=32229387/32229388 fqs=3407711
	(t=6825807 jiffies g=51307757 q=12582143 ncpus=2)
CPU: 1 PID: 135430 Comm: systemd-journal Not tainted 6.1.57 #1-NixOS
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:__rcu_read_unlock+0x1d/0x30
Code: ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 3c 25 c0 0b 02 00 83 af 64 04 00 00 01 75 0a 8b 87 68 04 00 00 <85> c0 75 05 c3 cc cc cc cc e9 45 fe ff ff 0f 1f 44 00 00 0f 1f 44
RSP: 0018:ffffa9c442887c78 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffca97c0ed4000 RCX: 0000000000000000
RDX: ffff88a1919bb6d0 RSI: ffff88a1919bb6d0 RDI: ffff88a187480000
RBP: 0000000000000044 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000100cca
R13: ffff88a2a48836b0 R14: 0000000000001be0 R15: ffffca97c0ed4000
FS:  00007fa45ec86c40(0000) GS:ffff88a2fad00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa45f436985 CR3: 000000010b4f8000 CR4: 00000000000006e0
Call Trace:
 <IRQ>
 ? rcu_dump_cpu_stacks+0xc8/0x100
 ? rcu_sched_clock_irq.cold+0x15b/0x2fb
 ? sched_slice+0x87/0x140
 ? perf_event_task_tick+0x64/0x370
 ? __cgroup_account_cputime_field+0x5b/0xa0
 ? update_process_times+0x77/0xb0
 ? tick_sched_handle+0x34/0x50
 ? tick_sched_timer+0x6f/0x80
 ? tick_sched_do_timer+0xa0/0xa0
 ? __hrtimer_run_queues+0x112/0x2b0
 ? hrtimer_interrupt+0xfe/0x220
 ? __sysvec_apic_timer_interrupt+0x7f/0x170
 ? sysvec_apic_timer_interrupt+0x99/0xc0
 </IRQ>
 <TASK>
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? __rcu_read_unlock+0x1d/0x30
 ? xas_load+0x30/0x40
 __filemap_get_folio+0x10a/0x370
 filemap_fault+0x139/0x910
 ? preempt_count_add+0x47/0xa0
 __do_fault+0x31/0x80
 do_fault+0x299/0x410
 __handle_mm_fault+0x623/0xb80
 handle_mm_fault+0xdb/0x2d0
 do_user_addr_fault+0x19c/0x560
 exc_page_fault+0x66/0x150
 asm_exc_page_fault+0x22/0x30
RIP: 0033:0x7fa45f4369af
Code: Unable to access opcode bytes at 0x7fa45f436985.
RSP: 002b:00007fff3ec0a580 EFLAGS: 00010246
RAX: 0000002537ea8ea4 RBX: 00007fff3ec0aab0 RCX: 0000000000000000
RDX: 00007fa45a3dffd0 RSI: 00007fa45a3e0010 RDI: 000055e348682520
RBP: 0000000000000015 R08: 000055e34862fd00 R09: 00007fff3ec0b1b0
R10: 0000000000000000 R11: 0000000000000000 R12: 00007fff3ec0a820
R13: 00007fff3ec0a640 R14: 2f4f057952ecadbd R15: 0000000000000000
 </TASK>


rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:         1-....: (21000 ticks this GP) idle=d1e4/1/0x4000000000000000 softirq=87308049/87308049 fqs=5541
        (t=21002 jiffies g=363533457 q=100563 ncpus=5)
rcu: rcu_preempt kthread starved for 8417 jiffies! g363533457 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=4
rcu:         Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:0     pid:15    ppid:2      flags:0x00004000
Call Trace:
 <TASK>
 ? rcu_gp_cleanup+0x570/0x570
 __schedule+0x35d/0x1370
 ? get_nohz_timer_target+0x18/0x190
 ? _raw_spin_unlock_irqrestore+0x23/0x40
 ? __mod_timer+0x281/0x3d0
 ? rcu_gp_cleanup+0x570/0x570
 schedule+0x5d/0xe0
 schedule_timeout+0x94/0x150
 ? __bpf_trace_tick_stop+0x10/0x10
 rcu_gp_fqs_loop+0x15b/0x650
 rcu_gp_kthread+0x1a9/0x280
 kthread+0xe9/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 4:
NMI backtrace for cpu 4
CPU: 4 PID: 529675 Comm: connection Not tainted 6.1.57 #1-NixOS
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:xas_descend+0x3/0x90
Code: 48 8b 57 08 48 89 57 10 e9 3a c6 2c 00 48 8b 57 10 48 89 07 48 c1 e8 20 48 89 57 08 e9 26 c6 2c 00 cc cc cc cc cc cc 0f b6 0e <48> 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48
RSP: 0018:ffffa37c47ccfbf8 EFLAGS: 00000246
RAX: ffff92832453e912 RBX: ffffa37c47ccfd78 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffff92832453e910 RDI: ffffa37c47ccfc08
RBP: 0000000000006305 R08: ffffa37c47ccfe70 R09: ffff92830f538138
R10: ffffa37c47ccfe68 R11: ffff92830f538138 R12: 0000000000006305
R13: ffff92832b518900 R14: 0000000000006305 R15: ffffa37c47ccfe98
FS:  00007fcbee42b6c0(0000) GS:ffff9287a9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc0b10d0d8 CR3: 0000000107632000 CR4: 00000000003506e0
Call Trace:
 <NMI>
 ? nmi_cpu_backtrace.cold+0x1b/0x76
 ? nmi_cpu_backtrace_handler+0xd/0x20
 ? nmi_handle+0x5d/0x120
 ? xas_descend+0x3/0x90
 ? default_do_nmi+0x69/0x170
 ? exc_nmi+0x13c/0x170
 ? end_repeat_nmi+0x16/0x67
 ? xas_descend+0x3/0x90
 ? xas_descend+0x3/0x90
 ? xas_descend+0x3/0x90
 </NMI>
 <TASK>
 xas_load+0x30/0x40
 filemap_get_read_batch+0x16e/0x250
 filemap_get_pages+0xa9/0x630
 ? current_time+0x3c/0x100
 ? atime_needs_update+0x104/0x180
 ? touch_atime+0x46/0x1f0
 filemap_read+0xd2/0x340
 xfs_file_buffered_read+0x4f/0xd0 [xfs]
 xfs_file_read_iter+0x6a/0xd0 [xfs]
 vfs_read+0x23c/0x310
 ksys_read+0x6b/0xf0
 do_syscall_64+0x3a/0x90
 entry_SYSCALL_64_after_hwframe+0x64/0xce
RIP: 0033:0x7fd0ccf0f78c
Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 a9 bb f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 ff bb f8 ff 48
RSP: 002b:00007fcbee427320 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd0ccf0f78c
RDX: 0000000000000014 RSI: 00007fcbee427500 RDI: 0000000000000129
RBP: 00007fcbee427430 R08: 0000000000000000 R09: 00a9b630ab4578b9
R10: 0000000000000001 R11: 0000000000000246 R12: 00007fcbee42a9f8
R13: 0000000000000014 R14: 00000000040ef680 R15: 0000000000000129
 </TASK>
CPU: 1 PID: 529591 Comm: connection Not tainted 6.1.57 #1-NixOS
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:xas_descend+0x18/0x90
Code: c1 e8 20 48 89 57 08 e9 26 c6 2c 00 cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 <48> 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d fd 00 00 00
RSP: 0018:ffffa37c47b7fbf8 EFLAGS: 00000216
RAX: fffff6e88448e000 RBX: ffffa37c47b7fd78 RCX: 0000000000000000
RDX: 000000000000000d RSI: ffff92832453e910 RDI: ffffa37c47b7fc08
RBP: 000000000000630d R08: ffffa37c47b7fe70 R09: ffff92830f538138
R10: ffffa37c47b7fe68 R11: ffff92830f538138 R12: 000000000000630d
R13: ffff92830b9a3b00 R14: 000000000000630d R15: ffffa37c47b7fe98
FS:  00007fcbf07bb6c0(0000) GS:ffff9287a9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc0ac8e360 CR3: 0000000107632000 CR4: 00000000003506e0
Call Trace:
 <IRQ>
 ? rcu_dump_cpu_stacks+0xc8/0x100
 ? rcu_sched_clock_irq.cold+0x15b/0x2fb
 ? sched_slice+0x87/0x140
 ? perf_event_task_tick+0x64/0x370
 ? __cgroup_account_cputime_field+0x5b/0xa0
 ? update_process_times+0x77/0xb0
 ? tick_sched_handle+0x34/0x50
 ? tick_sched_timer+0x6f/0x80
 ? tick_sched_do_timer+0xa0/0xa0
 ? __hrtimer_run_queues+0x112/0x2b0
 ? hrtimer_interrupt+0xfe/0x220
 ? __sysvec_apic_timer_interrupt+0x7f/0x170
 ? sysvec_apic_timer_interrupt+0x99/0xc0
 </IRQ>
 <TASK>
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? xas_descend+0x18/0x90
 xas_load+0x30/0x40
 filemap_get_read_batch+0x16e/0x250
 filemap_get_pages+0xa9/0x630
 ? current_time+0x3c/0x100
 ? atime_needs_update+0x104/0x180
 ? touch_atime+0x46/0x1f0
 filemap_read+0xd2/0x340
 xfs_file_buffered_read+0x4f/0xd0 [xfs]
 xfs_file_read_iter+0x6a/0xd0 [xfs]
 vfs_read+0x23c/0x310
 ksys_read+0x6b/0xf0
 do_syscall_64+0x3a/0x90
 </TASK>


rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    0-....: (21000 ticks this GP) idle=91fc/1/0x4000000000000000 softirq=85252827/85252827 fqs=4704
        (t=21002 jiffies g=167843445 q=13889 ncpus=3)
CPU: 0 PID: 2202919 Comm: .postgres-wrapp Not tainted 6.1.31 #1-NixOS
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:xas_descend+0x26/0x70
Code: cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 <75> 08 48 3d fd 00 00 0>
RSP: 0018:ffffb427c4917bf0 EFLAGS: 00000246
RAX: ffff98871f8dbdaa RBX: ffffb427c4917d70 RCX: 0000000000000002
RDX: 0000000000000005 RSI: ffff988876d3c000 RDI: ffffb427c4917c00
RBP: 000000000000f177 R08: ffffb427c4917e68 R09: ffff988846485d38
R10: ffffb427c4917e60 R11: ffff988846485d38 R12: 000000000000f177
R13: ffff988827b4ae00 R14: 000000000000f176 R15: ffffb427c4917e90
FS:  00007ff8de817800(0000) GS:ffff98887ac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff881c8c000 CR3: 000000010dfea000 CR4: 00000000000006f0

(stack trace is missing in this one)

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:         1-....: (20915 ticks this GP) idle=4b8c/1/0x4000000000000000 softirq=138338523/138338526 fqs=6063
        (t=21000 jiffies g=180955121 q=35490 ncpus=2)
CPU: 1 PID: 1415835 Comm: .postgres-wrapp Not tainted 6.1.57 #1-NixOS
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:filemap_get_read_batch+0x16e/0x250
Code: 85 ff 00 00 00 48 83 c4 40 5b 5d c3 cc cc cc cc f0 ff 0e 0f 84 e1 00 00 00 48 c7 44 24 18 03 00 00 00 48 89 e7 e8 42 ab 6d 00 <48> 89 c7 48 85 ff 74 ba 48 81 ff 06 04 00 00 0f 85 fe fe ff ff 48
RSP: 0018:ffffac01c6887c00 EFLAGS: 00000246
RAX: ffffe5a104574000 RBX: ffffac01c6887d70 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff96db861bcb68 RDI: ffffac01c6887c00
RBP: 0000000000014781 R08: ffffac01c6887e68 R09: ffff96dad46fad38
R10: ffffac01c6887e60 R11: ffff96dad46fad38 R12: 0000000000014781
R13: ffff96db86f47000 R14: 0000000000014780 R15: ffffac01c6887e90
FS:  00007f9ba0a12800(0000) GS:ffff96dbbbd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9b5050a018 CR3: 000000010ac82000 CR4: 00000000000006e0
Call Trace:
 <IRQ>
 ? rcu_dump_cpu_stacks+0xc8/0x100
 ? rcu_sched_clock_irq.cold+0x15b/0x2fb
 ? sched_slice+0x87/0x140
 ? timekeeping_update+0xdd/0x130
 ? __cgroup_account_cputime_field+0x5b/0xa0
 ? update_process_times+0x77/0xb0
 ? update_wall_time+0xc/0x20
 ? tick_sched_handle+0x34/0x50
 ? tick_sched_timer+0x6f/0x80
 ? tick_sched_do_timer+0xa0/0xa0
 ? __hrtimer_run_queues+0x112/0x2b0
 ? hrtimer_interrupt+0xfe/0x220
 ? __sysvec_apic_timer_interrupt+0x7f/0x170
 ? sysvec_apic_timer_interrupt+0x99/0xc0
 </IRQ>
 <TASK>
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? filemap_get_read_batch+0x16e/0x250
 filemap_get_pages+0xa9/0x630
 ? iomap_iter+0x78/0x310
 ? iomap_file_buffered_write+0x8f/0x2f0
 filemap_read+0xd2/0x340
 xfs_file_buffered_read+0x4f/0xd0 [xfs]
 xfs_file_read_iter+0x6a/0xd0 [xfs]
 vfs_read+0x23c/0x310
 __x64_sys_pread64+0x94/0xc0
 do_syscall_64+0x3a/0x90
 entry_SYSCALL_64_after_hwframe+0x64/0xce
RIP: 0033:0x7f9ba0b0d787
Code: 48 e8 5d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 f3 0f 1e fa 80 3d 7d bc 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
RSP: 002b:00007ffe56bb0878 EFLAGS: 00000202 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9ba0b0d787
RDX: 0000000000002000 RSI: 00007f9b5c85ce80 RDI: 000000000000003a
RBP: 0000000000000001 R08: 000000000a00000d R09: 0000000000000000
R10: 0000000014780000 R11: 0000000000000202 R12: 00007f9b90052ab0
R13: 00005566dc227f75 R14: 00005566dc22c510 R15: 00005566de3cf0c0
 </TASK>


rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	1-....: (21000 ticks this GP) idle=947c/1/0x4000000000000000 softirq=299845076/299845076 fqs=5249
	(t=21002 jiffies g=500931101 q=17117 ncpus=2)
CPU: 1 PID: 1660396 Comm: nix-collect-gar Not tainted 6.1.55 #1-NixOS
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:__xas_next+0x0/0xe0
Code: 48 3d 00 10 00 00 77 c8 48 89 c8 c3 cc cc cc cc e9 f5 fe ff ff 48 c7 47 18 01 00 00 00 31 c9 48 89 c8 c3 cc cc cc cc 0f 1f 00 <48> 8b 47 18 a8 02 75 0e 48 83 47 08 01 48 85 c0 0f 84 b5 00 00 00
RSP: 0018:ffffb170866f7bf8 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffb170866f7d70 RCX: 0000000000000000
RDX: 0000000000000020 RSI: ffff8ab97d5306d8 RDI: ffffb170866f7c00
RBP: 00000000000011e4 R08: 0000000000000000 R09: ffff8ab9a4dc3d38
R10: ffffb170866f7e60 R11: ffff8ab9a4dc3d38 R12: 00000000000011e4
R13: ffff8ab946fda400 R14: 00000000000011e4 R15: ffffb170866f7e90
FS:  00007f17d22e3f80(0000) GS:ffff8ab9bdd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000013279e8 CR3: 00000000137c8000 CR4: 00000000000006e0
Call Trace:
 <IRQ>
 ? rcu_dump_cpu_stacks+0xc8/0x100
 ? rcu_sched_clock_irq.cold+0x15b/0x2fb
 ? sched_slice+0x87/0x140
 ? perf_event_task_tick+0x64/0x370
 ? __cgroup_account_cputime_field+0x5b/0xa0
 ? update_process_times+0x77/0xb0
 ? tick_sched_handle+0x34/0x50
 ? tick_sched_timer+0x6f/0x80
 ? tick_sched_do_timer+0xa0/0xa0
 ? __hrtimer_run_queues+0x112/0x2b0
 ? hrtimer_interrupt+0xfe/0x220
 ? __sysvec_apic_timer_interrupt+0x7f/0x170
 ? sysvec_apic_timer_interrupt+0x99/0xc0
 </IRQ>
 <TASK>
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? __xas_prev+0xe0/0xe0
 ? xas_load+0x30/0x40
 filemap_get_read_batch+0x16e/0x250
 filemap_get_pages+0xa9/0x630
 ? atime_needs_update+0x104/0x180
 ? touch_atime+0x46/0x1f0
 filemap_read+0xd2/0x340
 xfs_file_buffered_read+0x4f/0xd0 [xfs]
 xfs_file_read_iter+0x6a/0xd0 [xfs]
 vfs_read+0x23c/0x310
 __x64_sys_pread64+0x94/0xc0
 do_syscall_64+0x3a/0x90
 entry_SYSCALL_64_after_hwframe+0x64/0xce
RIP: 0033:0x7f17d3a2d7c7
Code: 08 89 3c 24 48 89 4c 24 18 e8 75 db f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 c5 db f8 ff 48 8b
RSP: 002b:00007ffffd9d0fb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f17d3a2d7c7
RDX: 0000000000001000 RSI: 000056435be0ccf8 RDI: 0000000000000006
RBP: 00000000011e4000 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000011e4000 R11: 0000000000000293 R12: 0000000000001000
R13: 000056435be0ccf8 R14: 0000000000001000 R15: 000056435bdea370
 </TASK>

I’ve pulled together the various states the stall was detected in a more compact form:

 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? __rcu_read_unlock+0x1d/0x30
 ? xas_load+0x30/0x40
 __filemap_get_folio+0x10a/0x370
 filemap_fault+0x139/0x910

 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? __xas_prev+0xe0/0xe0
 ? xas_load+0x30/0x40
 filemap_get_read_batch+0x16e/0x250

 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? filemap_get_read_batch+0x16e/0x250

 xas_load+0x30/0x40
 filemap_get_read_batch+0x16e/0x250

RIP: 0010:xas_descend+0x26/0x70 (this one was missing the stack trace)

I tried reading through the xarray code, but my C and kernel knowledge is stretched thin trying to understand some of the internals: I couldn’t figure out how __rcu_read_unlock appears from within xas_load, similar to __xas_prev. I stopped diving deeper at that point.

My original bug report also includes an initial grab of multiple stall reports over time on a single machine where the situation unfolded with different stack traces over many hours. It’s a bit long so I’m opting to provide the link: https://bugzilla.kernel.org/show_bug.cgi?id=217572#c0 

I also was wondering whether the stall is stuck or spinning and one of my early comments noticed that with 3 CPUs I had a total of 60% spent in system time, so this sounds like it might be spinning between xas_load and xas_descend. I see there’s some kind of retry mechanism in there and while-loops that might get stuck if the data structures are borked. I think it’s alternating between xas_load and xas_descend, though, so not stuck in xas_descend’s loop itself.

The redhat report "large folio related page cache iteration hang” (https://bugzilla.redhat.com/show_bug.cgi?id=2213967) does show a “kernel bug” message in addition to the known stack around xas_load:

kernel: watchdog: BUG: soft lockup - CPU#28 stuck for 26s! [rocksdb:low:2195]
kernel: Modules linked in: tls nf_conntrack_netbios_ns nf_conntrack_broadcast nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel tcp_bbr rfkill ip_set nf_tables nfnetlink nct6775 nct6775_core tun hwmon_vid jc42 vfat fat ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm snd_timer cdc_ether irqbypass acpi_ipmi snd usbnet wmi_bmof rapl ipmi_si k10temp soundcore i2c_piix4 joydev mii ipmi_devintf ipmi_msghandler fuse loop xfs uas usb_storage raid1 hid_cp2112 igb crct10dif_pclmul ast crc32_pclmul nvme crc32c_intel polyval_clmulni dca polyval_generic i2c_algo_bit nvme_core ghash_clmulni_intel ccp sha512_ssse3 wmi sp5100_tco nvme_common
kernel: CPU: 28 PID: 2195 Comm: rocksdb:low Not tainted 6.3.5-100.fc37.x86_64 #1
kernel: Hardware name: To Be Filled By O.E.M. X570D4U/X570D4U, BIOS T1.29b 05/17/2022
kernel: RIP: 0010:xas_load+0x45/0x50
kernel: Code: 3d 00 10 00 00 77 07 5b 5d c3 cc cc cc cc 0f b6 4b 10 48 8d 68 fe 38 48 fe 72 ec 48 89 ee 48 89 df e8 cf fd ff ff 80 7d 00 00 <75> c7 eb d9 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
kernel: RSP: 0018:ffffaab80392fb40 EFLAGS: 00000246
kernel: RAX: fffff69f82a7c000 RBX: ffffaab80392fb58 RCX: 0000000000000000
kernel: RDX: 0000000000000010 RSI: ffff94a4268a6480 RDI: ffffaab80392fb58
kernel: RBP: ffff94a4268a6480 R08: 0000000000000000 R09: 000000000000424a
kernel: R10: ffff94af1ec69ab0 R11: 0000000000000000 R12: 0000000000001610
kernel: R13: 000000000000160c R14: 000000000000160c R15: ffffaab80392fdf0
kernel: FS: 00007f49f7bfe6c0(0000) GS:ffff94b63f100000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f01446e9000 CR3: 000000014a4be000 CR4: 0000000000750ee0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: <IRQ>
kernel: ? watchdog_timer_fn+0x1a8/0x210
kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
kernel: ? __hrtimer_run_queues+0x112/0x2b0
kernel: ? hrtimer_interrupt+0xf8/0x230
kernel: ? __sysvec_apic_timer_interrupt+0x61/0x130
kernel: ? sysvec_apic_timer_interrupt+0x6d/0x90
kernel: </IRQ>
kernel: <TASK>
kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
kernel: ? xas_load+0x45/0x50
kernel: filemap_get_read_batch+0x179/0x270
kernel: filemap_get_pages+0xab/0x6a0
kernel: ? touch_atime+0x48/0x1b0
kernel: ? filemap_read+0x33f/0x350
kernel: filemap_read+0xdf/0x350
kernel: xfs_file_buffered_read+0x4f/0xd0 [xfs]
kernel: xfs_file_read_iter+0x74/0xe0 [xfs]
kernel: vfs_read+0x240/0x310
kernel: __x64_sys_pread64+0x98/0xd0
kernel: do_syscall_64+0x5f/0x90
kernel: ? native_flush_tlb_local+0x34/0x40
kernel: ? flush_tlb_func+0x10d/0x240
kernel: ? do_syscall_64+0x6b/0x90
kernel: ? sched_clock_cpu+0xf/0x190
kernel: ? irqtime_account_irq+0x40/0xc0
kernel: ? __irq_exit_rcu+0x4b/0xf0
kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
kernel: RIP: 0033:0x7f4a0c23c227
kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 b5 e3 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 05 e4 f8 ff 48 8b
kernel: RSP: 002b:00007f49f7bf8310 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
kernel: RAX: ffffffffffffffda RBX: 000000000000424a RCX: 00007f4a0c23c227
kernel: RDX: 000000000000424a RSI: 00007f04294a35c0 RDI: 00000000000004be
kernel: RBP: 00007f49f7bf8460 R08: 0000000000000000 R09: 00007f49f7bf84a0
kernel: R10: 000000000160c718 R11: 0000000000000293 R12: 000000000000424a
kernel: R13: 000000000160c718 R14: 00007f04294a35c0 R15: 0000000000000000
kernel: </TASK>
...
kernel: ------------[ cut here ]------------
kernel: kernel BUG at fs/inode.c:612!
kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
kernel: CPU: 21 PID: 2195 Comm: rocksdb:low Tainted: G L 6.3.5-100.fc37.x86_64 #1
kernel: Hardware name: To Be Filled By O.E.M. X570D4U/X570D4U, BIOS T1.29b 05/17/2022
kernel: RIP: 0010:clear_inode+0x76/0x80
kernel: Code: 2d a8 40 75 2b 48 8b 93 28 01 00 00 48 8d 83 28 01 00 00 48 39 c2 75 1a 48 c7 83 98 00 00 00 60 00 00 00 5b 5d c3 cc cc cc cc <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 90 90 90 90 90 90 90 90 90 90 90 90
kernel: RSP: 0018:ffffaab80392fe58 EFLAGS: 00010002
kernel: RAX: 0000000000000000 RBX: ffff94af1ec69938 RCX: 0000000000000000
kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff94af1ec69ab8
kernel: RBP: ffff94af1ec69ab8 R08: ffffaab80392fd38 R09: 0000000000000002
kernel: R10: 0000000000000001 R11: 0000000000000005 R12: ffffffffc08b9860
kernel: R13: ffff94af1ec69938 R14: 00000000ffffff9c R15: ffff94979dd5da40
kernel: FS: 00007f49f7bfe6c0(0000) GS:ffff94b63ef40000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007eefca8e2000 CR3: 000000014a4be000 CR4: 0000000000750ee0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: <TASK>
kernel: ? die+0x36/0x90
kernel: ? do_trap+0xda/0x100
kernel: ? clear_inode+0x76/0x80
kernel: ? do_error_trap+0x6a/0x90
kernel: ? clear_inode+0x76/0x80
kernel: ? exc_invalid_op+0x50/0x70
kernel: ? clear_inode+0x76/0x80
kernel: ? asm_exc_invalid_op+0x1a/0x20
kernel: ? clear_inode+0x76/0x80
kernel: ? clear_inode+0x1d/0x80
kernel: evict+0x1b8/0x1d0
kernel: do_unlinkat+0x174/0x320
kernel: __x64_sys_unlink+0x42/0x70
kernel: do_syscall_64+0x5f/0x90
kernel: ? __irq_exit_rcu+0x4b/0xf0
kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
kernel: RIP: 0033:0x7f4a0c23faab
kernel: Code: f0 ff ff 73 01 c3 48 8b 0d 82 63 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 63 0d 00 f7 d8 64 89 01 48
kernel: RSP: 002b:00007f49f7bfab58 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
kernel: RAX: ffffffffffffffda RBX: 00007f49f7bfac38 RCX: 00007f4a0c23faab
kernel: RDX: 00007f49f7bfadd0 RSI: 00007f4a0bc2fd30 RDI: 00007f49dd3c32d0
kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: ffffffffffffdf58 R11: 0000000000000206 R12: 0000000000280bc0
kernel: R13: 00007f4a0bca77b8 R14: 00007f49f7bfadd0 R15: 00007f49f7bfadd0
kernel: </TASK>
kernel: Modules linked in: tls nf_conntrack_netbios_ns nf_conntrack_broadcast nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel tcp_bbr rfkill ip_set nf_tables nfnetlink nct6775 nct6775_core tun hwmon_vid jc42 vfat fat ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm snd_hwdep snd_pcm snd_timer cdc_ether irqbypass acpi_ipmi snd usbnet wmi_bmof rapl ipmi_si k10temp soundcore i2c_piix4 joydev mii ipmi_devintf ipmi_msghandler fuse loop xfs uas usb_storage raid1 hid_cp2112 igb crct10dif_pclmul ast crc32_pclmul nvme crc32c_intel polyval_clmulni dca polyval_generic i2c_algo_bit nvme_core ghash_clmulni_intel ccp sha512_ssse3 wmi sp5100_tco nvme_common
kernel: ---[ end trace 0000000000000000 ]---
kernel: RIP: 0010:clear_inode+0x76/0x80
kernel: Code: 2d a8 40 75 2b 48 8b 93 28 01 00 00 48 8d 83 28 01 00 00 48 39 c2 75 1a 48 c7 83 98 00 00 00 60 00 00 00 5b 5d c3 cc cc cc cc <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 90 90 90 90 90 90 90 90 90 90 90 90
kernel: RSP: 0018:ffffaab80392fe58 EFLAGS: 00010002
kernel: RAX: 0000000000000000 RBX: ffff94af1ec69938 RCX: 0000000000000000
kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff94af1ec69ab8
kernel: RBP: ffff94af1ec69ab8 R08: ffffaab80392fd38 R09: 0000000000000002
kernel: R10: 0000000000000001 R11: 0000000000000005 R12: ffffffffc08b9860
kernel: R13: ffff94af1ec69938 R14: 00000000ffffff9c R15: ffff94979dd5da40
kernel: FS: 00007f49f7bfe6c0(0000) GS:ffff94b63ef40000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007eefca8e2000 CR3: 000000014a4be000 CR4: 0000000000750ee0
kernel: PKRU: 55555554
kernel: note: rocksdb:low[2195] exited with irqs disabled
kernel: note: rocksdb:low[2195] exited with preempt_count 1

Above report is showing rocksdb as the workload with relatively short uptimes around 30 minutes. Maybe there’s a reproducer around there somewhere? I’ve CCed the reporter from there to maybe get some insight on his workload.

Rabbit hole 2: things that were already considered 
==================================================

There were a number of potential vectors and bugfixes that were discussed/referenced but haven’t turned out to fix the issue overall. Some of them might be obvious red herrings by now, but I’m not sure which.

* [GIT PULL] xfs, iomap: fix data corruption due to stale cached iomaps (https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/)

* cbc02854331e ("XArray: Do not return sibling entries from xa_load()”) did not help here

* I think i’ve seen that the affected data on disk ended up being null bytes, but I can’t verify that.

* There was a fix close to this in “_filemap_get_folio and NULL pointer dereference” (https://bugzilla.kernel.org/show_bug.cgi?id=217441) and "having TRANSPARENT_HUGEPAGE enabled hangs some applications (supervisor read access in kernel mode)” https://bugzilla.kernel.org/show_bug.cgi?id=216646 but their traces looked slightly different from the ones discussed here as did their outcome. Interestingly: those are also on the page fault path, not an fs path.

* memcg was in the stack and under question at some point but it also happens without it

* i was wondering whether increased readahead sizes might cause issues (most our VMs run 128kb but DB VMs run with 1MiB. However, this might also be a red herring as the single vs. multi core situation correlates strongly in our case).

Maybe offtopic but maybe it spurs ideas – a situation that felt similar to the stalls here: I remember debugging a memory issue in Python’s small object allocator a number of years ago that resulted in segfaults and I wonder whether the stalls we’re seeing are only a delayed symptom of an earlier corruption somewhere else. The Python issue was a third party module that caused an out of bounds write into an adjacent byte that was used as a pointer for arena management. That one was also extremely hard to track down due to this indirection / "magic at a distance” behaviour.

That’s all the input I have.

My offer stands: I can take time and run a number of machines that exhibited the behaviour on custom kernels to gather data. 

Cheers,
Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:25     ` Linus Torvalds
  2024-09-12 22:30       ` Jens Axboe
  2024-09-13 12:11       ` Christian Brauner
@ 2024-09-13 15:30       ` Chris Mason
  2024-09-13 15:51         ` Matthew Wilcox
  2024-09-13 16:04       ` David Howells
  2024-09-16  0:00       ` Dave Chinner
  4 siblings, 1 reply; 81+ messages in thread
From: Chris Mason @ 2024-09-13 15:30 UTC (permalink / raw)
  To: Linus Torvalds, Jens Axboe
  Cc: Matthew Wilcox, Christian Theune, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner,
	regressions, regressions



On 9/12/24 6:25 PM, Linus Torvalds wrote:
> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> When I saw Christian's report, I seemed to recall that we ran into this
>> at Meta too. And we did, and hence have been reverting it since our 5.19
>> release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping
>> things that are known broken.
> 
> I do think that if we have big sites just reverting it as known broken
> and can't figure out why, we should do so upstream too.

I've mentioned this in the past to both Willy and Dave Chinner, but so
far all of my attempts to reproduce it on purpose have failed.  It's
awkward because I don't like to send bug reports that I haven't
reproduced on a non-facebook kernel, but I'm pretty confident this bug
isn't specific to us.

I'll double down on repros again during plumbers and hopefully come up
with a recipe for explosions.  On other important datapoint is that we
also enable huge folios via tmpfs mount -o huge=within_size.

That hasn't hit problems, and we've been doing it for years, but of
course the tmpfs usage is pretty different from iomap/xfs.

We have two workloads that have reliably seen large folios bugs in prod.
 This is all on bare metal systems, some are two socket, some single,
nothing really exotic.

1) On 5.19 kernels, knfsd reading and writing to XFS.  We needed
O(hundreds) of knfsd servers running for about 8 hours to see one hit.

The issue looked similar to Christian Theune's rcu stalls, but since it
was just one CPU spinning away, I was able to perf probe and drgn my way
to some details.  The xarray for the file had a series of large folios:

[ index 0 large folio from the correct file ]
[ index 1: large folio from the correct file ]
...
[ index N: large folio from a completely different file ]
[ index N+1: large folio from the correct file ]

I'm being sloppy with index numbers, but the important part is that
we've got a large folio from the wrong file in the middle of the bunch.

filemap_read() iterates over batches of folios from the xarray, but if
one of the folios in the batch has folio->offset out of order with the
rest, the whole thing turns into a infinite loop.  It's not really a
filemap_read() bug, the batch coming back from the xarray is just incorrect.

2) On 6.9 kernels, we saw a BUG_ON() during inode eviction because
mapping->nrpages was non-zero.  I'm assuming it's really just a
different window into the same bug.  Crash dump analysis was less
conclusive because the xarray itself was always empty, but turning off
large folios made the problem go away.

This happened ~5-10 times a day, and the service had a few thousand
machines running 6.9.  If I can't make an artificial repro, I'll try and
talk the service owners into setting up a production shadow to hammer on
it with additional debugging.

We also disabled large folios for our 6.4 kernel, but Stefan actually
tracked that bug down:

commit a48d5bdc877b85201e42cef9c2fdf5378164c23a
Author: Stefan Roesch <shr@devkernel.io>
Date:   Mon Nov 6 10:19:18 2023 -0800

    mm: fix for negative counter: nr_file_hugepages

We didn't have time to revalidate with large folios back on afterwards.

-chris


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13 15:30       ` Chris Mason
@ 2024-09-13 15:51         ` Matthew Wilcox
  2024-09-13 16:33           ` Chris Mason
  0 siblings, 1 reply; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-13 15:51 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Jens Axboe, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner,
	regressions, regressions

On Fri, Sep 13, 2024 at 11:30:41AM -0400, Chris Mason wrote:
> I've mentioned this in the past to both Willy and Dave Chinner, but so
> far all of my attempts to reproduce it on purpose have failed.  It's
> awkward because I don't like to send bug reports that I haven't
> reproduced on a non-facebook kernel, but I'm pretty confident this bug
> isn't specific to us.

I don't think the bug is specific to you either.  It's been hit by
several people ... but it's really hard to hit ;-(  

> I'll double down on repros again during plumbers and hopefully come up
> with a recipe for explosions.  On other important datapoint is that we

I appreciate the effort!

> The issue looked similar to Christian Theune's rcu stalls, but since it
> was just one CPU spinning away, I was able to perf probe and drgn my way
> to some details.  The xarray for the file had a series of large folios:
> 
> [ index 0 large folio from the correct file ]
> [ index 1: large folio from the correct file ]
> ...
> [ index N: large folio from a completely different file ]
> [ index N+1: large folio from the correct file ]
> 
> I'm being sloppy with index numbers, but the important part is that
> we've got a large folio from the wrong file in the middle of the bunch.

If you could get the precise index numbers, that would be an important
clue.  It would be interesting to know the index number in the xarray
where the folio was found rather than folio->index (as I suspect that
folio->index is completely bogus because folio->mapping is wrong).
But gathering that info is going to be hard.

Maybe something like this?

+++ b/mm/filemap.c
@@ -2317,6 +2317,12 @@ static void filemap_get_read_batch(struct address_space *mapping,
                if (unlikely(folio != xas_reload(&xas)))
                        goto put_folio;

+{
+       struct address_space *fmapping = READ_ONCE(folio->mapping);
+       if (fmapping != NULL && fmapping != mapping)
+               printk("bad folio at %lx\n", xas.xa_index);
+}
+
                if (!folio_batch_add(fbatch, folio))
                        break;
                if (!folio_test_uptodate(folio))

(could use VM_BUG_ON_FOLIO() too, but i'm not sure that the identity of
the bad folio we've found is as interesting as where we found it)



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:25     ` Linus Torvalds
                         ` (2 preceding siblings ...)
  2024-09-13 15:30       ` Chris Mason
@ 2024-09-13 16:04       ` David Howells
  2024-09-13 16:37         ` Chris Mason
  2024-09-16  0:00       ` Dave Chinner
  4 siblings, 1 reply; 81+ messages in thread
From: David Howells @ 2024-09-13 16:04 UTC (permalink / raw)
  To: Chris Mason
  Cc: dhowells, Linus Torvalds, Jens Axboe, Matthew Wilcox,
	Christian Theune, linux-mm, linux-xfs, linux-fsdevel,
	linux-kernel, Daniel Dao, Dave Chinner, regressions, regressions

Chris Mason <clm@meta.com> wrote:

> I've mentioned this in the past to both Willy and Dave Chinner, but so
> far all of my attempts to reproduce it on purpose have failed.

Could it be a splice bug?

David



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13 15:51         ` Matthew Wilcox
@ 2024-09-13 16:33           ` Chris Mason
  2024-09-13 18:15             ` Matthew Wilcox
  0 siblings, 1 reply; 81+ messages in thread
From: Chris Mason @ 2024-09-13 16:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Jens Axboe, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner,
	regressions, regressions

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]



On 9/13/24 11:51 AM, Matthew Wilcox wrote:
> On Fri, Sep 13, 2024 at 11:30:41AM -0400, Chris Mason wrote:
>> I've mentioned this in the past to both Willy and Dave Chinner, but so
>> far all of my attempts to reproduce it on purpose have failed.  It's
>> awkward because I don't like to send bug reports that I haven't
>> reproduced on a non-facebook kernel, but I'm pretty confident this bug
>> isn't specific to us.
> 
> I don't think the bug is specific to you either.  It's been hit by
> several people ... but it's really hard to hit ;-(  
> 
>> I'll double down on repros again during plumbers and hopefully come up
>> with a recipe for explosions.  On other important datapoint is that we
> 
> I appreciate the effort!
> 
>> The issue looked similar to Christian Theune's rcu stalls, but since it
>> was just one CPU spinning away, I was able to perf probe and drgn my way
>> to some details.  The xarray for the file had a series of large folios:
>>
>> [ index 0 large folio from the correct file ]
>> [ index 1: large folio from the correct file ]
>> ...
>> [ index N: large folio from a completely different file ]
>> [ index N+1: large folio from the correct file ]
>>
>> I'm being sloppy with index numbers, but the important part is that
>> we've got a large folio from the wrong file in the middle of the bunch.
> 
> If you could get the precise index numbers, that would be an important
> clue.  It would be interesting to know the index number in the xarray
> where the folio was found rather than folio->index (as I suspect that
> folio->index is completely bogus because folio->mapping is wrong).
> But gathering that info is going to be hard.

This particular debug session was late at night while we were urgently
trying to roll out some NFS features.  I didn't really save many of the
details because my plan was to reproduce it and make a full bug report.

Also, I was explaining the details to people in workplace chat, which is
wildly bad at rendering long lines of structured text, especially when
half the people in the chat are on a mobile device.

You're probably wondering why all of that is important...what I'm really
trying to say is that I've attached a screenshot of the debugging output.

It came from a older drgn script, where I'm still clinging to "radix",
and you probably can't trust the string representation of the page flags
because I wasn't yet using Omar's helpers and may have hard coded them
from an older kernel.

-chris

[-- Attachment #2: xarray-debug.png --]
[-- Type: image/png, Size: 933545 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13 16:04       ` David Howells
@ 2024-09-13 16:37         ` Chris Mason
  0 siblings, 0 replies; 81+ messages in thread
From: Chris Mason @ 2024-09-13 16:37 UTC (permalink / raw)
  To: David Howells
  Cc: Linus Torvalds, Jens Axboe, Matthew Wilcox, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	Dave Chinner, regressions, regressions

On 9/13/24 12:04 PM, David Howells wrote:
> Chris Mason <clm@meta.com> wrote:
> 
>> I've mentioned this in the past to both Willy and Dave Chinner, but so
>> far all of my attempts to reproduce it on purpose have failed.
> 
> Could it be a splice bug?

I really wanted it to be a splice bug, but I believe the 6.9 workload I
mentioned isn't using splice.  I didn't 100% verify though.

-chris


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13 16:33           ` Chris Mason
@ 2024-09-13 18:15             ` Matthew Wilcox
  2024-09-13 21:24               ` Linus Torvalds
  0 siblings, 1 reply; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-13 18:15 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Jens Axboe, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner,
	regressions, regressions

On Fri, Sep 13, 2024 at 12:33:49PM -0400, Chris Mason wrote:
> > If you could get the precise index numbers, that would be an important
> > clue.  It would be interesting to know the index number in the xarray
> > where the folio was found rather than folio->index (as I suspect that
> > folio->index is completely bogus because folio->mapping is wrong).
> > But gathering that info is going to be hard.
> 
> This particular debug session was late at night while we were urgently
> trying to roll out some NFS features.  I didn't really save many of the
> details because my plan was to reproduce it and make a full bug report.
> 
> Also, I was explaining the details to people in workplace chat, which is
> wildly bad at rendering long lines of structured text, especially when
> half the people in the chat are on a mobile device.
> 
> You're probably wondering why all of that is important...what I'm really
> trying to say is that I've attached a screenshot of the debugging output.
> 
> It came from a older drgn script, where I'm still clinging to "radix",
> and you probably can't trust the string representation of the page flags
> because I wasn't yet using Omar's helpers and may have hard coded them
> from an older kernel.

That's all _fine_.  This is enormously helpful.

First, we see the same folio appear three times.  I think that's
particularly significant.  Modulo 64 (number of entries/node), the indices
the bad folio are found at is 16, 32 and 48.  So I think the _current_
order of folio is 4, but at the time the folio was put in the xarray,
it was order 6.  Except ... at order-6 we elide a level of the xarray.
So we shouldn't be able to see this.  Hm.

Oh!  I think split is the key.  Let's say we have an order-6 (or
larger) folio.  And we call split_huge_page() (whatever it's called
in your kernel version).  That calls xas_split_alloc() followed
by xas_split().  xas_split_alloc() puts entry in node->slots[0] and
initialises node->slots[1..XA_CHUNK_SIZE] to a sibling entry.

Now, if we do allocate those node in xas_split_alloc(), we're supposed to
free them with radix_tree_node_rcu_free() which zeroes all the slots.
But what if we don't, somehow?  (this is my best current theory).
Then we allocate the node to a different tree, but any time we try to
look something up, unless it's the index for which we allocated the node,
we find a sibling entry and it points to a stale pointer.

I'm going to think on this a bit more, but so far this is all good
evidence for my leading theory.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13 18:15             ` Matthew Wilcox
@ 2024-09-13 21:24               ` Linus Torvalds
  2024-09-13 21:30                 ` Matthew Wilcox
  0 siblings, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-13 21:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Chris Mason, Jens Axboe, Christian Theune, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner,
	regressions, regressions

On Fri, 13 Sept 2024 at 11:15, Matthew Wilcox <willy@infradead.org> wrote:
>
> Oh!  I think split is the key.  Let's say we have an order-6 (or
> larger) folio.  And we call split_huge_page() (whatever it's called
> in your kernel version).  That calls xas_split_alloc() followed
> by xas_split().  xas_split_alloc() puts entry in node->slots[0] and
> initialises node->slots[1..XA_CHUNK_SIZE] to a sibling entry.

Hmm. The splitting does seem to be not just indicated by the debug
logs, but it ends up being a fairly complicated case. *The* most
complicated case of adding a new folio by far, I'd say.

And I wonder if it's even necessary?

Because I think the *common* case is through filemap_add_folio(),
isn't it? And that code path really doesn't care what the size of the
folio is.

So instead of splitting, that code path would seem to be perfectly
happy with instead erroring out, and simply re-doing the new folio
allocation using the same size that the old conflicting folio had (at
which point it won't be conflicting any more).

No?

It's possible that I'm entirely missing something, but at least the
filemap_add_folio() case looks like it really would actually be
happier with a "oh, that size conflicts with an existing entry, let's
just allocate a smaller size then"

                Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13 21:24               ` Linus Torvalds
@ 2024-09-13 21:30                 ` Matthew Wilcox
  0 siblings, 0 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-13 21:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Jens Axboe, Christian Theune, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, Dave Chinner,
	regressions, regressions

On Fri, Sep 13, 2024 at 02:24:02PM -0700, Linus Torvalds wrote:
> On Fri, 13 Sept 2024 at 11:15, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > Oh!  I think split is the key.  Let's say we have an order-6 (or
> > larger) folio.  And we call split_huge_page() (whatever it's called
> > in your kernel version).  That calls xas_split_alloc() followed
> > by xas_split().  xas_split_alloc() puts entry in node->slots[0] and
> > initialises node->slots[1..XA_CHUNK_SIZE] to a sibling entry.
> 
> Hmm. The splitting does seem to be not just indicated by the debug
> logs, but it ends up being a fairly complicated case. *The* most
> complicated case of adding a new folio by far, I'd say.
> 
> And I wonder if it's even necessary?

Unfortunately, we need to handle things like "we are truncating a file
which has a folio which now extends many pages beyond the end of the
file" and so we have to split the folio which now crosses EOF.  Or we
could write it back and drop it, but that has its own problems.

Part of the "large block size" patches sitting in Christian's tree is
solving these problems for folios which can't be split down to order-0,
so there may be ways we can handle this better now, but if we don't
split we might end up wasting a lot of memory in file tails.

> It's possible that I'm entirely missing something, but at least the
> filemap_add_folio() case looks like it really would actually be
> happier with a "oh, that size conflicts with an existing entry, let's
> just allocate a smaller size then"

Pretty sure we already do that; it's mostly handled through the
readahead path which checks for conflicting folios already in the cache.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-12 22:25     ` Linus Torvalds
                         ` (3 preceding siblings ...)
  2024-09-13 16:04       ` David Howells
@ 2024-09-16  0:00       ` Dave Chinner
  2024-09-16  4:20         ` Linus Torvalds
  2024-09-16  7:14         ` Christian Theune
  4 siblings, 2 replies; 81+ messages in thread
From: Dave Chinner @ 2024-09-16  0:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Matthew Wilcox, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, clm,
	regressions, regressions

On Thu, Sep 12, 2024 at 03:25:50PM -0700, Linus Torvalds wrote:
> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
> Honestly, the fact that it hasn't been reverted after apparently
> people knowing about it for months is a bit shocking to me. Filesystem
> people tend to take unknown corruption issues as a big deal. What
> makes this so special? Is it because the XFS people don't consider it
> an XFS issue, so...

I don't think this is a data corruption/loss problem - it certainly
hasn't ever appeared that way to me.  The "data loss" appeared to be
in incomplete postgres dump files after the system was rebooted and
this is exactly what would happen when you randomly crash the
system. i.e. dirty data in memory is lost, and application data
being written at the time is in an inconsistent state after the
system recovers. IOWs, there was no clear evidence of actual data
corruption occuring, and data loss is definitely expected when the
page cache iteration hangs and the system is forcibly rebooted
without being able to sync or unmount the filesystems...

All the hangs seem to be caused by folio lookup getting stuck
on a rogue xarray entry in truncate or readahead. If we find an
invalid entry or a folio from a different mapping or with a
unexpected index, we skip it and try again.  Hence this does not
appear to be a data corruption vector, either - it results in a
livelock from endless retry because of the bad entry in the xarray.
This endless retry livelock appears to be what is being reported.

IOWs, there is no evidence of real runtime data corruption or loss
from this pagecache livelock bug.  We also haven't heard of any
random file data corruption events since we've enabled large folios
on XFS. Hence there really is no evidence to indicate that there is
a large folio xarray lookup bug that results in data corruption in
the existing code, and therefore there is no obvious reason for
turning off the functionality we are already building significant
new functionality on top of.

It's been 10 months since I asked Christain to help isolate a
reproducer so we can track this down. Nothing came from that, so
we're still at exactly where we were at back in november 2023 -
waiting for information on a way to reproduce this issue more
reliably.

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-16  0:00       ` Dave Chinner
@ 2024-09-16  4:20         ` Linus Torvalds
  2024-09-16  8:47           ` Chris Mason
  2024-09-16  7:14         ` Christian Theune
  1 sibling, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-16  4:20 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jens Axboe, Matthew Wilcox, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, clm,
	regressions, regressions

On Mon, 16 Sept 2024 at 02:00, Dave Chinner <david@fromorbit.com> wrote:
>
> I don't think this is a data corruption/loss problem - it certainly
> hasn't ever appeared that way to me.  The "data loss" appeared to be
> in incomplete postgres dump files after the system was rebooted and
> this is exactly what would happen when you randomly crash the
> system.

Ok, that sounds better, indeed.

Of course, "hang due to internal xarray corruption" isn't _much_
better, but still..

> All the hangs seem to be caused by folio lookup getting stuck
> on a rogue xarray entry in truncate or readahead. If we find an
> invalid entry or a folio from a different mapping or with a
> unexpected index, we skip it and try again.

We *could* perhaps change the "retry the optimistic lookup forever" to
be a "retry and take lock after optimistic failure". At least in the
common paths.

That's what we do with some dcache locking, because the "retry on
race" caused some potential latency issues under ridiculous loads.

And if we retry with the lock, at that point we can actually notice
corruption, because at that point we can say "we have the lock, and we
see a bad folio with the wrong mapping pointer, and now it's not some
possible race condition due to RCU".

That, in turn, might then result in better bug reports. Which would at
least be forward progress rather than "we have this bug".

Let me think about it. Unless somebody else gets to it before I do
(hint hint to anybody who is comfy with that filemap_read() path etc).

                 Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-16  0:00       ` Dave Chinner
  2024-09-16  4:20         ` Linus Torvalds
@ 2024-09-16  7:14         ` Christian Theune
  2024-09-16 12:16           ` Matthew Wilcox
  2024-09-18  8:31           ` Christian Theune
  1 sibling, 2 replies; 81+ messages in thread
From: Christian Theune @ 2024-09-16  7:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Jens Axboe, Matthew Wilcox, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, clm, regressions,
	regressions


> On 16. Sep 2024, at 02:00, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Thu, Sep 12, 2024 at 03:25:50PM -0700, Linus Torvalds wrote:
>> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
>> Honestly, the fact that it hasn't been reverted after apparently
>> people knowing about it for months is a bit shocking to me. Filesystem
>> people tend to take unknown corruption issues as a big deal. What
>> makes this so special? Is it because the XFS people don't consider it
>> an XFS issue, so...
> 
> I don't think this is a data corruption/loss problem - it certainly
> hasn't ever appeared that way to me.  The "data loss" appeared to be
> in incomplete postgres dump files after the system was rebooted and
> this is exactly what would happen when you randomly crash the
> system. i.e. dirty data in memory is lost, and application data
> being written at the time is in an inconsistent state after the
> system recovers. IOWs, there was no clear evidence of actual data
> corruption occuring, and data loss is definitely expected when the
> page cache iteration hangs and the system is forcibly rebooted
> without being able to sync or unmount the filesystems…
> All the hangs seem to be caused by folio lookup getting stuck
> on a rogue xarray entry in truncate or readahead. If we find an
> invalid entry or a folio from a different mapping or with a
> unexpected index, we skip it and try again.  Hence this does not
> appear to be a data corruption vector, either - it results in a
> livelock from endless retry because of the bad entry in the xarray.
> This endless retry livelock appears to be what is being reported.
> 
> IOWs, there is no evidence of real runtime data corruption or loss
> from this pagecache livelock bug.  We also haven't heard of any
> random file data corruption events since we've enabled large folios
> on XFS. Hence there really is no evidence to indicate that there is
> a large folio xarray lookup bug that results in data corruption in
> the existing code, and therefore there is no obvious reason for
> turning off the functionality we are already building significant
> new functionality on top of.

Right, understood. 

However, the timeline of one of the encounters with PostgreSQL (the first comment in Bugzilla) involved still makes me feel uneasy:
 
T0                   : one postgresql process blocked with a different trace (not involving xas_load)
T+a few minutes      : another process stuck with the relevant xas_load/descend trace
T+a few more minutes : other processes blocked in xas_load (this time the systemd journal)
T+14m                : the journal gets coredumped, likely due to some watchdog 

Things go back to normal.

T+14h                : another postgres process gets fully stuck on the xas_load/descend trace


I agree with your analysis if the process gets stuck in an infinite loop, but I’ve seen at least one instance where it appears to have left the loop at some point and IMHO that would be a condition that would allow data corruption.

> It's been 10 months since I asked Christain to help isolate a
> reproducer so we can track this down. Nothing came from that, so
> we're still at exactly where we were at back in november 2023 -
> waiting for information on a way to reproduce this issue more
> reliably.

Sorry for dropping the ball from my side as well - I’ve learned my lesson from trying to go through Bugzilla here. ;)

You mentioned above that this might involve read-ahead code and that’s something I noticed before: the machines that carry databases do run with a higher read-ahead setting (1MiB vs. 128k elsewhere).

Also, I’m still puzzled about the one variation that seems to involve page faults and not XFS. That’s something I haven’t seen a response to yet whether this IS in fact interesting or not. 

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-16  4:20         ` Linus Torvalds
@ 2024-09-16  8:47           ` Chris Mason
  2024-09-17  9:32             ` Matthew Wilcox
  0 siblings, 1 reply; 81+ messages in thread
From: Chris Mason @ 2024-09-16  8:47 UTC (permalink / raw)
  To: Linus Torvalds, Dave Chinner
  Cc: Jens Axboe, Matthew Wilcox, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On 9/16/24 12:20 AM, Linus Torvalds wrote:
> On Mon, 16 Sept 2024 at 02:00, Dave Chinner <david@fromorbit.com> wrote:
>>
>> I don't think this is a data corruption/loss problem - it certainly
>> hasn't ever appeared that way to me.  The "data loss" appeared to be
>> in incomplete postgres dump files after the system was rebooted and
>> this is exactly what would happen when you randomly crash the
>> system.
> 
> Ok, that sounds better, indeed.

I think Dave is right because in practice most filesystems have enough
files of various sizes that we're likely to run into the lockups or BUGs
already mentioned.

But, if the impacted files are relatively small (say 16K), and all
exactly the same size, we could probably share pages between them and
give the wrong data to applications.

It should crash eventually, that's probably the nrpages > 0 assertions
we hit during inode eviction on 6.9, but it seems like there's a window
to return the wrong data.


filemap_fault() has:

        if (unlikely(folio->mapping != mapping)) {

So I think we're probably in better shape on mmap.

> 
> Of course, "hang due to internal xarray corruption" isn't _much_
> better, but still..
> 
>> All the hangs seem to be caused by folio lookup getting stuck
>> on a rogue xarray entry in truncate or readahead. If we find an
>> invalid entry or a folio from a different mapping or with a
>> unexpected index, we skip it and try again.
> 
> We *could* perhaps change the "retry the optimistic lookup forever" to
> be a "retry and take lock after optimistic failure". At least in the
> common paths.
> 
> That's what we do with some dcache locking, because the "retry on
> race" caused some potential latency issues under ridiculous loads.
> 
> And if we retry with the lock, at that point we can actually notice
> corruption, because at that point we can say "we have the lock, and we
> see a bad folio with the wrong mapping pointer, and now it's not some
> possible race condition due to RCU".
> 
> That, in turn, might then result in better bug reports. Which would at
> least be forward progress rather than "we have this bug".
> 
> Let me think about it. Unless somebody else gets to it before I do
> (hint hint to anybody who is comfy with that filemap_read() path etc).

I've got a bunch of assertions around incorrect folio->mapping and I'm
trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
on those, and our systems do run pretty short on ram, so it feels right
at least.  We'll see.

-chris



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-16  7:14         ` Christian Theune
@ 2024-09-16 12:16           ` Matthew Wilcox
  2024-09-18  8:31           ` Christian Theune
  1 sibling, 0 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-16 12:16 UTC (permalink / raw)
  To: Christian Theune
  Cc: Dave Chinner, Linus Torvalds, Jens Axboe, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, clm, regressions,
	regressions

On Mon, Sep 16, 2024 at 09:14:45AM +0200, Christian Theune wrote:
> Also, I’m still puzzled about the one variation that seems to involve page faults and not XFS. That’s something I haven’t seen a response to yet whether this IS in fact interesting or not. 

It's not; once the page cache is corrupted, it doesn't matter whether
we go through the filesystem to get the page or not.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-13 12:11       ` Christian Brauner
@ 2024-09-16 13:29         ` Matthew Wilcox
  2024-09-18  9:51           ` Christian Brauner
  0 siblings, 1 reply; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-16 13:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Pankaj Raghav, Luis Chamberlain, Jens Axboe,
	Christian Theune, linux-mm, linux-xfs, linux-fsdevel,
	linux-kernel, Daniel Dao, Dave Chinner, clm, regressions,
	regressions

On Fri, Sep 13, 2024 at 02:11:22PM +0200, Christian Brauner wrote:
> So this issue it new to me as well. One of the items this cycle is the
> work to enable support for block sizes that are larger than page sizes
> via the large block size (LBS) series that's been sitting in -next for a
> long time. That work specifically targets xfs and builds on top of the
> large folio support.
> 
> If the support for large folios is going to be reverted in xfs then I
> see no point to merge the LBS work now. So I'm holding off on sending
> that pull request until a decision is made (for xfs). As far as I
> understand, supporting larger block sizes will not be meaningful without
> large folio support.

This is unwarranted; please send this pull request.  We're not going to
rip out all of the infrastructure although we might end up disabling it
by default.  There's a bunch of other work queued up behind that, and not
having it in Linus' tree is just going to make everything more painful.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-16  8:47           ` Chris Mason
@ 2024-09-17  9:32             ` Matthew Wilcox
  2024-09-17  9:36               ` Chris Mason
                                 ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-17  9:32 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Jens Axboe, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
> I've got a bunch of assertions around incorrect folio->mapping and I'm
> trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
> on those, and our systems do run pretty short on ram, so it feels right
> at least.  We'll see.

I've been running with some variant of this patch the whole way across
the Atlantic, and not hit any problems.  But maybe with the right
workload ...?

There are two things being tested here.  One is whether we have a
cross-linked node (ie a node that's in two trees at the same time).
The other is whether the slab allocator is giving us a node that already
contains non-NULL entries.

If you could throw this on top of your kernel, we might stand a chance
of catching the problem sooner.  If it is one of these problems and not
something weirder.

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 0b618ec04115..006556605eb3 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1179,6 +1179,8 @@ struct xa_node {
 
 void xa_dump(const struct xarray *);
 void xa_dump_node(const struct xa_node *);
+void xa_dump_index(unsigned long index, unsigned int shift);
+void xa_dump_entry(const void *entry, unsigned long index, unsigned long shift);
 
 #ifdef XA_DEBUG
 #define XA_BUG_ON(xa, x) do {					\
diff --git a/lib/xarray.c b/lib/xarray.c
index 32d4bac8c94c..6bb35bdca30e 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -6,6 +6,8 @@
  * Author: Matthew Wilcox <willy@infradead.org>
  */
 
+#define XA_DEBUG
+
 #include <linux/bitmap.h>
 #include <linux/export.h>
 #include <linux/list.h>
@@ -206,6 +208,7 @@ static __always_inline void *xas_descend(struct xa_state *xas,
 	unsigned int offset = get_offset(xas->xa_index, node);
 	void *entry = xa_entry(xas->xa, node, offset);
 
+	XA_NODE_BUG_ON(node, node->array != xas->xa);
 	xas->xa_node = node;
 	while (xa_is_sibling(entry)) {
 		offset = xa_to_sibling(entry);
@@ -309,6 +312,7 @@ bool xas_nomem(struct xa_state *xas, gfp_t gfp)
 		return false;
 	xas->xa_alloc->parent = NULL;
 	XA_NODE_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list));
+	XA_NODE_BUG_ON(xas->xa_alloc, memchr_inv(&xas->xa_alloc->slots, 0, sizeof(void *) * XA_CHUNK_SIZE));
 	xas->xa_node = XAS_RESTART;
 	return true;
 }
@@ -345,6 +349,7 @@ static bool __xas_nomem(struct xa_state *xas, gfp_t gfp)
 		return false;
 	xas->xa_alloc->parent = NULL;
 	XA_NODE_BUG_ON(xas->xa_alloc, !list_empty(&xas->xa_alloc->private_list));
+	XA_NODE_BUG_ON(xas->xa_alloc, memchr_inv(&xas->xa_alloc->slots, 0, sizeof(void *) * XA_CHUNK_SIZE));
 	xas->xa_node = XAS_RESTART;
 	return true;
 }
@@ -388,6 +393,7 @@ static void *xas_alloc(struct xa_state *xas, unsigned int shift)
 	}
 	XA_NODE_BUG_ON(node, shift > BITS_PER_LONG);
 	XA_NODE_BUG_ON(node, !list_empty(&node->private_list));
+	XA_NODE_BUG_ON(node, memchr_inv(&node->slots, 0, sizeof(void *) * XA_CHUNK_SIZE));
 	node->shift = shift;
 	node->count = 0;
 	node->nr_values = 0;


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-17  9:32             ` Matthew Wilcox
@ 2024-09-17  9:36               ` Chris Mason
  2024-09-17 10:11               ` Christian Theune
  2024-09-17 11:13               ` Chris Mason
  2 siblings, 0 replies; 81+ messages in thread
From: Chris Mason @ 2024-09-17  9:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Dave Chinner, Jens Axboe, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On 9/17/24 5:32 AM, Matthew Wilcox wrote:
> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
>> I've got a bunch of assertions around incorrect folio->mapping and I'm
>> trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
>> on those, and our systems do run pretty short on ram, so it feels right
>> at least.  We'll see.
> 
> I've been running with some variant of this patch the whole way across
> the Atlantic, and not hit any problems.  But maybe with the right
> workload ...?
> 
> There are two things being tested here.  One is whether we have a
> cross-linked node (ie a node that's in two trees at the same time).
> The other is whether the slab allocator is giving us a node that already
> contains non-NULL entries.
> 
> If you could throw this on top of your kernel, we might stand a chance
> of catching the problem sooner.  If it is one of these problems and not
> something weirder.
> 

I was able to corrupt the xarray one time, hitting a crash during
unmount.  It wasn't the xfs filesystem I was actually hammering so I
guess that tells us something, but it was after ~3 hours of stress runs,
so not really useful.

I'll try with your patch as well.

-chris


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-17  9:32             ` Matthew Wilcox
  2024-09-17  9:36               ` Chris Mason
@ 2024-09-17 10:11               ` Christian Theune
  2024-09-17 11:13               ` Chris Mason
  2 siblings, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-09-17 10:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Chris Mason, Linus Torvalds, Dave Chinner, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions



> On 17. Sep 2024, at 11:32, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
>> I've got a bunch of assertions around incorrect folio->mapping and I'm
>> trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
>> on those, and our systems do run pretty short on ram, so it feels right
>> at least.  We'll see.
> 
> I've been running with some variant of this patch the whole way across
> the Atlantic, and not hit any problems.  But maybe with the right
> workload ...?

I can start running my non-prod machines that were affected previously. I’d run this on top of 6.11?

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-17  9:32             ` Matthew Wilcox
  2024-09-17  9:36               ` Chris Mason
  2024-09-17 10:11               ` Christian Theune
@ 2024-09-17 11:13               ` Chris Mason
  2024-09-17 13:25                 ` Matthew Wilcox
  2 siblings, 1 reply; 81+ messages in thread
From: Chris Mason @ 2024-09-17 11:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Dave Chinner, Jens Axboe, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

[-- Attachment #1: Type: text/plain, Size: 7813 bytes --]

On 9/17/24 5:32 AM, Matthew Wilcox wrote:
> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
>> I've got a bunch of assertions around incorrect folio->mapping and I'm
>> trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
>> on those, and our systems do run pretty short on ram, so it feels right
>> at least.  We'll see.
> 
> I've been running with some variant of this patch the whole way across
> the Atlantic, and not hit any problems.  But maybe with the right
> workload ...?
> 
> There are two things being tested here.  One is whether we have a
> cross-linked node (ie a node that's in two trees at the same time).
> The other is whether the slab allocator is giving us a node that already
> contains non-NULL entries.
> 
> If you could throw this on top of your kernel, we might stand a chance
> of catching the problem sooner.  If it is one of these problems and not
> something weirder.
> 

This fires in roughly 10 seconds for me on top of v6.11.  Since array seems
to always be 1, I'm not sure if the assertion is right, but hopefully you
can trigger yourself.

reader.c is attached.  It just has one thread doing large reads and two
threads fadvising things away.  The important part seems to be two threads
in parallel calling fadvise DONTNEED at the same time, just one thread
wasn't enough.

root@kerneltest003-kvm ~]# cat small.sh
#!/bin/bash

mkfs.xfs -f /dev/vdb
mount /dev/vdb /xfs
fallocate -l10g /xfs/file1
./reader /xfs/file1
[root@kerneltest003-kvm ~]# ./small.sh
meta-data=/dev/vdb               isize=512    agcount=10, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=2684354550, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[  102.013720] XFS (vdb): Mounting V5 Filesystem c3531255-dee1-4b86-8e14-2baa3cc900f8
[  102.029638] XFS (vdb): Ending clean mount
[  104.204205] node ffff888119f86ba8 offset 13 parent ffff888119f84988 shift 6 count 0 values 0 array 0000000000000001 list ffffffff81f93230 0000000000000000 marks 0 0 0
+[  104.206996] ------------[ cut here ]------------
[  104.207948] kernel BUG at lib/xarray.c:211!
[  104.208729] Oops: invalid opcode: 0000 [#1] SMP PTI
[  104.209627] CPU: 51 UID: 0 PID: 862 Comm: reader Not tainted 6.11.0-dirty #24
[  104.211232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  104.213402] RIP: 0010:xas_load+0xe4/0x120
[  104.214144] Code: 00 10 00 00 76 c4 48 83 fa 02 75 ad 41 b8 02 04 00 00 eb a5 40 f6 c6 03 75 12 48 89 f7 e8 44 f5 ff ff 0f 0b 49 83 f8 02 75 10 <0f> 0b 48 c7 c7 76 58 98 82 e8 7e 3b 1a ff eb e8 40 f6 c6 03 75 0a
[  104.217593] RSP: 0018:ffffc90001b57b90 EFLAGS: 00010296
[  104.218729] RAX: 0000000000000000 RBX: ffffc90001b57bc8 RCX: 0000000000000000
[  104.220019] RDX: ffff88b177aee180 RSI: ffff88b177ae0b80 RDI: ffff88b177ae0b80
[  104.221394] RBP: 000000000027ffff R08: ffffffff8396b4a8 R09: 0000000000000003
[  104.222679] R10: ffffffff8326b4c0 R11: ffffffff837eb4c0 R12: ffffc90001b57d48
[  104.223985] R13: ffffc90001b57c48 R14: ffffc90001b57c50 R15: 0000000000000000
[  104.225277] FS:  00007fcee02006c0(0000) GS:ffff88b177ac0000(0000) knlGS:0000000000000000
[  104.226726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.227768] CR2: 00007fcee01fff78 CR3: 000000011bdc2004 CR4: 0000000000770ef0
[  104.229055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  104.230341] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  104.231625] PKRU: 55555554
[  104.232131] Call Trace:
[  104.232586]  <TASK>
[  104.232984]  ? die+0x33/0x90
[  104.233531]  ? do_trap+0xda/0x100
[  104.234206]  ? do_error_trap+0x65/0x80
[  104.234893]  ? xas_load+0xe4/0x120
[  104.235524]  ? exc_invalid_op+0x4e/0x70
[  104.236231]  ? xas_load+0xe4/0x120
[  104.236855]  ? asm_exc_invalid_op+0x16/0x20
[  104.237638]  ? xas_load+0xe4/0x120
[  104.238268]  xas_find+0x18c/0x1f0
[  104.238878]  find_lock_entries+0x6d/0x2f0
[  104.239617]  mapping_try_invalidate+0x5e/0x150
[  104.240432]  ? update_load_avg+0x78/0x750
[  104.241167]  ? psi_group_change+0x122/0x310
[  104.241929]  ? sched_balance_newidle+0x306/0x3b0
[  104.242770]  ? psi_task_switch+0xd6/0x230
[  104.243506]  ? __switch_to_asm+0x2a/0x60
[  104.244224]  ? __schedule+0x316/0xa00
[  104.244896]  ? schedule+0x1c/0xd0
[  104.245530]  ? schedule_preempt_disabled+0xa/0x10
[  104.246386]  ? __mutex_lock.constprop.0+0x2cf/0x5a0
[  104.247274]  ? __lru_add_drain_all+0x150/0x1e0
[  104.248089]  generic_fadvise+0x230/0x280
[  104.248802]  ? __fdget+0x8c/0xe0
[  104.249407]  ksys_fadvise64_64+0x4c/0xa0
[  104.250126]  __x64_sys_fadvise64+0x18/0x20
[  104.250868]  do_syscall_64+0x5b/0x170
[  104.251543]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  104.252463] RIP: 0033:0x7fcee0e5cd6e
[  104.253131] Code: b8 ff ff ff ff eb c3 67 e8 7f cf 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 41 89 ca b8 dd 00 00 00 0f 05 <89> c2 f7 da 3d 00 f0 ff ff b8 00 00 00 00 0f 47 c2 c3 41 57 41 56
[  104.256446] RSP: 002b:00007fcee01ffe88 EFLAGS: 00000202 ORIG_RAX: 00000000000000dd
[  104.257800] RAX: ffffffffffffffda RBX: 00007fcee0200cdc RCX: 00007fcee0e5cd6e
[  104.259085] RDX: 0000000280000000 RSI: 0000000000000000 RDI: 0000000000000003
[  104.260365] RBP: 00007fcee01ffed0 R08: 0000000000000000 R09: 00007fcee02006c0
[  104.261648] R10: 0000000000000004 R11: 0000000000000202 R12: ffffffffffffff88
[  104.262964] R13: 0000000000000000 R14: 00007ffc16078a70 R15: 00007fcedfa00000
[  104.264258]  </TASK>
[  104.264669] Modules linked in: intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel bochs drm_vram_helper drm_kms_helper kvm drm_ttm_helper intel_agp ttm i2c_piix4 intel_gtt agpgart i2c_smbus evdev button serio_raw sch_fq_codel usbip_core drm loop drm_panel_orientation_quirks backlight bpf_preload virtio_rng ip_tables autofs4
[  104.270152] ---[ end trace 0000000000000000 ]---
[  104.271179] RIP: 0010:xas_load+0xe4/0x120
[  104.271968] Code: 00 10 00 00 76 c4 48 83 fa 02 75 ad 41 b8 02 04 00 00 eb a5 40 f6 c6 03 75 12 48 89 f7 e8 44 f5 ff ff 0f 0b 49 83 f8 02 75 10 <0f> 0b 48 c7 c7 76 58 98 82 e8 7e 3b 1a ff eb e8 40 f6 c6 03 75 0a
[  104.275460] RSP: 0018:ffffc90001b57b90 EFLAGS: 00010296
[  104.276481] RAX: 0000000000000000 RBX: ffffc90001b57bc8 RCX: 0000000000000000
[  104.277797] RDX: ffff88b177aee180 RSI: ffff88b177ae0b80 RDI: ffff88b177ae0b80
[  104.279101] RBP: 000000000027ffff R08: ffffffff8396b4a8 R09: 0000000000000003
[  104.280400] R10: ffffffff8326b4c0 R11: ffffffff837eb4c0 R12: ffffc90001b57d48
[  104.281705] R13: ffffc90001b57c48 R14: ffffc90001b57c50 R15: 0000000000000000
[  104.283014] FS:  00007fcee02006c0(0000) GS:ffff88b177ac0000(0000) knlGS:0000000000000000
[  104.284487] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.285539] CR2: 00007fcee01fff78 CR3: 000000011bdc2004 CR4: 0000000000770ef0
[  104.286838] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  104.288139] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  104.289468] PKRU: 55555554
[  104.289983] Kernel panic - not syncing: Fatal exception
[  104.292343] Kernel Offset: disabled
[  104.292990] ---[ end Kernel panic - not syncing: Fatal exception ]---

[-- Attachment #2: reader.c --]
[-- Type: text/plain, Size: 2147 bytes --]

/*
 * gcc -Wall -o reader reader.c -lpthread
 */
#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/sendfile.h>
#include <unistd.h>
#include <errno.h>
#include <err.h>
#include <pthread.h>

struct thread_data {
	int fd;
	size_t size;
};

static void *drop_pages(void *arg)
{
	struct thread_data *td = arg;
	int ret;
	unsigned long nr_pages = td->size / 4096;
	unsigned int seed = 0x55443322;
	off_t offset;
	unsigned long nr_drops = 0;

	while (1) {
		offset = rand_r(&seed) % nr_pages;
		offset = offset * 4096;
		ret = posix_fadvise(td->fd,  offset, 4096, POSIX_FADV_DONTNEED);
		if (ret < 0)
			err(1, "fadvise dontneed");

		/* every once and a while, drop everything */
		if (nr_drops > nr_pages / 2) {
			ret = posix_fadvise(td->fd,  0, td->size, POSIX_FADV_DONTNEED);
			if (ret < 0)
				err(1, "fadvise dontneed");
			fprintf(stderr, "+");
			nr_drops = 0;
		}
		nr_drops++;
	}
	return NULL;
}

#define READ_BUF (2 * 1024 * 1024)
static void *read_pages(void *arg)
{
	struct thread_data *td = arg;
	char buf[READ_BUF];
	ssize_t ret;
	loff_t offset;

	while (1) {
		offset = 0;
		while(offset < td->size) {
			ret = pread(td->fd, buf, READ_BUF, offset);
			if (ret < 0)
				err(1, "read");
			if (ret == 0)
				break;
			offset += ret;
		}
	}
	return NULL;
}

int main(int ac, char **av)
{
	int fd;
	int ret;
	struct stat st;
	struct thread_data td;
	pthread_t drop_tid;
	pthread_t drop2_tid;
	pthread_t read_tid;

	if (ac != 2)
		err(1, "usage: reader filename\n");

	fd = open(av[1], O_RDONLY, 0600);
	if (fd < 0)
		err(1, "unable to open %s", av[1]);

	ret = fstat(fd, &st);
	if (ret < 0)
		err(1, "stat");

	td.fd = fd;
	td.size = st.st_size;

	ret = pthread_create(&drop_tid, NULL, drop_pages, &td);
	if (ret)
		err(1, "pthread_create");
	ret = pthread_create(&drop2_tid, NULL, drop_pages, &td);
	if (ret)
		err(1, "pthread_create");
	ret = pthread_create(&read_tid, NULL, read_pages, &td);
	if (ret)
		err(1, "pthread_create");

	pthread_join(drop_tid, NULL);
	pthread_join(drop2_tid, NULL);
	pthread_join(read_tid, NULL);
}

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-17 11:13               ` Chris Mason
@ 2024-09-17 13:25                 ` Matthew Wilcox
  2024-09-18  6:37                   ` Jens Axboe
  0 siblings, 1 reply; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-17 13:25 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Jens Axboe, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Tue, Sep 17, 2024 at 01:13:05PM +0200, Chris Mason wrote:
> On 9/17/24 5:32 AM, Matthew Wilcox wrote:
> > On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
> >> I've got a bunch of assertions around incorrect folio->mapping and I'm
> >> trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
> >> on those, and our systems do run pretty short on ram, so it feels right
> >> at least.  We'll see.
> > 
> > I've been running with some variant of this patch the whole way across
> > the Atlantic, and not hit any problems.  But maybe with the right
> > workload ...?
> > 
> > There are two things being tested here.  One is whether we have a
> > cross-linked node (ie a node that's in two trees at the same time).
> > The other is whether the slab allocator is giving us a node that already
> > contains non-NULL entries.
> > 
> > If you could throw this on top of your kernel, we might stand a chance
> > of catching the problem sooner.  If it is one of these problems and not
> > something weirder.
> > 
> 
> This fires in roughly 10 seconds for me on top of v6.11.  Since array seems
> to always be 1, I'm not sure if the assertion is right, but hopefully you
> can trigger yourself.

Whoops.

$ git grep XA_RCU_FREE
lib/xarray.c:#define XA_RCU_FREE        ((struct xarray *)1)
lib/xarray.c:   node->array = XA_RCU_FREE;

so you walked into a node which is currently being freed by RCU.  Which
isn't a problem, of course.  I don't know why I do that; it doesn't seem
like anyone tests it.  The jetlag is seriously kicking in right now,
so I'm going to refrain from saying anything more because it probably
won't be coherent.



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-17 13:25                 ` Matthew Wilcox
@ 2024-09-18  6:37                   ` Jens Axboe
  2024-09-18  9:28                     ` Chris Mason
  0 siblings, 1 reply; 81+ messages in thread
From: Jens Axboe @ 2024-09-18  6:37 UTC (permalink / raw)
  To: Matthew Wilcox, Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On 9/17/24 7:25 AM, Matthew Wilcox wrote:
> On Tue, Sep 17, 2024 at 01:13:05PM +0200, Chris Mason wrote:
>> On 9/17/24 5:32 AM, Matthew Wilcox wrote:
>>> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
>>>> I've got a bunch of assertions around incorrect folio->mapping and I'm
>>>> trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
>>>> on those, and our systems do run pretty short on ram, so it feels right
>>>> at least.  We'll see.
>>>
>>> I've been running with some variant of this patch the whole way across
>>> the Atlantic, and not hit any problems.  But maybe with the right
>>> workload ...?
>>>
>>> There are two things being tested here.  One is whether we have a
>>> cross-linked node (ie a node that's in two trees at the same time).
>>> The other is whether the slab allocator is giving us a node that already
>>> contains non-NULL entries.
>>>
>>> If you could throw this on top of your kernel, we might stand a chance
>>> of catching the problem sooner.  If it is one of these problems and not
>>> something weirder.
>>>
>>
>> This fires in roughly 10 seconds for me on top of v6.11.  Since array seems
>> to always be 1, I'm not sure if the assertion is right, but hopefully you
>> can trigger yourself.
> 
> Whoops.
> 
> $ git grep XA_RCU_FREE
> lib/xarray.c:#define XA_RCU_FREE        ((struct xarray *)1)
> lib/xarray.c:   node->array = XA_RCU_FREE;
> 
> so you walked into a node which is currently being freed by RCU.  Which
> isn't a problem, of course.  I don't know why I do that; it doesn't seem
> like anyone tests it.  The jetlag is seriously kicking in right now,
> so I'm going to refrain from saying anything more because it probably
> won't be coherent.

Based on a modified reproducer from Chris (N threads reading from a
file, M threads dropping pages), I can pretty quickly reproduce the
xas_descend() spin on 6.9 in a vm with 128 cpus. Here's some debugging
output with a modified version of your patch too, that ignores
XA_RCU_FREE:

node ffff8e838a01f788 max 59 parent 0000000000000000 shift 0 count 0 values 0 array ffff8e839dfa86a0 list ffff8e838a01f7a0 ffff8e838a01f7a0 marks 0 0 0
WARNING: CPU: 106 PID: 1554 at lib/xarray.c:405 xas_alloc.cold+0x26/0x4b

which is:

XA_NODE_BUG_ON(node, memchr_inv(&node->slots, 0, sizeof(void *) * XA_CHUN  K_SIZE));

and:

node ffff8e838a01f788 offset 59 parent ffff8e838b0419c8 shift 0 count 252 values 0 array ffff8e839dfa86a0 list ffff8e838a01f7a0 ffff8e838a01f7a0 marks 0 0 0

which is:

XA_NODE_BUG_ON(node, node->count > XA_CHUNK_SIZE);

and for this particular run, 2 threads spinning:

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-1 rcu_node (CPUs 16-31): P1555
rcu: 	Tasks blocked on level-1 rcu_node (CPUs 64-79): P1556
rcu: 	(detected by 97, t=2102 jiffies, g=7821, q=293800 ncpus=128)
task:reader          state:R  running task     stack:0     pid:1555  tgid:1551  ppid:1      flags:0x00004006
Call Trace:
 <TASK>
 ? __schedule+0x37f/0xaa0
 ? sysvec_apic_timer_interrupt+0x96/0xb0
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? xas_load+0x74/0xe0
 ? xas_load+0x10/0xe0
 ? xas_find+0x162/0x1b0
 ? find_lock_entries+0x1ac/0x360
 ? find_lock_entries+0x76/0x360
 ? mapping_try_invalidate+0x5d/0x130
 ? generic_fadvise+0x110/0x240
 ? xfd_validate_state+0x1e/0x70
 ? ksys_fadvise64_64+0x50/0x90
 ? __x64_sys_fadvise64+0x18/0x20
 ? do_syscall_64+0x5d/0x180
 ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
 </TASK>
task:reader          state:R  running task     stack:0     pid:1556  tgid:1551  ppid:1      flags:0x00004006

The reproducer takes ~30 seconds, and will lead to anywhere from 1..N
threads spinning here.

Now for the kicker - this doesn't reproduce in 6.10 and onwards. There
are only a few changes here that are relevant, seemingly, and the prime
candidates are:

commit a4864671ca0bf51c8e78242951741df52c06766f
Author: Kairui Song <kasong@tencent.com>
Date:   Tue Apr 16 01:18:55 2024 +0800

    lib/xarray: introduce a new helper xas_get_order

and the followup filemap change:

commit 6758c1128ceb45d1a35298912b974eb4895b7dd9
Author: Kairui Song <kasong@tencent.com>
Date:   Tue Apr 16 01:18:56 2024 +0800

    mm/filemap: optimize filemap folio adding

and reverting those two on 6.10 hits it again almost immediately. Didn't
look into these commit, but looks like they inadvertently also fixed
this corruption issue.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-16  7:14         ` Christian Theune
  2024-09-16 12:16           ` Matthew Wilcox
@ 2024-09-18  8:31           ` Christian Theune
  1 sibling, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-09-18  8:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Jens Axboe, Matthew Wilcox, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, clm, regressions,
	regressions



> On 16. Sep 2024, at 09:14, Christian Theune <ct@flyingcircus.io> wrote:
> 
>> 
>> On 16. Sep 2024, at 02:00, Dave Chinner <david@fromorbit.com> wrote:
>> 
>> I don't think this is a data corruption/loss problem - it certainly
>> hasn't ever appeared that way to me.  The "data loss" appeared to be
>> in incomplete postgres dump files after the system was rebooted and
>> this is exactly what would happen when you randomly crash the
>> system. i.e. dirty data in memory is lost, and application data
>> being written at the time is in an inconsistent state after the
>> system recovers. IOWs, there was no clear evidence of actual data
>> corruption occuring, and data loss is definitely expected when the
>> page cache iteration hangs and the system is forcibly rebooted
>> without being able to sync or unmount the filesystems…
>> All the hangs seem to be caused by folio lookup getting stuck
>> on a rogue xarray entry in truncate or readahead. If we find an
>> invalid entry or a folio from a different mapping or with a
>> unexpected index, we skip it and try again.  Hence this does not
>> appear to be a data corruption vector, either - it results in a
>> livelock from endless retry because of the bad entry in the xarray.
>> This endless retry livelock appears to be what is being reported.
>> 
>> IOWs, there is no evidence of real runtime data corruption or loss
>> from this pagecache livelock bug.  We also haven't heard of any
>> random file data corruption events since we've enabled large folios
>> on XFS. Hence there really is no evidence to indicate that there is
>> a large folio xarray lookup bug that results in data corruption in
>> the existing code, and therefore there is no obvious reason for
>> turning off the functionality we are already building significant
>> new functionality on top of.

I’ve been chewing more on this and reviewed the tickets I have. We did see a PostgreSQL database ending up reporting "ERROR: invalid page in block 30896 of relation base/16389/103292”. 

My understanding of the argument that this bug does not corrupt data is that the error would only lead to a crash-consistent state. So applications that can properly recover from a crash-consistent state would only experience data loss to the point of the crash (which is fine and expected) but should not end up in a further corrupted state.

PostgreSQL reporting this error indicates - to my knowledge - that it did not see a crash consistent state of the file system.

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18  6:37                   ` Jens Axboe
@ 2024-09-18  9:28                     ` Chris Mason
  2024-09-18 12:23                       ` Chris Mason
  2024-09-18 13:34                       ` Matthew Wilcox
  0 siblings, 2 replies; 81+ messages in thread
From: Chris Mason @ 2024-09-18  9:28 UTC (permalink / raw)
  To: Jens Axboe, Matthew Wilcox
  Cc: Linus Torvalds, Dave Chinner, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

[-- Attachment #1: Type: text/plain, Size: 5699 bytes --]

One or more of the originally attached files triggered the rule module.access.rule.exestrip_notify

The following attachments were deleted from the original message.
radixcheck.py

Original Message:

On 9/18/24 2:37 AM, Jens Axboe wrote:
> On 9/17/24 7:25 AM, Matthew Wilcox wrote:
>> On Tue, Sep 17, 2024 at 01:13:05PM +0200, Chris Mason wrote:
>>> On 9/17/24 5:32 AM, Matthew Wilcox wrote:
>>>> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
>>>>> I've got a bunch of assertions around incorrect folio->mapping and I'm
>>>>> trying to bash on the ENOMEM for readahead case.  There's a GFP_NOWARN
>>>>> on those, and our systems do run pretty short on ram, so it feels right
>>>>> at least.  We'll see.
>>>>
>>>> I've been running with some variant of this patch the whole way across
>>>> the Atlantic, and not hit any problems.  But maybe with the right
>>>> workload ...?
>>>>
>>>> There are two things being tested here.  One is whether we have a
>>>> cross-linked node (ie a node that's in two trees at the same time).
>>>> The other is whether the slab allocator is giving us a node that already
>>>> contains non-NULL entries.
>>>>
>>>> If you could throw this on top of your kernel, we might stand a chance
>>>> of catching the problem sooner.  If it is one of these problems and not
>>>> something weirder.
>>>>
>>>
>>> This fires in roughly 10 seconds for me on top of v6.11.  Since array seems
>>> to always be 1, I'm not sure if the assertion is right, but hopefully you
>>> can trigger yourself.
>>
>> Whoops.
>>
>> $ git grep XA_RCU_FREE
>> lib/xarray.c:#define XA_RCU_FREE        ((struct xarray *)1)
>> lib/xarray.c:   node->array = XA_RCU_FREE;
>>
>> so you walked into a node which is currently being freed by RCU.  Which
>> isn't a problem, of course.  I don't know why I do that; it doesn't seem
>> like anyone tests it.  The jetlag is seriously kicking in right now,
>> so I'm going to refrain from saying anything more because it probably
>> won't be coherent.
> 
> Based on a modified reproducer from Chris (N threads reading from a
> file, M threads dropping pages), I can pretty quickly reproduce the
> xas_descend() spin on 6.9 in a vm with 128 cpus. Here's some debugging
> output with a modified version of your patch too, that ignores
> XA_RCU_FREE:

Jens and I are running slightly different versions of reader.c, but we're
seeing the same thing.  v6.11 is lasts all night long, and reverting those
two commits falls over in about 5 minutes or less.

I switched from a VM to bare metal, and managed to hit an assertion I'd
added to filemap_get_read_batch() (should look familiar):

{
	struct address_space *fmapping = READ_ONCE(folio->mapping);
	BUG_ON(fmapping && fmapping != mapping);
}

Walking the xarray in the crashdump shows that it's probably the same
corruption I saw in 5.19.  drgn is printing like so:

print("0x%x mapping 0x%x radix index %d page index %d flags 0x%x (%s) size %d" % (page.address_of_(), page.mapping.value_(), index, page.index, page.flags, decode_page_flags(page), folio._folio_nr_pages))

And I attached radixcheck.py if you want to see the full script.

These are all from the correct mapping:
0xffffea0088b17200 mapping 0xffff88a22a9614e8 radix index 53 page index 53 flags 0x15ffff000000000c (PG_referenced|PG_uptodate|PG_reported) size 59472
0xffffea008773e940 mapping 0xffff88a22a9614e8 radix index 54 page index 54 flags 0x15ffff000000000c (PG_referenced|PG_uptodate|PG_reported) size 4244589144
0xffffea0084ad1d00 mapping 0xffff88a22a9614e8 radix index 55 page index 55 flags 0x15ffff000000000c (PG_referenced|PG_uptodate|PG_reported) size 4040059330
0xffffea0088c9d840 mapping 0xffff88a22a9614e8 radix index 56 page index 56 flags 0x15ffff000000000c (PG_referenced|PG_uptodate|PG_reported) size 5958
0xffffea00879c6300 mapping 0xffff88a22a9614e8 radix index 57 page index 57 flags 0x15ffff000000000c (PG_referenced|PG_uptodate|PG_reported) size 112
0xffffea0086630980 mapping 0xffff88a22a9614e8 radix index 58 page index 58 flags 0x15ffff000000000c (PG_referenced|PG_uptodate|PG_reported) size 4025236287
0xffffea0008eb6580 mapping 0xffff88a22a9614e8 radix index 59 page index 59 flags 0x5ffff000000012c (PG_referenced|PG_uptodate|PG_lru|PG_active|PG_reported) size 269
0xffffea00072db000 mapping 0xffff88a22a9614e8 radix index 60 page index 60 flags 0x5ffff000000416c (PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|PG_private|PG_reported) size 4
0xffffea000919b600 mapping 0xffff88a22a9614e8 radix index 64 page index 64 flags 0x5ffff000000416c (PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|PG_private|PG_reported) size 4

These last 3 are not:
0xffffea0008fa7000 mapping 0xffff888124910768 radix index 208 page index 192 flags 0x5ffff000000416c (PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|PG_private|PG_reported) size 64
0xffffea0008fa7000 mapping 0xffff888124910768 radix index 224 page index 192 flags 0x5ffff000000416c (PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|PG_private|PG_reported) size 64
0xffffea0008fa7000 mapping 0xffff888124910768 radix index 240 page index 192 flags 0x5ffff000000416c (PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|PG_private|PG_reported) size 64

I think the bug was in __filemap_add_folio()'s usage of xarray_split_alloc()
and the tree changing before taking the lock.  It's just a guess, but that
was always my biggest suspect.

To reproduce, I used:

mkfs.xfs -f <some device>
mount some_device /xfs
for x in `seq 1 8` ; do
	fallocate -l100m /xfs/file$x
	./reader /xfs/file$x &
done

New reader.c attached.  Jens changed his so that every
reader thread was using its own offset in the file,
and he found that reproduced more consistently.

-chris

[-- Attachment #2: reader.c --]
[-- Type: text/plain, Size: 1808 bytes --]

/*
 * gcc -Wall -o reader reader.c -lpthread
 */
#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/sendfile.h>
#include <unistd.h>
#include <errno.h>
#include <err.h>
#include <pthread.h>

struct thread_data {
	int fd;
	int read_size;
	size_t size;
};

static void *drop_pages(void *arg)
{
	struct thread_data *td = arg;
	int ret;

	while (1) {
		ret = posix_fadvise(td->fd,  0, td->size, POSIX_FADV_DONTNEED);
		if (ret < 0)
			err(1, "fadvise dontneed");
	}
	return NULL;
}

#define READ_BUF (2 * 1024 * 1024)
static void *read_pages(void *arg)
{
	struct thread_data *td = arg;
	char buf[READ_BUF];
	ssize_t ret;
	loff_t offset = 8192;

	while (1) {
		ret = pread(td->fd, buf, td->read_size, offset);
		if (ret < 0)
			err(1, "read");
		if (ret == 0)
			break;
	}
	return NULL;
}

int main(int ac, char **av)
{
	int fd;
	int ret;
	struct stat st;
	int sizes[9] = { 0, 0, 8192, 16834, 32768, 65536, 128 * 1024, 256 * 1024, 1024 * 1024 };
	int nr_tids = 9;
	struct thread_data tds[9];
	int i;
	int sleeps = 0;
	pthread_t tids[nr_tids];

	if (ac != 2)
		err(1, "usage: reader filename\n");

	fd = open(av[1], O_RDONLY, 0600);
	if (fd < 0)
		err(1, "unable to open %s", av[1]);

	ret = fstat(fd, &st);
	if (ret < 0)
		err(1, "stat");


	for (i = 0; i < nr_tids; i++) {
		struct thread_data *td = tds + i;

		td->fd = fd;
		td->size = st.st_size;
		td->read_size = sizes[i];

		if (i < 2)
			ret = pthread_create(tids + i, NULL, drop_pages, td);
		else
			ret = pthread_create(tids + i, NULL, read_pages, td);
		if (ret)
			err(1, "pthread_create");
	}
	for (i = 0; i < nr_tids; i++) {
		pthread_detach(tids[i]);
	}
	while(1) {
		sleep(122);
		sleeps++;
		fprintf(stderr, ":%d:", sleeps * 122);

	}
}

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-16 13:29         ` Matthew Wilcox
@ 2024-09-18  9:51           ` Christian Brauner
  0 siblings, 0 replies; 81+ messages in thread
From: Christian Brauner @ 2024-09-18  9:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Pankaj Raghav, Luis Chamberlain, Jens Axboe,
	Christian Theune, linux-mm, linux-xfs, linux-fsdevel,
	linux-kernel, Daniel Dao, Dave Chinner, clm, regressions,
	regressions

On Mon, Sep 16, 2024 at 02:29:49PM GMT, Matthew Wilcox wrote:
> On Fri, Sep 13, 2024 at 02:11:22PM +0200, Christian Brauner wrote:
> > So this issue it new to me as well. One of the items this cycle is the
> > work to enable support for block sizes that are larger than page sizes
> > via the large block size (LBS) series that's been sitting in -next for a
> > long time. That work specifically targets xfs and builds on top of the
> > large folio support.
> > 
> > If the support for large folios is going to be reverted in xfs then I
> > see no point to merge the LBS work now. So I'm holding off on sending
> > that pull request until a decision is made (for xfs). As far as I
> > understand, supporting larger block sizes will not be meaningful without
> > large folio support.
> 
> This is unwarranted; please send this pull request.  We're not going to
> rip out all of the infrastructure although we might end up disabling it
> by default.  There's a bunch of other work queued up behind that, and not
> having it in Linus' tree is just going to make everything more painful.

Now that there's a reproducer and hopefully soon a fix I think we can
try and merge this next week.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18  9:28                     ` Chris Mason
@ 2024-09-18 12:23                       ` Chris Mason
  2024-09-18 13:34                       ` Matthew Wilcox
  1 sibling, 0 replies; 81+ messages in thread
From: Chris Mason @ 2024-09-18 12:23 UTC (permalink / raw)
  To: Jens Axboe, Matthew Wilcox
  Cc: Linus Torvalds, Dave Chinner, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On 9/18/24 5:28 AM, Chris Mason wrote:
> And I attached radixcheck.py if you want to see the full script.

Since the attachment didn't actually make it through:

#!/usr/bin/env -S drgn -c vmcore

from drgn.helpers.linux.fs import *
from drgn.helpers.linux.mm import *
from drgn.helpers.linux.list import *
from drgn.helpers.linux.xarray import *
from drgn import *
import os
import sys
import time

mapping = Object(prog, 'struct  address_space', address=0xffff88a22a9614e8)
#p = path_lookup(prog, sys.argv[1]);
#mapping = p.dentry.d_inode.i_mapping

for index, x in xa_for_each(mapping.i_pages.address_of_()):
    if xa_is_zero(x):
        continue
    if xa_is_value(x):
        continue

    page = Object(prog, 'struct page', address=x)
    folio = Object(prog, 'struct folio', address=x)

    print("0x%x mapping 0x%x radix index %d page index %d flags 0x%x (%s) size %d" % (page.address_of_(), page.mapping.value_(), index, page.index, page.flags, decode_page_flags(page), folio._folio_nr_pages))



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18  9:28                     ` Chris Mason
  2024-09-18 12:23                       ` Chris Mason
@ 2024-09-18 13:34                       ` Matthew Wilcox
  2024-09-18 13:51                         ` Linus Torvalds
  2024-09-19  1:43                         ` Dave Chinner
  1 sibling, 2 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-18 13:34 UTC (permalink / raw)
  To: Chris Mason
  Cc: Jens Axboe, Linus Torvalds, Dave Chinner, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Wed, Sep 18, 2024 at 11:28:52AM +0200, Chris Mason wrote:
> I think the bug was in __filemap_add_folio()'s usage of xarray_split_alloc()
> and the tree changing before taking the lock.  It's just a guess, but that
> was always my biggest suspect.

Oh god, that's it.

there should have been an xas_reset() after calling xas_split_alloc().

and 6758c1128ceb calls xas_reset() after calling xas_split_alloc().

i wonder if xas_split_alloc() should call xas_reset() to prevent this
from ever being a problem again?


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18 13:34                       ` Matthew Wilcox
@ 2024-09-18 13:51                         ` Linus Torvalds
  2024-09-18 14:12                           ` Matthew Wilcox
  2024-09-19  1:43                         ` Dave Chinner
  1 sibling, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-18 13:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Chris Mason, Jens Axboe, Dave Chinner, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Wed, 18 Sept 2024 at 15:35, Matthew Wilcox <willy@infradead.org> wrote:
>
> Oh god, that's it.
>
> there should have been an xas_reset() after calling xas_split_alloc().

I think it is worse than that.

Even *without* an xas_split_alloc(), I think the old code was wrong,
because it drops the xas lock without doing the xas_reset.

> i wonder if xas_split_alloc() should call xas_reset() to prevent this
> from ever being a problem again?

See above: I think the code in filemap_add_folio() was buggy entirely
unrelated to the xas_split_alloc(), although it is probably *much*
easier to trigger issues with it (ie the alloc will just make any
races much bigger)

But even when it doesn't do the alloc, it takes and drops the lock,
and it's unclear how much xas state it just randomly re-uses over the
lock drop.

(Maybe none of the other operations end up mattering, but it does look
very wrong).

So I think it might be better to do the xas_reset() when you do the
xas_lock_irq(), no? Isn't _that_ the a more logical point where "any
old state is unreliable, now we need to reset the walk"?

                   Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18 13:51                         ` Linus Torvalds
@ 2024-09-18 14:12                           ` Matthew Wilcox
  2024-09-18 14:39                             ` Linus Torvalds
  2024-09-18 16:37                             ` Chris Mason
  0 siblings, 2 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-18 14:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Jens Axboe, Dave Chinner, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Wed, Sep 18, 2024 at 03:51:39PM +0200, Linus Torvalds wrote:
> On Wed, 18 Sept 2024 at 15:35, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > Oh god, that's it.
> >
> > there should have been an xas_reset() after calling xas_split_alloc().
> 
> I think it is worse than that.
> 
> Even *without* an xas_split_alloc(), I think the old code was wrong,
> because it drops the xas lock without doing the xas_reset.

That's actually OK.  The first time around the loop, we haven't walked the
tree, so we start from the top as you'd expect.  The only other reason to
go around the loop again is that memory allocation failed for a node, and
in that case we call xas_nomem() and that (effectively) calls xas_reset().

So in terms of the expected API for xa_state users, it would be consistent
for xas_split_alloc() to call xas_reset().

You might argue that this API is too subtle, but it was intended to
be easy to use.  The problem was that xas_split_alloc() got added much
later and I forgot to maintain the invariant that makes it work as well
as be easy to use.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18 14:12                           ` Matthew Wilcox
@ 2024-09-18 14:39                             ` Linus Torvalds
  2024-09-18 17:12                               ` Matthew Wilcox
  2024-09-18 16:37                             ` Chris Mason
  1 sibling, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-18 14:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Chris Mason, Jens Axboe, Dave Chinner, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Wed, 18 Sept 2024 at 16:12, Matthew Wilcox <willy@infradead.org> wrote:
>
>
> That's actually OK.  The first time around the loop, we haven't walked the
> tree, so we start from the top as you'd expect.  The only other reason to
> go around the loop again is that memory allocation failed for a node, and
> in that case we call xas_nomem() and that (effectively) calls xas_reset().

Well, that's quite subtle and undocumented. But yes, I see the
(open-coded) xas_reset() in xas_nomem().

So yes, in practice it seems to be only the xas_split_alloc() path in
there that can have this problem, but maybe this should at the very
least be very documented.

The fact that this bug was fixed basically entirely by mistake does
say "this is much too subtle".

Of course, the fact that an xas_reset() not only resets the walk, but
also clears any pending errors (because it's all the same "xa_node"
thing), doesn't make things more obvious. Because right now you
*could* treat errors as "cumulative", but if a xas_split_alloc() does
an xas_reset() on success, that means that it's actually a big
conceptual change and you can't do the "cumulative" thing any more.

End result: it would probably make sense to change "cas_split_alloc()"
to explicitly *not* have that "check xas_error() afterwards as if it
could be cumulative", and instead make it very clearly have no history
and change the semantics to

 (a) return the error - instead of having people have to check for
errors separately afterwards

 (b) do the xas_reset() in the success path

so that it explicitly does *not* work for accumulating previous errors
(which presumably was never really the intent of the interface, but
people certainly _could_ use it that way).

             Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18 14:12                           ` Matthew Wilcox
  2024-09-18 14:39                             ` Linus Torvalds
@ 2024-09-18 16:37                             ` Chris Mason
  1 sibling, 0 replies; 81+ messages in thread
From: Chris Mason @ 2024-09-18 16:37 UTC (permalink / raw)
  To: Matthew Wilcox, Linus Torvalds
  Cc: Jens Axboe, Dave Chinner, Christian Theune, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On 9/18/24 10:12 AM, Matthew Wilcox wrote:
> On Wed, Sep 18, 2024 at 03:51:39PM +0200, Linus Torvalds wrote:
>> On Wed, 18 Sept 2024 at 15:35, Matthew Wilcox <willy@infradead.org> wrote:
>>>
>>> Oh god, that's it.
>>>
>>> there should have been an xas_reset() after calling xas_split_alloc().
>>
>> I think it is worse than that.
>>
>> Even *without* an xas_split_alloc(), I think the old code was wrong,
>> because it drops the xas lock without doing the xas_reset.
> 
> That's actually OK.  The first time around the loop, we haven't walked the
> tree, so we start from the top as you'd expect.  The only other reason to
> go around the loop again is that memory allocation failed for a node, and
> in that case we call xas_nomem() and that (effectively) calls xas_reset().
> 
> So in terms of the expected API for xa_state users, it would be consistent
> for xas_split_alloc() to call xas_reset().
> 
> You might argue that this API is too subtle, but it was intended to
> be easy to use.  The problem was that xas_split_alloc() got added much
> later and I forgot to maintain the invariant that makes it work as well
> as be easy to use.
> 

Ok, missing xas_reset() makes a ton of sense as the root cause, and it
also explains why tmpfs hasn't seen the problem.

We'll start validating 6.11 and make noise if the large folios cause
problems again.  Thanks everyone!

-chris



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18 14:39                             ` Linus Torvalds
@ 2024-09-18 17:12                               ` Matthew Wilcox
  0 siblings, 0 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-18 17:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Jens Axboe, Dave Chinner, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Wed, Sep 18, 2024 at 04:39:56PM +0200, Linus Torvalds wrote:
> The fact that this bug was fixed basically entirely by mistake does
> say "this is much too subtle".

Yup.

> Of course, the fact that an xas_reset() not only resets the walk, but
> also clears any pending errors (because it's all the same "xa_node"
> thing), doesn't make things more obvious. Because right now you
> *could* treat errors as "cumulative", but if a xas_split_alloc() does
> an xas_reset() on success, that means that it's actually a big
> conceptual change and you can't do the "cumulative" thing any more.

So ... the way xas was intended to work is that the first thing we did
that set an error meant that everything after it was a no-op.  You
can see that in functions like xas_start() which do:

        if (xas_error(xas))
                return NULL;

obviously something like xas_unlock() isn't a noop because you still
want to unlock even if you had an error.

The xas_split_alloc() was done in too much of a hurry.  I had thought
that I wouldn't need it, and then found out that it was a prerequisite
for something I needed to do, and so I wasn't in the right frame of mind
when I wrote it.

It's actually a giant pain and I wanted to redo it even before this, as
well as clear up some pieces from xas_nomem() / __xas_nomem().  The
restriction on "we can only split to one additional level" is awful,
and has caused some contortions elsewhere.

> End result: it would probably make sense to change "xas_split_alloc()"
> to explicitly *not* have that "check xas_error() afterwards as if it
> could be cumulative", and instead make it very clearly have no history
> and change the semantics to

What it really should do is just return if it's already in an error state.
That makes it consistent with the rest of the API, and we don't have to
worry about it losing an already-found error.

But also all the other infelicities with it need to be fixed.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-18 13:34                       ` Matthew Wilcox
  2024-09-18 13:51                         ` Linus Torvalds
@ 2024-09-19  1:43                         ` Dave Chinner
  2024-09-19  3:03                           ` Linus Torvalds
  1 sibling, 1 reply; 81+ messages in thread
From: Dave Chinner @ 2024-09-19  1:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Chris Mason, Jens Axboe, Linus Torvalds, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Wed, Sep 18, 2024 at 02:34:57PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 18, 2024 at 11:28:52AM +0200, Chris Mason wrote:
> > I think the bug was in __filemap_add_folio()'s usage of xarray_split_alloc()
> > and the tree changing before taking the lock.  It's just a guess, but that
> > was always my biggest suspect.
> 
> Oh god, that's it.
> 
> there should have been an xas_reset() after calling xas_split_alloc().
> 
> and 6758c1128ceb calls xas_reset() after calling xas_split_alloc().

Should we be asking for 6758c1128ceb to be backported to all
stable kernels then?

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  1:43                         ` Dave Chinner
@ 2024-09-19  3:03                           ` Linus Torvalds
  2024-09-19  3:12                             ` Linus Torvalds
  0 siblings, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-19  3:03 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Chris Mason, Jens Axboe, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Thu, 19 Sept 2024 at 03:43, Dave Chinner <david@fromorbit.com> wrote:
>
> Should we be asking for 6758c1128ceb to be backported to all
> stable kernels then?

I think we should just do the simple one-liner of adding a
"xas_reset()" to after doing xas_split_alloc() (or do it inside the
xas_split_alloc()).

That said, I do also think it would be really good if the 'xa_lock*()'
family of functions also had something like a

        WARN_ON_ONCE(xas->xa_node && !xa_err(xas->xa_node));

which I think would have caught this. Because right now nothing at all
checks "we dropped the xa lock, and held xas state over it".

               Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  3:03                           ` Linus Torvalds
@ 2024-09-19  3:12                             ` Linus Torvalds
  2024-09-19  3:38                               ` Jens Axboe
  2024-09-19  6:34                               ` Christian Theune
  0 siblings, 2 replies; 81+ messages in thread
From: Linus Torvalds @ 2024-09-19  3:12 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Chris Mason, Jens Axboe, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think we should just do the simple one-liner of adding a
> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
> xas_split_alloc()).

.. and obviously that should be actually *verified* to fix the issue
not just with the test-case that Chris and Jens have been using, but
on Christian's real PostgreSQL load.

Christian?

Note that the xas_reset() needs to be done after the check for errors
- or like Willy suggested, xas_split_alloc() needs to be re-organized.

So the simplest fix is probably to just add a

                        if (xas_error(&xas))
                                goto error;
                }
+               xas_reset(&xas);
                xas_lock_irq(&xas);
                xas_for_each_conflict(&xas, entry) {
                        old = entry;

in __filemap_add_folio() in mm/filemap.c

(The above is obviously a whitespace-damaged pseudo-patch for the
pre-6758c1128ceb state. I don't actually carry a stable tree around on
my laptop, but I hope it's clear enough what I'm rambling about)

               Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  3:12                             ` Linus Torvalds
@ 2024-09-19  3:38                               ` Jens Axboe
  2024-09-19  4:32                                 ` Linus Torvalds
  2024-09-19  4:36                                 ` Matthew Wilcox
  2024-09-19  6:34                               ` Christian Theune
  1 sibling, 2 replies; 81+ messages in thread
From: Jens Axboe @ 2024-09-19  3:38 UTC (permalink / raw)
  To: Linus Torvalds, Dave Chinner
  Cc: Matthew Wilcox, Chris Mason, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On 9/18/24 9:12 PM, Linus Torvalds wrote:
> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I think we should just do the simple one-liner of adding a
>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>> xas_split_alloc()).
> 
> .. and obviously that should be actually *verified* to fix the issue
> not just with the test-case that Chris and Jens have been using, but
> on Christian's real PostgreSQL load.
> 
> Christian?
> 
> Note that the xas_reset() needs to be done after the check for errors
> - or like Willy suggested, xas_split_alloc() needs to be re-organized.
> 
> So the simplest fix is probably to just add a
> 
>                         if (xas_error(&xas))
>                                 goto error;
>                 }
> +               xas_reset(&xas);
>                 xas_lock_irq(&xas);
>                 xas_for_each_conflict(&xas, entry) {
>                         old = entry;
> 
> in __filemap_add_folio() in mm/filemap.c
> 
> (The above is obviously a whitespace-damaged pseudo-patch for the
> pre-6758c1128ceb state. I don't actually carry a stable tree around on
> my laptop, but I hope it's clear enough what I'm rambling about)

I kicked off a quick run with this on 6.9 with my debug patch as well,
and it still fails for me... I'll double check everything is sane. For
reference, below is the 6.9 filemap patch.

diff --git a/mm/filemap.c b/mm/filemap.c
index 30de18c4fd28..88093e2b7256 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -883,6 +883,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 		if (order > folio_order(folio))
 			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
 					order, gfp);
+		xas_reset(&xas);
 		xas_lock_irq(&xas);
 		xas_for_each_conflict(&xas, entry) {
 			old = entry;

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  3:38                               ` Jens Axboe
@ 2024-09-19  4:32                                 ` Linus Torvalds
  2024-09-19  4:42                                   ` Jens Axboe
  2024-09-19  4:36                                 ` Matthew Wilcox
  1 sibling, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-19  4:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Thu, 19 Sept 2024 at 05:38, Jens Axboe <axboe@kernel.dk> wrote:
>
> I kicked off a quick run with this on 6.9 with my debug patch as well,
> and it still fails for me... I'll double check everything is sane. For
> reference, below is the 6.9 filemap patch.

Ok, that's interesting. So it's *not* just about "that code didn't do
xas_reset() after xas_split_alloc()".

Now, another thing that commit 6758c1128ceb ("mm/filemap: optimize
filemap folio adding") does is that it now *only* calls xa_get_order()
under the xa lock, and then it verifies it against the
xas_split_alloc() that it did earlier.

The old code did "xas_split_alloc()" with one order (all outside the
lock), and then re-did the xas_get_order() lookup inside the lock. But
if it changed in between, it ended up doing the "xas_split()" with the
new order, even though "xas_split_alloc()" was done with the *old*
order.

That seems dangerous, and maybe the lack of xas_reset() was never the
*major* issue?

Willy? You know this code much better than I do. Maybe we should just
back-port 6758c1128ceb in its entirety.

Regardless, I'd want to make sure that we really understand the root
cause. Because it certainly looks like *just* the lack of xas_reset()
wasn't it.

                Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  3:38                               ` Jens Axboe
  2024-09-19  4:32                                 ` Linus Torvalds
@ 2024-09-19  4:36                                 ` Matthew Wilcox
  2024-09-19  4:46                                   ` Jens Axboe
                                                     ` (2 more replies)
  1 sibling, 3 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-19  4:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Dave Chinner, Chris Mason, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Wed, Sep 18, 2024 at 09:38:41PM -0600, Jens Axboe wrote:
> On 9/18/24 9:12 PM, Linus Torvalds wrote:
> > On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> I think we should just do the simple one-liner of adding a
> >> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
> >> xas_split_alloc()).
> > 
> > .. and obviously that should be actually *verified* to fix the issue
> > not just with the test-case that Chris and Jens have been using, but
> > on Christian's real PostgreSQL load.
> > 
> > Christian?
> > 
> > Note that the xas_reset() needs to be done after the check for errors
> > - or like Willy suggested, xas_split_alloc() needs to be re-organized.
> > 
> > So the simplest fix is probably to just add a
> > 
> >                         if (xas_error(&xas))
> >                                 goto error;
> >                 }
> > +               xas_reset(&xas);
> >                 xas_lock_irq(&xas);
> >                 xas_for_each_conflict(&xas, entry) {
> >                         old = entry;
> > 
> > in __filemap_add_folio() in mm/filemap.c
> > 
> > (The above is obviously a whitespace-damaged pseudo-patch for the
> > pre-6758c1128ceb state. I don't actually carry a stable tree around on
> > my laptop, but I hope it's clear enough what I'm rambling about)
> 
> I kicked off a quick run with this on 6.9 with my debug patch as well,
> and it still fails for me... I'll double check everything is sane. For
> reference, below is the 6.9 filemap patch.
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 30de18c4fd28..88093e2b7256 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -883,6 +883,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
>  		if (order > folio_order(folio))
>  			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
>  					order, gfp);
> +		xas_reset(&xas);
>  		xas_lock_irq(&xas);
>  		xas_for_each_conflict(&xas, entry) {
>  			old = entry;

My brain is still mushy, but I think there is still a problem (both with
the simple fix for 6.9 and indeed with 6.10).

For splitting a folio, we have the folio locked, so we know it's not
going anywhere.  The tree may get rearranged around it while we don't
have the xa_lock, but we're somewhat protected.

In this case we're splitting something that was, at one point, a shadow
entry.  There's no struct there to lock.  So I think we can have a
situation where we replicate 'old' (in 6.10) or xa_load() (in 6.9)
into the nodes we allocate in xas_split_alloc().  In 6.10, that's at
least guaranteed to be a shadow entry, but in 6.9, it might already be a
folio by this point because we've raced with something else also doing a
split.

Probably xas_split_alloc() needs to just do the alloc, like the name
says, and drop the 'entry' argument.  ICBW, but I think it explains
what you're seeing?  Maybe it doesn't?


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  4:32                                 ` Linus Torvalds
@ 2024-09-19  4:42                                   ` Jens Axboe
  0 siblings, 0 replies; 81+ messages in thread
From: Jens Axboe @ 2024-09-19  4:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On 9/18/24 10:32 PM, Linus Torvalds wrote:
> On Thu, 19 Sept 2024 at 05:38, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> I kicked off a quick run with this on 6.9 with my debug patch as well,
>> and it still fails for me... I'll double check everything is sane. For
>> reference, below is the 6.9 filemap patch.

Confirmed with a few more runs, still hits, basically as quickly as it
did before. So no real change observed with the added xas_reset().

> Ok, that's interesting. So it's *not* just about "that code didn't do
> xas_reset() after xas_split_alloc()".
> 
> Now, another thing that commit 6758c1128ceb ("mm/filemap: optimize
> filemap folio adding") does is that it now *only* calls xa_get_order()
> under the xa lock, and then it verifies it against the
> xas_split_alloc() that it did earlier.
> 
> The old code did "xas_split_alloc()" with one order (all outside the
> lock), and then re-did the xas_get_order() lookup inside the lock. But
> if it changed in between, it ended up doing the "xas_split()" with the
> new order, even though "xas_split_alloc()" was done with the *old*
> order.
> 
> That seems dangerous, and maybe the lack of xas_reset() was never the
> *major* issue?
> 
> Willy? You know this code much better than I do. Maybe we should just
> back-port 6758c1128ceb in its entirety.
> 
> Regardless, I'd want to make sure that we really understand the root
> cause. Because it certainly looks like *just* the lack of xas_reset()
> wasn't it.

Just for sanity's sake, I backported 6758c1128ceb (and the associated
xarray xas_get_order() change) to 6.9 and kicked that off.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  4:36                                 ` Matthew Wilcox
@ 2024-09-19  4:46                                   ` Jens Axboe
  2024-09-19  5:20                                     ` Jens Axboe
  2024-09-19  4:46                                   ` Linus Torvalds
  2024-09-20 13:54                                   ` Chris Mason
  2 siblings, 1 reply; 81+ messages in thread
From: Jens Axboe @ 2024-09-19  4:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Dave Chinner, Chris Mason, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On 9/18/24 10:36 PM, Matthew Wilcox wrote:
> On Wed, Sep 18, 2024 at 09:38:41PM -0600, Jens Axboe wrote:
>> On 9/18/24 9:12 PM, Linus Torvalds wrote:
>>> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>>
>>>> I think we should just do the simple one-liner of adding a
>>>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>>>> xas_split_alloc()).
>>>
>>> .. and obviously that should be actually *verified* to fix the issue
>>> not just with the test-case that Chris and Jens have been using, but
>>> on Christian's real PostgreSQL load.
>>>
>>> Christian?
>>>
>>> Note that the xas_reset() needs to be done after the check for errors
>>> - or like Willy suggested, xas_split_alloc() needs to be re-organized.
>>>
>>> So the simplest fix is probably to just add a
>>>
>>>                         if (xas_error(&xas))
>>>                                 goto error;
>>>                 }
>>> +               xas_reset(&xas);
>>>                 xas_lock_irq(&xas);
>>>                 xas_for_each_conflict(&xas, entry) {
>>>                         old = entry;
>>>
>>> in __filemap_add_folio() in mm/filemap.c
>>>
>>> (The above is obviously a whitespace-damaged pseudo-patch for the
>>> pre-6758c1128ceb state. I don't actually carry a stable tree around on
>>> my laptop, but I hope it's clear enough what I'm rambling about)
>>
>> I kicked off a quick run with this on 6.9 with my debug patch as well,
>> and it still fails for me... I'll double check everything is sane. For
>> reference, below is the 6.9 filemap patch.
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 30de18c4fd28..88093e2b7256 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -883,6 +883,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
>>  		if (order > folio_order(folio))
>>  			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
>>  					order, gfp);
>> +		xas_reset(&xas);
>>  		xas_lock_irq(&xas);
>>  		xas_for_each_conflict(&xas, entry) {
>>  			old = entry;
> 
> My brain is still mushy, but I think there is still a problem (both with
> the simple fix for 6.9 and indeed with 6.10).
> 
> For splitting a folio, we have the folio locked, so we know it's not
> going anywhere.  The tree may get rearranged around it while we don't
> have the xa_lock, but we're somewhat protected.
> 
> In this case we're splitting something that was, at one point, a shadow
> entry.  There's no struct there to lock.  So I think we can have a
> situation where we replicate 'old' (in 6.10) or xa_load() (in 6.9)
> into the nodes we allocate in xas_split_alloc().  In 6.10, that's at
> least guaranteed to be a shadow entry, but in 6.9, it might already be a
> folio by this point because we've raced with something else also doing a
> split.
> 
> Probably xas_split_alloc() needs to just do the alloc, like the name
> says, and drop the 'entry' argument.  ICBW, but I think it explains
> what you're seeing?  Maybe it doesn't?

Since I can hit it pretty reliably and quickly, I'm happy to test
whatever you want on top of 6.9. From the other email, I backported:

a4864671ca0b ("lib/xarray: introduce a new helper xas_get_order")
6758c1128ceb ("mm/filemap: optimize filemap folio adding")

to 6.9 and kicked off a test with that 5 min ago, and it's still going.
I'd say with 90% confidence that it should've hit already, but let's
leave it churning for an hour and see what pops out the other end.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  4:36                                 ` Matthew Wilcox
  2024-09-19  4:46                                   ` Jens Axboe
@ 2024-09-19  4:46                                   ` Linus Torvalds
  2024-09-20 13:54                                   ` Chris Mason
  2 siblings, 0 replies; 81+ messages in thread
From: Linus Torvalds @ 2024-09-19  4:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jens Axboe, Dave Chinner, Chris Mason, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Thu, 19 Sept 2024 at 06:36, Matthew Wilcox <willy@infradead.org> wrote:
>
> Probably xas_split_alloc() needs to just do the alloc, like the name
> says, and drop the 'entry' argument.  ICBW, but I think it explains
> what you're seeing?  Maybe it doesn't?

.. or we make the rule be that you have to re-check that the order and
the entry still matches when you do the actual xas_split()..

Like commit 6758c1128ceb does, in this case.

We do have another xas_split_alloc() - in the hugepage case - but
there we do have

                xas_lock(&xas);
                xas_reset(&xas);
                if (xas_load(&xas) != folio)
                        goto fail;

and the folio is locked over the whole sequence, so I think that code
is probably safe and guarantees that we're splitting with the same
details we alloc'ed.

                Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  4:46                                   ` Jens Axboe
@ 2024-09-19  5:20                                     ` Jens Axboe
  0 siblings, 0 replies; 81+ messages in thread
From: Jens Axboe @ 2024-09-19  5:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Dave Chinner, Chris Mason, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On 9/18/24 10:46 PM, Jens Axboe wrote:
> On 9/18/24 10:36 PM, Matthew Wilcox wrote:
>> On Wed, Sep 18, 2024 at 09:38:41PM -0600, Jens Axboe wrote:
>>> On 9/18/24 9:12 PM, Linus Torvalds wrote:
>>>> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
>>>> <torvalds@linux-foundation.org> wrote:
>>>>>
>>>>> I think we should just do the simple one-liner of adding a
>>>>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>>>>> xas_split_alloc()).
>>>>
>>>> .. and obviously that should be actually *verified* to fix the issue
>>>> not just with the test-case that Chris and Jens have been using, but
>>>> on Christian's real PostgreSQL load.
>>>>
>>>> Christian?
>>>>
>>>> Note that the xas_reset() needs to be done after the check for errors
>>>> - or like Willy suggested, xas_split_alloc() needs to be re-organized.
>>>>
>>>> So the simplest fix is probably to just add a
>>>>
>>>>                         if (xas_error(&xas))
>>>>                                 goto error;
>>>>                 }
>>>> +               xas_reset(&xas);
>>>>                 xas_lock_irq(&xas);
>>>>                 xas_for_each_conflict(&xas, entry) {
>>>>                         old = entry;
>>>>
>>>> in __filemap_add_folio() in mm/filemap.c
>>>>
>>>> (The above is obviously a whitespace-damaged pseudo-patch for the
>>>> pre-6758c1128ceb state. I don't actually carry a stable tree around on
>>>> my laptop, but I hope it's clear enough what I'm rambling about)
>>>
>>> I kicked off a quick run with this on 6.9 with my debug patch as well,
>>> and it still fails for me... I'll double check everything is sane. For
>>> reference, below is the 6.9 filemap patch.
>>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 30de18c4fd28..88093e2b7256 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -883,6 +883,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
>>>  		if (order > folio_order(folio))
>>>  			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
>>>  					order, gfp);
>>> +		xas_reset(&xas);
>>>  		xas_lock_irq(&xas);
>>>  		xas_for_each_conflict(&xas, entry) {
>>>  			old = entry;
>>
>> My brain is still mushy, but I think there is still a problem (both with
>> the simple fix for 6.9 and indeed with 6.10).
>>
>> For splitting a folio, we have the folio locked, so we know it's not
>> going anywhere.  The tree may get rearranged around it while we don't
>> have the xa_lock, but we're somewhat protected.
>>
>> In this case we're splitting something that was, at one point, a shadow
>> entry.  There's no struct there to lock.  So I think we can have a
>> situation where we replicate 'old' (in 6.10) or xa_load() (in 6.9)
>> into the nodes we allocate in xas_split_alloc().  In 6.10, that's at
>> least guaranteed to be a shadow entry, but in 6.9, it might already be a
>> folio by this point because we've raced with something else also doing a
>> split.
>>
>> Probably xas_split_alloc() needs to just do the alloc, like the name
>> says, and drop the 'entry' argument.  ICBW, but I think it explains
>> what you're seeing?  Maybe it doesn't?
> 
> Since I can hit it pretty reliably and quickly, I'm happy to test
> whatever you want on top of 6.9. From the other email, I backported:
> 
> a4864671ca0b ("lib/xarray: introduce a new helper xas_get_order")
> 6758c1128ceb ("mm/filemap: optimize filemap folio adding")
> 
> to 6.9 and kicked off a test with that 5 min ago, and it's still going.
> I'd say with 90% confidence that it should've hit already, but let's
> leave it churning for an hour and see what pops out the other end.

45 min later, I think I can conclusively call the backport of those two
on top of 6.9 good.

Below is what I'm running, which is those two commits (modulo the test
bits, for clarify). Rather than attempt to fix this differently for 6.9,
perhaps not a bad idea to just get those two into stable? It's not a lot
of churn, and at least that keeps it consistent rather than doing
something differently for stable.

I'll try and do a patch that just ensures the order is consistent across
lock cycles as Linus suggested, just to verify that this is indeed the
main issue. Will keep the xas_reset() as well.

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index cb571dfcf4b1..da2f5bba7944 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1548,6 +1551,7 @@ void xas_create_range(struct xa_state *);
 
 #ifdef CONFIG_XARRAY_MULTI
 int xa_get_order(struct xarray *, unsigned long index);
+int xas_get_order(struct xa_state *xas);
 void xas_split(struct xa_state *, void *entry, unsigned int order);
 void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t);
 #else
@@ -1556,6 +1560,11 @@ static inline int xa_get_order(struct xarray *xa, unsigned long index)
 	return 0;
 }
 
+static inline int xas_get_order(struct xa_state *xas)
+{
+	return 0;
+}
+
 static inline void xas_split(struct xa_state *xas, void *entry,
 		unsigned int order)
 {
diff --git a/lib/xarray.c b/lib/xarray.c
index 5e7d6334d70d..c0514fb16d33 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1765,39 +1780,52 @@ void *xa_store_range(struct xarray *xa, unsigned long first,
 EXPORT_SYMBOL(xa_store_range);
 
 /**
- * xa_get_order() - Get the order of an entry.
- * @xa: XArray.
- * @index: Index of the entry.
+ * xas_get_order() - Get the order of an entry.
+ * @xas: XArray operation state.
+ *
+ * Called after xas_load, the xas should not be in an error state.
  *
  * Return: A number between 0 and 63 indicating the order of the entry.
  */
-int xa_get_order(struct xarray *xa, unsigned long index)
+int xas_get_order(struct xa_state *xas)
 {
-	XA_STATE(xas, xa, index);
-	void *entry;
 	int order = 0;
 
-	rcu_read_lock();
-	entry = xas_load(&xas);
-
-	if (!entry)
-		goto unlock;
-
-	if (!xas.xa_node)
-		goto unlock;
+	if (!xas->xa_node)
+		return 0;
 
 	for (;;) {
-		unsigned int slot = xas.xa_offset + (1 << order);
+		unsigned int slot = xas->xa_offset + (1 << order);
 
 		if (slot >= XA_CHUNK_SIZE)
 			break;
-		if (!xa_is_sibling(xas.xa_node->slots[slot]))
+		if (!xa_is_sibling(xa_entry(xas->xa, xas->xa_node, slot)))
 			break;
 		order++;
 	}
 
-	order += xas.xa_node->shift;
-unlock:
+	order += xas->xa_node->shift;
+	return order;
+}
+EXPORT_SYMBOL_GPL(xas_get_order);
+
+/**
+ * xa_get_order() - Get the order of an entry.
+ * @xa: XArray.
+ * @index: Index of the entry.
+ *
+ * Return: A number between 0 and 63 indicating the order of the entry.
+ */
+int xa_get_order(struct xarray *xa, unsigned long index)
+{
+	XA_STATE(xas, xa, index);
+	int order = 0;
+	void *entry;
+
+	rcu_read_lock();
+	entry = xas_load(&xas);
+	if (entry)
+		order = xas_get_order(&xas);
 	rcu_read_unlock();
 
 	return order;
diff --git a/mm/filemap.c b/mm/filemap.c
index 30de18c4fd28..b8d525825d3f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -852,7 +852,9 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 		struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp)
 {
 	XA_STATE(xas, &mapping->i_pages, index);
-	bool huge = folio_test_hugetlb(folio);
+	void *alloced_shadow = NULL;
+	int alloced_order = 0;
+	bool huge;
 	bool charged = false;
 	long nr = 1;
 
@@ -869,6 +871,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 
 	VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
 	xas_set_order(&xas, index, folio_order(folio));
+	huge = folio_test_hugetlb(folio);
 	nr = folio_nr_pages(folio);
 
 	gfp &= GFP_RECLAIM_MASK;
@@ -876,13 +879,10 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 	folio->mapping = mapping;
 	folio->index = xas.xa_index;
 
-	do {
-		unsigned int order = xa_get_order(xas.xa, xas.xa_index);
+	for (;;) {
+		int order = -1, split_order = 0;
 		void *entry, *old = NULL;
 
-		if (order > folio_order(folio))
-			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
-					order, gfp);
 		xas_lock_irq(&xas);
 		xas_for_each_conflict(&xas, entry) {
 			old = entry;
@@ -890,19 +890,33 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 				xas_set_err(&xas, -EEXIST);
 				goto unlock;
 			}
+			/*
+			 * If a larger entry exists,
+			 * it will be the first and only entry iterated.
+			 */
+			if (order == -1)
+				order = xas_get_order(&xas);
+		}
+
+		/* entry may have changed before we re-acquire the lock */
+		if (alloced_order && (old != alloced_shadow || order != alloced_order)) {
+			xas_destroy(&xas);
+			alloced_order = 0;
 		}
 
 		if (old) {
-			if (shadowp)
-				*shadowp = old;
-			/* entry may have been split before we acquired lock */
-			order = xa_get_order(xas.xa, xas.xa_index);
-			if (order > folio_order(folio)) {
+			if (order > 0 && order > folio_order(folio)) {
 				/* How to handle large swap entries? */
 				BUG_ON(shmem_mapping(mapping));
+				if (!alloced_order) {
+					split_order = order;
+					goto unlock;
+				}
 				xas_split(&xas, old, order);
 				xas_reset(&xas);
 			}
+			if (shadowp)
+				*shadowp = old;
 		}
 
 		xas_store(&xas, folio);
@@ -918,9 +932,24 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 				__lruvec_stat_mod_folio(folio,
 						NR_FILE_THPS, nr);
 		}
+
 unlock:
 		xas_unlock_irq(&xas);
-	} while (xas_nomem(&xas, gfp));
+
+		/* split needed, alloc here and retry. */
+		if (split_order) {
+			xas_split_alloc(&xas, old, split_order, gfp);
+			if (xas_error(&xas))
+				goto error;
+			alloced_shadow = old;
+			alloced_order = split_order;
+			xas_reset(&xas);
+			continue;
+		}
+
+		if (!xas_nomem(&xas, gfp))
+			break;
+	}
 
 	if (xas_error(&xas))
 		goto error;

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  3:12                             ` Linus Torvalds
  2024-09-19  3:38                               ` Jens Axboe
@ 2024-09-19  6:34                               ` Christian Theune
  2024-09-19  6:57                                 ` Linus Torvalds
  1 sibling, 1 reply; 81+ messages in thread
From: Christian Theune @ 2024-09-19  6:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions


> On 19. Sep 2024, at 05:12, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> 
>> I think we should just do the simple one-liner of adding a
>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>> xas_split_alloc()).
> 
> .. and obviously that should be actually *verified* to fix the issue
> not just with the test-case that Chris and Jens have been using, but
> on Christian's real PostgreSQL load.
> 
> Christian?

Happy to! I see there’s still some back and forth on the specific patches. Let me know which kernel version and which patches I should start trying out. I’m loosing track while following the discussion. 

In preparation: I’m wondering whether the known reproducer gives insight how I might force my load to trigger it more easily? Would running the reproducer above and combining that with a running PostgreSQL benchmark make sense? 

Otherwise we’d likely only be getting insight after weeks of not seeing crashes … 

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  6:34                               ` Christian Theune
@ 2024-09-19  6:57                                 ` Linus Torvalds
  2024-09-19 10:19                                   ` Christian Theune
  0 siblings, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-19  6:57 UTC (permalink / raw)
  To: Christian Theune
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On Thu, 19 Sept 2024 at 08:35, Christian Theune <ct@flyingcircus.io> wrote:
>
> Happy to! I see there’s still some back and forth on the specific
> patches. Let me know which kernel version and which patches I should
> start trying out. I’m loosing track while following the discussion.

Yeah, right now Jens is still going to run some more testing, but I
think the plan is to just backport

  a4864671ca0b ("lib/xarray: introduce a new helper xas_get_order")
  6758c1128ceb ("mm/filemap: optimize filemap folio adding")

and I think we're at the point where you might as well start testing
that if you have the cycles for it. Jens is mostly trying to confirm
the root cause, but even without that, I think you running your load
with those two changes back-ported is worth it.

(Or even just try running it on plain 6.10 or 6.11, both of which
already has those commits)

> In preparation: I’m wondering whether the known reproducer gives
> insight how I might force my load to trigger it more easily? Would
> running the reproducer above and combining that with a running
> PostgreSQL benchmark make sense?
>
> Otherwise we’d likely only be getting insight after weeks of not
> seeing crashes …

So considering how well the reproducer works for Jens and Chris, my
main worry is whether your load might have some _additional_ issue.

Unlikely, but still .. The two commits fix the repproducer, so I think
the important thing to make sure is that it really fixes the original
issue too.

And yeah, I'd be surprised if it doesn't, but at the same time I would
_not_ suggest you try to make your load look more like the case we
already know gets fixed.

So yes, it will be "weeks of not seeing crashes" until we'd be
_really_ confident it's all the same thing, but I'd rather still have
you test that, than test something else than what caused issues
originally, if you see what I mean.

         Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  6:57                                 ` Linus Torvalds
@ 2024-09-19 10:19                                   ` Christian Theune
  2024-09-30 17:34                                     ` Christian Theune
  0 siblings, 1 reply; 81+ messages in thread
From: Christian Theune @ 2024-09-19 10:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions



> On 19. Sep 2024, at 08:57, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> Yeah, right now Jens is still going to run some more testing, but I
> think the plan is to just backport
> 
>  a4864671ca0b ("lib/xarray: introduce a new helper xas_get_order")
>  6758c1128ceb ("mm/filemap: optimize filemap folio adding")
> 
> and I think we're at the point where you might as well start testing
> that if you have the cycles for it. Jens is mostly trying to confirm
> the root cause, but even without that, I think you running your load
> with those two changes back-ported is worth it.
> 
> (Or even just try running it on plain 6.10 or 6.11, both of which
> already has those commits)

I’ve discussed this with my team and we’re preparing to switch all our 
non-prod machines as well as those production machines that have shown
the error before.

This will require a bit of user communication and reboot scheduling.
Our release prep will be able to roll this out starting early next week
and the production machines in question around Sept 30.

We would run with 6.11 as our understanding so far is that running the
most current kernel would generate the most insight and is easier to
work with for you all?

(Generally we run the mostly vanilla LTS that has surpassed x.y.50+ so
we might later downgrade to 6.6 when this is fixed.)

> So considering how well the reproducer works for Jens and Chris, my
> main worry is whether your load might have some _additional_ issue.
> 
> Unlikely, but still .. The two commits fix the repproducer, so I think
> the important thing to make sure is that it really fixes the original
> issue too.
> 
> And yeah, I'd be surprised if it doesn't, but at the same time I would
> _not_ suggest you try to make your load look more like the case we
> already know gets fixed.
> 
> So yes, it will be "weeks of not seeing crashes" until we'd be
> _really_ confident it's all the same thing, but I'd rather still have
> you test that, than test something else than what caused issues
> originally, if you see what I mean.

Agreed, I’m all onboard with that.

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19  4:36                                 ` Matthew Wilcox
  2024-09-19  4:46                                   ` Jens Axboe
  2024-09-19  4:46                                   ` Linus Torvalds
@ 2024-09-20 13:54                                   ` Chris Mason
  2024-09-24 15:58                                     ` Matthew Wilcox
                                                       ` (2 more replies)
  2 siblings, 3 replies; 81+ messages in thread
From: Chris Mason @ 2024-09-20 13:54 UTC (permalink / raw)
  To: Matthew Wilcox, Jens Axboe
  Cc: Linus Torvalds, Dave Chinner, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On 9/19/24 12:36 AM, Matthew Wilcox wrote:
> On Wed, Sep 18, 2024 at 09:38:41PM -0600, Jens Axboe wrote:
>> On 9/18/24 9:12 PM, Linus Torvalds wrote:
>>> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>>
>>>> I think we should just do the simple one-liner of adding a
>>>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>>>> xas_split_alloc()).
>>>
>>> .. and obviously that should be actually *verified* to fix the issue
>>> not just with the test-case that Chris and Jens have been using, but
>>> on Christian's real PostgreSQL load.
>>>
>>> Christian?
>>>
>>> Note that the xas_reset() needs to be done after the check for errors
>>> - or like Willy suggested, xas_split_alloc() needs to be re-organized.
>>>
>>> So the simplest fix is probably to just add a
>>>
>>>                         if (xas_error(&xas))
>>>                                 goto error;
>>>                 }
>>> +               xas_reset(&xas);
>>>                 xas_lock_irq(&xas);
>>>                 xas_for_each_conflict(&xas, entry) {
>>>                         old = entry;
>>>
>>> in __filemap_add_folio() in mm/filemap.c
>>>
>>> (The above is obviously a whitespace-damaged pseudo-patch for the
>>> pre-6758c1128ceb state. I don't actually carry a stable tree around on
>>> my laptop, but I hope it's clear enough what I'm rambling about)
>>
>> I kicked off a quick run with this on 6.9 with my debug patch as well,
>> and it still fails for me... I'll double check everything is sane. For
>> reference, below is the 6.9 filemap patch.
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 30de18c4fd28..88093e2b7256 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -883,6 +883,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
>>  		if (order > folio_order(folio))
>>  			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
>>  					order, gfp);
>> +		xas_reset(&xas);
>>  		xas_lock_irq(&xas);
>>  		xas_for_each_conflict(&xas, entry) {
>>  			old = entry;
> 
> My brain is still mushy, but I think there is still a problem (both with
> the simple fix for 6.9 and indeed with 6.10).
> 
> For splitting a folio, we have the folio locked, so we know it's not
> going anywhere.  The tree may get rearranged around it while we don't
> have the xa_lock, but we're somewhat protected.
> 
> In this case we're splitting something that was, at one point, a shadow
> entry.  There's no struct there to lock.  So I think we can have a
> situation where we replicate 'old' (in 6.10) or xa_load() (in 6.9)
> into the nodes we allocate in xas_split_alloc().  In 6.10, that's at
> least guaranteed to be a shadow entry, but in 6.9, it might already be a
> folio by this point because we've raced with something else also doing a
> split.
> 
> Probably xas_split_alloc() needs to just do the alloc, like the name
> says, and drop the 'entry' argument.  ICBW, but I think it explains
> what you're seeing?  Maybe it doesn't?

Jens and I went through a lot of iterations making the repro more
reliable, and we were able to pretty consistently show a UAF with
the debug code that Willy suggested:

XA_NODE_BUG_ON(xas->xa_alloc, memchr_inv(&xas->xa_alloc->slots, 0, sizeof(void *) * XA_CHUNK_SIZE));

But, I didn't really catch what Willy was saying about xas_split_alloc()
until this morning.

xas_split_alloc() does the allocation and also shoves an entry into some of
the slots.  When the tree changes, the entry we've stored is wildly 
wrong, but xas_reset() doesn't undo any of that.  So when we actually
use the xas->xa_alloc nodes we've setup, they are pointing to the
wrong things.

Which is probably why the commits in 6.10 added this:

/* entry may have changed before we re-acquire the lock */
if (alloced_order && (old != alloced_shadow || order != alloced_order)) {
	xas_destroy(&xas);
        alloced_order = 0;
}

The only way to undo the work done by xas_split_alloc() is to call
xas_destroy().

To prove this theory, I tried making a minimal version that also
called destroy, but it all ended up less minimal than the code
that's actually in 6.10.  I've got a long test going now with
an extra cond_resched() to make the race bigger, and a printk of victory.

It hasn't fired yet, and I need to hop on an airplane, so I'll just leave
it running for now.  But long story short, I think we should probably
just tag all of these for stable:

https://lore.kernel.org/all/20240415171857.19244-2-ryncsn@gmail.com/T/#mdb85922624c39ea7efb775a044af4731890ff776

Also, Willy's proposed changes to xas_split_alloc() seem like a good
idea.

-chris



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-20 13:54                                   ` Chris Mason
@ 2024-09-24 15:58                                     ` Matthew Wilcox
  2024-09-24 17:16                                     ` Sam James
  2024-09-24 19:17                                     ` Chris Mason
  2 siblings, 0 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-24 15:58 UTC (permalink / raw)
  To: Chris Mason
  Cc: Jens Axboe, Linus Torvalds, Dave Chinner, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Fri, Sep 20, 2024 at 03:54:55PM +0200, Chris Mason wrote:
> On 9/19/24 12:36 AM, Matthew Wilcox wrote:
> > My brain is still mushy, but I think there is still a problem (both with
> > the simple fix for 6.9 and indeed with 6.10).
> > 
> > For splitting a folio, we have the folio locked, so we know it's not
> > going anywhere.  The tree may get rearranged around it while we don't
> > have the xa_lock, but we're somewhat protected.
> > 
> > In this case we're splitting something that was, at one point, a shadow
> > entry.  There's no struct there to lock.  So I think we can have a
> > situation where we replicate 'old' (in 6.10) or xa_load() (in 6.9)
> > into the nodes we allocate in xas_split_alloc().  In 6.10, that's at
> > least guaranteed to be a shadow entry, but in 6.9, it might already be a
> > folio by this point because we've raced with something else also doing a
> > split.
> > 
> > Probably xas_split_alloc() needs to just do the alloc, like the name
> > says, and drop the 'entry' argument.  ICBW, but I think it explains
> > what you're seeing?  Maybe it doesn't?
> 
> Jens and I went through a lot of iterations making the repro more
> reliable, and we were able to pretty consistently show a UAF with
> the debug code that Willy suggested:
> 
> XA_NODE_BUG_ON(xas->xa_alloc, memchr_inv(&xas->xa_alloc->slots, 0, sizeof(void *) * XA_CHUNK_SIZE));
> 
> But, I didn't really catch what Willy was saying about xas_split_alloc()
> until this morning.
> 
> xas_split_alloc() does the allocation and also shoves an entry into some of
> the slots.  When the tree changes, the entry we've stored is wildly 
> wrong, but xas_reset() doesn't undo any of that.  So when we actually
> use the xas->xa_alloc nodes we've setup, they are pointing to the
> wrong things.
> 
> Which is probably why the commits in 6.10 added this:
> 
> /* entry may have changed before we re-acquire the lock */
> if (alloced_order && (old != alloced_shadow || order != alloced_order)) {
> 	xas_destroy(&xas);
>         alloced_order = 0;
> }
> 
> The only way to undo the work done by xas_split_alloc() is to call
> xas_destroy().

I hadn't fully understood this until today.  Here's what the code in 6.9
did (grossly simplified):

        do {
                unsigned int order = xa_get_order(xas.xa, xas.xa_index);
                if (order > folio_order(folio))
                        xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
                                        order, gfp);
                xas_lock_irq(&xas);
                if (old) {
                        order = xa_get_order(xas.xa, xas.xa_index);
                        if (order > folio_order(folio)) {
                                xas_split(&xas, old, order);
                        }
                }
                xas_store(&xas, folio);
                xas_unlock_irq(&xas);
        } while (xas_nomem(&xas, gfp));

The intent was that xas_store() would use the node allocated by
xas_nomem() and xas_split() would use the nodes allocated by
xas_split_alloc().  That doesn't end up happening if the split already
happened before getting the lock.  So if we were looking for a minimal
fix for pre-6.10, calling xas_destroy if we don't call xas_split()
would fix the problem.  But I think we're better off backporting the
6.10 patches.

For 6.12, I'm going to put this in -next:

http://git.infradead.org/?p=users/willy/xarray.git;a=commitdiff;h=6684aba0780da9f505c202f27e68ee6d18c0aa66

and then send it to Linus in a couple of weeks as an "obviously correct"
bit of hardening.  We really should have called xas_reset() before
retaking the lock.

Beyond that, I really want to revisit how, when and what we split.
A few months ago we came to the realisation that splitting order-9
folios to 512 order-0 folios was just legacy thinking.  What each user
really wants is to specify a precise page and say "I want this page to
end up in a folio that is of order N" (where N is smaller than the order
of the folio that it's currently in).  That is, if we truncate a file
which is currently a multiple of 2MB in size to one which has a tail of,
say, 13377ea bytes, we'd want to create a 1MB folio which we leave at
the end of the file, then a 512kB folio which we free, then a 256kB
folio which we keep, a 128kB folio which we discard, a 64kB folio which
we discard, ...

So we need to do that first, then all this code becomes way easier and
xas_split_alloc() no longer needs to fill in the node at the wrong time.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-20 13:54                                   ` Chris Mason
  2024-09-24 15:58                                     ` Matthew Wilcox
@ 2024-09-24 17:16                                     ` Sam James
  2024-09-25 16:06                                       ` Kairui Song
  2024-09-24 19:17                                     ` Chris Mason
  2 siblings, 1 reply; 81+ messages in thread
From: Sam James @ 2024-09-24 17:16 UTC (permalink / raw)
  To: clm, stable, Kairui Song, Matthew Wilcox
  Cc: axboe, ct, david, dqminh, linux-fsdevel, linux-kernel, linux-mm,
	linux-xfs, regressions, regressions, torvalds

Kairui, could you send them to the stable ML to be queued if Willy is
fine with it?


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-20 13:54                                   ` Chris Mason
  2024-09-24 15:58                                     ` Matthew Wilcox
  2024-09-24 17:16                                     ` Sam James
@ 2024-09-24 19:17                                     ` Chris Mason
  2024-09-24 19:24                                       ` Linus Torvalds
  2 siblings, 1 reply; 81+ messages in thread
From: Chris Mason @ 2024-09-24 19:17 UTC (permalink / raw)
  To: Matthew Wilcox, Jens Axboe
  Cc: Linus Torvalds, Dave Chinner, Christian Theune, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On 9/20/24 3:54 PM, Chris Mason wrote:

[ ... ]

> xas_split_alloc() does the allocation and also shoves an entry into some of
> the slots.  When the tree changes, the entry we've stored is wildly 
> wrong, but xas_reset() doesn't undo any of that.  So when we actually
> use the xas->xa_alloc nodes we've setup, they are pointing to the
> wrong things.
> 
> Which is probably why the commits in 6.10 added this:
> 
> /* entry may have changed before we re-acquire the lock */
> if (alloced_order && (old != alloced_shadow || order != alloced_order)) {
> 	xas_destroy(&xas);
>         alloced_order = 0;
> }
> 
> The only way to undo the work done by xas_split_alloc() is to call
> xas_destroy().
> 
> To prove this theory, I tried making a minimal version that also
> called destroy, but it all ended up less minimal than the code
> that's actually in 6.10.  I've got a long test going now with
> an extra cond_resched() to make the race bigger, and a printk of victory.
> 
> It hasn't fired yet, and I need to hop on an airplane, so I'll just leave
> it running for now.  But long story short, I think we should probably
> just tag all of these for stable:
> 
> https://lore.kernel.org/all/20240415171857.19244-2-ryncsn@gmail.com/T/#mdb85922624c39ea7efb775a044af4731890ff776
> 
> Also, Willy's proposed changes to xas_split_alloc() seem like a good
> idea.

A few days of load later and some extra printks, it turns out that
taking the writer lock in __filemap_add_folio() makes us dramatically
more likely to just return EEXIST than go into the xas_split_alloc() dance.

With the changes in 6.10, we only get into that xas_destroy() case above
when the conflicting entry is a shadow entry, so I changed my repro to
use memory pressure instead of fadvise.

I also added a schedule_timeout(1) after the split alloc, and with all
of that I'm able to consistently make the xas_destroy() case trigger
without causing any system instability.  Kairui Song's patches do seem
to have fixed things nicely.

-chris



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-24 19:17                                     ` Chris Mason
@ 2024-09-24 19:24                                       ` Linus Torvalds
  0 siblings, 0 replies; 81+ messages in thread
From: Linus Torvalds @ 2024-09-24 19:24 UTC (permalink / raw)
  To: Chris Mason
  Cc: Matthew Wilcox, Jens Axboe, Dave Chinner, Christian Theune,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Tue, 24 Sept 2024 at 12:18, Chris Mason <clm@meta.com> wrote:
>
> A few days of load later and some extra printks, it turns out that
> taking the writer lock in __filemap_add_folio() makes us dramatically
> more likely to just return EEXIST than go into the xas_split_alloc() dance.

.. and that sounds like a good thing, except for the test coverage, I guess.

Which you seem to have fixed:

> With the changes in 6.10, we only get into that xas_destroy() case above
> when the conflicting entry is a shadow entry, so I changed my repro to
> use memory pressure instead of fadvise.
>
> I also added a schedule_timeout(1) after the split alloc, and with all
> of that I'm able to consistently make the xas_destroy() case trigger
> without causing any system instability.  Kairui Song's patches do seem
> to have fixed things nicely.

<confused thumbs up / fingers crossed emoji>

              Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-24 17:16                                     ` Sam James
@ 2024-09-25 16:06                                       ` Kairui Song
  2024-09-25 16:42                                         ` Christian Theune
  2024-09-27 14:51                                         ` Sam James
  0 siblings, 2 replies; 81+ messages in thread
From: Kairui Song @ 2024-09-25 16:06 UTC (permalink / raw)
  To: Sam James, stable
  Cc: clm, Matthew Wilcox, axboe, ct, david, dqminh, linux-fsdevel,
	linux-kernel, linux-mm, linux-xfs, regressions, regressions,
	torvalds

On Wed, Sep 25, 2024 at 1:16 AM Sam James <sam@gentoo.org> wrote:
>
> Kairui, could you send them to the stable ML to be queued if Willy is
> fine with it?
>

Hi Sam,

Thanks for adding me to the discussion.

Yes I'd like to, just not sure if people are still testing and
checking the commits.

And I haven't sent seperate fix just for stable fix before, so can
anyone teach me, should I send only two patches for a minimal change,
or send a whole series (with some minor clean up patch as dependency)
for minimal conflicts? Or the stable team can just pick these up?


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-25 16:06                                       ` Kairui Song
@ 2024-09-25 16:42                                         ` Christian Theune
  2024-09-27 14:51                                         ` Sam James
  1 sibling, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-09-25 16:42 UTC (permalink / raw)
  To: Kairui Song
  Cc: Sam James, stable, clm, Matthew Wilcox, axboe, Dave Chinner,
	dqminh, linux-fsdevel, linux-kernel, linux-mm, linux-xfs,
	regressions, regressions, torvalds



> On 25. Sep 2024, at 18:06, Kairui Song <ryncsn@gmail.com> wrote:
> 
> On Wed, Sep 25, 2024 at 1:16 AM Sam James <sam@gentoo.org> wrote:
>> 
>> Kairui, could you send them to the stable ML to be queued if Willy is
>> fine with it?
>> 
> 
> Hi Sam,
> 
> Thanks for adding me to the discussion.
> 
> Yes I'd like to, just not sure if people are still testing and
> checking the commits.

As the one who raised the issue recently: we’re rolling out 6.11 for testing on a couple hundred machines right now. I’ve scheduled this internally to run 8-12 weeks due to the fleeting nature and will report back if it pops up again or after that time has elapsed.

AFAICT this is a fix in any case even if we should find more issues in my fleet later.

Cheers,
Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-25 16:06                                       ` Kairui Song
  2024-09-25 16:42                                         ` Christian Theune
@ 2024-09-27 14:51                                         ` Sam James
  2024-09-27 14:58                                           ` Jens Axboe
  1 sibling, 1 reply; 81+ messages in thread
From: Sam James @ 2024-09-27 14:51 UTC (permalink / raw)
  To: Kairui Song, Greg KH
  Cc: stable, clm, Matthew Wilcox, axboe, ct, david, dqminh,
	linux-fsdevel, linux-kernel, linux-mm, linux-xfs, regressions,
	regressions, torvalds

Kairui Song <ryncsn@gmail.com> writes:

> On Wed, Sep 25, 2024 at 1:16 AM Sam James <sam@gentoo.org> wrote:
>>
>> Kairui, could you send them to the stable ML to be queued if Willy is
>> fine with it?
>>
>
> Hi Sam,

Hi Kairui,

>
> Thanks for adding me to the discussion.
>
> Yes I'd like to, just not sure if people are still testing and
> checking the commits.
>
> And I haven't sent seperate fix just for stable fix before, so can
> anyone teach me, should I send only two patches for a minimal change,
> or send a whole series (with some minor clean up patch as dependency)
> for minimal conflicts? Or the stable team can just pick these up?

Please see https://www.kernel.org/doc/html/v6.11/process/stable-kernel-rules.html.

If Option 2 can't work (because of conflicts), please follow Option 3
(https://www.kernel.org/doc/html/v6.11/process/stable-kernel-rules.html#option-3).

Just explain the background and link to this thread in a cover letter
and mention it's your first time. Greg didn't bite me when I fumbled my
way around it :)

(greg, please correct me if I'm talking rubbish)

thanks,
sam


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-27 14:51                                         ` Sam James
@ 2024-09-27 14:58                                           ` Jens Axboe
  2024-10-01 21:10                                             ` Kairui Song
  0 siblings, 1 reply; 81+ messages in thread
From: Jens Axboe @ 2024-09-27 14:58 UTC (permalink / raw)
  To: Sam James, Kairui Song, Greg KH
  Cc: stable, clm, Matthew Wilcox, ct, david, dqminh, linux-fsdevel,
	linux-kernel, linux-mm, linux-xfs, regressions, regressions,
	torvalds

On 9/27/24 8:51 AM, Sam James wrote:
> Kairui Song <ryncsn@gmail.com> writes:
> 
>> On Wed, Sep 25, 2024 at 1:16?AM Sam James <sam@gentoo.org> wrote:
>>>
>>> Kairui, could you send them to the stable ML to be queued if Willy is
>>> fine with it?
>>>
>>
>> Hi Sam,
> 
> Hi Kairui,
> 
>>
>> Thanks for adding me to the discussion.
>>
>> Yes I'd like to, just not sure if people are still testing and
>> checking the commits.
>>
>> And I haven't sent seperate fix just for stable fix before, so can
>> anyone teach me, should I send only two patches for a minimal change,
>> or send a whole series (with some minor clean up patch as dependency)
>> for minimal conflicts? Or the stable team can just pick these up?
> 
> Please see https://www.kernel.org/doc/html/v6.11/process/stable-kernel-rules.html.
> 
> If Option 2 can't work (because of conflicts), please follow Option 3
> (https://www.kernel.org/doc/html/v6.11/process/stable-kernel-rules.html#option-3).
> 
> Just explain the background and link to this thread in a cover letter
> and mention it's your first time. Greg didn't bite me when I fumbled my
> way around it :)
> 
> (greg, please correct me if I'm talking rubbish)

It needs two cherry picks, one of them won't pick cleanly. So I suggest
whoever submits this to stable does:

1) Cherry pick the two commits, fixup the simple issue with one of them.
   I forget what it was since it's been a week and a half since I did
   it, but it's trivial to fixup.

   Don't forget to add the "commit XXX upstream" to the commit message.

2) Test that it compiles and boots and send an email to
   stable@vger.kernel.org with the patches attached and CC the folks in
   this thread, to help spot if there are mistakes.

and that should be it. Worst case, we'll need a few different patches
since this affects anything back to 5.19, and each currently maintained
stable kernel version will need it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-19 10:19                                   ` Christian Theune
@ 2024-09-30 17:34                                     ` Christian Theune
  2024-09-30 18:46                                       ` Linus Torvalds
                                                         ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Christian Theune @ 2024-09-30 17:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

Hi,

we’ve been running a number of VMs since last week on 6.11. We’ve encountered one hung task situation multiple times now that seems to be resolving itself after a bit of time, though. I do not see spinning CPU during this time.

The situation seems to be related to cgroups-based IO throttling / weighting so far:

Here are three examples of similar tracebacks where jobs that do perform a certain amount of IO (either given a weight or given an explicit limit like this:

IOWeight=10
IOReadIOPSMax=/dev/vda 188
IOWriteIOPSMax=/dev/vda 188
	
Telemetry for the affected VM does not show that it actually reaches 188 IOPS (the load is mostly writing) but creates a kind of gaussian curve … 

The underlying storage and network was completely inconspicuous during the whole time.

Sep 27 00:51:20 <redactedhostname>13 kernel: INFO: task nix-build:5300 blocked for more than 122 seconds.
Sep 27 00:51:20 <redactedhostname>13 kernel:       Not tainted 6.11.0 #1-NixOS
Sep 27 00:51:20 <redactedhostname>13 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 27 00:51:20 <redactedhostname>13 kernel: task:nix-build       state:D stack:0     pid:5300  tgid:5298  ppid:5297   flags:0x00000002
Sep 27 00:51:20 <redactedhostname>13 kernel: Call Trace:
Sep 27 00:51:20 <redactedhostname>13 kernel:  <TASK>
Sep 27 00:51:20 <redactedhostname>13 kernel:  __schedule+0x3a3/0x1300
Sep 27 00:51:20 <redactedhostname>13 kernel:  ? xfs_vm_writepages+0x67/0x90 [xfs]
Sep 27 00:51:20 <redactedhostname>13 kernel:  schedule+0x27/0xf0
Sep 27 00:51:20 <redactedhostname>13 kernel:  io_schedule+0x46/0x70
Sep 27 00:51:20 <redactedhostname>13 kernel:  folio_wait_bit_common+0x13f/0x340
Sep 27 00:51:20 <redactedhostname>13 kernel:  ? __pfx_wake_page_function+0x10/0x10
Sep 27 00:51:20 <redactedhostname>13 kernel:  folio_wait_writeback+0x2b/0x80
Sep 27 00:51:20 <redactedhostname>13 kernel:  __filemap_fdatawait_range+0x80/0xe0
Sep 27 00:51:20 <redactedhostname>13 kernel:  filemap_write_and_wait_range+0x85/0xb0
Sep 27 00:51:20 <redactedhostname>13 kernel:  xfs_setattr_size+0xd9/0x3c0 [xfs]
Sep 27 00:51:20 <redactedhostname>13 kernel:  xfs_vn_setattr+0x81/0x150 [xfs]
Sep 27 00:51:20 <redactedhostname>13 kernel:  notify_change+0x2ed/0x4f0
Sep 27 00:51:20 <redactedhostname>13 kernel:  ? do_truncate+0x98/0xf0
Sep 27 00:51:20 <redactedhostname>13 kernel:  do_truncate+0x98/0xf0
Sep 27 00:51:20 <redactedhostname>13 kernel:  do_ftruncate+0xfe/0x160
Sep 27 00:51:20 <redactedhostname>13 kernel:  __x64_sys_ftruncate+0x3e/0x70
Sep 27 00:51:20 <redactedhostname>13 kernel:  do_syscall_64+0xb7/0x200
Sep 27 00:51:20 <redactedhostname>13 kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
Sep 27 00:51:20 <redactedhostname>13 kernel: RIP: 0033:0x7f1ed1912c2b
Sep 27 00:51:20 <redactedhostname>13 kernel: RSP: 002b:00007f1eb73fd3f8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
Sep 27 00:51:20 <redactedhostname>13 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1ed1912c2b
Sep 27 00:51:20 <redactedhostname>13 kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000012
Sep 27 00:51:20 <redactedhostname>13 kernel: RBP: 0000000000000012 R08: 0000000000000000 R09: 00007f1eb73fd3a0
Sep 27 00:51:20 <redactedhostname>13 kernel: R10: 0000000000132000 R11: 0000000000000246 R12: 00005601d0150290
Sep 27 00:51:20 <redactedhostname>13 kernel: R13: 00005601d58ae0b8 R14: 0000000000000001 R15: 00005601d58bec58
Sep 27 00:51:20 <redactedhostname>13 kernel:  </TASK>

Sep 28 10:13:04 release2405dev00 kernel: INFO: task nix-channel:507080 blocked for more than 122 seconds.
Sep 28 10:13:04 release2405dev00 kernel:       Not tainted 6.11.0 #1-NixOS
Sep 28 10:13:04 release2405dev00 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 10:13:04 release2405dev00 kernel: task:nix-channel     state:D stack:0     pid:507080 tgid:507080 ppid:507061 flags:0x00000002
Sep 28 10:13:04 release2405dev00 kernel: Call Trace:
Sep 28 10:13:04 release2405dev00 kernel:  <TASK>
Sep 28 10:13:04 release2405dev00 kernel:  __schedule+0x3a3/0x1300
Sep 28 10:13:04 release2405dev00 kernel:  ? xfs_vm_writepages+0x67/0x90 [xfs]
Sep 28 10:13:04 release2405dev00 kernel:  schedule+0x27/0xf0
Sep 28 10:13:04 release2405dev00 kernel:  io_schedule+0x46/0x70
Sep 28 10:13:04 release2405dev00 kernel:  folio_wait_bit_common+0x13f/0x340
Sep 28 10:13:04 release2405dev00 kernel:  ? __pfx_wake_page_function+0x10/0x10
Sep 28 10:13:04 release2405dev00 kernel:  folio_wait_writeback+0x2b/0x80
Sep 28 10:13:04 release2405dev00 kernel:  __filemap_fdatawait_range+0x80/0xe0
Sep 28 10:13:04 release2405dev00 kernel:  file_write_and_wait_range+0x88/0xb0
Sep 28 10:13:04 release2405dev00 kernel:  xfs_file_fsync+0x5e/0x2a0 [xfs]
Sep 28 10:13:04 release2405dev00 kernel:  __x64_sys_fdatasync+0x52/0x90
Sep 28 10:13:04 release2405dev00 kernel:  do_syscall_64+0xb7/0x200
Sep 28 10:13:04 release2405dev00 kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
Sep 28 10:13:04 release2405dev00 kernel: RIP: 0033:0x7f5b9371270a
Sep 28 10:13:04 release2405dev00 kernel: RSP: 002b:00007ffd678149f0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Sep 28 10:13:04 release2405dev00 kernel: RAX: ffffffffffffffda RBX: 0000559a4d023a18 RCX: 00007f5b9371270a
Sep 28 10:13:04 release2405dev00 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
Sep 28 10:13:04 release2405dev00 kernel: RBP: 0000000000000000 R08: 0000000000000001 R09: 0000559a4d027878
Sep 28 10:13:04 release2405dev00 kernel: R10: 0000000000000016 R11: 0000000000000293 R12: 0000000000000001
Sep 28 10:13:04 release2405dev00 kernel: R13: 000000000000002e R14: 0000559a4d0278fc R15: 00007ffd67814bf0
Sep 28 10:13:04 release2405dev00 kernel:  </TASK>

Sep 28 03:39:19 <redactedhostname>10 kernel: INFO: task nix-build:94696 blocked for more than 122 seconds.
Sep 28 03:39:19 <redactedhostname>10 kernel:       Not tainted 6.11.0 #1-NixOS
Sep 28 03:39:19 <redactedhostname>10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 03:39:19 <redactedhostname>10 kernel: task:nix-build       state:D stack:0     pid:94696 tgid:94696 ppid:94695  flags:0x00000002
Sep 28 03:39:19 <redactedhostname>10 kernel: Call Trace:
Sep 28 03:39:19 <redactedhostname>10 kernel:  <TASK>
Sep 28 03:39:19 <redactedhostname>10 kernel:  __schedule+0x3a3/0x1300
Sep 28 03:39:19 <redactedhostname>10 kernel:  schedule+0x27/0xf0
Sep 28 03:39:19 <redactedhostname>10 kernel:  io_schedule+0x46/0x70
Sep 28 03:39:19 <redactedhostname>10 kernel:  folio_wait_bit_common+0x13f/0x340
Sep 28 03:39:19 <redactedhostname>10 kernel:  ? __pfx_wake_page_function+0x10/0x10
Sep 28 03:39:19 <redactedhostname>10 kernel:  folio_wait_writeback+0x2b/0x80
Sep 28 03:39:19 <redactedhostname>10 kernel:  truncate_inode_partial_folio+0x5e/0x1b0
Sep 28 03:39:19 <redactedhostname>10 kernel:  truncate_inode_pages_range+0x1de/0x400
Sep 28 03:39:19 <redactedhostname>10 kernel:  evict+0x29f/0x2c0
Sep 28 03:39:19 <redactedhostname>10 kernel:  ? iput+0x6e/0x230
Sep 28 03:39:19 <redactedhostname>10 kernel:  ? _atomic_dec_and_lock+0x39/0x50
Sep 28 03:39:19 <redactedhostname>10 kernel:  do_unlinkat+0x2de/0x330
Sep 28 03:39:19 <redactedhostname>10 kernel:  __x64_sys_unlink+0x3f/0x70
Sep 28 03:39:19 <redactedhostname>10 kernel:  do_syscall_64+0xb7/0x200
Sep 28 03:39:19 <redactedhostname>10 kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
Sep 28 03:39:19 <redactedhostname>10 kernel: RIP: 0033:0x7f37c062d56b
Sep 28 03:39:19 <redactedhostname>10 kernel: RSP: 002b:00007fff71638018 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
Sep 28 03:39:19 <redactedhostname>10 kernel: RAX: ffffffffffffffda RBX: 0000562038c30500 RCX: 00007f37c062d56b
Sep 28 03:39:19 <redactedhostname>10 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000562038c31c80
Sep 28 03:39:19 <redactedhostname>10 kernel: RBP: 0000562038c30690 R08: 0000000000016020 R09: 0000000000000000
Sep 28 03:39:19 <redactedhostname>10 kernel: R10: 0000000000000050 R11: 0000000000000206 R12: 00007fff71638058
Sep 28 03:39:19 <redactedhostname>10 kernel: R13: 00007fff7163803c R14: 00007fff71638960 R15: 0000562040b8a500
Sep 28 03:39:19 <redactedhostname>10 kernel:  </TASK>

Hope this helps,
Christian

> On 19. Sep 2024, at 12:19, Christian Theune <ct@flyingcircus.io> wrote:
> 
> 
> 
>> On 19. Sep 2024, at 08:57, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> 
>> Yeah, right now Jens is still going to run some more testing, but I
>> think the plan is to just backport
>> 
>> a4864671ca0b ("lib/xarray: introduce a new helper xas_get_order")
>> 6758c1128ceb ("mm/filemap: optimize filemap folio adding")
>> 
>> and I think we're at the point where you might as well start testing
>> that if you have the cycles for it. Jens is mostly trying to confirm
>> the root cause, but even without that, I think you running your load
>> with those two changes back-ported is worth it.
>> 
>> (Or even just try running it on plain 6.10 or 6.11, both of which
>> already has those commits)
> 
> I’ve discussed this with my team and we’re preparing to switch all our 
> non-prod machines as well as those production machines that have shown
> the error before.
> 
> This will require a bit of user communication and reboot scheduling.
> Our release prep will be able to roll this out starting early next week
> and the production machines in question around Sept 30.
> 
> We would run with 6.11 as our understanding so far is that running the
> most current kernel would generate the most insight and is easier to
> work with for you all?
> 
> (Generally we run the mostly vanilla LTS that has surpassed x.y.50+ so
> we might later downgrade to 6.6 when this is fixed.)
> 
>> So considering how well the reproducer works for Jens and Chris, my
>> main worry is whether your load might have some _additional_ issue.
>> 
>> Unlikely, but still .. The two commits fix the repproducer, so I think
>> the important thing to make sure is that it really fixes the original
>> issue too.
>> 
>> And yeah, I'd be surprised if it doesn't, but at the same time I would
>> _not_ suggest you try to make your load look more like the case we
>> already know gets fixed.
>> 
>> So yes, it will be "weeks of not seeing crashes" until we'd be
>> _really_ confident it's all the same thing, but I'd rather still have
>> you test that, than test something else than what caused issues
>> originally, if you see what I mean.
> 
> Agreed, I’m all onboard with that.
> 
> Liebe Grüße,
> Christian Theune
> 
> -- 
> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> 

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 17:34                                     ` Christian Theune
@ 2024-09-30 18:46                                       ` Linus Torvalds
  2024-09-30 19:25                                         ` Christian Theune
  2024-10-01  0:56                                       ` Chris Mason
  2024-10-01  2:22                                       ` Dave Chinner
  2 siblings, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-30 18:46 UTC (permalink / raw)
  To: Christian Theune
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On Mon, 30 Sept 2024 at 10:35, Christian Theune <ct@flyingcircus.io> wrote:
>
> Sep 27 00:51:20 <redactedhostname>13 kernel:  folio_wait_bit_common+0x13f/0x340
> Sep 27 00:51:20 <redactedhostname>13 kernel:  folio_wait_writeback+0x2b/0x80

Gaah. Every single case you point to is that folio_wait_writeback() case.

And this might be an old old annoyance.

folio_wait_writeback() is insane. It does

        while (folio_test_writeback(folio)) {
                trace_folio_wait_writeback(folio, folio_mapping(folio));
                folio_wait_bit(folio, PG_writeback);
        }

and the reason that is insane is that PG_writeback isn't some kind of
exclusive state. So folio_wait_bit() will return once somebody has
ended writeback, but *new* writeback can easily have been started
afterwards. So then we go back to wait...

And even after it eventually returns (possibly after having waited for
hundreds of other processes writing back that folio - imagine lots of
other threads doing writes to it and 'fdatasync()' or whatever) the
caller *still* can't actually assume that the writeback bit is clear,
because somebody else might have started writeback again.

Anyway, it's insane, but it's insane for a *reason*. We've tried to
fix this before, long before it was a folio op. See commit
c2407cf7d22d ("mm: make wait_on_page_writeback() wait for multiple
pending writebacks").

IOW, this code is known-broken and might have extreme unfairness
issues (although I had blissfully forgotten about it), because while
the actual writeback *bit* itself is set and cleared atomically, the
wakeup for the bit is asynchronous and can be delayed almost
arbitrarily, so you can get basically spurious wakeups that were from
a previous bit clear.

So the "wait many times" is crazy, but it's sadly a necessary crazy as
things are right now.

Now, many callers hold the page lock while doing this, and in that
case new writeback cases shouldn't happen, and so repeating the loop
should be extremely limited.

But "many" is not "all". For example, __filemap_fdatawait_range() very
much doesn't hold the lock on the pages it waits for, so afaik this
can cause that unfairness and starvation issue.

That said, while every one of your traces are for that
folio_wait_writeback(), the last one is for the truncate case, and
that one *does* hold the page lock and so shouldn't see this potential
unfairness issue.

So the code here is questionable, and might cause some issues, but the
starvation of folio_wait_writeback() can't explain _all_ the cases you
see.

                  Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 18:46                                       ` Linus Torvalds
@ 2024-09-30 19:25                                         ` Christian Theune
  2024-09-30 20:12                                           ` Linus Torvalds
  0 siblings, 1 reply; 81+ messages in thread
From: Christian Theune @ 2024-09-30 19:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions


> On 30. Sep 2024, at 20:46, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Mon, 30 Sept 2024 at 10:35, Christian Theune <ct@flyingcircus.io> wrote:
>> 
>> Sep 27 00:51:20 <redactedhostname>13 kernel:  folio_wait_bit_common+0x13f/0x340
>> Sep 27 00:51:20 <redactedhostname>13 kernel:  folio_wait_writeback+0x2b/0x80
> 
> Gaah. Every single case you point to is that folio_wait_writeback() case.
> 
> And this might be an old old annoyance.

I’m being told that I’m somewhat of a truffle pig for dirty code … how long ago does “old old” refer to, btw?

> […]
> IOW, this code is known-broken and might have extreme unfairness
> issues (although I had blissfully forgotten about it), because while
> the actual writeback *bit* itself is set and cleared atomically, the
> wakeup for the bit is asynchronous and can be delayed almost
> arbitrarily, so you can get basically spurious wakeups that were from
> a previous bit clear.

I wonder whether the extreme unfairness gets exacerbated when in a cgroup throttled context … It’s a limited number of workloads we 
have seen this with, some of which are parallelized and others aren’t. (and I guess non-parallelized code shouldn’t suffer much from this?)

Maybe I can reproduce this more easily and  ...

> So the code here is questionable, and might cause some issues, but the
> starvation of folio_wait_writeback() can't explain _all_ the cases you
> see.

… also get you more data and dig for maybe more cases more systematically.
Anything particular you’d like me to look for? Any specific additional data
points that would help?

We’re going to keep with 6.11 in staging and avoid rolling it out to the production machines for now.

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 19:25                                         ` Christian Theune
@ 2024-09-30 20:12                                           ` Linus Torvalds
  2024-09-30 20:56                                             ` Matthew Wilcox
  0 siblings, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-09-30 20:12 UTC (permalink / raw)
  To: Christian Theune
  Cc: Dave Chinner, Matthew Wilcox, Chris Mason, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

On Mon, 30 Sept 2024 at 12:25, Christian Theune <ct@flyingcircus.io> wrote:
>
> I’m being told that I’m somewhat of a truffle pig for dirty code … how long ago does “old old” refer to, btw?

It's basically been that way forever. The code has changed many times,
but we've basically always had that "wait on bit will wait not until
the next wakeup, but until it actually sees the bit being clear".

And by "always" I mean "going back at least to before the git tree". I
didn't search further. It's not new.

The only reason I pointed at that (relatively recent) commit from 2021
is that when we rewrote the page bit waiting logic (for some unrelated
horrendous scalability issues with tens of thousands of pages on wait
queues), the rewritten code _tried_ to not do it, and instead go "we
were woken up by a bit clear op, so now we've waited enough".

And that then caused problems as explained in that commit c2407cf7d22d
("mm: make wait_on_page_writeback() wait for multiple pending
writebacks") because the wakeups aren't atomic wrt the actual bit
setting/clearing/testing.

IOW - that 2021 commit didn't _introduce_ the issue, it just went back
to the horrendous behavior that we've always had, and temporarily
tried to avoid.

Note that "horrendous behavior" is really "you probably can't hit it
under any normal load". So it's not like it's a problem in practice.

Except your load clearly triggers *something*. And maybe this is part of it.

                 Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 20:12                                           ` Linus Torvalds
@ 2024-09-30 20:56                                             ` Matthew Wilcox
  2024-09-30 22:42                                               ` Davidlohr Bueso
  2024-09-30 23:53                                               ` Linus Torvalds
  0 siblings, 2 replies; 81+ messages in thread
From: Matthew Wilcox @ 2024-09-30 20:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Theune, Dave Chinner, Chris Mason, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Mon, Sep 30, 2024 at 01:12:37PM -0700, Linus Torvalds wrote:
> It's basically been that way forever. The code has changed many times,
> but we've basically always had that "wait on bit will wait not until
> the next wakeup, but until it actually sees the bit being clear".
> 
> And by "always" I mean "going back at least to before the git tree". I
> didn't search further. It's not new.
> 
> The only reason I pointed at that (relatively recent) commit from 2021
> is that when we rewrote the page bit waiting logic (for some unrelated
> horrendous scalability issues with tens of thousands of pages on wait
> queues), the rewritten code _tried_ to not do it, and instead go "we
> were woken up by a bit clear op, so now we've waited enough".
> 
> And that then caused problems as explained in that commit c2407cf7d22d
> ("mm: make wait_on_page_writeback() wait for multiple pending
> writebacks") because the wakeups aren't atomic wrt the actual bit
> setting/clearing/testing.

Could we break out if folio->mapping has changed?  Clearly if it has,
we're no longer waiting for the folio we thought we were waiting for,
but for a folio which now belongs to a different file.

maybe this:

+void __folio_wait_writeback(struct address_space *mapping, struct folio *folio)
+{
+       while (folio_test_writeback(folio) && folio->mapping == mapping) {
+               trace_folio_wait_writeback(folio, mapping);
+               folio_wait_bit(folio, PG_writeback);
+       }
+}

[...]

 void folio_wait_writeback(struct folio *folio)
 {
-       while (folio_test_writeback(folio)) {
-               trace_folio_wait_writeback(folio, folio_mapping(folio));
-               folio_wait_bit(folio, PG_writeback);
-       }
+       __folio_wait_writeback(folio->mapping, folio);
 }



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 20:56                                             ` Matthew Wilcox
@ 2024-09-30 22:42                                               ` Davidlohr Bueso
  2024-09-30 23:00                                                 ` Davidlohr Bueso
  2024-09-30 23:53                                               ` Linus Torvalds
  1 sibling, 1 reply; 81+ messages in thread
From: Davidlohr Bueso @ 2024-09-30 22:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Christian Theune, Dave Chinner, Chris Mason,
	Jens Axboe, linux-mm, linux-xfs, linux-fsdevel, linux-kernel,
	Daniel Dao, regressions, regressions

On Mon, 30 Sep 2024, Matthew Wilcox wrote:\n
>On Mon, Sep 30, 2024 at 01:12:37PM -0700, Linus Torvalds wrote:
>> It's basically been that way forever. The code has changed many times,
>> but we've basically always had that "wait on bit will wait not until
>> the next wakeup, but until it actually sees the bit being clear".
>>
>> And by "always" I mean "going back at least to before the git tree". I
>> didn't search further. It's not new.
>>
>> The only reason I pointed at that (relatively recent) commit from 2021
>> is that when we rewrote the page bit waiting logic (for some unrelated
>> horrendous scalability issues with tens of thousands of pages on wait
>> queues), the rewritten code _tried_ to not do it, and instead go "we
>> were woken up by a bit clear op, so now we've waited enough".
>>
>> And that then caused problems as explained in that commit c2407cf7d22d
>> ("mm: make wait_on_page_writeback() wait for multiple pending
>> writebacks") because the wakeups aren't atomic wrt the actual bit
>> setting/clearing/testing.
>
>Could we break out if folio->mapping has changed?  Clearly if it has,
>we're no longer waiting for the folio we thought we were waiting for,
>but for a folio which now belongs to a different file.
>
>maybe this:
>
>+void __folio_wait_writeback(struct address_space *mapping, struct folio *folio)
>+{
>+       while (folio_test_writeback(folio) && folio->mapping == mapping) {

READ_ONCE(folio->mapping)?

>+               trace_folio_wait_writeback(folio, mapping);
>+               folio_wait_bit(folio, PG_writeback);
>+       }
>+}
>
>[...]
>
> void folio_wait_writeback(struct folio *folio)
> {
>-       while (folio_test_writeback(folio)) {
>-               trace_folio_wait_writeback(folio, folio_mapping(folio));
>-               folio_wait_bit(folio, PG_writeback);
>-       }
>+       __folio_wait_writeback(folio->mapping, folio);
> }

Also, the last sentence in the description would need to be dropped.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 22:42                                               ` Davidlohr Bueso
@ 2024-09-30 23:00                                                 ` Davidlohr Bueso
  0 siblings, 0 replies; 81+ messages in thread
From: Davidlohr Bueso @ 2024-09-30 23:00 UTC (permalink / raw)
  To: Matthew Wilcox, Linus Torvalds, Christian Theune, Dave Chinner,
	Chris Mason, Jens Axboe, linux-mm, linux-xfs, linux-fsdevel,
	linux-kernel, Daniel Dao, regressions, regressions

On Mon, 30 Sep 2024, Davidlohr Bueso wrote:\n

>Also, the last sentence in the description would need to be dropped.

No never mind this, it is fine. I was mostly thinking about the pathological
unbounded scenario which is removed, but after re-reading the description
it is still valid.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 20:56                                             ` Matthew Wilcox
  2024-09-30 22:42                                               ` Davidlohr Bueso
@ 2024-09-30 23:53                                               ` Linus Torvalds
  1 sibling, 0 replies; 81+ messages in thread
From: Linus Torvalds @ 2024-09-30 23:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christian Theune, Dave Chinner, Chris Mason, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Mon, 30 Sept 2024 at 13:57, Matthew Wilcox <willy@infradead.org> wrote:
>
> Could we break out if folio->mapping has changed?  Clearly if it has,
> we're no longer waiting for the folio we thought we were waiting for,
> but for a folio which now belongs to a different file.

Sounds like a sane check to me, but it's also not clear that this
would make any difference.

The most likely reason for starvation I can see is a slow thread
(possibly due to cgroup throttling like Christian alluded to) would
simply be continually unlucky, because every time it gets woken up,
some other thread has already dirtied the data and caused writeback
again.

I would think that kind of behavior (perhaps some DB transaction
header kind of folio) would be more likely than the mapping changing
(and then remaining under writeback for some other mapping).

But I really don't know.

I would much prefer to limit the folio_wait_bit() loop based on something else.

For example, the basic reason for that loop (unless there is some
other hidden one) is that the folio writeback bit is not atomic wrt
the wakeup. Maybe we could *make* it atomic, by simply taking the
folio waitqueue lock before clearing the bit?

(Only if it has the "waiters" bit set, of course!)

Handwavy.

Anyway, this writeback handling is nasty. folio_end_writeback() has a
big comment about the subtle folio reference issue too, and ignoring
that we also have this:

        if (__folio_end_writeback(folio))
                folio_wake_bit(folio, PG_writeback);

(which is the cause of the non-atomicity: __folio_end_writeback() will
clear the bit, and return the "did we have waiters", and then
folio_wake_bit() will get the waitqueue lock and wake people up).

And notice how __folio_end_writeback() clears the bit with

                ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);

which does that "clear bit and look it it had waiters" atomically. But
that function then has a comment that says

 * This must only be used for flags which are changed with the folio
 * lock held.  For example, it is unsafe to use for PG_dirty as that
 * can be set without the folio lock held.  [...]

but the code that uses it here does *NOT* hold the folio lock.

I think the comment is wrong, and the code is fine (the important
point is that the folio lock _serialized_ the writers, and while
clearing doesn't hold the folio lock, you can't clear it without
setting it, and setting the writeback flag *does* hold the folio
lock).

So my point is not that this code is wrong, but that this code is all
kinds of subtle and complex. I think it would be good to change the
rules so that we serialize with waiters, but being complex and subtle
means it sounds all kinds of nasty.

            Linus


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 17:34                                     ` Christian Theune
  2024-09-30 18:46                                       ` Linus Torvalds
@ 2024-10-01  0:56                                       ` Chris Mason
  2024-10-01  7:54                                         ` Christian Theune
  2024-10-10  6:29                                         ` Christian Theune
  2024-10-01  2:22                                       ` Dave Chinner
  2 siblings, 2 replies; 81+ messages in thread
From: Chris Mason @ 2024-10-01  0:56 UTC (permalink / raw)
  To: Christian Theune, Linus Torvalds
  Cc: Dave Chinner, Matthew Wilcox, Jens Axboe, linux-mm, linux-xfs,
	linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]

On 9/30/24 7:34 PM, Christian Theune wrote:
> Hi,
> 
> we’ve been running a number of VMs since last week on 6.11. We’ve encountered one hung task situation multiple times now that seems to be resolving itself after a bit of time, though. I do not see spinning CPU during this time.
> 
> The situation seems to be related to cgroups-based IO throttling / weighting so far:
> 
> Here are three examples of similar tracebacks where jobs that do perform a certain amount of IO (either given a weight or given an explicit limit like this:
> 
> IOWeight=10
> IOReadIOPSMax=/dev/vda 188
> IOWriteIOPSMax=/dev/vda 188
> 	
> Telemetry for the affected VM does not show that it actually reaches 188 IOPS (the load is mostly writing) but creates a kind of gaussian curve … 
> 
> The underlying storage and network was completely inconspicuous during the whole time.

Not disagreeing with Linus at all, but given that you've got IO
throttling too, we might really just be waiting.  It's hard to tell
because the hung task timeouts only give you information about one process.

I've attached a minimal version of a script we use here to show all the
D state processes, it might help explain things.  The only problem is
you have to actually ssh to the box and run it when you're stuck.

The idea is to print the stack trace of every D state process, and then
also print out how often each unique stack trace shows up.  When we're
deadlocked on something, there are normally a bunch of the same stack
(say waiting on writeback) and then one jerk sitting around in a
different stack who is causing all the trouble.

(I made some quick changes to make this smaller, so apologies if you get
silly errors)

Example output:

 sudo ./walker.py
15 rcu_tasks_trace_kthread D
[<0>] __wait_rcu_gp+0xab/0x120
[<0>] synchronize_rcu+0x46/0xd0
[<0>] rcu_tasks_wait_gp+0x86/0x2a0
[<0>] rcu_tasks_one_gp+0x300/0x430
[<0>] rcu_tasks_kthread+0x9a/0xb0
[<0>] kthread+0xad/0xe0
[<0>] ret_from_fork+0x1f/0x30

1440504 dd D
[<0>] folio_wait_bit_common+0x149/0x2d0
[<0>] filemap_read+0x7bd/0xd10
[<0>] blkdev_read_iter+0x5b/0x130
[<0>] __x64_sys_read+0x1ce/0x3f0
[<0>] do_syscall_64+0x3d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

-----
stack summary

1 hit:
[<0>] __wait_rcu_gp+0xab/0x120
[<0>] synchronize_rcu+0x46/0xd0
[<0>] rcu_tasks_wait_gp+0x86/0x2a0
[<0>] rcu_tasks_one_gp+0x300/0x430
[<0>] rcu_tasks_kthread+0x9a/0xb0
[<0>] kthread+0xad/0xe0
[<0>] ret_from_fork+0x1f/0x30

-----
[<0>] folio_wait_bit_common+0x149/0x2d0
[<0>] filemap_read+0x7bd/0xd10
[<0>] blkdev_read_iter+0x5b/0x130
[<0>] __x64_sys_read+0x1ce/0x3f0
[<0>] do_syscall_64+0x3d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

[-- Attachment #2: walker.py.txt --]
[-- Type: text/plain, Size: 3020 bytes --]

#!/usr/bin/env python3
#
# this walks all the tasks on the system and prints out a stack trace
# of any tasks waiting in D state.  If you pass -a, it will print out
# the stack of every task it finds.
#
# It also makes a histogram of the common stacks so you can see where
# more of the tasks are.  Usually when we're deadlocked, we care about
# the least common stacks.
#
import sys
import os
import argparse

parser = argparse.ArgumentParser(description='Show kernel stacks')
parser.add_argument('-a', '--all_tasks', action='store_true', help='Dump all stacks')
parser.add_argument('-p', '--pid', type=str, help='Filter on pid')
parser.add_argument('-c', '--command', type=str, help='Filter on command name')
options = parser.parse_args()

stacks = {}

# parse the units from a number and normalize into KB
def parse_number(s):
    try:
        words = s.split()
        unit = words[-1].lower()
        number = int(words[1])
        tag = words[0].lower().rstrip(':')

        # we store in kb
        if unit == "mb":
            number = number * 1024
        elif unit == "gb":
            number = number * 1024 * 1024
        elif unit == "tb":
            number = number * 1024 * 1024

        return (tag, number)
    except:
        return (None, None)

# read /proc/pid/stack and add it to the hashes
def add_stack(path, pid, cmd, status):
    global stacks

    try:
        stack = open(os.path.join(path, "stack"), 'r').read()
    except:
        return

    if (status != "D" and not options.all_tasks):
        return

    print("%s %s %s" % (pid, cmd, status))
    print(stack)
    v = stacks.get(stack)
    if v:
        v += 1
    else:
        v = 1
    stacks[stack] = v


# worker to read all the files for one individual task
def run_one_task(path):

    try:
        stat = open(os.path.join(path, "stat"), 'r').read()
    except:
        return
    words = stat.split()
    pid, cmd, status = words[0:3]

    cmd = cmd.lstrip('(')
    cmd = cmd.rstrip(')')

    if options.command and options.command != cmd:
        return

    add_stack(path, pid, cmd, status)

def print_usage():
    sys.stderr.write("Usage: %s [-a]\n" % sys.argv[0])
    sys.exit(1)

# for a given pid in string form, read the files from proc
def run_pid(name):
    try:
        pid = int(name)
    except:
        return

    p = os.path.join("/proc", name, "task")
    if not os.path.exists(p):
        return

    try:
        for t in os.listdir(p):
            run_one_task(os.path.join(p, t))
    except:
        pass

if options.pid:
    run_pid(options.pid)
else:
    for name in os.listdir("/proc"):
        run_pid(name)

values = {}
for stack, count in stacks.items():
    l = values.setdefault(count, [])
    l.append(stack)

counts = list(values.keys())
counts.sort(reverse=True)
if counts:
    print("-----\nstack summary\n")

for x in counts:
    if x == 1:
        print("1 hit:")
    else:
        print("%d hits: " % x)

    l = values[x]
    for stack in l:
        print(stack)
        print("-----")

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-30 17:34                                     ` Christian Theune
  2024-09-30 18:46                                       ` Linus Torvalds
  2024-10-01  0:56                                       ` Chris Mason
@ 2024-10-01  2:22                                       ` Dave Chinner
  2 siblings, 0 replies; 81+ messages in thread
From: Dave Chinner @ 2024-10-01  2:22 UTC (permalink / raw)
  To: Christian Theune
  Cc: Linus Torvalds, Matthew Wilcox, Chris Mason, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

On Mon, Sep 30, 2024 at 07:34:39PM +0200, Christian Theune wrote:
> Hi,
> 
> we’ve been running a number of VMs since last week on 6.11. We’ve
> encountered one hung task situation multiple times now that seems
> to be resolving itself after a bit of time, though. I do not see
> spinning CPU during this time.
> 
> The situation seems to be related to cgroups-based IO throttling /
> weighting so far:

.....

> Sep 28 03:39:19 <redactedhostname>10 kernel: INFO: task nix-build:94696 blocked for more than 122 seconds.
> Sep 28 03:39:19 <redactedhostname>10 kernel:       Not tainted 6.11.0 #1-NixOS
> Sep 28 03:39:19 <redactedhostname>10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 28 03:39:19 <redactedhostname>10 kernel: task:nix-build       state:D stack:0     pid:94696 tgid:94696 ppid:94695  flags:0x00000002
> Sep 28 03:39:19 <redactedhostname>10 kernel: Call Trace:
> Sep 28 03:39:19 <redactedhostname>10 kernel:  <TASK>
> Sep 28 03:39:19 <redactedhostname>10 kernel:  __schedule+0x3a3/0x1300
> Sep 28 03:39:19 <redactedhostname>10 kernel:  schedule+0x27/0xf0
> Sep 28 03:39:19 <redactedhostname>10 kernel:  io_schedule+0x46/0x70
> Sep 28 03:39:19 <redactedhostname>10 kernel:  folio_wait_bit_common+0x13f/0x340
> Sep 28 03:39:19 <redactedhostname>10 kernel:  folio_wait_writeback+0x2b/0x80
> Sep 28 03:39:19 <redactedhostname>10 kernel:  truncate_inode_partial_folio+0x5e/0x1b0
> Sep 28 03:39:19 <redactedhostname>10 kernel:  truncate_inode_pages_range+0x1de/0x400
> Sep 28 03:39:19 <redactedhostname>10 kernel:  evict+0x29f/0x2c0
> Sep 28 03:39:19 <redactedhostname>10 kernel:  do_unlinkat+0x2de/0x330

That's not what I'd call expected behaviour.

By the time we are that far through eviction of a newly unlinked
inode, we've already removed the inode from the writeback lists and
we've supposedly waited for all writeback to complete.

IOWs, there shouldn't be a cached folio in writeback state at this
point in time - we're supposed to have guaranteed all writeback has
already compelted before we call truncate_inode_pages_final()....

So how are we getting a partial folio that is still under writeback
at this point in time?

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-01  0:56                                       ` Chris Mason
@ 2024-10-01  7:54                                         ` Christian Theune
  2024-10-10  6:29                                         ` Christian Theune
  1 sibling, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-10-01  7:54 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions


> On 1. Oct 2024, at 02:56, Chris Mason <clm@meta.com> wrote:
> 
> I've attached a minimal version of a script we use here to show all the
> D state processes, it might help explain things.  The only problem is
> you have to actually ssh to the box and run it when you're stuck.

Thanks, I’ll dig into this next week when I’m back from vacation.

I can set up alerts when this happens and hope that I’ll be fast enough as the situation does seem to resolve itselve at some point. It’s happened quite a bit in the fleet so I guess I should be able to catch it.

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-09-27 14:58                                           ` Jens Axboe
@ 2024-10-01 21:10                                             ` Kairui Song
  0 siblings, 0 replies; 81+ messages in thread
From: Kairui Song @ 2024-10-01 21:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sam James, Greg KH, stable, clm, Matthew Wilcox, ct, david,
	dqminh, linux-fsdevel, linux-kernel, linux-mm, linux-xfs,
	regressions, regressions, torvalds

On Fri, Sep 27, 2024 at 10:58 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/27/24 8:51 AM, Sam James wrote:
> > Kairui Song <ryncsn@gmail.com> writes:
> >
> >> On Wed, Sep 25, 2024 at 1:16?AM Sam James <sam@gentoo.org> wrote:
> >>>
> >>> Kairui, could you send them to the stable ML to be queued if Willy is
> >>> fine with it?
> >>>
> >>
> >> Hi Sam,
> >
> > Hi Kairui,
> >
> >>
> >> Thanks for adding me to the discussion.
> >>
> >> Yes I'd like to, just not sure if people are still testing and
> >> checking the commits.
> >>
> >> And I haven't sent seperate fix just for stable fix before, so can
> >> anyone teach me, should I send only two patches for a minimal change,
> >> or send a whole series (with some minor clean up patch as dependency)
> >> for minimal conflicts? Or the stable team can just pick these up?
> >
> > Please see https://www.kernel.org/doc/html/v6.11/process/stable-kernel-rules.html.
> >
> > If Option 2 can't work (because of conflicts), please follow Option 3
> > (https://www.kernel.org/doc/html/v6.11/process/stable-kernel-rules.html#option-3).
> >
> > Just explain the background and link to this thread in a cover letter
> > and mention it's your first time. Greg didn't bite me when I fumbled my
> > way around it :)y
> >
> > (greg, please correct me if I'm talking rubbish)
>
> It needs two cherry picks, one of them won't pick cleanly. So I suggest
> whoever submits this to stable does:
>
> 1) Cherry pick the two commits, fixup the simple issue with one of them.
>    I forget what it was since it's been a week and a half since I did
>    it, but it's trivial to fixup.
>
>    Don't forget to add the "commit XXX upstream" to the commit message.
>
> 2) Test that it compiles and boots and send an email to
>    stable@vger.kernel.org with the patches attached and CC the folks in
>    this thread, to help spot if there are mistakes.
>
> and that should be it. Worst case, we'll need a few different patches
> since this affects anything back to 5.19, and each currently maintained
> stable kernel version will need it.
>

Hi Sam, Jens,

Thanks very much, currently maintained upstream kernels are
6.10, 6.6, 6.1, 5.15, 5.10, 5.4, 4.19.

I think only 6.6 and 6.1 need backport, I've sent a fix for these two,
it's three checkpicks from the one 6.10 series so the conflict is
minimal. The stable series can be applied without conflict for both.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-01  0:56                                       ` Chris Mason
  2024-10-01  7:54                                         ` Christian Theune
@ 2024-10-10  6:29                                         ` Christian Theune
  2024-10-11  7:27                                           ` Christian Theune
  1 sibling, 1 reply; 81+ messages in thread
From: Christian Theune @ 2024-10-10  6:29 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions


> On 1. Oct 2024, at 02:56, Chris Mason <clm@meta.com> wrote:
> 
> Not disagreeing with Linus at all, but given that you've got IO
> throttling too, we might really just be waiting.  It's hard to tell
> because the hung task timeouts only give you information about one process.
> 
> I've attached a minimal version of a script we use here to show all the
> D state processes, it might help explain things.  The only problem is
> you have to actually ssh to the box and run it when you're stuck.
> 
> The idea is to print the stack trace of every D state process, and then
> also print out how often each unique stack trace shows up.  When we're
> deadlocked on something, there are normally a bunch of the same stack
> (say waiting on writeback) and then one jerk sitting around in a
> different stack who is causing all the trouble.

I think I should be able to trigger this. I’ve seen around a 100 of those issues over the last week and the chance of it happening correlates with a certain workload that should be easy to trigger. Also, the condition remains for at around 5 minutes, so I should be able to trace it when I see the alert in an interactive session.

I’ve verified I can run your script and I’ll get back to you in the next days.

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-10  6:29                                         ` Christian Theune
@ 2024-10-11  7:27                                           ` Christian Theune
  2024-10-11  9:08                                             ` Christian Theune
  0 siblings, 1 reply; 81+ messages in thread
From: Christian Theune @ 2024-10-11  7:27 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

Hi,

> On 10. Oct 2024, at 08:29, Christian Theune <ct@flyingcircus.io> wrote:
> 
> 
>> On 1. Oct 2024, at 02:56, Chris Mason <clm@meta.com> wrote:
>> 
>> Not disagreeing with Linus at all, but given that you've got IO
>> throttling too, we might really just be waiting.  It's hard to tell
>> because the hung task timeouts only give you information about one process.
>> 
>> I've attached a minimal version of a script we use here to show all the
>> D state processes, it might help explain things.  The only problem is
>> you have to actually ssh to the box and run it when you're stuck.
>> 
>> The idea is to print the stack trace of every D state process, and then
>> also print out how often each unique stack trace shows up.  When we're
>> deadlocked on something, there are normally a bunch of the same stack
>> (say waiting on writeback) and then one jerk sitting around in a
>> different stack who is causing all the trouble.
> 
> I think I should be able to trigger this. I’ve seen around a 100 of those issues over the last week and the chance of it happening correlates with a certain workload that should be easy to trigger. Also, the condition remains for at around 5 minutes, so I should be able to trace it when I see the alert in an interactive session.
> 
> I’ve verified I can run your script and I’ll get back to you in the next days.

I wasn’t able to create a reproducer after all so I’ve set up alerting.

I just caught one right away, but it unblocked quickly after I logged in:

The original message that triggered the alert was:

[Oct11 09:18] INFO: task nix-build:157920 blocked for more than 122 seconds.
[  +0.000937]       Not tainted 6.11.0 #1-NixOS
[  +0.000540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000902] task:nix-build       state:D stack:0     pid:157920 tgid:157920 ppid:157919 flags:0x00000002
[  +0.001098] Call Trace:
[  +0.000306]  <TASK>
[  +0.000279]  __schedule+0x3a3/0x1300
[  +0.000478]  schedule+0x27/0xf0
[  +0.000392]  io_schedule+0x46/0x70
[  +0.000436]  folio_wait_bit_common+0x13f/0x340
[  +0.000572]  ? __pfx_wake_page_function+0x10/0x10
[  +0.000592]  folio_wait_writeback+0x2b/0x80
[  +0.000466]  truncate_inode_partial_folio+0x5e/0x1b0
[  +0.000586]  truncate_inode_pages_range+0x1de/0x400
[  +0.000595]  evict+0x29f/0x2c0
[  +0.000396]  ? iput+0x6e/0x230
[  +0.000408]  ? _atomic_dec_and_lock+0x39/0x50
[  +0.000542]  do_unlinkat+0x2de/0x330
[  +0.000402]  __x64_sys_unlink+0x3f/0x70
[  +0.000419]  do_syscall_64+0xb7/0x200
[  +0.000407]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  +0.000556] RIP: 0033:0x7f2bb5d1056b
[  +0.000473] RSP: 002b:00007ffc013c8588 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
[  +0.000942] RAX: ffffffffffffffda RBX: 000055963c267500 RCX: 00007f2bb5d1056b
[  +0.000859] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055963c268c80
[  +0.000800] RBP: 000055963c267690 R08: 0000000000016020 R09: 0000000000000000
[  +0.000977] R10: 00000000000000f0 R11: 0000000000000206 R12: 00007ffc013c85c8
[  +0.000826] R13: 00007ffc013c85ac R14: 00007ffc013c8ed0 R15: 00005596441e42b0
[  +0.000833]  </TASK>

Then after logging in I caught it once with walker.py - this was about a minute after the alert triggered I think. I’ll add timestamps to walker.py in the next instances:

157920 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] truncate_inode_partial_folio+0x5e/0x1b0
[<0>] truncate_inode_pages_range+0x1de/0x400
[<0>] evict+0x29f/0x2c0
[<0>] do_unlinkat+0x2de/0x330
[<0>] __x64_sys_unlink+0x3f/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] truncate_inode_partial_folio+0x5e/0x1b0
[<0>] truncate_inode_pages_range+0x1de/0x400
[<0>] evict+0x29f/0x2c0
[<0>] do_unlinkat+0x2de/0x330
[<0>] __x64_sys_unlink+0x3f/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

I tried once again after 1-2 seconds and got this:

157920 nix-build D
[<0>] xlog_wait_on_iclog+0x167/0x180 [xfs]
[<0>] xfs_log_force_seq+0x8d/0x150 [xfs]
[<0>] xfs_file_fsync+0x195/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] xlog_wait_on_iclog+0x167/0x180 [xfs]
[<0>] xfs_log_force_seq+0x8d/0x150 [xfs]
[<0>] xfs_file_fsync+0x195/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

and after that the process was done and exited. The last traceback looks unlocked already.

I’m going to gather a few more instances during the day and will post them as a batch later.

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-11  7:27                                           ` Christian Theune
@ 2024-10-11  9:08                                             ` Christian Theune
  2024-10-11 13:06                                               ` Chris Mason
  0 siblings, 1 reply; 81+ messages in thread
From: Christian Theune @ 2024-10-11  9:08 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]


> On 11. Oct 2024, at 09:27, Christian Theune <ct@flyingcircus.io> wrote:
> 
> I’m going to gather a few more instances during the day and will post them as a batch later.

I’ve received 8 alerts in the last hours and managed to get detailed, repeated walker output from two of them:

- FC-41287.log
- FC-41289.log

The other logs are tracebacks as the kernel reported them but the situation resolved itself faster than I could log in and run the walker script. In FC-41289.log I’m also providing output from `ps auxf` to see what the process tree looks like, maybe that helps, too.

My observations: 

- different entry points from the XFS code: unlink, f(data)sync, truncate
- in none of the cases I caught I could see any real competing traffic (aside from maybe occasional journal writes and very little background noise), all affected machines are staging environments that saw basically no usage during that timeframe

I’m stopping my alerting now as it’s been interrupting me every few minutes and I’m running out of steam sitting around waiting for the alert. ;)

Christian


[-- Attachment #2: FC-41281.log --]
[-- Type: application/octet-stream, Size: 2345 bytes --]

[195020.405783] INFO: task nix-build:157920 blocked for more than 122 seconds.
[195020.406720]       Not tainted 6.11.0 #1-NixOS
[195020.407260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[195020.408162] task:nix-build       state:D stack:0     pid:157920 tgid:157920 ppid:157919 flags:0x00000002
[195020.409260] Call Trace:
[195020.409566]  <TASK>
[195020.409845]  __schedule+0x3a3/0x1300
[195020.410323]  schedule+0x27/0xf0
[195020.410715]  io_schedule+0x46/0x70
[195020.411151]  folio_wait_bit_common+0x13f/0x340
[195020.411723]  ? __pfx_wake_page_function+0x10/0x10
[195020.412315]  folio_wait_writeback+0x2b/0x80
[195020.412781]  truncate_inode_partial_folio+0x5e/0x1b0
[195020.413367]  truncate_inode_pages_range+0x1de/0x400
[195020.413962]  evict+0x29f/0x2c0
[195020.414358]  ? iput+0x6e/0x230
[195020.414766]  ? _atomic_dec_and_lock+0x39/0x50
[195020.415308]  do_unlinkat+0x2de/0x330
[195020.415710]  __x64_sys_unlink+0x3f/0x70
[195020.416129]  do_syscall_64+0xb7/0x200
[195020.416536]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[195020.417092] RIP: 0033:0x7f2bb5d1056b
[195020.417565] RSP: 002b:00007ffc013c8588 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
[195020.418507] RAX: ffffffffffffffda RBX: 000055963c267500 RCX: 00007f2bb5d1056b
[195020.419366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055963c268c80
[195020.420166] RBP: 000055963c267690 R08: 0000000000016020 R09: 0000000000000000
[195020.421143] R10: 00000000000000f0 R11: 0000000000000206 R12: 00007ffc013c85c8
[195020.421969] R13: 00007ffc013c85ac R14: 00007ffc013c8ed0 R15: 00005596441e42b0
[195020.422802]  </TASK>

157920 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] truncate_inode_partial_folio+0x5e/0x1b0
[<0>] truncate_inode_pages_range+0x1de/0x400
[<0>] evict+0x29f/0x2c0
[<0>] do_unlinkat+0x2de/0x330
[<0>] __x64_sys_unlink+0x3f/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] truncate_inode_partial_folio+0x5e/0x1b0
[<0>] truncate_inode_pages_range+0x1de/0x400
[<0>] evict+0x29f/0x2c0
[<0>] do_unlinkat+0x2de/0x330
[<0>] __x64_sys_unlink+0x3f/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

[-- Attachment #3: FC-41282.log --]
[-- Type: application/octet-stream, Size: 1781 bytes --]

[208400.702546] INFO: task nix-build:330993 blocked for more than 122 seconds.
[208400.703012]       Not tainted 6.11.0 #1-NixOS
[208400.703260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[208400.703760] task:nix-build       state:D stack:0     pid:330993 tgid:330993 ppid:330992 flags:0x00004002
[208400.704588] Call Trace:
[208400.704744]  <TASK>
[208400.704874]  __schedule+0x3a3/0x1300
[208400.705085]  ? wb_update_bandwidth+0x52/0x70
[208400.705329]  schedule+0x27/0xf0
[208400.705523]  io_schedule+0x46/0x70
[208400.705734]  folio_wait_bit_common+0x13f/0x340
[208400.706021]  ? __pfx_wake_page_function+0x10/0x10
[208400.706296]  folio_wait_writeback+0x2b/0x80
[208400.706644]  __filemap_fdatawait_range+0x80/0xe0
[208400.707037]  filemap_write_and_wait_range+0x85/0xb0
[208400.707436]  xfs_setattr_size+0xd9/0x3c0 [xfs]
[208400.707955]  xfs_vn_setattr+0x81/0x150 [xfs]
[208400.708365]  notify_change+0x2ed/0x4f0
[208400.708638]  ? do_truncate+0x98/0xf0
[208400.708855]  do_truncate+0x98/0xf0
[208400.709050]  do_ftruncate+0xfe/0x160
[208400.709329]  __x64_sys_ftruncate+0x3e/0x70
[208400.709656]  do_syscall_64+0xb7/0x200
[208400.710041]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[208400.710367] RIP: 0033:0x7fab32912c2b
[208400.710614] RSP: 002b:00007ffee94d7e18 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[208400.711093] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fab32912c2b
[208400.711503] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000011
[208400.711947] RBP: 0000000000000011 R08: 0000000000000000 R09: 00007ffee94d7dc0
[208400.712355] R10: 0000000000068000 R11: 0000000000000246 R12: 000055f3aca90b20
[208400.712776] R13: 000055f3acb2d3d8 R14: 0000000000000001 R15: 000055f3acb3a3a8
[208400.713222]  </TASK>

[-- Attachment #4: FC-41283.log --]
[-- Type: application/octet-stream, Size: 1724 bytes --]

[820710.966217] INFO: task nix-build:884370 blocked for more than 122 seconds.
[820710.966643]       Not tainted 6.11.0 #1-NixOS
[820710.966890] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[820710.967307] task:nix-build       state:D stack:0     pid:884370 tgid:884370 ppid:884369 flags:0x00000002
[820710.967913] Call Trace:
[820710.968056]  <TASK>
[820710.968189]  __schedule+0x3a3/0x1300
[820710.968391]  schedule+0x27/0xf0
[820710.968563]  io_schedule+0x46/0x70
[820710.968758]  folio_wait_bit_common+0x13f/0x340
[820710.968998]  ? __pfx_wake_page_function+0x10/0x10
[820710.969258]  folio_wait_writeback+0x2b/0x80
[820710.969485]  truncate_inode_partial_folio+0x5e/0x1b0
[820710.969753]  truncate_inode_pages_range+0x1de/0x400
[820710.970041]  evict+0x29f/0x2c0
[820710.970230]  ? iput+0x6e/0x230
[820710.970399]  ? _atomic_dec_and_lock+0x39/0x50
[820710.970633]  do_unlinkat+0x2de/0x330
[820710.970837]  __x64_sys_unlink+0x3f/0x70
[820710.971042]  do_syscall_64+0xb7/0x200
[820710.971257]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[820710.971528] RIP: 0033:0x7f09e0e2d56b
[820710.971740] RSP: 002b:00007ffed1ddeb58 EFLAGS: 00000202 ORIG_RAX: 0000000000000057
[820710.972131] RAX: ffffffffffffffda RBX: 00005587092aa500 RCX: 00007f09e0e2d56b
[820710.972503] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005587092abc80
[820710.972875] RBP: 00005587092aa690 R08: 0000000000016020 R09: 0000000000000000
[820710.973249] R10: 0000000000000080 R11: 0000000000000202 R12: 00007ffed1ddeb98
[820710.973623] R13: 00007ffed1ddeb7c R14: 00007ffed1ddf4a0 R15: 0000558711268cd0
[820710.973998]  </TASK>
[820710.974122] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

[-- Attachment #5: FC-41285.log --]
[-- Type: application/octet-stream, Size: 1567 bytes --]

[217499.576744] INFO: task nix-build:176931 blocked for more than 122 seconds.
[217499.577213]       Not tainted 6.11.0 #1-NixOS
[217499.577455] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[217499.577910] task:nix-build       state:D stack:0     pid:176931 tgid:176931 ppid:176930 flags:0x00004002
[217499.578417] Call Trace:
[217499.578560]  <TASK>
[217499.578697]  __schedule+0x3a3/0x1300
[217499.578920]  ? xfs_vm_writepages+0x67/0x90 [xfs]
[217499.579333]  schedule+0x27/0xf0
[217499.579515]  io_schedule+0x46/0x70
[217499.579721]  folio_wait_bit_common+0x13f/0x340
[217499.579981]  ? __pfx_wake_page_function+0x10/0x10
[217499.580241]  folio_wait_writeback+0x2b/0x80
[217499.580475]  __filemap_fdatawait_range+0x80/0xe0
[217499.580740]  file_write_and_wait_range+0x88/0xb0
[217499.581004]  xfs_file_fsync+0x5e/0x2a0 [xfs]
[217499.581586]  __x64_sys_fdatasync+0x52/0x90
[217499.581856]  do_syscall_64+0xb7/0x200
[217499.582069]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[217499.582349] RIP: 0033:0x7f56be82f70a
[217499.582563] RSP: 002b:00007fff458db490 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[217499.582988] RAX: ffffffffffffffda RBX: 000055af3319bbf8 RCX: 00007f56be82f70a
[217499.583372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
[217499.583760] RBP: 0000000000000000 R08: 0000000000000001 R09: 000055af3b0fefe8
[217499.584169] R10: 000000000000007e R11: 0000000000000293 R12: 0000000000000001
[217499.584552] R13: 0000000000000197 R14: 000055af3b0ff33e R15: 00007fff458db690
[217499.584951]  </TASK>

[-- Attachment #6: FC-41286.log --]
[-- Type: application/octet-stream, Size: 2237 bytes --]

[217499.576744] INFO: task nix-build:176931 blocked for more than 122 seconds.
[217499.577213]       Not tainted 6.11.0 #1-NixOS
[217499.577455] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[217499.577910] task:nix-build       state:D stack:0     pid:176931 tgid:176931 ppid:176930 flags:0x00004002
[217499.578417] Call Trace:
[217499.578560]  <TASK>
[217499.578697]  __schedule+0x3a3/0x1300
[217499.578920]  ? xfs_vm_writepages+0x67/0x90 [xfs]
[217499.579333]  schedule+0x27/0xf0
[217499.579515]  io_schedule+0x46/0x70
[217499.579721]  folio_wait_bit_common+0x13f/0x340
[217499.579981]  ? __pfx_wake_page_function+0x10/0x10
[217499.580241]  folio_wait_writeback+0x2b/0x80
[217499.580475]  __filemap_fdatawait_range+0x80/0xe0
[217499.580740]  file_write_and_wait_range+0x88/0xb0
[217499.581004]  xfs_file_fsync+0x5e/0x2a0 [xfs]
[217499.581586]  __x64_sys_fdatasync+0x52/0x90
[217499.581856]  do_syscall_64+0xb7/0x200
[217499.582069]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[217499.582349] RIP: 0033:0x7f56be82f70a
[217499.582563] RSP: 002b:00007fff458db490 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[217499.582988] RAX: ffffffffffffffda RBX: 000055af3319bbf8 RCX: 00007f56be82f70a
[217499.583372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
[217499.583760] RBP: 0000000000000000 R08: 0000000000000001 R09: 000055af3b0fefe8
[217499.584169] R10: 000000000000007e R11: 0000000000000293 R12: 0000000000000001
[217499.584552] R13: 0000000000000197 R14: 000055af3b0ff33e R15: 00007fff458db690
[217499.584951]  </TASK>
[217565.040136] systemd[1]: fc-agent.service: Deactivated successfully.
[217565.041118] systemd[1]: Finished Flying Circus Management Task.
[217565.041814] systemd[1]: fc-agent.service: Consumed 18.400s CPU time, received 28.9M IP traffic, sent 158.2K IP traffic.
[217637.400585] systemd[1]: Created slice Slice /user/1003.
[217637.407307] systemd[1]: Starting User Runtime Directory /run/user/1003...
[217637.426906] systemd[1]: Finished User Runtime Directory /run/user/1003.
[217637.439512] systemd[1]: Starting User Manager for UID 1003...
[217637.644565] systemd[1]: Started User Manager for UID 1003.
[217637.652243] systemd[1]: Started Session 3 of User ctheune.

[-- Attachment #7: FC-41287.log --]
[-- Type: application/octet-stream, Size: 5752 bytes --]

[215042.580872] INFO: task nix-build:240798 blocked for more than 122 seconds.
[215042.581318]       Not tainted 6.11.0 #1-NixOS
[215042.581624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[215042.582070] task:nix-build       state:D stack:0     pid:240798 tgid:240798 ppid:240797 flags:0x00000002
[215042.582573] Call Trace:
[215042.582713]  <TASK>
[215042.582860]  __schedule+0x3a3/0x1300
[215042.583069]  ? xfs_vm_writepages+0x67/0x90 [xfs]
[215042.583469]  schedule+0x27/0xf0
[215042.583651]  io_schedule+0x46/0x70
[215042.583859]  folio_wait_bit_common+0x13f/0x340
[215042.584108]  ? __pfx_wake_page_function+0x10/0x10
[215042.584364]  folio_wait_writeback+0x2b/0x80
[215042.584594]  __filemap_fdatawait_range+0x80/0xe0
[215042.584856]  file_write_and_wait_range+0x88/0xb0
[215042.585109]  xfs_file_fsync+0x5e/0x2a0 [xfs]
[215042.585471]  __x64_sys_fdatasync+0x52/0x90
[215042.585698]  do_syscall_64+0xb7/0x200
[215042.585916]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[215042.586191] RIP: 0033:0x7ff0c831270a
[215042.586406] RSP: 002b:00007ffe1482b960 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[215042.586818] RAX: ffffffffffffffda RBX: 0000564877f8abf8 RCX: 00007ff0c831270a
[215042.587197] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
[215042.587574] RBP: 0000000000000000 R08: 0000000000000001 R09: 000056487ff73f78
[215042.587960] R10: 0000000000000082 R11: 0000000000000293 R12: 0000000000000001
[215042.588337] R13: 00000000000001a0 R14: 000056487ff742e0 R15: 00007ffe1482bb60
[215042.588716]  </TASK>
[215120.626730] systemd[1]: Created slice Slice /user/1003.
[215120.633868] systemd[1]: Starting User Runtime Directory /run/user/1003...
[215120.664698] systemd[1]: Finished User Runtime Directory /run/user/1003.
[215120.673752] systemd[1]: Starting User Manager for UID 1003...
[215121.175903] systemd[1]: Started User Manager for UID 1003.
[215121.182026] systemd[1]: Started Session 1 of User ctheune.

[215135.177690429]

240798 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

[215140.478882357]

240798 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----


[215145.029642882]

240798 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f


-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----


[215150.173831058]

240798 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

[215155.155491198]

240798 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] file_write_and_wait_range+0x88/0xb0
[<0>] xfs_file_fsync+0x5e/0x2a0 [xfs]
[<0>] __x64_sys_fdatasync+0x52/0x90
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----


[215163.267601] systemd[1]: fc-agent.service: Deactivated successfully.
[215163.268172] systemd[1]: Finished Flying Circus Management Task.
[215163.269162] systemd[1]: fc-agent.service: Consumed 19.683s CPU time, received 28.9M IP traffic, sent 152.5K IP traffic.

[-- Attachment #8: FC-41288.log --]
[-- Type: application/octet-stream, Size: 1684 bytes --]

[217748.915126] INFO: task nix-build:198761 blocked for more than 122 seconds.
[217748.916085]       Not tainted 6.11.0 #1-NixOS
[217748.916639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[217748.917636] task:nix-build       state:D stack:0     pid:198761 tgid:198761 ppid:198760 flags:0x00000002
[217748.918897] Call Trace:
[217748.919271]  <TASK>
[217748.919593]  __schedule+0x3a3/0x1300
[217748.920118]  ? xfs_btree_insrec+0x32c/0x570 [xfs]
[217748.921070]  schedule+0x27/0xf0
[217748.921483]  io_schedule+0x46/0x70
[217748.921921]  folio_wait_bit_common+0x13f/0x340
[217748.922508]  ? __pfx_wake_page_function+0x10/0x10
[217748.923115]  folio_wait_writeback+0x2b/0x80
[217748.923647]  truncate_inode_partial_folio+0x5e/0x1b0
[217748.924286]  truncate_inode_pages_range+0x1de/0x400
[217748.924903]  evict+0x29f/0x2c0
[217748.925325]  ? iput+0x6e/0x230
[217748.925722]  ? _atomic_dec_and_lock+0x39/0x50
[217748.926290]  do_unlinkat+0x2de/0x330
[217748.926751]  __x64_sys_unlink+0x3f/0x70
[217748.927247]  do_syscall_64+0xb7/0x200
[217748.927716]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[217748.928375] RIP: 0033:0x7f177b02d56b
[217748.928860] RSP: 002b:00007ffe6bb88658 EFLAGS: 00000202 ORIG_RAX: 0000000000000057
[217748.929804] RAX: ffffffffffffffda RBX: 000055fd811835b0 RCX: 00007f177b02d56b
[217748.930700] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055fd81184d30
[217748.931615] RBP: 000055fd81183740 R08: 0000000000016020 R09: 0000000000000000
[217748.932508] R10: 0000000000000030 R11: 0000000000000202 R12: 00007ffe6bb88698
[217748.933422] R13: 00007ffe6bb8867c R14: 00007ffe6bb88fa0 R15: 000055fd890e1a50
[217748.934341]  </TASK>

[-- Attachment #9: FC-41289.log --]
[-- Type: application/octet-stream, Size: 166376 bytes --]

[218237.291578] INFO: task nix-build:176536 blocked for more than 122 seconds.
[218237.292026]       Not tainted 6.11.0 #1-NixOS
[218237.292261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[218237.292695] task:nix-build       state:D stack:0     pid:176536 tgid:176536 ppid:176535 flags:0x00000002
[218237.293188] Call Trace:
[218237.293326]  <TASK>
[218237.293458]  __schedule+0x3a3/0x1300
[218237.293673]  ? xfs_vm_writepages+0x67/0x90 [xfs]
[218237.294063]  schedule+0x27/0xf0
[218237.294240]  io_schedule+0x46/0x70
[218237.294426]  folio_wait_bit_common+0x13f/0x340
[218237.294696]  ? __pfx_wake_page_function+0x10/0x10
[218237.295038]  folio_wait_writeback+0x2b/0x80
[218237.295270]  __filemap_fdatawait_range+0x80/0xe0
[218237.295541]  filemap_write_and_wait_range+0x85/0xb0
[218237.295804]  xfs_setattr_size+0xd9/0x3c0 [xfs]
[218237.296173]  xfs_vn_setattr+0x81/0x150 [xfs]
[218237.296530]  notify_change+0x2ed/0x4f0
[218237.296777]  ? do_truncate+0x98/0xf0
[218237.296996]  do_truncate+0x98/0xf0
[218237.297183]  do_ftruncate+0xfe/0x160
[218237.297378]  __x64_sys_ftruncate+0x3e/0x70
[218237.297632]  do_syscall_64+0xb7/0x200
[218237.297836]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[218237.298111] RIP: 0033:0x7f0453b12c2b
[218237.298316] RSP: 002b:00007ffe9f6db828 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[218237.298742] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0453b12c2b
[218237.299116] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000011
[218237.299492] RBP: 0000000000000011 R08: 0000000000000000 R09: 00007ffe9f6db7d0
[218237.299869] R10: 0000000000018000 R11: 0000000000000246 R12: 00005562ff27fb20
[218237.300241] R13: 00005562ff31c3d8 R14: 0000000000000001 R15: 00005562ff3293a8
[218237.300631]  </TASK>
[218261.984778] systemd[1]: Created slice Slice /user/1003.
[218261.989545] systemd[1]: Starting User Runtime Directory /run/user/1003...
[218262.000938] systemd[1]: Finished User Runtime Directory /run/user/1003.
[218262.005583] systemd[1]: Starting User Manager for UID 1003...
[218262.105005] systemd[1]: Started User Manager for UID 1003.
[218262.109759] systemd[1]: Started Session 7 of User ctheune.


[218269.921479398]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

[218274.052571366]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

[218278.588908363]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

[218283.450120071]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

[218287.296514668]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

[218290.957136179]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

# dstat
You did not select any stats, using -cdngy by default.
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  5   1  92   1   1|7869B  145k|   0     0 |  17B  442B| 611   801
  2   1   0  97   0|   0  1488k|  21k 7707B|   0     0 |1030  1385
  2   1   0  96   1|   0  1076k|  20k 5616B|   0     0 | 981  1296
  1   0   0  99   0|   0  1196k|  11k  454B|   0     0 | 711   987 ^C

[218298.44665228]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           2  0.0  0.0      0     0 ?        S    Oct08   0:00 [kthreadd]
root           3  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [pool_workqueue_release]
root           4  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-rcu_gp]
root           5  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-sync_wq]
root           6  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-slub_flushwq]
root           7  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-netns]
root          10  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/0:0H-kblockd]
root          13  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mm_percpu_wq]
root          14  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_kthread]
root          15  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_rude_kthread]
root          16  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_trace_kthread]
root          17  0.0  0.0      0     0 ?        S    Oct08   0:25  \_ [ksoftirqd/0]
root          18  0.0  0.0      0     0 ?        I    Oct08   1:12  \_ [rcu_preempt]
root          19  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_par_gp_kthread_worker/0]
root          20  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_gp_kthread_worker]
root          21  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [migration/0]
root          22  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [idle_inject/0]
root          23  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [cpuhp/0]
root          24  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kdevtmpfs]
root          25  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-inet_frag_wq]
root          26  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kauditd]
root          27  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [khungtaskd]
root          28  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [oom_reaper]
root          29  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-writeback]
root          30  0.0  0.0      0     0 ?        S    Oct08   0:02  \_ [kcompactd0]
root          31  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [ksmd]
root          32  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [khugepaged]
root          33  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kintegrityd]
root          34  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kblockd]
root          35  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-blkcg_punt_bio]
root          36  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [irq/9-acpi]
root          37  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md]
root          38  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md_bitmap]
root          39  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-devfreq_wq]
root          44  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kswapd0]
root          45  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kthrotld]
root          46  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mld]
root          47  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ipv6_addrconf]
root          54  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kstrp]
root          55  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/u5:0]
root         102  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [hwrng]
root         109  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [watchdogd]
root         149  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ata_sff]
root         150  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_0]
root         151  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_0]
root         152  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_1]
root         153  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_1]
root         184  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfsalloc]
root         185  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs_mru_cache]
root         186  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vda1]
root         187  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vda1]
root         188  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vda1]
root         189  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vda1]
root         190  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vda1]
root         191  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vda1]
root         192  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vda1]
root         193  0.0  0.0      0     0 ?        S    Oct08   0:20  \_ [xfsaild/vda1]
root         531  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root         644  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vdc1]
root         645  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vdc1]
root         646  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vdc1]
root         647  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vdc1]
root         648  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vdc1]
root         649  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vdc1]
root         650  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vdc1]
root         651  0.0  0.0      0     0 ?        S    Oct08   0:05  \_ [xfsaild/vdc1]
root         723  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ttm]
root        1286  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-tls-strp]
root        2772  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root      171717  0.0  0.0      0     0 ?        I    09:03   0:00  \_ [kworker/u4:3-events_power_efficient]
root      174477  0.0  0.0      0     0 ?        I    10:01   0:00  \_ [kworker/0:2-xfs-conv/vdc1]
root      174683  0.0  0.0      0     0 ?        I    10:06   0:00  \_ [kworker/u4:2-events_power_efficient]
root      175378  0.0  0.0      0     0 ?        I    10:20   0:00  \_ [kworker/u4:4-events_unbound]
root      176049  0.0  0.0      0     0 ?        I    10:34   0:00  \_ [kworker/0:3-xfs-conv/vdc1]
root      176150  0.0  0.0      0     0 ?        I<   10:35   0:00  \_ [kworker/0:1H-xfs-log/vda1]
root      176358  0.0  0.0      0     0 ?        I    10:40   0:00  \_ [kworker/0:0-xfs-conv/vdc1]
root      176402  0.0  0.0      0     0 ?        I    10:41   0:00  \_ [kworker/u4:0-events_power_efficient]
root      176544  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:1-writeback]
root      176545  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:5-writeback]
root      176546  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:6-events_power_efficient]
root      176549  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:1-xfs-conv/vdc1]
root      176550  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:4-xfs-conv/vdc1]
root      176551  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:5-xfs-conv/vdc1]
root      176552  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:6-xfs-conv/vdc1]
root      176553  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:7-xfs-conv/vdc1]
root      176554  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:8-xfs-conv/vdc1]
root      176555  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:9-xfs-conv/vdc1]
root      176556  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:10-xfs-conv/vdc1]
root      176557  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:11-xfs-conv/vdc1]
root      176558  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:12-kthrotld]
root      176559  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:13-xfs-conv/vdc1]
root      176560  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:14-xfs-conv/vdc1]
root      176561  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:15-xfs-conv/vdc1]
root      176562  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:16-xfs-conv/vdc1]
root      176563  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:17-xfs-conv/vdc1]
root      176564  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:18-xfs-conv/vdc1]
root      176565  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:19-xfs-conv/vdc1]
root      176566  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:20-xfs-conv/vdc1]
root      176567  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:21-xfs-conv/vdc1]
root      176568  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:22-xfs-conv/vdc1]
root      176569  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:23-xfs-conv/vdc1]
root      176570  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:24-xfs-conv/vdc1]
root      176571  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:25-xfs-conv/vdc1]
root      176572  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:26-xfs-conv/vdc1]
root      176573  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:27-xfs-conv/vdc1]
root      176574  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:28-xfs-conv/vdc1]
root      176575  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:29-xfs-conv/vdc1]
root      176576  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:30-xfs-conv/vdc1]
root      176577  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:31-xfs-conv/vdc1]
root      176578  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:32-xfs-conv/vdc1]
root      176579  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:33-xfs-conv/vdc1]
root      176580  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:34-xfs-conv/vdc1]
root      176581  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:35-xfs-conv/vdc1]
root      176582  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:36-xfs-conv/vdc1]
root      176583  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:37-xfs-conv/vdc1]
root      176584  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:38-xfs-conv/vdc1]
root      176585  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:39-xfs-conv/vdc1]
root      176586  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:40-xfs-conv/vdc1]
root      176587  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:41-xfs-buf/vdc1]
root      176588  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:42-xfs-conv/vdc1]
root      176589  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:43-xfs-conv/vdc1]
root      176590  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:44-xfs-conv/vdc1]
root      176591  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:45-xfs-conv/vdc1]
root      176592  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:46-xfs-conv/vdc1]
root      176593  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:47-xfs-conv/vdc1]
root      176594  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:48-xfs-conv/vdc1]
root      176595  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:49-xfs-conv/vdc1]
root      176596  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:50-xfs-conv/vdc1]
root      176597  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:51-xfs-conv/vdc1]
root      176598  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:52-xfs-conv/vdc1]
root      176599  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:53-xfs-conv/vdc1]
root      176600  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:54-xfs-conv/vdc1]
root      176601  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:55-xfs-conv/vdc1]
root      176602  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:56-xfs-conv/vdc1]
root      176603  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:57-xfs-conv/vdc1]
root      176604  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:58-xfs-conv/vdc1]
root      176605  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:59-xfs-conv/vdc1]
root      176606  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:60-xfs-conv/vdc1]
root      176607  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:61-xfs-conv/vdc1]
root      176608  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:62-xfs-conv/vdc1]
root      176609  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:63-xfs-conv/vdc1]
root      176610  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:64-xfs-conv/vdc1]
root      176611  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:65-xfs-conv/vdc1]
root      176612  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:66-xfs-conv/vdc1]
root      176613  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:67-xfs-conv/vdc1]
root      176614  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:68-xfs-conv/vdc1]
root      176615  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:69-xfs-conv/vdc1]
root      176616  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:70-xfs-conv/vdc1]
root      176617  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:71-xfs-conv/vdc1]
root      176618  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:72-xfs-conv/vdc1]
root      176619  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:73-xfs-conv/vdc1]
root      176620  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:74-xfs-conv/vdc1]
root      176621  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:75-xfs-conv/vdc1]
root      176622  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:76-xfs-conv/vdc1]
root      176623  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:77-xfs-conv/vdc1]
root      176624  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:78-xfs-conv/vdc1]
root      176625  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:79-xfs-conv/vdc1]
root      176626  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:80-xfs-conv/vdc1]
root      176627  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:81-xfs-conv/vdc1]
root      176628  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:82-xfs-conv/vdc1]
root      176629  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:83-xfs-conv/vdc1]
root      176630  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:84-xfs-conv/vdc1]
root      176631  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:85-xfs-conv/vdc1]
root      176632  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:86-xfs-conv/vdc1]
root      176633  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:87-xfs-conv/vdc1]
root      176634  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:88-xfs-conv/vdc1]
root      176635  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:89-xfs-conv/vdc1]
root      176636  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:90-xfs-conv/vdc1]
root      176637  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:91-xfs-conv/vdc1]
root      176638  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:92-xfs-conv/vdc1]
root      176639  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:93-xfs-conv/vdc1]
root      176640  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:94-xfs-conv/vdc1]
root      176641  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:95-xfs-conv/vdc1]
root      176642  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:96-xfs-conv/vdc1]
root      176643  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:97-xfs-conv/vdc1]
root      176644  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:98-xfs-conv/vdc1]
root      176645  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:99-xfs-conv/vdc1]
root      176646  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:100-xfs-conv/vdc1]
root      176647  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:101-xfs-conv/vdc1]
root      176648  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:102-xfs-conv/vdc1]
root      176649  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:103-xfs-conv/vdc1]
root      176650  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:104-xfs-conv/vdc1]
root      176651  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:105-xfs-conv/vdc1]
root      176652  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:106-xfs-conv/vdc1]
root      176653  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:107-xfs-conv/vdc1]
root      176654  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:108-xfs-conv/vdc1]
root      176655  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:109-xfs-conv/vdc1]
root      176656  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:110-xfs-conv/vdc1]
root      176657  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:111-xfs-conv/vdc1]
root      176658  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:112-xfs-conv/vdc1]
root      176659  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:113-xfs-conv/vdc1]
root      176660  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:114-xfs-conv/vdc1]
root      176661  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:115-xfs-conv/vdc1]
root      176662  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:116-xfs-conv/vdc1]
root      176663  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:117-xfs-conv/vdc1]
root      176664  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:118-xfs-conv/vdc1]
root      176665  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:119-xfs-conv/vdc1]
root      176666  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:120-xfs-conv/vdc1]
root      176667  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:121-xfs-conv/vdc1]
root      176668  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:122-xfs-conv/vdc1]
root      176669  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:123-xfs-conv/vdc1]
root      176670  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:124-xfs-conv/vdc1]
root      176671  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:125-xfs-conv/vdc1]
root      176672  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:126-xfs-conv/vdc1]
root      176673  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:127-xfs-conv/vdc1]
root      176674  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:128-xfs-conv/vdc1]
root      176675  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:129-xfs-conv/vdc1]
root      176676  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:130-xfs-conv/vdc1]
root      176677  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:131-xfs-conv/vdc1]
root      176678  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:132-xfs-conv/vdc1]
root      176679  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:133-xfs-conv/vdc1]
root      176680  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:134-xfs-conv/vdc1]
root      176681  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:135-xfs-conv/vdc1]
root      176682  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:136-xfs-conv/vdc1]
root      176683  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:137-xfs-conv/vdc1]
root      176684  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:138-xfs-conv/vdc1]
root      176685  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:139-xfs-conv/vdc1]
root      176686  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:140-xfs-conv/vdc1]
root      176687  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:141-xfs-conv/vdc1]
root      176688  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:142-xfs-conv/vdc1]
root      176689  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:143-xfs-conv/vdc1]
root      176690  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:144-xfs-conv/vdc1]
root      176691  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:145-xfs-conv/vdc1]
root      176692  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:146-xfs-conv/vdc1]
root      176693  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:147-xfs-conv/vdc1]
root      176694  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:148-xfs-conv/vdc1]
root      176695  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:149-xfs-conv/vdc1]
root      176696  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:150-xfs-conv/vdc1]
root      176697  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:151-xfs-conv/vdc1]
root      176698  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:152-xfs-conv/vdc1]
root      176699  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:153-xfs-conv/vdc1]
root      176700  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:154-xfs-conv/vdc1]
root      176701  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:155-xfs-conv/vdc1]
root      176702  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:156-xfs-conv/vdc1]
root      176703  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:157-xfs-conv/vdc1]
root      176704  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:158-xfs-buf/vda1]
root      176705  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:159-xfs-conv/vdc1]
root      176706  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:160-xfs-conv/vdc1]
root      176707  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:161-xfs-conv/vdc1]
root      176708  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:162-xfs-conv/vdc1]
root      176709  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:163-xfs-conv/vdc1]
root      176710  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:164-xfs-conv/vdc1]
root      176711  0.2  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:165-xfs-conv/vda1]
root      176712  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:166-xfs-conv/vdc1]
root      176713  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:167-xfs-conv/vdc1]
root      176714  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:168-xfs-conv/vdc1]
root      176715  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:169-xfs-conv/vdc1]
root      176716  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:170-xfs-conv/vdc1]
root      176717  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:171-xfs-conv/vdc1]
root      176718  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:172-xfs-conv/vdc1]
root      176719  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:173-xfs-conv/vdc1]
root      176720  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:174-xfs-conv/vdc1]
root      176721  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:175-xfs-conv/vdc1]
root      176722  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:176-xfs-conv/vdc1]
root      176723  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:177-xfs-conv/vdc1]
root      176724  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:178-xfs-conv/vdc1]
root      176725  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:179-xfs-conv/vdc1]
root      176726  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:180-xfs-conv/vdc1]
root      176727  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:181-xfs-conv/vdc1]
root      176728  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:182-xfs-conv/vdc1]
root      176729  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:183-xfs-conv/vdc1]
root      176730  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:184-xfs-conv/vdc1]
root      176731  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:185-xfs-conv/vdc1]
root      176732  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:186-xfs-conv/vdc1]
root      176733  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:187-xfs-conv/vdc1]
root      176734  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:188-xfs-conv/vdc1]
root      176735  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:189-xfs-conv/vdc1]
root      176736  0.3  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:190-cgroup_destroy]
root      176737  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:191-xfs-conv/vdc1]
root      176738  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:192-xfs-conv/vdc1]
root      176739  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:193-xfs-conv/vdc1]
root      176740  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:194-xfs-conv/vdc1]
root      176741  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:195-xfs-conv/vdc1]
root      176742  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:196-xfs-conv/vdc1]
root      176743  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:197-xfs-conv/vdc1]
root      176744  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:198-xfs-conv/vdc1]
root      176745  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:199-xfs-conv/vdc1]
root      176746  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:200-xfs-conv/vdc1]
root      176747  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:201-xfs-conv/vdc1]
root      176748  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:202-xfs-conv/vdc1]
root      176749  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:203-xfs-conv/vdc1]
root      176750  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:204-xfs-conv/vdc1]
root      176751  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:205-xfs-conv/vdc1]
root      176752  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:206-xfs-buf/vda1]
root      176753  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:207-xfs-conv/vdc1]
root      176754  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:208-xfs-conv/vdc1]
root      176755  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:209-xfs-conv/vdc1]
root      176756  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:210-xfs-conv/vdc1]
root      176757  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:211-xfs-conv/vdc1]
root      176758  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:212-xfs-conv/vdc1]
root      176759  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:213-xfs-conv/vdc1]
root      176760  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:214-xfs-conv/vdc1]
root      176761  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:215-xfs-conv/vdc1]
root      176762  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:216-xfs-conv/vdc1]
root      176763  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:217-xfs-conv/vdc1]
root      176764  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:218-xfs-conv/vdc1]
root      176765  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:219-xfs-conv/vdc1]
root      176766  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:220-xfs-conv/vdc1]
root      176767  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:221-xfs-conv/vdc1]
root      176768  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:222-xfs-conv/vdc1]
root      176769  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:223-xfs-conv/vdc1]
root      176770  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:224-xfs-conv/vdc1]
root      176771  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:225-xfs-conv/vdc1]
root      176772  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:226-xfs-conv/vdc1]
root      176773  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:227-xfs-conv/vdc1]
root      176774  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:228-xfs-conv/vdc1]
root      176775  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:229-xfs-conv/vdc1]
root      176776  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:230-xfs-conv/vdc1]
root      176777  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:231-xfs-conv/vdc1]
root      176778  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:232-xfs-conv/vdc1]
root      176779  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:233-xfs-conv/vdc1]
root      176780  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:234-xfs-conv/vdc1]
root      176781  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:235-xfs-conv/vdc1]
root      176782  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:236-xfs-conv/vdc1]
root      176783  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:237-xfs-conv/vdc1]
root      176784  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:238-xfs-conv/vdc1]
root      176785  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:239-xfs-conv/vdc1]
root      176786  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:240-xfs-conv/vdc1]
root      176787  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:241-xfs-conv/vdc1]
root      176788  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:242-xfs-conv/vdc1]
root      176789  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:243-xfs-conv/vdc1]
root      176790  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:244-xfs-conv/vdc1]
root      176791  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:245-xfs-conv/vdc1]
root      176792  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:246-xfs-conv/vdc1]
root      176793  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:247-xfs-conv/vdc1]
root      176794  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:248-xfs-conv/vdc1]
root      176795  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:249-xfs-conv/vdc1]
root      176796  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:250-xfs-conv/vdc1]
root      176797  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:251-xfs-conv/vdc1]
root      176798  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:252-xfs-conv/vdc1]
root      176799  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:253-xfs-conv/vdc1]
root      176800  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:254-xfs-conv/vdc1]
root      176801  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:255-xfs-conv/vdc1]
root      176802  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:256-xfs-buf/vda1]
root      176803  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:257-xfs-conv/vdc1]
root      176804  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:7-events_unbound]
root      176813  0.0  0.0      0     0 ?        I<   10:44   0:00  \_ [kworker/0:2H-kblockd]
root      176814  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/u4:8-events_unbound]
root      176815  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/0:258]
root           1  0.0  0.3  21852 13056 ?        Ss   Oct08   0:19 /run/current-system/systemd/lib/systemd/systemd
root         399  0.0  1.8 139764 75096 ?        Ss   Oct08   0:13 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-journald
root         455  0.0  0.2  33848  8168 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-udevd
systemd+     811  0.0  0.1  16800  6660 ?        Ss   Oct08   0:10 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-oomd
systemd+     816  0.0  0.1  91380  7952 ?        Ssl  Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-timesyncd
root         837  0.0  0.0  80596  3288 ?        Ssl  Oct08   1:37 /nix/store/ag3xk1l8ij06vx434abk8643f8p7i08c-qemu-host-cpu-only-8.2.6-ga/bin/qemu-ga --statedir /run/qemu-ga
root         840  0.0  0.0 226896  1984 ?        Ss   Oct08   0:00 /nix/store/k34f0d079arcgfjsq78gpkdbd6l6nnq4-cron-4.1/bin/cron -n
message+     850  0.0  0.1  13776  6080 ?        Ss   Oct08   0:05 /nix/store/0hm8vh65m378439kl16xv0p6l7c51asj-dbus-1.14.10/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         876  0.0  0.1  17468  7968 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-logind
nscd        1074  0.0  0.1 555748  6016 ?        Ssl  Oct08   0:28 /nix/store/zza9hvd6iawqdcxvinf4yxv580av3s9f-nsncd-unstable-2024-01-16/bin/nsncd
telegraf    1092  0.3  3.4 6344672 138484 ?      S<Lsl Oct08  13:05 /nix/store/8bnbkyh26j97l0pw02gb7lngh4n6k3r5-telegraf-1.30.3/bin/telegraf -config /nix/store/nh4k7bx1asm0kn1klhbmg52wk1qdcwpw-config.toml -config-directory /nix/store/dj77wnb5j
root        1093  0.0  1.5 1109328 60864 ?       Ssl  Oct08   2:24 /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/fn9jcsr2kp2kq3m2qd6qrkv6xh7jcj5g-fail2ban-1.0.2/bin/.fail2ban-server-wrapped -xf start
sensucl+    1094  0.0  0.9 898112 38340 ?        Ssl  Oct08   1:41 /nix/store/qqc6v89xn0g2w123wx85blkpc4pz2ags-ruby-2.7.8/bin/ruby /nix/store/dpvf0jdq1mbrdc90aapyrn2wvjbpckyv-sensu-check-env/bin/sensu-client -L warn -c /nix/store/ly677hg5b7szz
root        1098  0.0  0.1  11564  7568 ?        Ss   Oct08   0:00 sshd: /nix/store/1m888byzaqaig6azrrfpmjdyhgfliaga-openssh-9.7p1/bin/sshd -D -f /etc/ssh/sshd_config [listener] 0 of 10-100 startups
root      176967  0.0  0.2  14380  9840 ?        Ss   10:47   0:00  \_ sshd: ctheune [priv]
ctheune   176988  0.2  0.1  14540  5856 ?        S    10:47   0:00      \_ sshd: ctheune@pts/0
ctheune   176992  0.0  0.1 230756  5968 pts/0    Ss   10:47   0:00          \_ -bash
root      176998  0.0  0.0 228796  3956 pts/0    S+   10:47   0:00              \_ sudo -i
root      177001  0.0  0.0 228796  1604 pts/1    Ss   10:47   0:00                  \_ sudo -i
root      177002  0.0  0.1 230892  6064 pts/1    S    10:47   0:00                      \_ -bash
root      177041  0.0  0.1 232344  4264 pts/1    R+   10:48   0:00                          \_ ps auxf
root        1101  0.0  0.0 226928  1944 tty1     Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login --noclear --keep-baud tty1 115200,38400,9600 linux
root        1102  0.0  0.0 226928  2192 ttyS0    Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login ttyS0 --keep-baud vt220
_du4651+    1105  0.0  2.2 2505204 90824 ?       Ssl  Oct08   1:15 /nix/store/ff5j2is3di7praysyv232wfvcq7hvkii-filebeat-oss-7.17.16/bin/filebeat -e -c /nix/store/xlb56lv0f3j03l3v34x5jfvq8wng18ww-filebeat-journal-services19.gocept.net.json -pat
mysql       2809  0.3 18.6 4784932 750856 ?      Ssl  Oct08  11:47 /nix/store/9iq211dy95nqn484nx5z5mv3c7pc2h27-percona-server_lts-8.0.36-28/bin/mysqld --defaults-extra-file=/nix/store/frvxmffp9fpgq06bx89rgczyn6k6i51y-my.cnf --user=mysql --data
root      176527  0.0  0.0 227904  3236 ?        SNs  10:43   0:00 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/s8g5ls9d611hjq5psyd15sqbpqgrlwck-unit-script-fc-agent-start/bin/fc-agent-start
root      176535  0.1  1.1 279068 46452 ?        SN   10:43   0:00  \_ /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/gavi1rlv3ja79vl5hg3lgh07absa8yb9-python3.11-fc-agent-1.0/bin/.fc-manage-wrapped --enc-p
root      176536  3.5  1.8 635400 72368 ?        DNl  10:43   0:09      \_ nix-build --no-build-output <nixpkgs/nixos> -A system -I https://hydra.flyingcircus.io/build/496886/download/1/nixexprs.tar.xz --out-link /run/fc-agent-built-system
ctheune   176972  0.1  0.2  20028 11856 ?        Ss   10:47   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd --user
ctheune   176974  0.0  0.0  20368  3004 ?        S    10:47   0:00  \_ (sd-pam)

[218305.88474928]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           2  0.0  0.0      0     0 ?        S    Oct08   0:00 [kthreadd]
root           3  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [pool_workqueue_release]
root           4  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-rcu_gp]
root           5  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-sync_wq]
root           6  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-slub_flushwq]
root           7  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-netns]
root          10  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/0:0H-kblockd]
root          13  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mm_percpu_wq]
root          14  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_kthread]
root          15  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_rude_kthread]
root          16  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_trace_kthread]
root          17  0.0  0.0      0     0 ?        S    Oct08   0:25  \_ [ksoftirqd/0]
root          18  0.0  0.0      0     0 ?        I    Oct08   1:12  \_ [rcu_preempt]
root          19  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_par_gp_kthread_worker/0]
root          20  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_gp_kthread_worker]
root          21  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [migration/0]
root          22  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [idle_inject/0]
root          23  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [cpuhp/0]
root          24  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kdevtmpfs]
root          25  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-inet_frag_wq]
root          26  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kauditd]
root          27  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [khungtaskd]
root          28  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [oom_reaper]
root          29  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-writeback]
root          30  0.0  0.0      0     0 ?        S    Oct08   0:02  \_ [kcompactd0]
root          31  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [ksmd]
root          32  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [khugepaged]
root          33  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kintegrityd]
root          34  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kblockd]
root          35  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-blkcg_punt_bio]
root          36  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [irq/9-acpi]
root          37  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md]
root          38  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md_bitmap]
root          39  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-devfreq_wq]
root          44  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kswapd0]
root          45  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kthrotld]
root          46  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mld]
root          47  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ipv6_addrconf]
root          54  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kstrp]
root          55  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/u5:0]
root         102  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [hwrng]
root         109  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [watchdogd]
root         149  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ata_sff]
root         150  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_0]
root         151  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_0]
root         152  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_1]
root         153  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_1]
root         184  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfsalloc]
root         185  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs_mru_cache]
root         186  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vda1]
root         187  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vda1]
root         188  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vda1]
root         189  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vda1]
root         190  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vda1]
root         191  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vda1]
root         192  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vda1]
root         193  0.0  0.0      0     0 ?        S    Oct08   0:20  \_ [xfsaild/vda1]
root         531  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root         644  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vdc1]
root         645  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vdc1]
root         646  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vdc1]
root         647  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vdc1]
root         648  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vdc1]
root         649  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vdc1]
root         650  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vdc1]
root         651  0.0  0.0      0     0 ?        S    Oct08   0:05  \_ [xfsaild/vdc1]
root         723  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ttm]
root        1286  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-tls-strp]
root        2772  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root      171717  0.0  0.0      0     0 ?        I    09:03   0:00  \_ [kworker/u4:3-events_power_efficient]
root      174477  0.0  0.0      0     0 ?        I    10:01   0:00  \_ [kworker/0:2-xfs-conv/vdc1]
root      174683  0.0  0.0      0     0 ?        I    10:06   0:00  \_ [kworker/u4:2-events_unbound]
root      175378  0.0  0.0      0     0 ?        I    10:20   0:00  \_ [kworker/u4:4-writeback]
root      176049  0.0  0.0      0     0 ?        I    10:34   0:00  \_ [kworker/0:3-xfs-conv/vdc1]
root      176150  0.0  0.0      0     0 ?        I<   10:35   0:00  \_ [kworker/0:1H-xfs-log/vda1]
root      176358  0.0  0.0      0     0 ?        I    10:40   0:00  \_ [kworker/0:0-xfs-conv/vdc1]
root      176402  0.0  0.0      0     0 ?        I    10:41   0:00  \_ [kworker/u4:0-events_power_efficient]
root      176544  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:1-writeback]
root      176545  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:5-writeback]
root      176546  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:6-events_power_efficient]
root      176549  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:1-xfs-conv/vdc1]
root      176550  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:4-xfs-conv/vdc1]
root      176551  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:5-xfs-conv/vdc1]
root      176552  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:6-xfs-conv/vdc1]
root      176553  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:7-xfs-conv/vdc1]
root      176554  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:8-xfs-conv/vdc1]
root      176555  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:9-xfs-conv/vdc1]
root      176556  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:10-xfs-conv/vdc1]
root      176557  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:11-xfs-conv/vdc1]
root      176558  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:12-kthrotld]
root      176559  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:13-xfs-conv/vdc1]
root      176560  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:14-xfs-conv/vdc1]
root      176561  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:15-xfs-conv/vdc1]
root      176562  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:16-xfs-conv/vdc1]
root      176563  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:17-xfs-conv/vdc1]
root      176564  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:18-xfs-conv/vdc1]
root      176565  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:19-xfs-conv/vdc1]
root      176566  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:20-xfs-conv/vdc1]
root      176567  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:21-xfs-conv/vdc1]
root      176568  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:22-xfs-conv/vdc1]
root      176569  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:23-xfs-conv/vdc1]
root      176570  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:24-xfs-conv/vdc1]
root      176571  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:25-xfs-conv/vdc1]
root      176572  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:26-xfs-conv/vdc1]
root      176573  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:27-xfs-conv/vdc1]
root      176574  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:28-xfs-conv/vdc1]
root      176575  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:29-xfs-conv/vdc1]
root      176576  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:30-xfs-conv/vdc1]
root      176577  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:31-xfs-conv/vdc1]
root      176578  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:32-xfs-conv/vdc1]
root      176579  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:33-xfs-conv/vdc1]
root      176580  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:34-xfs-conv/vdc1]
root      176581  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:35-xfs-conv/vdc1]
root      176582  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:36-xfs-conv/vdc1]
root      176583  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:37-xfs-conv/vdc1]
root      176584  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:38-xfs-conv/vdc1]
root      176585  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:39-xfs-conv/vdc1]
root      176586  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:40-xfs-conv/vdc1]
root      176587  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:41-xfs-buf/vdc1]
root      176588  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:42-xfs-conv/vdc1]
root      176589  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:43-xfs-conv/vdc1]
root      176590  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:44-xfs-conv/vdc1]
root      176591  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:45-xfs-conv/vdc1]
root      176592  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:46-xfs-conv/vdc1]
root      176593  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:47-xfs-conv/vdc1]
root      176594  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:48-xfs-conv/vdc1]
root      176595  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:49-xfs-conv/vdc1]
root      176596  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:50-xfs-conv/vdc1]
root      176597  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:51-xfs-conv/vdc1]
root      176598  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:52-xfs-conv/vdc1]
root      176599  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:53-xfs-conv/vdc1]
root      176600  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:54-xfs-conv/vdc1]
root      176601  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:55-xfs-conv/vdc1]
root      176602  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:56-xfs-conv/vdc1]
root      176603  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:57-xfs-conv/vdc1]
root      176604  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:58-xfs-conv/vdc1]
root      176605  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:59-xfs-conv/vdc1]
root      176606  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:60-xfs-conv/vdc1]
root      176607  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:61-xfs-conv/vdc1]
root      176608  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:62-xfs-conv/vdc1]
root      176609  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:63-xfs-conv/vdc1]
root      176610  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:64-xfs-conv/vdc1]
root      176611  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:65-xfs-conv/vdc1]
root      176612  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:66-xfs-conv/vdc1]
root      176613  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:67-xfs-conv/vdc1]
root      176614  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:68-xfs-conv/vdc1]
root      176615  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:69-xfs-conv/vdc1]
root      176616  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:70-xfs-conv/vdc1]
root      176617  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:71-xfs-conv/vdc1]
root      176618  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:72-xfs-conv/vdc1]
root      176619  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:73-xfs-conv/vdc1]
root      176620  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:74-xfs-conv/vdc1]
root      176621  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:75-xfs-conv/vdc1]
root      176622  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:76-xfs-conv/vdc1]
root      176623  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:77-xfs-conv/vdc1]
root      176624  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:78-xfs-conv/vdc1]
root      176625  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:79-xfs-conv/vdc1]
root      176626  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:80-xfs-conv/vdc1]
root      176627  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:81-xfs-conv/vdc1]
root      176628  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:82-xfs-conv/vdc1]
root      176629  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:83-xfs-conv/vdc1]
root      176630  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:84-xfs-conv/vdc1]
root      176631  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:85-xfs-conv/vdc1]
root      176632  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:86-xfs-conv/vdc1]
root      176633  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:87-xfs-conv/vdc1]
root      176634  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:88-xfs-conv/vdc1]
root      176635  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:89-xfs-conv/vdc1]
root      176636  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:90-xfs-conv/vdc1]
root      176637  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:91-xfs-conv/vdc1]
root      176638  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:92-xfs-conv/vdc1]
root      176639  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:93-xfs-conv/vdc1]
root      176640  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:94-xfs-conv/vdc1]
root      176641  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:95-xfs-conv/vdc1]
root      176642  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:96-xfs-conv/vdc1]
root      176643  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:97-xfs-conv/vdc1]
root      176644  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:98-xfs-conv/vdc1]
root      176645  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:99-xfs-conv/vdc1]
root      176646  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:100-xfs-conv/vdc1]
root      176647  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:101-xfs-conv/vdc1]
root      176648  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:102-xfs-conv/vdc1]
root      176649  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:103-xfs-conv/vdc1]
root      176650  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:104-xfs-conv/vdc1]
root      176651  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:105-xfs-conv/vdc1]
root      176652  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:106-xfs-conv/vdc1]
root      176653  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:107-xfs-conv/vdc1]
root      176654  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:108-xfs-conv/vdc1]
root      176655  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:109-xfs-conv/vdc1]
root      176656  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:110-xfs-conv/vdc1]
root      176657  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:111-xfs-conv/vdc1]
root      176658  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:112-xfs-conv/vdc1]
root      176659  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:113-xfs-conv/vdc1]
root      176660  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:114-xfs-conv/vdc1]
root      176661  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:115-xfs-conv/vdc1]
root      176662  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:116-xfs-conv/vdc1]
root      176663  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:117-xfs-conv/vdc1]
root      176664  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:118-xfs-conv/vdc1]
root      176665  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:119-xfs-conv/vdc1]
root      176666  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:120-xfs-conv/vdc1]
root      176667  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:121-xfs-conv/vdc1]
root      176668  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:122-xfs-conv/vdc1]
root      176669  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:123-xfs-conv/vdc1]
root      176670  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:124-xfs-conv/vdc1]
root      176671  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:125-xfs-conv/vdc1]
root      176672  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:126-xfs-conv/vdc1]
root      176673  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:127-xfs-conv/vdc1]
root      176674  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:128-xfs-conv/vdc1]
root      176675  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:129-xfs-conv/vdc1]
root      176676  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:130-xfs-conv/vdc1]
root      176677  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:131-xfs-conv/vdc1]
root      176678  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:132-xfs-conv/vdc1]
root      176679  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:133-xfs-conv/vdc1]
root      176680  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:134-xfs-conv/vdc1]
root      176681  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:135-xfs-conv/vdc1]
root      176682  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:136-xfs-conv/vdc1]
root      176683  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:137-xfs-conv/vdc1]
root      176684  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:138-xfs-conv/vdc1]
root      176685  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:139-xfs-conv/vdc1]
root      176686  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:140-xfs-conv/vdc1]
root      176687  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:141-xfs-conv/vdc1]
root      176688  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:142-xfs-conv/vdc1]
root      176689  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:143-xfs-conv/vdc1]
root      176690  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:144-xfs-conv/vdc1]
root      176691  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:145-xfs-conv/vdc1]
root      176692  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:146-xfs-conv/vdc1]
root      176693  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:147-xfs-conv/vdc1]
root      176694  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:148-xfs-conv/vdc1]
root      176695  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:149-xfs-conv/vdc1]
root      176696  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:150-xfs-conv/vdc1]
root      176697  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:151-xfs-conv/vdc1]
root      176698  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:152-xfs-conv/vdc1]
root      176699  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:153-xfs-conv/vdc1]
root      176700  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:154-xfs-conv/vdc1]
root      176701  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:155-xfs-conv/vdc1]
root      176702  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:156-xfs-conv/vdc1]
root      176703  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:157-xfs-conv/vdc1]
root      176704  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:158-xfs-buf/vda1]
root      176705  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:159-xfs-conv/vdc1]
root      176706  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:160-xfs-conv/vdc1]
root      176707  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:161-xfs-conv/vdc1]
root      176708  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:162-xfs-conv/vdc1]
root      176709  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:163-xfs-conv/vdc1]
root      176710  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:164-xfs-conv/vdc1]
root      176711  0.2  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:165-xfs-conv/vda1]
root      176712  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:166-xfs-conv/vdc1]
root      176713  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:167-xfs-conv/vdc1]
root      176714  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:168-xfs-conv/vdc1]
root      176715  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:169-xfs-conv/vdc1]
root      176716  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:170-xfs-conv/vdc1]
root      176717  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:171-xfs-conv/vdc1]
root      176718  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:172-xfs-conv/vdc1]
root      176719  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:173-xfs-conv/vdc1]
root      176720  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:174-xfs-conv/vdc1]
root      176721  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:175-xfs-conv/vdc1]
root      176722  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:176-xfs-conv/vdc1]
root      176723  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:177-xfs-conv/vdc1]
root      176724  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:178-xfs-conv/vdc1]
root      176725  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:179-xfs-conv/vdc1]
root      176726  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:180-xfs-conv/vdc1]
root      176727  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:181-xfs-conv/vdc1]
root      176728  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:182-xfs-conv/vdc1]
root      176729  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:183-xfs-conv/vdc1]
root      176730  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:184-xfs-conv/vdc1]
root      176731  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:185-xfs-conv/vdc1]
root      176732  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:186-xfs-conv/vdc1]
root      176733  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:187-xfs-conv/vdc1]
root      176734  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:188-xfs-conv/vdc1]
root      176735  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:189-xfs-conv/vdc1]
root      176736  0.3  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:190-cgroup_destroy]
root      176737  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:191-xfs-conv/vdc1]
root      176738  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:192-xfs-conv/vdc1]
root      176739  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:193-xfs-conv/vdc1]
root      176740  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:194-xfs-conv/vdc1]
root      176741  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:195-xfs-conv/vdc1]
root      176742  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:196-xfs-conv/vdc1]
root      176743  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:197-xfs-conv/vdc1]
root      176744  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:198-xfs-conv/vdc1]
root      176745  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:199-xfs-conv/vdc1]
root      176746  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:200-xfs-conv/vdc1]
root      176747  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:201-xfs-conv/vdc1]
root      176748  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:202-xfs-conv/vdc1]
root      176749  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:203-xfs-conv/vdc1]
root      176750  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:204-xfs-conv/vdc1]
root      176751  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:205-xfs-conv/vdc1]
root      176752  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:206-xfs-buf/vda1]
root      176753  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:207-xfs-conv/vdc1]
root      176754  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:208-xfs-conv/vdc1]
root      176755  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:209-xfs-conv/vdc1]
root      176756  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:210-xfs-conv/vdc1]
root      176757  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:211-xfs-conv/vdc1]
root      176758  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:212-xfs-conv/vdc1]
root      176759  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:213-xfs-conv/vdc1]
root      176760  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:214-xfs-conv/vdc1]
root      176761  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:215-xfs-conv/vdc1]
root      176762  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:216-xfs-conv/vdc1]
root      176763  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:217-xfs-conv/vdc1]
root      176764  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:218-xfs-conv/vdc1]
root      176765  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:219-xfs-conv/vdc1]
root      176766  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:220-xfs-conv/vdc1]
root      176767  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:221-xfs-conv/vdc1]
root      176768  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:222-xfs-conv/vdc1]
root      176769  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:223-xfs-conv/vdc1]
root      176770  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:224-xfs-conv/vdc1]
root      176771  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:225-xfs-conv/vdc1]
root      176772  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:226-xfs-conv/vdc1]
root      176773  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:227-xfs-conv/vdc1]
root      176774  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:228-xfs-conv/vdc1]
root      176775  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:229-xfs-conv/vdc1]
root      176776  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:230-xfs-conv/vdc1]
root      176777  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:231-xfs-conv/vdc1]
root      176778  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:232-xfs-conv/vdc1]
root      176779  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:233-xfs-conv/vdc1]
root      176780  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:234-xfs-conv/vdc1]
root      176781  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:235-xfs-conv/vdc1]
root      176782  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:236-xfs-conv/vdc1]
root      176783  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:237-xfs-conv/vdc1]
root      176784  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:238-xfs-conv/vdc1]
root      176785  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:239-xfs-conv/vdc1]
root      176786  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:240-xfs-conv/vdc1]
root      176787  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:241-xfs-conv/vdc1]
root      176788  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:242-xfs-conv/vdc1]
root      176789  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:243-xfs-conv/vdc1]
root      176790  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:244-xfs-conv/vdc1]
root      176791  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:245-xfs-conv/vdc1]
root      176792  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:246-xfs-conv/vdc1]
root      176793  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:247-xfs-conv/vdc1]
root      176794  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:248-xfs-conv/vdc1]
root      176795  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:249-xfs-conv/vdc1]
root      176796  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:250-xfs-conv/vdc1]
root      176797  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:251-xfs-conv/vdc1]
root      176798  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:252-xfs-conv/vdc1]
root      176799  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:253-xfs-conv/vdc1]
root      176800  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:254-xfs-conv/vdc1]
root      176801  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:255-xfs-conv/vdc1]
root      176802  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:256-xfs-buf/vda1]
root      176803  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:257-xfs-conv/vdc1]
root      176804  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:7-events_unbound]
root      176813  0.0  0.0      0     0 ?        I<   10:44   0:00  \_ [kworker/0:2H-kblockd]
root      176814  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/u4:8-events_unbound]
root      176815  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/0:258]
root           1  0.0  0.3  21852 13056 ?        Ss   Oct08   0:19 /run/current-system/systemd/lib/systemd/systemd
root         399  0.0  1.8 139764 75096 ?        Ss   Oct08   0:13 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-journald
root         455  0.0  0.2  33848  8168 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-udevd
systemd+     811  0.0  0.1  16800  6660 ?        Ss   Oct08   0:10 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-oomd
systemd+     816  0.0  0.1  91380  7952 ?        Ssl  Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-timesyncd
root         837  0.0  0.0  80596  3288 ?        Ssl  Oct08   1:37 /nix/store/ag3xk1l8ij06vx434abk8643f8p7i08c-qemu-host-cpu-only-8.2.6-ga/bin/qemu-ga --statedir /run/qemu-ga
root         840  0.0  0.0 226896  1984 ?        Ss   Oct08   0:00 /nix/store/k34f0d079arcgfjsq78gpkdbd6l6nnq4-cron-4.1/bin/cron -n
message+     850  0.0  0.1  13776  6080 ?        Ss   Oct08   0:05 /nix/store/0hm8vh65m378439kl16xv0p6l7c51asj-dbus-1.14.10/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         876  0.0  0.1  17468  7968 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-logind
nscd        1074  0.0  0.1 555748  6016 ?        Ssl  Oct08   0:28 /nix/store/zza9hvd6iawqdcxvinf4yxv580av3s9f-nsncd-unstable-2024-01-16/bin/nsncd
telegraf    1092  0.3  3.4 6344672 138484 ?      S<Lsl Oct08  13:05 /nix/store/8bnbkyh26j97l0pw02gb7lngh4n6k3r5-telegraf-1.30.3/bin/telegraf -config /nix/store/nh4k7bx1asm0kn1klhbmg52wk1qdcwpw-config.toml -config-directory /nix/store/dj77wnb5j
root        1093  0.0  1.5 1109328 60864 ?       Ssl  Oct08   2:24 /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/fn9jcsr2kp2kq3m2qd6qrkv6xh7jcj5g-fail2ban-1.0.2/bin/.fail2ban-server-wrapped -xf start
sensucl+    1094  0.0  0.9 898112 38340 ?        Ssl  Oct08   1:41 /nix/store/qqc6v89xn0g2w123wx85blkpc4pz2ags-ruby-2.7.8/bin/ruby /nix/store/dpvf0jdq1mbrdc90aapyrn2wvjbpckyv-sensu-check-env/bin/sensu-client -L warn -c /nix/store/ly677hg5b7szz
root        1098  0.0  0.1  11564  7568 ?        Ss   Oct08   0:00 sshd: /nix/store/1m888byzaqaig6azrrfpmjdyhgfliaga-openssh-9.7p1/bin/sshd -D -f /etc/ssh/sshd_config [listener] 0 of 10-100 startups
root      176967  0.0  0.2  14380  9840 ?        Ss   10:47   0:00  \_ sshd: ctheune [priv]
ctheune   176988  0.2  0.1  14540  5856 ?        S    10:47   0:00      \_ sshd: ctheune@pts/0
ctheune   176992  0.0  0.1 230756  5968 pts/0    Ss   10:47   0:00          \_ -bash
root      176998  0.0  0.0 228796  3956 pts/0    S+   10:47   0:00              \_ sudo -i
root      177001  0.0  0.0 228796  1604 pts/1    Ss   10:47   0:00                  \_ sudo -i
root      177002  0.0  0.1 230892  6064 pts/1    S    10:47   0:00                      \_ -bash
root      177048  0.0  0.0 232344  3944 pts/1    R+   10:48   0:00                          \_ ps auxf
root        1101  0.0  0.0 226928  1944 tty1     Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login --noclear --keep-baud tty1 115200,38400,9600 linux
root        1102  0.0  0.0 226928  2192 ttyS0    Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login ttyS0 --keep-baud vt220
_du4651+    1105  0.0  2.2 2505204 90824 ?       Ssl  Oct08   1:15 /nix/store/ff5j2is3di7praysyv232wfvcq7hvkii-filebeat-oss-7.17.16/bin/filebeat -e -c /nix/store/xlb56lv0f3j03l3v34x5jfvq8wng18ww-filebeat-journal-services19.gocept.net.json -pat
mysql       2809  0.3 18.6 4784932 750856 ?      Ssl  Oct08  11:47 /nix/store/9iq211dy95nqn484nx5z5mv3c7pc2h27-percona-server_lts-8.0.36-28/bin/mysqld --defaults-extra-file=/nix/store/frvxmffp9fpgq06bx89rgczyn6k6i51y-my.cnf --user=mysql --data
root      176527  0.0  0.0 227904  3236 ?        SNs  10:43   0:00 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/s8g5ls9d611hjq5psyd15sqbpqgrlwck-unit-script-fc-agent-start/bin/fc-agent-start
root      176535  0.1  1.1 279068 46452 ?        SN   10:43   0:00  \_ /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/gavi1rlv3ja79vl5hg3lgh07absa8yb9-python3.11-fc-agent-1.0/bin/.fc-manage-wrapped --enc-p
root      176536  3.5  1.8 635400 72368 ?        DNl  10:43   0:09      \_ nix-build --no-build-output <nixpkgs/nixos> -A system -I https://hydra.flyingcircus.io/build/496886/download/1/nixexprs.tar.xz --out-link /run/fc-agent-built-system
ctheune   176972  0.1  0.2  20028 11856 ?        Ss   10:47   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd --user
ctheune   176974  0.0  0.0  20368  3004 ?        S    10:47   0:00  \_ (sd-pam)

[218314.012140606]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           2  0.0  0.0      0     0 ?        S    Oct08   0:00 [kthreadd]
root           3  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [pool_workqueue_release]
root           4  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-rcu_gp]
root           5  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-sync_wq]
root           6  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-slub_flushwq]
root           7  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-netns]
root          10  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/0:0H-kblockd]
root          13  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mm_percpu_wq]
root          14  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_kthread]
root          15  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_rude_kthread]
root          16  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_trace_kthread]
root          17  0.0  0.0      0     0 ?        S    Oct08   0:25  \_ [ksoftirqd/0]
root          18  0.0  0.0      0     0 ?        I    Oct08   1:12  \_ [rcu_preempt]
root          19  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_par_gp_kthread_worker/0]
root          20  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_gp_kthread_worker]
root          21  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [migration/0]
root          22  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [idle_inject/0]
root          23  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [cpuhp/0]
root          24  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kdevtmpfs]
root          25  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-inet_frag_wq]
root          26  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kauditd]
root          27  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [khungtaskd]
root          28  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [oom_reaper]
root          29  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-writeback]
root          30  0.0  0.0      0     0 ?        S    Oct08   0:02  \_ [kcompactd0]
root          31  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [ksmd]
root          32  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [khugepaged]
root          33  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kintegrityd]
root          34  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kblockd]
root          35  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-blkcg_punt_bio]
root          36  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [irq/9-acpi]
root          37  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md]
root          38  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md_bitmap]
root          39  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-devfreq_wq]
root          44  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kswapd0]
root          45  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kthrotld]
root          46  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mld]
root          47  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ipv6_addrconf]
root          54  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kstrp]
root          55  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/u5:0]
root         102  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [hwrng]
root         109  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [watchdogd]
root         149  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ata_sff]
root         150  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_0]
root         151  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_0]
root         152  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_1]
root         153  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_1]
root         184  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfsalloc]
root         185  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs_mru_cache]
root         186  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vda1]
root         187  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vda1]
root         188  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vda1]
root         189  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vda1]
root         190  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vda1]
root         191  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vda1]
root         192  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vda1]
root         193  0.0  0.0      0     0 ?        S    Oct08   0:20  \_ [xfsaild/vda1]
root         531  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root         644  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vdc1]
root         645  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vdc1]
root         646  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vdc1]
root         647  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vdc1]
root         648  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vdc1]
root         649  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vdc1]
root         650  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vdc1]
root         651  0.0  0.0      0     0 ?        S    Oct08   0:05  \_ [xfsaild/vdc1]
root         723  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ttm]
root        1286  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-tls-strp]
root        2772  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root      171717  0.0  0.0      0     0 ?        I    09:03   0:00  \_ [kworker/u4:3-writeback]
root      174477  0.0  0.0      0     0 ?        I    10:01   0:00  \_ [kworker/0:2-xfs-conv/vdc1]
root      174683  0.0  0.0      0     0 ?        I    10:06   0:00  \_ [kworker/u4:2-events_unbound]
root      175378  0.0  0.0      0     0 ?        I    10:20   0:00  \_ [kworker/u4:4-events_power_efficient]
root      176049  0.0  0.0      0     0 ?        I    10:34   0:00  \_ [kworker/0:3-xfs-conv/vdc1]
root      176150  0.0  0.0      0     0 ?        I<   10:35   0:00  \_ [kworker/0:1H-xfs-log/vda1]
root      176358  0.0  0.0      0     0 ?        I    10:40   0:00  \_ [kworker/0:0-xfs-conv/vdc1]
root      176402  0.0  0.0      0     0 ?        I    10:41   0:00  \_ [kworker/u4:0-events_power_efficient]
root      176544  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:1-writeback]
root      176545  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:5-writeback]
root      176546  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:6-events_power_efficient]
root      176549  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:1-xfs-conv/vdc1]
root      176550  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:4-xfs-conv/vdc1]
root      176551  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:5-xfs-conv/vdc1]
root      176552  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:6-xfs-conv/vdc1]
root      176553  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:7-xfs-conv/vdc1]
root      176554  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:8-xfs-conv/vdc1]
root      176555  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:9-xfs-conv/vdc1]
root      176556  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:10-xfs-conv/vdc1]
root      176557  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:11-xfs-conv/vdc1]
root      176558  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:12-kthrotld]
root      176559  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:13-xfs-conv/vdc1]
root      176560  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:14-xfs-conv/vdc1]
root      176561  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:15-xfs-conv/vdc1]
root      176562  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:16-xfs-conv/vdc1]
root      176563  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:17-xfs-conv/vdc1]
root      176564  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:18-xfs-conv/vdc1]
root      176565  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:19-xfs-conv/vdc1]
root      176566  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:20-xfs-conv/vdc1]
root      176567  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:21-xfs-conv/vdc1]
root      176568  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:22-xfs-conv/vdc1]
root      176569  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:23-xfs-conv/vdc1]
root      176570  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:24-xfs-conv/vdc1]
root      176571  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:25-xfs-conv/vdc1]
root      176572  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:26-xfs-conv/vdc1]
root      176573  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:27-xfs-conv/vdc1]
root      176574  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:28-xfs-conv/vdc1]
root      176575  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:29-xfs-conv/vdc1]
root      176576  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:30-xfs-conv/vdc1]
root      176577  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:31-xfs-conv/vdc1]
root      176578  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:32-xfs-conv/vdc1]
root      176579  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:33-xfs-conv/vdc1]
root      176580  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:34-xfs-conv/vdc1]
root      176581  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:35-xfs-conv/vdc1]
root      176582  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:36-xfs-conv/vdc1]
root      176583  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:37-xfs-conv/vdc1]
root      176584  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:38-xfs-conv/vdc1]
root      176585  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:39-xfs-conv/vdc1]
root      176586  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:40-xfs-conv/vdc1]
root      176587  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:41-xfs-buf/vdc1]
root      176588  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:42-xfs-conv/vdc1]
root      176589  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:43-xfs-conv/vdc1]
root      176590  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:44-xfs-conv/vdc1]
root      176591  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:45-xfs-conv/vdc1]
root      176592  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:46-xfs-conv/vdc1]
root      176593  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:47-xfs-conv/vdc1]
root      176594  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:48-xfs-conv/vdc1]
root      176595  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:49-xfs-conv/vdc1]
root      176596  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:50-xfs-conv/vdc1]
root      176597  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:51-xfs-conv/vdc1]
root      176598  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:52-xfs-conv/vdc1]
root      176599  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:53-xfs-conv/vdc1]
root      176600  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:54-xfs-conv/vdc1]
root      176601  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:55-xfs-conv/vdc1]
root      176602  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:56-xfs-conv/vdc1]
root      176603  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:57-xfs-conv/vdc1]
root      176604  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:58-xfs-conv/vdc1]
root      176605  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:59-xfs-conv/vdc1]
root      176606  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:60-xfs-conv/vdc1]
root      176607  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:61-xfs-conv/vdc1]
root      176608  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:62-xfs-conv/vdc1]
root      176609  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:63-xfs-conv/vdc1]
root      176610  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:64-xfs-conv/vdc1]
root      176611  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:65-xfs-conv/vdc1]
root      176612  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:66-xfs-conv/vdc1]
root      176613  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:67-xfs-conv/vdc1]
root      176614  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:68-xfs-conv/vdc1]
root      176615  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:69-xfs-conv/vdc1]
root      176616  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:70-xfs-conv/vdc1]
root      176617  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:71-xfs-conv/vdc1]
root      176618  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:72-xfs-conv/vdc1]
root      176619  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:73-xfs-conv/vdc1]
root      176620  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:74-xfs-conv/vdc1]
root      176621  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:75-xfs-conv/vdc1]
root      176622  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:76-xfs-conv/vdc1]
root      176623  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:77-xfs-conv/vdc1]
root      176624  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:78-xfs-conv/vdc1]
root      176625  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:79-xfs-conv/vdc1]
root      176626  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:80-xfs-conv/vdc1]
root      176627  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:81-xfs-conv/vdc1]
root      176628  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:82-xfs-conv/vdc1]
root      176629  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:83-xfs-conv/vdc1]
root      176630  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:84-xfs-conv/vdc1]
root      176631  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:85-xfs-conv/vdc1]
root      176632  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:86-xfs-conv/vdc1]
root      176633  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:87-xfs-conv/vdc1]
root      176634  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:88-xfs-conv/vdc1]
root      176635  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:89-xfs-conv/vdc1]
root      176636  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:90-xfs-conv/vdc1]
root      176637  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:91-xfs-conv/vdc1]
root      176638  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:92-xfs-conv/vdc1]
root      176639  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:93-xfs-conv/vdc1]
root      176640  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:94-xfs-conv/vdc1]
root      176641  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:95-xfs-conv/vdc1]
root      176642  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:96-xfs-conv/vdc1]
root      176643  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:97-xfs-conv/vdc1]
root      176644  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:98-xfs-conv/vdc1]
root      176645  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:99-xfs-conv/vdc1]
root      176646  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:100-xfs-conv/vdc1]
root      176647  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:101-xfs-conv/vdc1]
root      176648  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:102-xfs-conv/vdc1]
root      176649  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:103-xfs-conv/vdc1]
root      176650  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:104-xfs-conv/vdc1]
root      176651  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:105-xfs-conv/vdc1]
root      176652  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:106-xfs-conv/vdc1]
root      176653  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:107-xfs-conv/vdc1]
root      176654  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:108-xfs-conv/vdc1]
root      176655  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:109-xfs-conv/vdc1]
root      176656  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:110-xfs-conv/vdc1]
root      176657  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:111-xfs-conv/vdc1]
root      176658  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:112-xfs-conv/vdc1]
root      176659  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:113-xfs-conv/vdc1]
root      176660  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:114-xfs-conv/vdc1]
root      176661  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:115-xfs-conv/vdc1]
root      176662  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:116-xfs-conv/vdc1]
root      176663  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:117-xfs-conv/vdc1]
root      176664  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:118-xfs-conv/vdc1]
root      176665  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:119-xfs-conv/vdc1]
root      176666  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:120-xfs-conv/vdc1]
root      176667  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:121-xfs-conv/vdc1]
root      176668  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:122-xfs-conv/vdc1]
root      176669  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:123-xfs-conv/vdc1]
root      176670  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:124-xfs-conv/vdc1]
root      176671  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:125-xfs-conv/vdc1]
root      176672  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:126-xfs-conv/vdc1]
root      176673  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:127-xfs-conv/vdc1]
root      176674  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:128-xfs-conv/vdc1]
root      176675  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:129-xfs-conv/vdc1]
root      176676  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:130-xfs-conv/vdc1]
root      176677  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:131-xfs-conv/vdc1]
root      176678  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:132-xfs-conv/vdc1]
root      176679  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:133-xfs-conv/vdc1]
root      176680  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:134-xfs-conv/vdc1]
root      176681  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:135-xfs-conv/vdc1]
root      176682  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:136-xfs-conv/vdc1]
root      176683  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:137-xfs-conv/vdc1]
root      176684  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:138-xfs-conv/vdc1]
root      176685  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:139-xfs-conv/vdc1]
root      176686  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:140-xfs-conv/vdc1]
root      176687  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:141-xfs-conv/vdc1]
root      176688  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:142-xfs-conv/vdc1]
root      176689  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:143-xfs-conv/vdc1]
root      176690  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:144-xfs-conv/vdc1]
root      176691  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:145-xfs-conv/vdc1]
root      176692  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:146-xfs-conv/vdc1]
root      176693  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:147-xfs-conv/vdc1]
root      176694  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:148-xfs-conv/vdc1]
root      176695  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:149-xfs-conv/vdc1]
root      176696  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:150-xfs-conv/vdc1]
root      176697  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:151-xfs-conv/vdc1]
root      176698  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:152-xfs-conv/vdc1]
root      176699  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:153-xfs-conv/vdc1]
root      176700  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:154-xfs-conv/vdc1]
root      176701  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:155-xfs-conv/vdc1]
root      176702  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:156-xfs-conv/vdc1]
root      176703  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:157-xfs-conv/vdc1]
root      176704  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:158-xfs-buf/vda1]
root      176705  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:159-xfs-conv/vdc1]
root      176706  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:160-xfs-conv/vdc1]
root      176707  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:161-xfs-conv/vdc1]
root      176708  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:162-xfs-conv/vdc1]
root      176709  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:163-xfs-conv/vdc1]
root      176710  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:164-xfs-conv/vdc1]
root      176711  0.2  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:165-xfs-conv/vda1]
root      176712  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:166-xfs-conv/vdc1]
root      176713  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:167-xfs-conv/vdc1]
root      176714  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:168-xfs-conv/vdc1]
root      176715  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:169-xfs-conv/vdc1]
root      176716  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:170-xfs-conv/vdc1]
root      176717  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:171-xfs-conv/vdc1]
root      176718  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:172-xfs-conv/vdc1]
root      176719  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:173-xfs-conv/vdc1]
root      176720  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:174-xfs-conv/vdc1]
root      176721  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:175-xfs-conv/vdc1]
root      176722  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:176-xfs-conv/vdc1]
root      176723  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:177-xfs-conv/vdc1]
root      176724  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:178-xfs-conv/vdc1]
root      176725  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:179-xfs-conv/vdc1]
root      176726  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:180-xfs-conv/vdc1]
root      176727  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:181-xfs-conv/vdc1]
root      176728  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:182-xfs-conv/vdc1]
root      176729  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:183-xfs-conv/vdc1]
root      176730  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:184-xfs-conv/vdc1]
root      176731  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:185-xfs-conv/vdc1]
root      176732  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:186-xfs-conv/vdc1]
root      176733  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:187-xfs-conv/vdc1]
root      176734  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:188-xfs-conv/vdc1]
root      176735  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:189-xfs-conv/vdc1]
root      176736  0.3  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:190-cgroup_destroy]
root      176737  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:191-xfs-conv/vdc1]
root      176738  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:192-xfs-conv/vdc1]
root      176739  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:193-xfs-conv/vdc1]
root      176740  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:194-xfs-conv/vdc1]
root      176741  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:195-xfs-conv/vdc1]
root      176742  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:196-xfs-conv/vdc1]
root      176743  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:197-xfs-conv/vdc1]
root      176744  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:198-xfs-conv/vdc1]
root      176745  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:199-xfs-conv/vdc1]
root      176746  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:200-xfs-conv/vdc1]
root      176747  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:201-xfs-conv/vdc1]
root      176748  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:202-xfs-conv/vdc1]
root      176749  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:203-xfs-conv/vdc1]
root      176750  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:204-xfs-conv/vdc1]
root      176751  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:205-xfs-conv/vdc1]
root      176752  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:206-xfs-buf/vda1]
root      176753  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:207-xfs-conv/vdc1]
root      176754  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:208-xfs-conv/vdc1]
root      176755  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:209-xfs-conv/vdc1]
root      176756  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:210-xfs-conv/vdc1]
root      176757  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:211-xfs-conv/vdc1]
root      176758  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:212-xfs-conv/vdc1]
root      176759  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:213-xfs-conv/vdc1]
root      176760  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:214-xfs-conv/vdc1]
root      176761  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:215-xfs-conv/vdc1]
root      176762  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:216-xfs-conv/vdc1]
root      176763  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:217-xfs-conv/vdc1]
root      176764  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:218-xfs-conv/vdc1]
root      176765  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:219-xfs-conv/vdc1]
root      176766  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:220-xfs-conv/vdc1]
root      176767  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:221-xfs-conv/vdc1]
root      176768  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:222-xfs-conv/vdc1]
root      176769  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:223-xfs-conv/vdc1]
root      176770  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:224-xfs-conv/vdc1]
root      176771  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:225-xfs-conv/vdc1]
root      176772  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:226-xfs-conv/vdc1]
root      176773  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:227-xfs-conv/vdc1]
root      176774  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:228-xfs-conv/vdc1]
root      176775  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:229-xfs-conv/vdc1]
root      176776  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:230-xfs-conv/vdc1]
root      176777  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:231-xfs-conv/vdc1]
root      176778  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:232-xfs-conv/vdc1]
root      176779  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:233-xfs-conv/vdc1]
root      176780  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:234-xfs-conv/vdc1]
root      176781  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:235-xfs-conv/vdc1]
root      176782  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:236-xfs-conv/vdc1]
root      176783  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:237-xfs-conv/vdc1]
root      176784  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:238-xfs-conv/vdc1]
root      176785  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:239-xfs-conv/vdc1]
root      176786  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:240-xfs-conv/vdc1]
root      176787  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:241-xfs-conv/vdc1]
root      176788  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:242-xfs-conv/vdc1]
root      176789  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:243-xfs-conv/vdc1]
root      176790  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:244-xfs-conv/vdc1]
root      176791  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:245-xfs-conv/vdc1]
root      176792  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:246-xfs-conv/vdc1]
root      176793  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:247-xfs-conv/vdc1]
root      176794  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:248-xfs-conv/vdc1]
root      176795  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:249-xfs-conv/vdc1]
root      176796  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:250-xfs-conv/vdc1]
root      176797  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:251-xfs-conv/vdc1]
root      176798  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:252-xfs-conv/vdc1]
root      176799  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:253-xfs-conv/vdc1]
root      176800  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:254-xfs-conv/vdc1]
root      176801  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:255-xfs-conv/vdc1]
root      176802  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:256-xfs-buf/vda1]
root      176803  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:257-xfs-conv/vdc1]
root      176804  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:7-events_unbound]
root      176813  0.0  0.0      0     0 ?        I<   10:44   0:00  \_ [kworker/0:2H-kblockd]
root      176814  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/u4:8-events_unbound]
root      176815  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/0:258]
root           1  0.0  0.3  21852 13056 ?        Ss   Oct08   0:19 /run/current-system/systemd/lib/systemd/systemd
root         399  0.0  1.8 139764 75096 ?        Ss   Oct08   0:13 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-journald
root         455  0.0  0.2  33848  8168 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-udevd
systemd+     811  0.0  0.1  16800  6660 ?        Ss   Oct08   0:10 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-oomd
systemd+     816  0.0  0.1  91380  7952 ?        Ssl  Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-timesyncd
root         837  0.0  0.0  80596  3288 ?        Ssl  Oct08   1:37 /nix/store/ag3xk1l8ij06vx434abk8643f8p7i08c-qemu-host-cpu-only-8.2.6-ga/bin/qemu-ga --statedir /run/qemu-ga
root         840  0.0  0.0 226896  1984 ?        Ss   Oct08   0:00 /nix/store/k34f0d079arcgfjsq78gpkdbd6l6nnq4-cron-4.1/bin/cron -n
message+     850  0.0  0.1  13776  6080 ?        Ss   Oct08   0:05 /nix/store/0hm8vh65m378439kl16xv0p6l7c51asj-dbus-1.14.10/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         876  0.0  0.1  17468  7968 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-logind
nscd        1074  0.0  0.1 555748  6016 ?        Ssl  Oct08   0:28 /nix/store/zza9hvd6iawqdcxvinf4yxv580av3s9f-nsncd-unstable-2024-01-16/bin/nsncd
telegraf    1092  0.3  3.4 6344672 138484 ?      S<Lsl Oct08  13:05 /nix/store/8bnbkyh26j97l0pw02gb7lngh4n6k3r5-telegraf-1.30.3/bin/telegraf -config /nix/store/nh4k7bx1asm0kn1klhbmg52wk1qdcwpw-config.toml -config-directory /nix/store/dj77wnb5j
root        1093  0.0  1.5 1109328 60864 ?       Ssl  Oct08   2:24 /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/fn9jcsr2kp2kq3m2qd6qrkv6xh7jcj5g-fail2ban-1.0.2/bin/.fail2ban-server-wrapped -xf start
sensucl+    1094  0.0  0.9 898112 38340 ?        Ssl  Oct08   1:41 /nix/store/qqc6v89xn0g2w123wx85blkpc4pz2ags-ruby-2.7.8/bin/ruby /nix/store/dpvf0jdq1mbrdc90aapyrn2wvjbpckyv-sensu-check-env/bin/sensu-client -L warn -c /nix/store/ly677hg5b7szz
root        1098  0.0  0.1  11564  7568 ?        Ss   Oct08   0:00 sshd: /nix/store/1m888byzaqaig6azrrfpmjdyhgfliaga-openssh-9.7p1/bin/sshd -D -f /etc/ssh/sshd_config [listener] 0 of 10-100 startups
root      176967  0.0  0.2  14380  9840 ?        Ss   10:47   0:00  \_ sshd: ctheune [priv]
ctheune   176988  0.2  0.1  14540  5856 ?        S    10:47   0:00      \_ sshd: ctheune@pts/0
ctheune   176992  0.0  0.1 230756  5968 pts/0    Ss   10:47   0:00          \_ -bash
root      176998  0.0  0.0 228796  3956 pts/0    S+   10:47   0:00              \_ sudo -i
root      177001  0.0  0.0 228796  1604 pts/1    Ss   10:47   0:00                  \_ sudo -i
root      177002  0.0  0.1 230892  6064 pts/1    S    10:47   0:00                      \_ -bash
root      177061  0.0  0.1 232344  4048 pts/1    R+   10:48   0:00                          \_ ps auxf
root        1101  0.0  0.0 226928  1944 tty1     Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login --noclear --keep-baud tty1 115200,38400,9600 linux
root        1102  0.0  0.0 226928  2192 ttyS0    Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login ttyS0 --keep-baud vt220
_du4651+    1105  0.0  2.2 2505204 90824 ?       Ssl  Oct08   1:15 /nix/store/ff5j2is3di7praysyv232wfvcq7hvkii-filebeat-oss-7.17.16/bin/filebeat -e -c /nix/store/xlb56lv0f3j03l3v34x5jfvq8wng18ww-filebeat-journal-services19.gocept.net.json -pat
mysql       2809  0.3 18.6 4784932 750856 ?      Ssl  Oct08  11:47 /nix/store/9iq211dy95nqn484nx5z5mv3c7pc2h27-percona-server_lts-8.0.36-28/bin/mysqld --defaults-extra-file=/nix/store/frvxmffp9fpgq06bx89rgczyn6k6i51y-my.cnf --user=mysql --data
root      176527  0.0  0.0 227904  3236 ?        SNs  10:43   0:00 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/s8g5ls9d611hjq5psyd15sqbpqgrlwck-unit-script-fc-agent-start/bin/fc-agent-start
root      176535  0.1  1.1 279068 46452 ?        SN   10:43   0:00  \_ /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/gavi1rlv3ja79vl5hg3lgh07absa8yb9-python3.11-fc-agent-1.0/bin/.fc-manage-wrapped --enc-p
root      176536  3.3  1.8 635400 72368 ?        DNl  10:43   0:09      \_ nix-build --no-build-output <nixpkgs/nixos> -A system -I https://hydra.flyingcircus.io/build/496886/download/1/nixexprs.tar.xz --out-link /run/fc-agent-built-system
ctheune   176972  0.1  0.2  20028 11856 ?        Ss   10:47   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd --user
ctheune   176974  0.0  0.0  20368  3004 ?        S    10:47   0:00  \_ (sd-pam)

[218321.967537846]
176536 nix-build D
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----
stack summary

1 hit:
[<0>] folio_wait_bit_common+0x13f/0x340
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] __filemap_fdatawait_range+0x80/0xe0
[<0>] filemap_write_and_wait_range+0x85/0xb0
[<0>] xfs_setattr_size+0xd9/0x3c0 [xfs]
[<0>] xfs_vn_setattr+0x81/0x150 [xfs]
[<0>] notify_change+0x2ed/0x4f0
[<0>] do_truncate+0x98/0xf0
[<0>] do_ftruncate+0xfe/0x160
[<0>] __x64_sys_ftruncate+0x3e/0x70
[<0>] do_syscall_64+0xb7/0x200
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f

-----

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           2  0.0  0.0      0     0 ?        S    Oct08   0:00 [kthreadd]
root           3  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [pool_workqueue_release]
root           4  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-rcu_gp]
root           5  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-sync_wq]
root           6  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-slub_flushwq]
root           7  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-netns]
root          10  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/0:0H-kblockd]
root          13  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mm_percpu_wq]
root          14  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_kthread]
root          15  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_rude_kthread]
root          16  0.0  0.0      0     0 ?        I    Oct08   0:00  \_ [rcu_tasks_trace_kthread]
root          17  0.0  0.0      0     0 ?        S    Oct08   0:25  \_ [ksoftirqd/0]
root          18  0.0  0.0      0     0 ?        I    Oct08   1:12  \_ [rcu_preempt]
root          19  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_par_gp_kthread_worker/0]
root          20  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [rcu_exp_gp_kthread_worker]
root          21  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [migration/0]
root          22  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [idle_inject/0]
root          23  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [cpuhp/0]
root          24  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kdevtmpfs]
root          25  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-inet_frag_wq]
root          26  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kauditd]
root          27  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [khungtaskd]
root          28  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [oom_reaper]
root          29  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-writeback]
root          30  0.0  0.0      0     0 ?        S    Oct08   0:02  \_ [kcompactd0]
root          31  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [ksmd]
root          32  0.0  0.0      0     0 ?        SN   Oct08   0:00  \_ [khugepaged]
root          33  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kintegrityd]
root          34  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kblockd]
root          35  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-blkcg_punt_bio]
root          36  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [irq/9-acpi]
root          37  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md]
root          38  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-md_bitmap]
root          39  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-devfreq_wq]
root          44  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [kswapd0]
root          45  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kthrotld]
root          46  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-mld]
root          47  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ipv6_addrconf]
root          54  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-kstrp]
root          55  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/u5:0]
root         102  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [hwrng]
root         109  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [watchdogd]
root         149  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ata_sff]
root         150  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_0]
root         151  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_0]
root         152  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [scsi_eh_1]
root         153  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-scsi_tmf_1]
root         184  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfsalloc]
root         185  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs_mru_cache]
root         186  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vda1]
root         187  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vda1]
root         188  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vda1]
root         189  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vda1]
root         190  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vda1]
root         191  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vda1]
root         192  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vda1]
root         193  0.0  0.0      0     0 ?        S    Oct08   0:20  \_ [xfsaild/vda1]
root         531  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root         644  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-buf/vdc1]
root         645  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-conv/vdc1]
root         646  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-reclaim/vdc1]
root         647  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-blockgc/vdc1]
root         648  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-inodegc/vdc1]
root         649  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-log/vdc1]
root         650  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-xfs-cil/vdc1]
root         651  0.0  0.0      0     0 ?        S    Oct08   0:05  \_ [xfsaild/vdc1]
root         723  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-ttm]
root        1286  0.0  0.0      0     0 ?        I<   Oct08   0:00  \_ [kworker/R-tls-strp]
root        2772  0.0  0.0      0     0 ?        S    Oct08   0:00  \_ [psimon]
root      171717  0.0  0.0      0     0 ?        I    09:03   0:00  \_ [kworker/u4:3-events_power_efficient]
root      174477  0.0  0.0      0     0 ?        I    10:01   0:00  \_ [kworker/0:2-xfs-conv/vdc1]
root      174683  0.0  0.0      0     0 ?        I    10:06   0:00  \_ [kworker/u4:2-writeback]
root      175378  0.0  0.0      0     0 ?        I    10:20   0:00  \_ [kworker/u4:4-events_unbound]
root      176049  0.0  0.0      0     0 ?        I    10:34   0:00  \_ [kworker/0:3-xfs-conv/vdc1]
root      176150  0.0  0.0      0     0 ?        I<   10:35   0:00  \_ [kworker/0:1H-xfs-log/vda1]
root      176358  0.0  0.0      0     0 ?        I    10:40   0:00  \_ [kworker/0:0-xfs-conv/vdc1]
root      176402  0.0  0.0      0     0 ?        I    10:41   0:00  \_ [kworker/u4:0-events_power_efficient]
root      176544  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:1-writeback]
root      176545  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:5-writeback]
root      176546  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:6-events_power_efficient]
root      176549  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:1-xfs-conv/vdc1]
root      176550  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:4-xfs-conv/vdc1]
root      176551  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:5-xfs-conv/vdc1]
root      176552  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:6-xfs-conv/vdc1]
root      176553  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:7-xfs-conv/vdc1]
root      176554  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:8-xfs-conv/vdc1]
root      176555  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:9-xfs-conv/vdc1]
root      176556  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:10-xfs-conv/vdc1]
root      176557  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:11-xfs-conv/vdc1]
root      176558  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:12-kthrotld]
root      176559  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:13-xfs-conv/vdc1]
root      176560  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:14-xfs-conv/vdc1]
root      176561  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:15-xfs-conv/vdc1]
root      176562  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:16-xfs-conv/vdc1]
root      176563  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:17-xfs-conv/vdc1]
root      176564  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:18-xfs-conv/vdc1]
root      176565  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:19-xfs-conv/vdc1]
root      176566  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:20-xfs-conv/vdc1]
root      176567  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:21-xfs-conv/vdc1]
root      176568  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:22-xfs-conv/vdc1]
root      176569  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:23-xfs-conv/vdc1]
root      176570  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:24-xfs-conv/vdc1]
root      176571  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:25-xfs-conv/vdc1]
root      176572  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:26-xfs-conv/vdc1]
root      176573  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:27-xfs-conv/vdc1]
root      176574  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:28-xfs-conv/vdc1]
root      176575  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:29-xfs-conv/vdc1]
root      176576  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:30-xfs-conv/vdc1]
root      176577  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:31-xfs-conv/vdc1]
root      176578  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:32-xfs-conv/vdc1]
root      176579  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:33-xfs-conv/vdc1]
root      176580  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:34-xfs-conv/vdc1]
root      176581  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:35-xfs-conv/vdc1]
root      176582  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:36-xfs-conv/vdc1]
root      176583  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:37-xfs-conv/vdc1]
root      176584  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:38-xfs-conv/vdc1]
root      176585  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:39-xfs-conv/vdc1]
root      176586  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:40-xfs-conv/vdc1]
root      176587  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:41-xfs-buf/vdc1]
root      176588  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:42-xfs-conv/vdc1]
root      176589  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:43-xfs-conv/vdc1]
root      176590  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:44-xfs-conv/vdc1]
root      176591  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:45-xfs-conv/vdc1]
root      176592  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:46-xfs-conv/vdc1]
root      176593  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:47-xfs-conv/vdc1]
root      176594  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:48-xfs-conv/vdc1]
root      176595  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:49-xfs-conv/vdc1]
root      176596  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:50-xfs-conv/vdc1]
root      176597  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:51-xfs-conv/vdc1]
root      176598  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:52-xfs-conv/vdc1]
root      176599  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:53-xfs-conv/vdc1]
root      176600  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:54-xfs-conv/vdc1]
root      176601  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:55-xfs-conv/vdc1]
root      176602  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:56-xfs-conv/vdc1]
root      176603  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:57-xfs-conv/vdc1]
root      176604  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:58-xfs-conv/vdc1]
root      176605  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:59-xfs-conv/vdc1]
root      176606  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:60-xfs-conv/vdc1]
root      176607  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:61-xfs-conv/vdc1]
root      176608  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:62-xfs-conv/vdc1]
root      176609  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:63-xfs-conv/vdc1]
root      176610  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:64-xfs-conv/vdc1]
root      176611  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:65-xfs-conv/vdc1]
root      176612  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:66-xfs-conv/vdc1]
root      176613  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:67-xfs-conv/vdc1]
root      176614  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:68-xfs-conv/vdc1]
root      176615  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:69-xfs-conv/vdc1]
root      176616  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:70-xfs-conv/vdc1]
root      176617  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:71-xfs-conv/vdc1]
root      176618  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:72-xfs-conv/vdc1]
root      176619  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:73-xfs-conv/vdc1]
root      176620  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:74-xfs-conv/vdc1]
root      176621  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:75-xfs-conv/vdc1]
root      176622  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:76-xfs-conv/vdc1]
root      176623  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:77-xfs-conv/vdc1]
root      176624  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:78-xfs-conv/vdc1]
root      176625  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:79-xfs-conv/vdc1]
root      176626  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:80-xfs-conv/vdc1]
root      176627  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:81-xfs-conv/vdc1]
root      176628  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:82-xfs-conv/vdc1]
root      176629  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:83-xfs-conv/vdc1]
root      176630  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:84-xfs-conv/vdc1]
root      176631  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:85-xfs-conv/vdc1]
root      176632  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:86-xfs-conv/vdc1]
root      176633  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:87-xfs-conv/vdc1]
root      176634  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:88-xfs-conv/vdc1]
root      176635  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:89-xfs-conv/vdc1]
root      176636  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:90-xfs-conv/vdc1]
root      176637  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:91-xfs-conv/vdc1]
root      176638  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:92-xfs-conv/vdc1]
root      176639  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:93-xfs-conv/vdc1]
root      176640  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:94-xfs-conv/vdc1]
root      176641  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:95-xfs-conv/vdc1]
root      176642  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:96-xfs-conv/vdc1]
root      176643  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:97-xfs-conv/vdc1]
root      176644  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:98-xfs-conv/vdc1]
root      176645  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:99-xfs-conv/vdc1]
root      176646  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:100-xfs-conv/vdc1]
root      176647  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:101-xfs-conv/vdc1]
root      176648  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:102-xfs-conv/vdc1]
root      176649  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:103-xfs-conv/vdc1]
root      176650  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:104-xfs-conv/vdc1]
root      176651  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:105-xfs-conv/vdc1]
root      176652  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:106-xfs-conv/vdc1]
root      176653  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:107-xfs-conv/vdc1]
root      176654  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:108-xfs-conv/vdc1]
root      176655  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:109-xfs-conv/vdc1]
root      176656  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:110-xfs-conv/vdc1]
root      176657  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:111-xfs-conv/vdc1]
root      176658  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:112-xfs-conv/vdc1]
root      176659  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:113-xfs-conv/vdc1]
root      176660  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:114-xfs-conv/vdc1]
root      176661  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:115-xfs-conv/vdc1]
root      176662  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:116-xfs-conv/vdc1]
root      176663  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:117-xfs-conv/vdc1]
root      176664  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:118-xfs-conv/vdc1]
root      176665  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:119-xfs-conv/vdc1]
root      176666  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:120-xfs-conv/vdc1]
root      176667  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:121-xfs-conv/vdc1]
root      176668  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:122-xfs-conv/vdc1]
root      176669  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:123-xfs-conv/vdc1]
root      176670  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:124-xfs-conv/vdc1]
root      176671  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:125-xfs-conv/vdc1]
root      176672  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:126-xfs-conv/vdc1]
root      176673  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:127-xfs-conv/vdc1]
root      176674  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:128-xfs-conv/vdc1]
root      176675  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:129-xfs-conv/vdc1]
root      176676  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:130-xfs-conv/vdc1]
root      176677  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:131-xfs-conv/vdc1]
root      176678  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:132-xfs-conv/vdc1]
root      176679  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:133-xfs-conv/vdc1]
root      176680  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:134-xfs-conv/vdc1]
root      176681  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:135-xfs-conv/vdc1]
root      176682  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:136-xfs-conv/vdc1]
root      176683  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:137-xfs-conv/vdc1]
root      176684  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:138-xfs-conv/vdc1]
root      176685  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:139-xfs-conv/vdc1]
root      176686  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:140-xfs-conv/vdc1]
root      176687  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:141-xfs-conv/vdc1]
root      176688  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:142-xfs-conv/vdc1]
root      176689  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:143-xfs-conv/vdc1]
root      176690  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:144-xfs-conv/vdc1]
root      176691  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:145-xfs-conv/vdc1]
root      176692  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:146-xfs-conv/vdc1]
root      176693  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:147-xfs-conv/vdc1]
root      176694  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:148-xfs-conv/vdc1]
root      176695  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:149-xfs-conv/vdc1]
root      176696  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:150-xfs-conv/vdc1]
root      176697  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:151-xfs-conv/vdc1]
root      176698  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:152-xfs-conv/vdc1]
root      176699  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:153-xfs-conv/vdc1]
root      176700  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:154-xfs-conv/vdc1]
root      176701  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:155-xfs-conv/vdc1]
root      176702  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:156-xfs-conv/vdc1]
root      176703  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:157-xfs-conv/vdc1]
root      176704  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:158-xfs-buf/vda1]
root      176705  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:159-xfs-conv/vdc1]
root      176706  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:160-xfs-conv/vdc1]
root      176707  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:161-xfs-conv/vdc1]
root      176708  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:162-xfs-conv/vdc1]
root      176709  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:163-xfs-conv/vdc1]
root      176710  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:164-xfs-conv/vdc1]
root      176711  0.2  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:165-xfs-conv/vda1]
root      176712  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:166-xfs-conv/vdc1]
root      176713  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:167-xfs-conv/vdc1]
root      176714  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:168-xfs-conv/vdc1]
root      176715  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:169-xfs-conv/vdc1]
root      176716  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:170-xfs-conv/vdc1]
root      176717  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:171-xfs-conv/vdc1]
root      176718  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:172-xfs-conv/vdc1]
root      176719  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:173-xfs-conv/vdc1]
root      176720  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:174-xfs-conv/vdc1]
root      176721  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:175-xfs-conv/vdc1]
root      176722  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:176-xfs-conv/vdc1]
root      176723  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:177-xfs-conv/vdc1]
root      176724  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:178-xfs-conv/vdc1]
root      176725  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:179-xfs-conv/vdc1]
root      176726  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:180-xfs-conv/vdc1]
root      176727  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:181-xfs-conv/vdc1]
root      176728  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:182-xfs-conv/vdc1]
root      176729  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:183-xfs-conv/vdc1]
root      176730  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:184-xfs-conv/vdc1]
root      176731  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:185-xfs-conv/vdc1]
root      176732  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:186-xfs-conv/vdc1]
root      176733  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:187-xfs-conv/vdc1]
root      176734  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:188-xfs-conv/vdc1]
root      176735  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:189-xfs-conv/vdc1]
root      176736  0.3  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:190-cgroup_destroy]
root      176737  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:191-xfs-conv/vdc1]
root      176738  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:192-xfs-conv/vdc1]
root      176739  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:193-xfs-conv/vdc1]
root      176740  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:194-xfs-conv/vdc1]
root      176741  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:195-xfs-conv/vdc1]
root      176742  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:196-xfs-conv/vdc1]
root      176743  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:197-xfs-conv/vdc1]
root      176744  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:198-xfs-conv/vdc1]
root      176745  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:199-xfs-conv/vdc1]
root      176746  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:200-xfs-conv/vdc1]
root      176747  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:201-xfs-conv/vdc1]
root      176748  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:202-xfs-conv/vdc1]
root      176749  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:203-xfs-conv/vdc1]
root      176750  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:204-xfs-conv/vdc1]
root      176751  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:205-xfs-conv/vdc1]
root      176752  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:206-xfs-buf/vda1]
root      176753  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:207-xfs-conv/vdc1]
root      176754  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:208-xfs-conv/vdc1]
root      176755  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:209-xfs-conv/vdc1]
root      176756  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:210-xfs-conv/vdc1]
root      176757  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:211-xfs-conv/vdc1]
root      176758  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:212-xfs-conv/vdc1]
root      176759  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:213-xfs-conv/vdc1]
root      176760  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:214-xfs-conv/vdc1]
root      176761  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:215-xfs-conv/vdc1]
root      176762  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:216-xfs-conv/vdc1]
root      176763  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:217-xfs-conv/vdc1]
root      176764  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:218-xfs-conv/vdc1]
root      176765  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:219-xfs-conv/vdc1]
root      176766  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:220-xfs-conv/vdc1]
root      176767  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:221-xfs-conv/vdc1]
root      176768  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:222-xfs-conv/vdc1]
root      176769  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:223-xfs-conv/vdc1]
root      176770  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:224-xfs-conv/vdc1]
root      176771  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:225-xfs-conv/vdc1]
root      176772  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:226-xfs-conv/vdc1]
root      176773  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:227-xfs-conv/vdc1]
root      176774  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:228-xfs-conv/vdc1]
root      176775  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:229-xfs-conv/vdc1]
root      176776  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:230-xfs-conv/vdc1]
root      176777  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:231-xfs-conv/vdc1]
root      176778  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:232-xfs-conv/vdc1]
root      176779  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:233-xfs-conv/vdc1]
root      176780  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:234-xfs-conv/vdc1]
root      176781  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:235-xfs-conv/vdc1]
root      176782  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:236-xfs-conv/vdc1]
root      176783  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:237-xfs-conv/vdc1]
root      176784  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:238-xfs-conv/vdc1]
root      176785  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:239-xfs-conv/vdc1]
root      176786  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:240-xfs-conv/vdc1]
root      176787  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:241-xfs-conv/vdc1]
root      176788  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:242-xfs-conv/vdc1]
root      176789  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:243-xfs-conv/vdc1]
root      176790  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:244-xfs-conv/vdc1]
root      176791  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:245-xfs-conv/vdc1]
root      176792  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:246-xfs-conv/vdc1]
root      176793  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:247-xfs-conv/vdc1]
root      176794  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:248-xfs-conv/vdc1]
root      176795  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:249-xfs-conv/vdc1]
root      176796  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:250-xfs-conv/vdc1]
root      176797  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:251-xfs-conv/vdc1]
root      176798  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:252-xfs-conv/vdc1]
root      176799  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:253-xfs-conv/vdc1]
root      176800  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:254-xfs-conv/vdc1]
root      176801  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:255-xfs-conv/vdc1]
root      176802  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:256-xfs-buf/vda1]
root      176803  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/0:257-xfs-conv/vdc1]
root      176804  0.0  0.0      0     0 ?        I    10:43   0:00  \_ [kworker/u4:7-events_unbound]
root      176813  0.0  0.0      0     0 ?        I<   10:44   0:00  \_ [kworker/0:2H-kblockd]
root      176814  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/u4:8-events_unbound]
root      176815  0.0  0.0      0     0 ?        I    10:44   0:00  \_ [kworker/0:258]
root           1  0.0  0.3  21852 13056 ?        Ss   Oct08   0:19 /run/current-system/systemd/lib/systemd/systemd
root         399  0.0  1.8 139764 75096 ?        Ss   Oct08   0:13 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-journald
root         455  0.0  0.2  33848  8168 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-udevd
systemd+     811  0.0  0.1  16800  6660 ?        Ss   Oct08   0:10 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-oomd
systemd+     816  0.0  0.1  91380  7952 ?        Ssl  Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-timesyncd
root         837  0.0  0.0  80596  3288 ?        Ssl  Oct08   1:37 /nix/store/ag3xk1l8ij06vx434abk8643f8p7i08c-qemu-host-cpu-only-8.2.6-ga/bin/qemu-ga --statedir /run/qemu-ga
root         840  0.0  0.0 226896  1984 ?        Ss   Oct08   0:00 /nix/store/k34f0d079arcgfjsq78gpkdbd6l6nnq4-cron-4.1/bin/cron -n
message+     850  0.0  0.1  13776  6080 ?        Ss   Oct08   0:05 /nix/store/0hm8vh65m378439kl16xv0p6l7c51asj-dbus-1.14.10/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         876  0.0  0.1  17468  7968 ?        Ss   Oct08   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd-logind
nscd        1074  0.0  0.1 555748  6016 ?        Ssl  Oct08   0:28 /nix/store/zza9hvd6iawqdcxvinf4yxv580av3s9f-nsncd-unstable-2024-01-16/bin/nsncd
telegraf    1092  0.3  3.4 6344672 138484 ?      S<Lsl Oct08  13:05 /nix/store/8bnbkyh26j97l0pw02gb7lngh4n6k3r5-telegraf-1.30.3/bin/telegraf -config /nix/store/nh4k7bx1asm0kn1klhbmg52wk1qdcwpw-config.toml -config-directory /nix/store/dj77wnb5j
root        1093  0.0  1.5 1109328 60864 ?       Ssl  Oct08   2:24 /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/fn9jcsr2kp2kq3m2qd6qrkv6xh7jcj5g-fail2ban-1.0.2/bin/.fail2ban-server-wrapped -xf start
sensucl+    1094  0.0  0.9 898112 38340 ?        Ssl  Oct08   1:41 /nix/store/qqc6v89xn0g2w123wx85blkpc4pz2ags-ruby-2.7.8/bin/ruby /nix/store/dpvf0jdq1mbrdc90aapyrn2wvjbpckyv-sensu-check-env/bin/sensu-client -L warn -c /nix/store/ly677hg5b7szz
root        1098  0.0  0.1  11564  7568 ?        Ss   Oct08   0:00 sshd: /nix/store/1m888byzaqaig6azrrfpmjdyhgfliaga-openssh-9.7p1/bin/sshd -D -f /etc/ssh/sshd_config [listener] 0 of 10-100 startups
root      176967  0.0  0.2  14380  9840 ?        Ss   10:47   0:00  \_ sshd: ctheune [priv]
ctheune   176988  0.2  0.1  14540  5856 ?        S    10:47   0:00      \_ sshd: ctheune@pts/0
ctheune   176992  0.0  0.1 230756  5968 pts/0    Ss   10:47   0:00          \_ -bash
root      176998  0.0  0.0 228796  3956 pts/0    S+   10:47   0:00              \_ sudo -i
root      177001  0.0  0.0 228796  1604 pts/1    Ss   10:47   0:00                  \_ sudo -i
root      177002  0.0  0.1 230892  6064 pts/1    S    10:47   0:00                      \_ -bash
root      177075  0.0  0.1 232344  4048 pts/1    R+   10:48   0:00                          \_ ps auxf
root        1101  0.0  0.0 226928  1944 tty1     Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login --noclear --keep-baud tty1 115200,38400,9600 linux
root        1102  0.0  0.0 226928  2192 ttyS0    Ss+  Oct08   0:00 agetty --login-program /nix/store/gwihsgkd13xmk8vwfn2k1nkdi9bys42x-shadow-4.14.6/bin/login ttyS0 --keep-baud vt220
_du4651+    1105  0.0  2.2 2505204 90952 ?       Ssl  Oct08   1:15 /nix/store/ff5j2is3di7praysyv232wfvcq7hvkii-filebeat-oss-7.17.16/bin/filebeat -e -c /nix/store/xlb56lv0f3j03l3v34x5jfvq8wng18ww-filebeat-journal-services19.gocept.net.json -pat
mysql       2809  0.3 18.6 4784932 750856 ?      Ssl  Oct08  11:47 /nix/store/9iq211dy95nqn484nx5z5mv3c7pc2h27-percona-server_lts-8.0.36-28/bin/mysqld --defaults-extra-file=/nix/store/frvxmffp9fpgq06bx89rgczyn6k6i51y-my.cnf --user=mysql --data
root      176527  0.0  0.0 227904  3236 ?        SNs  10:43   0:00 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/s8g5ls9d611hjq5psyd15sqbpqgrlwck-unit-script-fc-agent-start/bin/fc-agent-start
root      176535  0.0  1.1 279068 46452 ?        SN   10:43   0:00  \_ /nix/store/h723hb9m43lybmvfxkk6n7j4v664qy7b-python3-3.11.9/bin/python3.11 /nix/store/gavi1rlv3ja79vl5hg3lgh07absa8yb9-python3.11-fc-agent-1.0/bin/.fc-manage-wrapped --enc-p
root      176536  3.2  1.8 635400 72368 ?        DNl  10:43   0:09      \_ nix-build --no-build-output <nixpkgs/nixos> -A system -I https://hydra.flyingcircus.io/build/496886/download/1/nixexprs.tar.xz --out-link /run/fc-agent-built-system
ctheune   176972  0.0  0.2  20028 11856 ?        Ss   10:47   0:00 /nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/lib/systemd/systemd --user
ctheune   176974  0.0  0.0  20368  3004 ?        S    10:47   0:00  \_ (sd-pam)


[218342.027043] systemd[1]: fc-agent.service: Deactivated successfully.
[218342.027658] systemd[1]: Finished Flying Circus Management Task.
[218342.028479] systemd[1]: fc-agent.service: Consumed 17.942s CPU time, received 28.8M IP traffic, sent 133.3K IP traffic.

[218331.821045432] (no further output from walker.py)

[-- Attachment #10: Type: text/plain, Size: 281 bytes --]



-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-11  9:08                                             ` Christian Theune
@ 2024-10-11 13:06                                               ` Chris Mason
  2024-10-11 13:50                                                 ` Christian Theune
  2024-10-12 17:01                                                 ` Linus Torvalds
  0 siblings, 2 replies; 81+ messages in thread
From: Chris Mason @ 2024-10-11 13:06 UTC (permalink / raw)
  To: Christian Theune
  Cc: Linus Torvalds, Dave Chinner, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions



On 10/11/24 5:08 AM, Christian Theune wrote:
> 
>> On 11. Oct 2024, at 09:27, Christian Theune <ct@flyingcircus.io> wrote:
>>
>> I’m going to gather a few more instances during the day and will post them as a batch later.
> 
> I’ve received 8 alerts in the last hours and managed to get detailed, repeated walker output from two of them:
> 
> - FC-41287.log
> - FC-41289.log

These are really helpful.

If io throttling were the cause, the traces should also have a process
that's waiting to submit the IO, but that's not present here.

Another common pattern is hung tasks with a process stuck in the kernel
burning CPU, but holding a lock or being somehow responsible for waking
the hung task.  Your process listings don't have that either.

One part I wanted to mention:

[820710.974122] Future hung task reports are suppressed, see sysctl
kernel.hung_task_warnings

By default you only get 10 or so hung task notifications per boot, and
after that they are suppressed. So for example, if you're watching a
count of hung task messages across a lot of machines and thinking that
things are pretty stable because you're not seeing hung task messages
anymore...the kernel might have just stopped complaining.

This isn't exactly new kernel behavior, but it can be a surprise.

Anyway, this leaves me with ~3 theories:

- Linus's starvation observation.  It doesn't feel like there's enough
load to cause this, especially given us sitting in truncate, where it
should be pretty unlikely to have multiple procs banging on the page in
question.

- Willy's folio->mapping check idea.  I _think_ this is also wrong, the
reference counts we have in the truncate path check folio->mapping
before returning, and we shouldn't be able to reuse the folio in a
different mapping while we have the reference held.

If this is the problem it would mean our original bug is slightly
unfixed.  But the fact that you're not seeing other problems, and these
hung tasks do resolve should mean we're ok.  We can add a printk or just
run a drgn script to check.

- It's actually taking the IO a long time to finish.  We can poke at the
pending requests, how does the device look in the VM?  (virtio, scsi etc).

-chris


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-11 13:06                                               ` Chris Mason
@ 2024-10-11 13:50                                                 ` Christian Theune
  2024-10-12 17:01                                                 ` Linus Torvalds
  1 sibling, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-10-11 13:50 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Dave Chinner, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

Hi,

> On 11. Oct 2024, at 15:06, Chris Mason <clm@meta.com> wrote:
> 
> - It's actually taking the IO a long time to finish.  We can poke at the
> pending requests, how does the device look in the VM?  (virtio, scsi etc).

I _think_ that’s not it. This is a Qemu w/ virtio-block + Ceph stack with 2x10G and fully SSD backed. The last 24 hours show operation latency at less than 0.016ms. Ceph’s slow request warning (30s limit) has not triggered in the last 24 hours.

Also, aside from a VM that was exhausting its Qemu io throttling for a minute (and stuck in completely different tracebacks) the only blocked task reports from the last 48 hours was this specific process.

I’d expect that we’d see a lot more reports about IO issues from multiple VMs and multiple loads at the same time when the storage misbehaves (we did experience those in the long long past in older Ceph versions and with spinning rust, so I’m pretty confident (at the moment) this isn’t a storage issue per se).

Incidentally this now reminds me of a different (maybe not?) issue that I’ve been trying to track down with mdraid/xfs:
https://marc.info/?l=linux-raid&m=172295385102939&w=2

This is only tested on an older kernel so far (5.15.138) and we ended up seeing IOPS stuck in the md device but not below it. However, MD isn’t involved here. I made the connection because the original traceback also shows it stuck in “wait_on_page_writeback”, but maybe that’s a red herring:

[Aug 6 09:35] INFO: task .backy-wrapped:2615 blocked for more than 122 seconds.
[ +0.008130] Not tainted 5.15.138 #1-NixOS
[ +0.005194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.008895] task:.backy-wrapped state:D stack: 0 pid: 2615 ppid: 1 flags:0x00000002
[ +0.000005] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] __schedule+0x373/0x1580
[ +0.000009] ? xlog_cil_commit+0x559/0x880 [xfs]
[ +0.000041] schedule+0x5b/0xe0
[ +0.000001] io_schedule+0x42/0x70
[ +0.000001] wait_on_page_bit_common+0x119/0x380
[ +0.000005] ? __page_cache_alloc+0x80/0x80
[ +0.000002] wait_on_page_writeback+0x22/0x70
[ +0.000001] truncate_inode_pages_range+0x26f/0x6d0
[ +0.000006] evict+0x15f/0x180
[ +0.000003] __dentry_kill+0xde/0x170
[ +0.000001] dput+0x15b/0x330
[ +0.000002] do_renameat2+0x34e/0x5b0
[ +0.000003] __x64_sys_rename+0x3f/0x50
[ +0.000002] do_syscall_64+0x3a/0x90
[ +0.000002] entry_SYSCALL_64_after_hwframe+0x62/0xcc
[ +0.000003] RIP: 0033:0x7fdd1885275b
[ +0.000002] RSP: 002b:00007ffde643ad18 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[ +0.000002] RAX: ffffffffffffffda RBX: 00007ffde643adb0 RCX: 00007fdd1885275b
[ +0.000001] RDX: 0000000000000000 RSI: 00007fdd09a3d3d0 RDI: 00007fdd098549d0
[ +0.000001] RBP: 00007ffde643ad60 R08: 00000000ffffffff R09: 0000000000000000
[ +0.000001] R10: 00007ffde643af90 R11: 0000000000000246 R12: 00000000ffffff9c
[ +0.000000] R13: 00000000ffffff9c R14: 000000000183cab0 R15: 00007fdd0b128810
[ +0.000001] </TASK>
[ +0.000011] INFO: task kworker/u64:0:2380262 blocked for more than 122 seconds.
[ +0.008309] Not tainted 5.15.138 #1-NixOS
[ +0.005190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.008895] task:kworker/u64:0 state:D stack: 0 pid:2380262 ppid: 2 flags:0x00004000
[ +0.000004] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ +0.000006] Call Trace:
[ +0.000001] <TASK>
[ +0.000001] __schedule+0x373/0x1580
[ +0.000003] schedule+0x5b/0xe0
[ +0.000001] md_bitmap_startwrite+0x177/0x1e0
[ +0.000004] ? finish_wait+0x90/0x90
[ +0.000004] add_stripe_bio+0x449/0x770 [raid456]
[ +0.000005] raid5_make_request+0x1cf/0xbd0 [raid456]
[ +0.000003] ? kmem_cache_alloc_node_trace+0x391/0x3e0
[ +0.000004] ? linear_map+0x44/0x90 [dm_mod]
[ +0.000005] ? finish_wait+0x90/0x90
[ +0.000001] ? __blk_queue_split+0x516/0x580
[ +0.000003] md_handle_request+0x122/0x1b0
[ +0.000003] md_submit_bio+0x6e/0xb0
[ +0.000001] __submit_bio+0x18f/0x220
[ +0.000002] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ +0.000002] submit_bio_noacct+0xbe/0x2d0
[ +0.000001] kcryptd_crypt+0x392/0x550 [dm_crypt]
[ +0.000002] process_one_work+0x1d6/0x360
[ +0.000003] worker_thread+0x4d/0x3b0
[ +0.000002] ? process_one_work+0x360/0x360
[ +0.000001] kthread+0x118/0x140
[ +0.000001] ? set_kthread_struct+0x50/0x50
[ +0.000001] ret_from_fork+0x22/0x30
[ +0.000004] </TASK>
…(more md kworker tasks pile up here)

Christian

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-11 13:06                                               ` Chris Mason
  2024-10-11 13:50                                                 ` Christian Theune
@ 2024-10-12 17:01                                                 ` Linus Torvalds
  2024-12-02 10:44                                                   ` Christian Theune
  1 sibling, 1 reply; 81+ messages in thread
From: Linus Torvalds @ 2024-10-12 17:01 UTC (permalink / raw)
  To: Chris Mason
  Cc: Christian Theune, Dave Chinner, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao,
	regressions, regressions

[-- Attachment #1: Type: text/plain, Size: 3012 bytes --]

On Fri, 11 Oct 2024 at 06:06, Chris Mason <clm@meta.com> wrote:
>
> - Linus's starvation observation.  It doesn't feel like there's enough
> load to cause this, especially given us sitting in truncate, where it
> should be pretty unlikely to have multiple procs banging on the page in
> question.

Yeah, I think the starvation can only possibly happen in
fdatasync-like paths where it's waiting for existing writeback without
holding the page lock. And while Christian has had those backtraces
too, the truncate path is not one of them.

That said, just because I wanted to see how nasty it is, I looked into
changing the rules for folio_wake_bit().

Christian, just to clarify, this is not for  you to test - this is
very experimental - but maybe Willy has comments on it.

Because it *might* be possible to do something like the attached,
where we do the page flags changes atomically but without any locks if
there are no waiters, but if there is a waiter on the page, we always
clear the page flag bit atomically under the waitqueue lock as we wake
up the waiter.

I changed the name (and the return value) of the
folio_xor_flags_has_waiters() function to just not have any
possibility of semantic mixup, but basically instead of doing the xor
atomically and unconditionally (and returning whether we had waiters),
it now does it conditionally only if we do *not* have waiters, and
returns true if successful.

And if there were waiters, it moves the flag clearing into the wakeup function.

That in turn means that the "while whiteback" loop can go back to be
just a non-looping "if writeback", and folio_wait_writeback() can't
get into any starvation with new writebacks always showing up.

The reason I say it *might* be possible to do something like this is
that it changes __folio_end_writeback() to no longer necessarily clear
the writeback bit under the XA lock. If there are waiters, we'll clear
it later (after releasing the lock) in the caller.

Willy? What do you think? Clearly this now makes PG_writeback not
synchronized with the PAGECACHE_TAG_WRITEBACK tag, but the reason I
think it might be ok is that the code that *sets* the PG_writeback bit
in __folio_start_writeback() only ever starts with a page that isn't
under writeback, and has a

        VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);

at the top of the function even outside the XA lock. So I don't think
these *need* to be synchronized under the XA lock, and I think the
folio flag wakeup atomicity might be more important than the XA
writeback tag vs folio writeback bit.

But I'm not going to really argue for this patch at all - I wanted to
look at how bad it was, I wrote it, I'm actually running it on my
machine now and it didn't *immediately* blow up in my face, so it
*may* work just fine.

The patch is fairly simple, and apart from the XA tagging issue is
seems very straightforward. I'm just not sure it's worth synchronizing
one part just to at the same time de-synchronize another..

                   Linus

[-- Attachment #2: 0001-Test-atomic-folio-bit-waiting.patch --]
[-- Type: text/x-patch, Size: 5519 bytes --]

From 9d4f0d60abc4dce5b7cfbad4576a2829832bb838 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 12 Oct 2024 09:34:24 -0700
Subject: [PATCH] Test atomic folio bit waiting

---
 include/linux/page-flags.h | 26 ++++++++++++++++----------
 mm/filemap.c               | 28 ++++++++++++++++++++++++++--
 mm/page-writeback.c        |  6 +++---
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1b3a76710487..b30a73e1c2c7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -730,22 +730,28 @@ TESTPAGEFLAG_FALSE(Ksm, ksm)
 u64 stable_page_flags(const struct page *page);
 
 /**
- * folio_xor_flags_has_waiters - Change some folio flags.
+ * folio_xor_flags_no_waiters - Change folio flags if no waiters
  * @folio: The folio.
- * @mask: Bits set in this word will be changed.
+ * @mask: Which flags to change.
  *
- * This must only be used for flags which are changed with the folio
- * lock held.  For example, it is unsafe to use for PG_dirty as that
- * can be set without the folio lock held.  It can also only be used
- * on flags which are in the range 0-6 as some of the implementations
- * only affect those bits.
+ * This does the optimistic fast-case of changing page flag bits
+ * that has no waiters. Only flags in the first word can be modified,
+ * and the old value must be stable (typically this clears the
+ * locked or writeback bit or similar).
  *
- * Return: Whether there are tasks waiting on the folio.
+ * Return: true if it succeeded
  */
-static inline bool folio_xor_flags_has_waiters(struct folio *folio,
+static inline bool folio_xor_flags_no_waiters(struct folio *folio,
 		unsigned long mask)
 {
-	return xor_unlock_is_negative_byte(mask, folio_flags(folio, 0));
+	const unsigned long waiter_mask = 1ul << PG_waiters;
+	unsigned long *flags = folio_flags(folio, 0);
+	unsigned long val = READ_ONCE(*flags);
+	do {
+		if (val & waiter_mask)
+			return false;
+	} while (!try_cmpxchg_release(flags, &val, val ^ mask));
+	return true;
 }
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 664e607a71ea..5fbaf6cea964 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1164,6 +1164,14 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	return (flags & WQ_FLAG_EXCLUSIVE) != 0;
 }
 
+/*
+ * Clear the folio bit and wake waiters atomically under
+ * the folio waitqueue lock.
+ *
+ * Note that the fast-path alternative to calling this is
+ * to atomically clear the bit and check that the PG_waiters
+ * bit was not set.
+ */
 static void folio_wake_bit(struct folio *folio, int bit_nr)
 {
 	wait_queue_head_t *q = folio_waitqueue(folio);
@@ -1175,6 +1183,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
 	key.page_match = 0;
 
 	spin_lock_irqsave(&q->lock, flags);
+	clear_bit_unlock(bit_nr, folio_flags(folio, 0));
 	__wake_up_locked_key(q, TASK_NORMAL, &key);
 
 	/*
@@ -1507,7 +1516,7 @@ void folio_unlock(struct folio *folio)
 	BUILD_BUG_ON(PG_waiters != 7);
 	BUILD_BUG_ON(PG_locked > 7);
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
-	if (folio_xor_flags_has_waiters(folio, 1 << PG_locked))
+	if (!folio_xor_flags_no_waiters(folio, 1 << PG_locked))
 		folio_wake_bit(folio, PG_locked);
 }
 EXPORT_SYMBOL(folio_unlock);
@@ -1535,10 +1544,25 @@ void folio_end_read(struct folio *folio, bool success)
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(folio_test_uptodate(folio), folio);
 
+	/*
+	 * Try to clear 'locked' at the same time as setting 'uptodate'
+	 *
+	 * Note that if we have lock bit waiters and this fast-case fails,
+	 * we'll have to clear the lock bit atomically under the folio wait
+	 * queue lock, so then we'll set 'update' separately.
+	 *
+	 * Note that this is purely a "avoid multiple atomics in the
+	 * common case" - while the locked bit needs to be cleared
+	 * synchronously wrt waiters, the uptodate bit has no such
+	 * requirements.
+	 */
 	if (likely(success))
 		mask |= 1 << PG_uptodate;
-	if (folio_xor_flags_has_waiters(folio, mask))
+	if (!folio_xor_flags_no_waiters(folio, mask)) {
+		if (success)
+			set_bit(PG_uptodate, folio_flags(folio, 0));
 		folio_wake_bit(folio, PG_locked);
+	}
 }
 EXPORT_SYMBOL(folio_end_read);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fcd4c1439cb9..3277bc3ceff9 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -3081,7 +3081,7 @@ bool __folio_end_writeback(struct folio *folio)
 		unsigned long flags;
 
 		xa_lock_irqsave(&mapping->i_pages, flags);
-		ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
+		ret = !folio_xor_flags_no_waiters(folio, 1 << PG_writeback);
 		__xa_clear_mark(&mapping->i_pages, folio_index(folio),
 					PAGECACHE_TAG_WRITEBACK);
 		if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT) {
@@ -3099,7 +3099,7 @@ bool __folio_end_writeback(struct folio *folio)
 
 		xa_unlock_irqrestore(&mapping->i_pages, flags);
 	} else {
-		ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
+		ret = !folio_xor_flags_no_waiters(folio, 1 << PG_writeback);
 	}
 
 	lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr);
@@ -3184,7 +3184,7 @@ EXPORT_SYMBOL(__folio_start_writeback);
  */
 void folio_wait_writeback(struct folio *folio)
 {
-	while (folio_test_writeback(folio)) {
+	if (folio_test_writeback(folio)) {
 		trace_folio_wait_writeback(folio, folio_mapping(folio));
 		folio_wait_bit(folio, PG_writeback);
 	}
-- 
2.46.1.608.gc56f2c11c8


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
  2024-10-12 17:01                                                 ` Linus Torvalds
@ 2024-12-02 10:44                                                   ` Christian Theune
  0 siblings, 0 replies; 81+ messages in thread
From: Christian Theune @ 2024-12-02 10:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Dave Chinner, Matthew Wilcox, Jens Axboe, linux-mm,
	linux-xfs, linux-fsdevel, linux-kernel, Daniel Dao, regressions,
	regressions

Hi,

waking this thread up again: we’ve been running the original fix on top of 6.11 for roughly 8 weeks now and have not had a single occurence of this. I’d be willing to call this as fixed. 

@Linus: we didn’t specify an actual deadline, but I guess 8 week without any hit is good enough?

My plan would be to migrate our fleet to 6.6 now. AFAICT the relevant patch series is the one in
https://lore.kernel.org/all/20240415171857.19244-4-ryncsn@gmail.com/T/#u and was released in 6.6.54.

I’d like to revive the discussion on the second issue, though, as it ended with Linus’ last post
and I couldn’t find whether this may have been followed up elsewhere or still needs to be worked on?

Christian

> On 12. Oct 2024, at 19:01, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Fri, 11 Oct 2024 at 06:06, Chris Mason <clm@meta.com> wrote:
>> 
>> - Linus's starvation observation.  It doesn't feel like there's enough
>> load to cause this, especially given us sitting in truncate, where it
>> should be pretty unlikely to have multiple procs banging on the page in
>> question.
> 
> Yeah, I think the starvation can only possibly happen in
> fdatasync-like paths where it's waiting for existing writeback without
> holding the page lock. And while Christian has had those backtraces
> too, the truncate path is not one of them.
> 
> That said, just because I wanted to see how nasty it is, I looked into
> changing the rules for folio_wake_bit().
> 
> Christian, just to clarify, this is not for  you to test - this is
> very experimental - but maybe Willy has comments on it.
> 
> Because it *might* be possible to do something like the attached,
> where we do the page flags changes atomically but without any locks if
> there are no waiters, but if there is a waiter on the page, we always
> clear the page flag bit atomically under the waitqueue lock as we wake
> up the waiter.
> 
> I changed the name (and the return value) of the
> folio_xor_flags_has_waiters() function to just not have any
> possibility of semantic mixup, but basically instead of doing the xor
> atomically and unconditionally (and returning whether we had waiters),
> it now does it conditionally only if we do *not* have waiters, and
> returns true if successful.
> 
> And if there were waiters, it moves the flag clearing into the wakeup function.
> 
> That in turn means that the "while whiteback" loop can go back to be
> just a non-looping "if writeback", and folio_wait_writeback() can't
> get into any starvation with new writebacks always showing up.
> 
> The reason I say it *might* be possible to do something like this is
> that it changes __folio_end_writeback() to no longer necessarily clear
> the writeback bit under the XA lock. If there are waiters, we'll clear
> it later (after releasing the lock) in the caller.
> 
> Willy? What do you think? Clearly this now makes PG_writeback not
> synchronized with the PAGECACHE_TAG_WRITEBACK tag, but the reason I
> think it might be ok is that the code that *sets* the PG_writeback bit
> in __folio_start_writeback() only ever starts with a page that isn't
> under writeback, and has a
> 
>        VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
> 
> at the top of the function even outside the XA lock. So I don't think
> these *need* to be synchronized under the XA lock, and I think the
> folio flag wakeup atomicity might be more important than the XA
> writeback tag vs folio writeback bit.
> 
> But I'm not going to really argue for this patch at all - I wanted to
> look at how bad it was, I wrote it, I'm actually running it on my
> machine now and it didn't *immediately* blow up in my face, so it
> *may* work just fine.
> 
> The patch is fairly simple, and apart from the XA tagging issue is
> seems very straightforward. I'm just not sure it's worth synchronizing
> one part just to at the same time de-synchronize another..
> 
>                   Linus
> <0001-Test-atomic-folio-bit-waiting.patch>

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2024-12-02 10:44 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-12 21:18 Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Christian Theune
2024-09-12 21:55 ` Matthew Wilcox
2024-09-12 22:11   ` Christian Theune
2024-09-12 22:12   ` Jens Axboe
2024-09-12 22:25     ` Linus Torvalds
2024-09-12 22:30       ` Jens Axboe
2024-09-12 22:56         ` Linus Torvalds
2024-09-13  3:44           ` Matthew Wilcox
2024-09-13 13:23             ` Christian Theune
2024-09-13 12:11       ` Christian Brauner
2024-09-16 13:29         ` Matthew Wilcox
2024-09-18  9:51           ` Christian Brauner
2024-09-13 15:30       ` Chris Mason
2024-09-13 15:51         ` Matthew Wilcox
2024-09-13 16:33           ` Chris Mason
2024-09-13 18:15             ` Matthew Wilcox
2024-09-13 21:24               ` Linus Torvalds
2024-09-13 21:30                 ` Matthew Wilcox
2024-09-13 16:04       ` David Howells
2024-09-13 16:37         ` Chris Mason
2024-09-16  0:00       ` Dave Chinner
2024-09-16  4:20         ` Linus Torvalds
2024-09-16  8:47           ` Chris Mason
2024-09-17  9:32             ` Matthew Wilcox
2024-09-17  9:36               ` Chris Mason
2024-09-17 10:11               ` Christian Theune
2024-09-17 11:13               ` Chris Mason
2024-09-17 13:25                 ` Matthew Wilcox
2024-09-18  6:37                   ` Jens Axboe
2024-09-18  9:28                     ` Chris Mason
2024-09-18 12:23                       ` Chris Mason
2024-09-18 13:34                       ` Matthew Wilcox
2024-09-18 13:51                         ` Linus Torvalds
2024-09-18 14:12                           ` Matthew Wilcox
2024-09-18 14:39                             ` Linus Torvalds
2024-09-18 17:12                               ` Matthew Wilcox
2024-09-18 16:37                             ` Chris Mason
2024-09-19  1:43                         ` Dave Chinner
2024-09-19  3:03                           ` Linus Torvalds
2024-09-19  3:12                             ` Linus Torvalds
2024-09-19  3:38                               ` Jens Axboe
2024-09-19  4:32                                 ` Linus Torvalds
2024-09-19  4:42                                   ` Jens Axboe
2024-09-19  4:36                                 ` Matthew Wilcox
2024-09-19  4:46                                   ` Jens Axboe
2024-09-19  5:20                                     ` Jens Axboe
2024-09-19  4:46                                   ` Linus Torvalds
2024-09-20 13:54                                   ` Chris Mason
2024-09-24 15:58                                     ` Matthew Wilcox
2024-09-24 17:16                                     ` Sam James
2024-09-25 16:06                                       ` Kairui Song
2024-09-25 16:42                                         ` Christian Theune
2024-09-27 14:51                                         ` Sam James
2024-09-27 14:58                                           ` Jens Axboe
2024-10-01 21:10                                             ` Kairui Song
2024-09-24 19:17                                     ` Chris Mason
2024-09-24 19:24                                       ` Linus Torvalds
2024-09-19  6:34                               ` Christian Theune
2024-09-19  6:57                                 ` Linus Torvalds
2024-09-19 10:19                                   ` Christian Theune
2024-09-30 17:34                                     ` Christian Theune
2024-09-30 18:46                                       ` Linus Torvalds
2024-09-30 19:25                                         ` Christian Theune
2024-09-30 20:12                                           ` Linus Torvalds
2024-09-30 20:56                                             ` Matthew Wilcox
2024-09-30 22:42                                               ` Davidlohr Bueso
2024-09-30 23:00                                                 ` Davidlohr Bueso
2024-09-30 23:53                                               ` Linus Torvalds
2024-10-01  0:56                                       ` Chris Mason
2024-10-01  7:54                                         ` Christian Theune
2024-10-10  6:29                                         ` Christian Theune
2024-10-11  7:27                                           ` Christian Theune
2024-10-11  9:08                                             ` Christian Theune
2024-10-11 13:06                                               ` Chris Mason
2024-10-11 13:50                                                 ` Christian Theune
2024-10-12 17:01                                                 ` Linus Torvalds
2024-12-02 10:44                                                   ` Christian Theune
2024-10-01  2:22                                       ` Dave Chinner
2024-09-16  7:14         ` Christian Theune
2024-09-16 12:16           ` Matthew Wilcox
2024-09-18  8:31           ` Christian Theune

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox