gfs2 is unhappy on pagecache/for-next

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* gfs2 is unhappy on pagecache/for-next
@ 2022-06-19  7:05 Christoph Hellwig
  2022-06-19 12:15 ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2022-06-19  7:05 UTC (permalink / raw)
  To: Bob Peterson, Andreas Gruenbacher, Matthew Wilcox; +Cc: cluster-devel, linux-mm

When trying to run xfstests on gfs2 (locally with the lock_nolock
cluster managed) the first mount already hits this warning in
inode_to_wb called from mark_buffer_dirty.  This all seems standard
code from folio_account_dirtied, so not sure what is going there.


[   30.440408] ------------[ cut here ]------------
[   30.440409] WARNING: CPU: 1 PID: 931 at include/linux/backing-dev.h:261 __folio_mark_dirty+0x2f0/0x380
[   30.446424] Modules linked in:
[   30.446828] CPU: 1 PID: 931 Comm: kworker/1:2 Not tainted 5.19.0-rc2+ #1702
[   30.447714] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[   30.448770] Workqueue: gfs_recovery gfs2_recover_func
[   30.449441] RIP: 0010:__folio_mark_dirty+0x2f0/0x380
[   30.450113] Code: e8 b5 69 12 01 85 c0 0f 85 6a fe ff ff 48 8b 83 a8 01 00 00 be ff ff ff ff 48 8d 78 2
[   30.452490] RSP: 0018:ffffc90001b77bd0 EFLAGS: 00010046
[   30.453141] RAX: 0000000000000000 RBX: ffff8881004a3d00 RCX: 0000000000000001
[   30.454067] RDX: 0000000000000000 RSI: ffffffff82f592db RDI: ffffffff830380ae
[   30.454970] RBP: ffffea000455f680 R08: 0000000000000001 R09: ffffffff84747570
[   30.455921] R10: 0000000000000017 R11: ffff88810260b1c0 R12: 0000000000000282
[   30.456910] R13: ffff88810dd92170 R14: 0000000000000001 R15: 0000000000000001
[   30.457871] FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   30.458912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.459608] CR2: 00007efc1d5adc80 CR3: 0000000116416000 CR4: 00000000000006e0
[   30.460564] Call Trace:
[   30.460871]  <TASK>
[   30.461130]  mark_buffer_dirty+0x173/0x1d0
[   30.461687]  update_statfs_inode+0x146/0x187
[   30.462276]  gfs2_recover_func.cold+0x48f/0x864
[   30.462875]  ? add_lock_to_list+0x8b/0xf0
[   30.463337]  ? __lock_acquire+0xf7e/0x1e30
[   30.463812]  ? lock_acquire+0xd4/0x300
[   30.464267]  ? lock_acquire+0xe4/0x300
[   30.464715]  ? gfs2_recover_func.cold+0x217/0x864
[   30.465334]  process_one_work+0x239/0x550
[   30.465920]  ? process_one_work+0x550/0x550
[   30.466485]  worker_thread+0x4d/0x3a0
[   30.466966]  ? process_one_work+0x550/0x550
[   30.467509]  kthread+0xe2/0x110
[   30.467941]  ? kthread_complete_and_exit+0x20/0x20
[   30.468558]  ret_from_fork+0x22/0x30
[   30.469047]  </TASK>
[   30.469346] irq event stamp: 36146
[   30.469796] hardirqs last  enabled at (36145): [<ffffffff8139185c>] folio_memcg_lock+0x8c/0x180
[   30.470919] hardirqs last disabled at (36146): [<ffffffff82429799>] _raw_spin_lock_irqsave+0x59/0x60
[   30.472024] softirqs last  enabled at (33630): [<ffffffff81157307>] __irq_exit_rcu+0xd7/0x130
[   30.473051] softirqs last disabled at (33619): [<ffffffff81157307>] __irq_exit_rcu+0xd7/0x130
[   30.474107] ---[ end trace 0000000000000000 ]---
[   30.475367] ------------[ cut here ]------------



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gfs2 is unhappy on pagecache/for-next
  2022-06-19  7:05 gfs2 is unhappy on pagecache/for-next Christoph Hellwig
@ 2022-06-19 12:15 ` Matthew Wilcox
  2022-06-20  6:21   ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2022-06-19 12:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bob Peterson, Andreas Gruenbacher, cluster-devel, linux-mm

On Sun, Jun 19, 2022 at 09:05:59AM +0200, Christoph Hellwig wrote:
> When trying to run xfstests on gfs2 (locally with the lock_nolock
> cluster managed) the first mount already hits this warning in
> inode_to_wb called from mark_buffer_dirty.  This all seems standard
> code from folio_account_dirtied, so not sure what is going there.

I don't think this is new to pagecache/for-next.
https://lore.kernel.org/linux-mm/cf8bc8dd-8e16-3590-a714-51203e6f4ba9@redhat.com/

> 
> [   30.440408] ------------[ cut here ]------------
> [   30.440409] WARNING: CPU: 1 PID: 931 at include/linux/backing-dev.h:261 __folio_mark_dirty+0x2f0/0x380
> [   30.446424] Modules linked in:
> [   30.446828] CPU: 1 PID: 931 Comm: kworker/1:2 Not tainted 5.19.0-rc2+ #1702
> [   30.447714] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [   30.448770] Workqueue: gfs_recovery gfs2_recover_func
> [   30.449441] RIP: 0010:__folio_mark_dirty+0x2f0/0x380
> [   30.450113] Code: e8 b5 69 12 01 85 c0 0f 85 6a fe ff ff 48 8b 83 a8 01 00 00 be ff ff ff ff 48 8d 78 2
> [   30.452490] RSP: 0018:ffffc90001b77bd0 EFLAGS: 00010046
> [   30.453141] RAX: 0000000000000000 RBX: ffff8881004a3d00 RCX: 0000000000000001
> [   30.454067] RDX: 0000000000000000 RSI: ffffffff82f592db RDI: ffffffff830380ae
> [   30.454970] RBP: ffffea000455f680 R08: 0000000000000001 R09: ffffffff84747570
> [   30.455921] R10: 0000000000000017 R11: ffff88810260b1c0 R12: 0000000000000282
> [   30.456910] R13: ffff88810dd92170 R14: 0000000000000001 R15: 0000000000000001
> [   30.457871] FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
> [   30.458912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   30.459608] CR2: 00007efc1d5adc80 CR3: 0000000116416000 CR4: 00000000000006e0
> [   30.460564] Call Trace:
> [   30.460871]  <TASK>
> [   30.461130]  mark_buffer_dirty+0x173/0x1d0
> [   30.461687]  update_statfs_inode+0x146/0x187
> [   30.462276]  gfs2_recover_func.cold+0x48f/0x864
> [   30.462875]  ? add_lock_to_list+0x8b/0xf0
> [   30.463337]  ? __lock_acquire+0xf7e/0x1e30
> [   30.463812]  ? lock_acquire+0xd4/0x300
> [   30.464267]  ? lock_acquire+0xe4/0x300
> [   30.464715]  ? gfs2_recover_func.cold+0x217/0x864
> [   30.465334]  process_one_work+0x239/0x550
> [   30.465920]  ? process_one_work+0x550/0x550
> [   30.466485]  worker_thread+0x4d/0x3a0
> [   30.466966]  ? process_one_work+0x550/0x550
> [   30.467509]  kthread+0xe2/0x110
> [   30.467941]  ? kthread_complete_and_exit+0x20/0x20
> [   30.468558]  ret_from_fork+0x22/0x30
> [   30.469047]  </TASK>
> [   30.469346] irq event stamp: 36146
> [   30.469796] hardirqs last  enabled at (36145): [<ffffffff8139185c>] folio_memcg_lock+0x8c/0x180
> [   30.470919] hardirqs last disabled at (36146): [<ffffffff82429799>] _raw_spin_lock_irqsave+0x59/0x60
> [   30.472024] softirqs last  enabled at (33630): [<ffffffff81157307>] __irq_exit_rcu+0xd7/0x130
> [   30.473051] softirqs last disabled at (33619): [<ffffffff81157307>] __irq_exit_rcu+0xd7/0x130
> [   30.474107] ---[ end trace 0000000000000000 ]---
> [   30.475367] ------------[ cut here ]------------
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gfs2 is unhappy on pagecache/for-next
  2022-06-19 12:15 ` Matthew Wilcox
@ 2022-06-20  6:21   ` Christoph Hellwig
  2022-06-20 17:20     ` Andreas Gruenbacher
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2022-06-20  6:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Bob Peterson, Andreas Gruenbacher,
	cluster-devel, linux-mm

On Sun, Jun 19, 2022 at 01:15:06PM +0100, Matthew Wilcox wrote:
> On Sun, Jun 19, 2022 at 09:05:59AM +0200, Christoph Hellwig wrote:
> > When trying to run xfstests on gfs2 (locally with the lock_nolock
> > cluster managed) the first mount already hits this warning in
> > inode_to_wb called from mark_buffer_dirty.  This all seems standard
> > code from folio_account_dirtied, so not sure what is going there.
> 
> I don't think this is new to pagecache/for-next.
> https://lore.kernel.org/linux-mm/cf8bc8dd-8e16-3590-a714-51203e6f4ba9@redhat.com/

Indeed, I can reproduce this on mainline as well.  I just didn't
expect a maintained file system to blow up on the very first mount
in xfstests..


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gfs2 is unhappy on pagecache/for-next
  2022-06-20  6:21   ` Christoph Hellwig
@ 2022-06-20 17:20     ` Andreas Gruenbacher
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2022-06-20 17:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Matthew Wilcox, Bob Peterson, cluster-devel, Linux-MM

On Mon, Jun 20, 2022 at 8:21 AM Christoph Hellwig <hch@lst.de> wrote:
> On Sun, Jun 19, 2022 at 01:15:06PM +0100, Matthew Wilcox wrote:
> > On Sun, Jun 19, 2022 at 09:05:59AM +0200, Christoph Hellwig wrote:
> > > When trying to run xfstests on gfs2 (locally with the lock_nolock
> > > cluster managed) the first mount already hits this warning in
> > > inode_to_wb called from mark_buffer_dirty.  This all seems standard
> > > code from folio_account_dirtied, so not sure what is going there.
> >
> > I don't think this is new to pagecache/for-next.
> > https://lore.kernel.org/linux-mm/cf8bc8dd-8e16-3590-a714-51203e6f4ba9@redhat.com/
>
> Indeed, I can reproduce this on mainline as well.  I just didn't
> expect a maintained file system to blow up on the very first mount
> in xfstests..

Yes, I'm aware of this. For all I know, we've been having this issue
since Tejun added this warning in 2015 in commit aaa2cacf8184
("writeback: add lockdep annotation to inode_to_wb()"), and I don't
know what to do about it. The only way of building a working version
of gfs2 currently is without CONFIG_LOCKDEP, or by removing that
warning.

My best guess is that it has to do with how gfs2 uses address spaces:
we have two address spaces attached to each inode: one for the inode's
data, and one for the inode's metadata. The "normal" data address
space works as it does on other filesystems. The metadata address
space is used to flush and purge ("truncate") an inode's metadata from
memory so that we can allow other cluster nodes to modify that inode.
The metadata can be spread out over the whole disk, but we want to
flush it in some sensible order; the address space allows that.

We've switched to that setup in commit 009d851837ab ("GFS2: Metadata
address space clean up") in 2009. Back then, each resource group also
had its own address space, but that was merged into a single address
space in commit 70d4ee94b370 (sd_aspace, "GFS2: Use only a single
address space for rgrps"). But then last year, Jan Kara basically said
that this has never worked and was never going to work [1]. More
recently, Willy pointed us at a similar looking fix in nilfs [2]. If I
understand that fix correctly, it would put us back into the state
before commit 009d851837ab ("GFS2: Metadata address space clean up"),
wasting an entire struct inode for each gfs2 inode for basically
nothing. Or maybe I'm just misunderstanding this whole crap.

Thanks,
Andreas

[1] Jan Kara on July 28, 2021:
https://listman.redhat.com/archives/cluster-devel/2021-July/021300.html

[2] Matthew Willcox on May 22, 2022:
https://lore.kernel.org/lkml/YorDHW5UmHuTq+2c@casper.infradead.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-20 17:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-19  7:05 gfs2 is unhappy on pagecache/for-next Christoph Hellwig
2022-06-19 12:15 ` Matthew Wilcox
2022-06-20  6:21   ` Christoph Hellwig
2022-06-20 17:20     ` Andreas Gruenbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox