6.6.8 stable: crash in folio_mark

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* 6.6.8 stable: crash in folio_mark_dirty
@ 2023-12-30 15:23 Genes Lists
  2023-12-30 18:02 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Genes Lists @ 2023-12-30 15:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: MatthewWilcox(Oracle), Andrew Morton, linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]


Apologies in advance, but I cannot git bisect this since machine was
running for 10 days on 6.6.8 before this happened.

Reporting in case it's useful (and not a hardware fail).

There is nothing interesting in journal ahead of the crash - previous
entry, 2 minutes prior from user space dhcp server.

 - Root, efi is on nvme
 - Spare root,efi is on sdg
 - md raid6 on sda-sd with lvmcache from one partition on nvme drive.
 - all filesystems are ext4 (other than efi).
 - 32 GB mem.


regards

gene

details attached which show:

Dec 30 07:00:36 s6 kernel:  <TASK>
Dec 30 07:00:36 s6 kernel:  ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel:  ? __warn+0x81/0x130
Dec 30 07:00:36 s6 kernel:  ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel:  ? report_bug+0x171/0x1a0
Dec 30 07:00:36 s6 kernel:  ? handle_bug+0x3c/0x80
Dec 30 07:00:36 s6 kernel:  ? exc_invalid_op+0x17/0x70
Dec 30 07:00:36 s6 kernel:  ? asm_exc_invalid_op+0x1a/0x20
Dec 30 07:00:36 s6 kernel:  ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel:  block_dirty_folio+0x8a/0xb0
Dec 30 07:00:36 s6 kernel:  unmap_page_range+0xd17/0x1120
Dec 30 07:00:36 s6 kernel:  unmap_vmas+0xb5/0x190
Dec 30 07:00:36 s6 kernel:  exit_mmap+0xec/0x340
Dec 30 07:00:36 s6 kernel:  __mmput+0x3e/0x130
Dec 30 07:00:36 s6 kernel:  do_exit+0x31c/0xb20
Dec 30 07:00:36 s6 kernel:  do_group_exit+0x31/0x80
Dec 30 07:00:36 s6 kernel:  __x64_sys_exit_group+0x18/0x20
Dec 30 07:00:36 s6 kernel:  do_syscall_64+0x5d/0x90
Dec 30 07:00:36 s6 kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
Dec 30 07:00:36 s6 kernel:  ? handle_mm_fault+0xa2/0x360
Dec 30 07:00:36 s6 kernel:  ? do_user_addr_fault+0x30f/0x660
Dec 30 07:00:36 s6 kernel:  ? exc_page_fault+0x7f/0x180
Dec 30 07:00:36 s6 kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Dec 30 07:00:36 s6 kernel: RIP: 0033:0x7fb3c581ee2d
Dec 30 07:00:36 s6 kernel: Code: Unable to access opcode bytes at
0x7fb3c581ee03.
Dec 30 07:00:36 s6 kernel: RSP: 002b:00007fff620541e8 EFLAGS: 00000206
ORIG_RAX: 00000000000000e7
Dec 30 07:00:36 s6 kernel: RAX: ffffffffffffffda RBX: 00007fb3c591efa8
RCX: 00007fb3c581ee2d
Dec 30 07:00:36 s6 kernel: RDX: 00000000000000e7 RSI: ffffffffffffff88
RDI: 0000000000000000
Dec 30 07:00:36 s6 kernel: RBP: 0000000000000002 R08: 0000000000000000
R09: 00007fb3c5924920
Dec 30 07:00:36 s6 kernel: R10: 00005650f2e615f0 R11: 0000000000000206
R12: 0000000000000000
Dec 30 07:00:36 s6 kernel: R13: 0000000000000000 R14: 00007fb3c591d680
R15: 00007fb3c591efc0
Dec 30 07:00:36 s6 kernel:  </TASK>


[-- Attachment #2: s6-crash --]
[-- Type: text/plain, Size: 7893 bytes --]

Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?) 
Dec 30 07:00:36 s6 kernel: Modules linked in: algif_hash af_alg rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs nft_nat nft_chain_nat nf_nat nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rpcrdma rdma>
Dec 30 07:00:36 s6 kernel:  async_xor rapl joydev async_tx intel_cstate mei_me nls_iso8859_1 vfat i2c_i801 xor cec snd raid6_pq libcrc32c intel_uncore mxm_wmi pcspkr e1000e i2c_smbus intel_wmi_thunderbolt soundcore mei>
Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
Dec 30 07:00:36 s6 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Extreme4, BIOS P4.20 10/31/2019
Dec 30 07:00:36 s6 kernel: RIP: 0010:__folio_mark_dirty (??:?) 
Dec 30 07:00:36 s6 kernel: Code: 89 fe e8 57 22 14 00 65 ff 0d b8 ff f2 62 0f 84 8d 00 00 00 49 8b 3c 24 e9 47 fe ff ff 4c 89 ff e8 b9 18 08 00 48 89 c6 eb 85 <0f> 0b e9 27 fe ff ff 48 8b 52 10 e9 56 ff ff ff 48 c7 04 >
All code
========
   0:	89 fe                	mov    %edi,%esi
   2:	e8 57 22 14 00       	call   0x14225e
   7:	65 ff 0d b8 ff f2 62 	decl   %gs:0x62f2ffb8(%rip)        # 0x62f2ffc6
   e:	0f 84 8d 00 00 00    	je     0xa1
  14:	49 8b 3c 24          	mov    (%r12),%rdi
  18:	e9 47 fe ff ff       	jmp    0xfffffffffffffe64
  1d:	4c 89 ff             	mov    %r15,%rdi
  20:	e8 b9 18 08 00       	call   0x818de
  25:	48 89 c6             	mov    %rax,%rsi
  28:	eb 85                	jmp    0xffffffffffffffaf
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	e9 27 fe ff ff       	jmp    0xfffffffffffffe58
  31:	48 8b 52 10          	mov    0x10(%rdx),%rdx
  35:	e9 56 ff ff ff       	jmp    0xffffffffffffff90
  3a:	48                   	rex.W
  3b:	c7                   	.byte 0xc7
  3c:	04 00                	add    $0x0,%al

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	e9 27 fe ff ff       	jmp    0xfffffffffffffe2e
   7:	48 8b 52 10          	mov    0x10(%rdx),%rdx
   b:	e9 56 ff ff ff       	jmp    0xffffffffffffff66
  10:	48                   	rex.W
  11:	c7                   	.byte 0xc7
  12:	04 00                	add    $0x0,%al
Dec 30 07:00:36 s6 kernel: RSP: 0018:ffffc9000c037b00 EFLAGS: 00010046
Dec 30 07:00:36 s6 kernel: RAX: 02ffff6000008030 RBX: 0000000000000286 RCX: ffff8885d44dff08
Dec 30 07:00:36 s6 kernel: RDX: 0000000000000001 RSI: ffff88810d015ca8 RDI: ffff88810d015cb0
Dec 30 07:00:36 s6 kernel: RBP: ffff88810d015cb0 R08: ffff8885208c1300 R09: 0000000000000000
Dec 30 07:00:36 s6 kernel: R10: 0000000000000200 R11: 0000000000000002 R12: ffff88810d015ca8
Dec 30 07:00:36 s6 kernel: R13: 0000000000000001 R14: ffff88851ec72fc0 R15: ffffea00105c5e00
Dec 30 07:00:36 s6 kernel: FS:  0000000000000000(0000) GS:ffff88889ee00000(0000) knlGS:0000000000000000
Dec 30 07:00:36 s6 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 30 07:00:36 s6 kernel: CR2: 00007fb3c593b020 CR3: 0000000690e20003 CR4: 00000000003706f0
Dec 30 07:00:36 s6 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 30 07:00:36 s6 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Dec 30 07:00:36 s6 kernel: Call Trace:
Dec 30 07:00:36 s6 kernel:  <TASK>
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty (??:?) 
Dec 30 07:00:36 s6 kernel: ? __warn (??:?) 
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty (??:?) 
Dec 30 07:00:36 s6 kernel: ? report_bug (??:?) 
Dec 30 07:00:36 s6 kernel: ? handle_bug (??:?) 
Dec 30 07:00:36 s6 kernel: ? exc_invalid_op (??:?) 
Dec 30 07:00:36 s6 kernel: ? asm_exc_invalid_op (??:?) 
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty (??:?) 
Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?) 
Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?) 
Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?) 
Dec 30 07:00:36 s6 kernel: exit_mmap (??:?) 
Dec 30 07:00:36 s6 kernel: __mmput (??:?) 
Dec 30 07:00:36 s6 kernel: do_exit (??:?) 
Dec 30 07:00:36 s6 kernel: do_group_exit (??:?) 
Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?) 
Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?) 
Dec 30 07:00:36 s6 kernel: ? count_memcg_events.constprop.0 (??:?) 
Dec 30 07:00:36 s6 kernel: ? handle_mm_fault (??:?) 
Dec 30 07:00:36 s6 kernel: ? do_user_addr_fault (??:?) 
Dec 30 07:00:36 s6 kernel: ? exc_page_fault (??:?) 
Dec 30 07:00:36 s6 kernel: entry_SYSCALL_64_after_hwframe (??:?) 
Dec 30 07:00:36 s6 kernel: RIP: 0033:0x7fb3c581ee2d
Dec 30 07:00:36 s6 kernel: Code: Unable to access opcode bytes at 0x7fb3c581ee03.

Code starting with the faulting instruction
===========================================
Dec 30 07:00:36 s6 kernel: RSP: 002b:00007fff620541e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Dec 30 07:00:36 s6 kernel: RAX: ffffffffffffffda RBX: 00007fb3c591efa8 RCX: 00007fb3c581ee2d
Dec 30 07:00:36 s6 kernel: RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
Dec 30 07:00:36 s6 kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fb3c5924920
Dec 30 07:00:36 s6 kernel: R10: 00005650f2e615f0 R11: 0000000000000206 R12: 0000000000000000
Dec 30 07:00:36 s6 kernel: R13: 0000000000000000 R14: 00007fb3c591d680 R15: 00007fb3c591efc0
Dec 30 07:00:36 s6 kernel:  </TASK>
Dec 30 07:00:36 s6 kernel: ---[ end trace 0000000000000000 ]---
Dec 30 07:00:36 s6 kernel: BUG: Bad rss-counter state mm:000000008e24d57a type:MM_FILEPAGES val:-1
Dec 30 07:00:36 s6 kernel: BUG: Bad rss-counter state mm:000000008e24d57a type:MM_ANONPAGES val:1
Dec 30 07:02:23 s6 kernel: general protection fault, probably for non-canonical address 0x6d65532d66697975: 0000 [#1] PREEMPT SMP PTI
Dec 30 07:02:23 s6 kernel: CPU: 7 PID: 521578 Comm: rsync Tainted: G        W          6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
Dec 30 07:02:23 s6 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Extreme4, BIOS P4.20 10/31/2019
Dec 30 07:02:23 s6 kernel: RIP: 0010:__mod_memcg_lruvec_state (??:?) 
Dec 30 07:02:23 s6 kernel: Code: ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 8f 40 0b 00 00 48 63 c2 89 f6 48 c1 e6 03 <48> 8b 91 10 07 00 00 48 01 f2 65 48 01 02 48 03 b7 28 06 >
All code
========
   0:	ff 90 90 90 90 90    	call   *-0x6f6f6f70(%rax)
   6:	90                   	nop
   7:	90                   	nop
   8:	90                   	nop
   9:	90                   	nop
   a:	90                   	nop
   b:	90                   	nop
   c:	90                   	nop
   d:	90                   	nop
   e:	90                   	nop
   f:	90                   	nop
  10:	90                   	nop
  11:	66 0f 1f 00          	nopw   (%rax)
  15:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  1a:	48 8b 8f 40 0b 00 00 	mov    0xb40(%rdi),%rcx
  21:	48 63 c2             	movslq %edx,%rax
  24:	89 f6                	mov    %esi,%esi
  26:	48 c1 e6 03          	shl    $0x3,%rsi
  2a:*	48 8b 91 10 07 00 00 	mov    0x710(%rcx),%rdx		<-- trapping instruction
  31:	48 01 f2             	add    %rsi,%rdx
  34:	65 48 01 02          	add    %rax,%gs:(%rdx)
  38:	48                   	rex.W
  39:	03                   	.byte 0x3
  3a:	b7 28                	mov    $0x28,%bh
  3c:	06                   	(bad)
	...

Code starting with the faulting instruction
===========================================
   0:	48 8b 91 10 07 00 00 	mov    0x710(%rcx),%rdx
   7:	48 01 f2             	add    %rsi,%rdx
   a:	65 48 01 02          	add    %rax,%gs:(%rdx)
   e:	48                   	rex.W
   f:	03                   	.byte 0x3
  10:	b7 28                	mov    $0x28,%bh
  12:	06                   	(bad)
	...
Dec 30 07:02:23 s6 kernel: RSP: 0018:ffffc9000c12fb68 EFLAGS: 00010206


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-30 15:23 6.6.8 stable: crash in folio_mark_dirty Genes Lists
@ 2023-12-30 18:02 ` Matthew Wilcox
  2023-12-30 19:16   ` Genes Lists
  2023-12-31  1:28 ` Hillf Danton
  2023-12-31 20:59 ` Matthew Wilcox
  2 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2023-12-30 18:02 UTC (permalink / raw)
  To: Genes Lists; +Cc: linux-kernel, Andrew Morton, linux-fsdevel, linux-mm

On Sat, Dec 30, 2023 at 10:23:26AM -0500, Genes Lists wrote:
> Apologies in advance, but I cannot git bisect this since machine was
> running for 10 days on 6.6.8 before this happened.

Thanks for the report.  Apologies, I'm on holiday until the middle of
the week so this will be extremely terse.

>  - Root, efi is on nvme
>  - Spare root,efi is on sdg
>  - md raid6 on sda-sd with lvmcache from one partition on nvme drive.
>  - all filesystems are ext4 (other than efi).
>  - 32 GB mem.

> Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?) 

This is:

                WARN_ON_ONCE(warn && !folio_test_uptodate(folio));

> Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df

So rsync is exiting.  Do you happen to know what rsync is doing?

> Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?) 
> Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?) 
> Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?) 
> Dec 30 07:00:36 s6 kernel: exit_mmap (??:?) 
> Dec 30 07:00:36 s6 kernel: __mmput (??:?) 
> Dec 30 07:00:36 s6 kernel: do_exit (??:?) 
> Dec 30 07:00:36 s6 kernel: do_group_exit (??:?) 
> Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?) 
> Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?) 

It looks llike rsync has a page from the block device mmaped?  I'll have
to investigate this properly when I'm back.  If you haven't heard from
me in a week, please ping me.

(I don't think I caused this, but I think I stand a fighting chance of
tracking down what the problem is, just not right now).


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-30 18:02 ` Matthew Wilcox
@ 2023-12-30 19:16   ` Genes Lists
  0 siblings, 0 replies; 14+ messages in thread
From: Genes Lists @ 2023-12-30 19:16 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, Andrew Morton, linux-fsdevel, linux-mm

On Sat, 2023-12-30 at 18:02 +0000, Matthew Wilcox wrote:
> 
> Thanks for the report.  Apologies, I'm on holiday until the middle of
> the week so this will be extremely terse.
> 

Enjoy 🙂

> > 
> Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted
> So rsync is exiting.  Do you happen to know what rsync is doing?
> .

There are 2 rsyncs I can think of:

 (a) rsync from another server (s8) pushing files over the local
network to this machine (s6). rsync writes to the raid drives on s6.

s8 says the rsync completed successfully at 3:04 am (about 4 hours
prior to this error at 7.00 am). 

 (b) There is also a script running inotify which uses rsync to keep
the spare root drive sync'ed. System had update at 5:48 am of a few
packages, and that would have caused an rsync from root on nvme to
sapre on sdg. Most likely this is this one that triggered around 7 am.

  This one runs: 

    /usr/bin/rsync --open-noatime --no-specials --delete --atimes -
axHAX --times  <src> <dst>

> t looks llike rsync has a page from the block device mmaped?  I'll
> have
> to investigate this properly when I'm back.  If you haven't heard
> from
> me in a week, please ping me.

Thank you.

> 
> (I don't think I caused this, but I think I stand a fighting chance
> of
> tracking down what the problem is, just not right now).

This may or may not be related, but this same machine crashed during an
rsync same as (a) above (i.e. s8 pushing files to the raid6 disks on
s6) about 3 weeks ago - then was on 6.6.4 kernel. In that case the
error was in md code.

https://lore.kernel.org/lkml/e2d47b6c-3420-4785-8e04-e5f217d09a46@leemhuis.info/T/

Thank you again,

gene

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-30 15:23 6.6.8 stable: crash in folio_mark_dirty Genes Lists
  2023-12-30 18:02 ` Matthew Wilcox
@ 2023-12-31  1:28 ` Hillf Danton
  2023-12-31 13:07   ` Matthew Wilcox
  2023-12-31 20:59 ` Matthew Wilcox
  2 siblings, 1 reply; 14+ messages in thread
From: Hillf Danton @ 2023-12-31  1:28 UTC (permalink / raw)
  To: Genes Lists; +Cc: Matthew Wilcox, linux-kernel, linux-fsdevel, linux-mm

On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <lists@sapience.com>
> Apologies in advance, but I cannot git bisect this since machine was
> running for 10 days on 6.6.8 before this happened.
>
> Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?) 
> Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?) 
> Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?) 
> Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?) 
> Dec 30 07:00:36 s6 kernel: exit_mmap (??:?) 
> Dec 30 07:00:36 s6 kernel: __mmput (??:?) 
> Dec 30 07:00:36 s6 kernel: do_exit (??:?) 
> Dec 30 07:00:36 s6 kernel: do_group_exit (??:?) 
> Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?) 
> Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?) 

See what comes out if race is handled.
Only for thoughts.

--- x/mm/page-writeback.c
+++ y/mm/page-writeback.c
@@ -2661,12 +2661,19 @@ void __folio_mark_dirty(struct folio *fo
 {
 	unsigned long flags;
 
+again:
 	xa_lock_irqsave(&mapping->i_pages, flags);
-	if (folio->mapping) {	/* Race with truncate? */
+	if (folio->mapping && mapping == folio->mapping) {
 		WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
 		folio_account_dirtied(folio, mapping);
 		__xa_set_mark(&mapping->i_pages, folio_index(folio),
 				PAGECACHE_TAG_DIRTY);
+	} else if (folio->mapping) { /* Race with truncate? */
+		struct address_space *tmp = folio->mapping;
+
+		xa_unlock_irqrestore(&mapping->i_pages, flags);
+		mapping = tmp;
+		goto again;
 	}
 	xa_unlock_irqrestore(&mapping->i_pages, flags);
 }
--


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-31  1:28 ` Hillf Danton
@ 2023-12-31 13:07   ` Matthew Wilcox
  2024-01-01  1:55     ` Hillf Danton
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2023-12-31 13:07 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Genes Lists, linux-kernel, linux-fsdevel, linux-mm

On Sun, Dec 31, 2023 at 09:28:46AM +0800, Hillf Danton wrote:
> On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <lists@sapience.com>
> > Apologies in advance, but I cannot git bisect this since machine was
> > running for 10 days on 6.6.8 before this happened.
> >
> > Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> > Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?) 
> > Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> > Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?) 
> > Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?) 
> > Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?) 
> > Dec 30 07:00:36 s6 kernel: exit_mmap (??:?) 
> > Dec 30 07:00:36 s6 kernel: __mmput (??:?) 
> > Dec 30 07:00:36 s6 kernel: do_exit (??:?) 
> > Dec 30 07:00:36 s6 kernel: do_group_exit (??:?) 
> > Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?) 
> > Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?) 
> 
> See what comes out if race is handled.
> Only for thoughts.

I don't think this can happen.  Look at the call trace;
block_dirty_folio() is called from unmap_page_range().  That means the
page is in the page tables.  We unmap the pages in a folio from the
page tables before we set folio->mapping to NULL.  Look at
invalidate_inode_pages2_range() for example:

                                unmap_mapping_pages(mapping, indices[i],
                                                (1 + end - indices[i]), false);
                        folio_lock(folio);
                        folio_wait_writeback(folio);
                        if (folio_mapped(folio))
                                unmap_mapping_folio(folio);
                        BUG_ON(folio_mapped(folio));
                                if (!invalidate_complete_folio2(mapping, folio))

... and invalidate_complete_folio2() is where we set ->mapping to NULL
in __filemap_remove_folio -> page_cache_delete().



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-30 15:23 6.6.8 stable: crash in folio_mark_dirty Genes Lists
  2023-12-30 18:02 ` Matthew Wilcox
  2023-12-31  1:28 ` Hillf Danton
@ 2023-12-31 20:59 ` Matthew Wilcox
  2023-12-31 21:12   ` Genes Lists
  2023-12-31 21:15   ` Genes Lists
  2 siblings, 2 replies; 14+ messages in thread
From: Matthew Wilcox @ 2023-12-31 20:59 UTC (permalink / raw)
  To: Genes Lists; +Cc: linux-kernel, Andrew Morton, linux-fsdevel, linux-mm

On Sat, Dec 30, 2023 at 10:23:26AM -0500, Genes Lists wrote:
> Apologies in advance, but I cannot git bisect this since machine was
> running for 10 days on 6.6.8 before this happened.

This problem simply doesn't make sense.  There's just no way we shoud be
able to get a not-uptodate folio into the page tables.  We do have one
pending patch which fixes a situation in which we can get some very
odd-looking situations due to reusing a page which has been freed.
I appreciate your ability to reproduce this is likely nil, but if you
could add

https://lore.kernel.org/all/20231220214715.912B4C433CA@smtp.kernel.org/

to your kernel, it might make things more stable for you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-31 20:59 ` Matthew Wilcox
@ 2023-12-31 21:12   ` Genes Lists
  2023-12-31 21:15   ` Genes Lists
  1 sibling, 0 replies; 14+ messages in thread
From: Genes Lists @ 2023-12-31 21:12 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, Andrew Morton, linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

On Sun, 2023-12-31 at 20:59 +0000, Matthew Wilcox wrote:
> On Sat, Dec 30, 2023 at 10:23:26AM -0500, Genes Lists wrote:
> > Apologies in advance, but I cannot git bisect this since machine
> > was
> > running for 10 days on 6.6.8 before this happened.
> 
> This problem simply doesn't make sense.  There's just no way we shoud
> be
> able to get a not-uptodate folio into the page tables.  We do have
> one
> pending patch which fixes a situation in which we can get some very
> odd-looking situations due to reusing a page which has been freed.
> I appreciate your ability to reproduce this is likely nil, but if you
> could add
> 
> https://lore.kernel.org/all/20231220214715.912B4C433CA@smtp.kernel.org/
> 
> to your kernel, it might make things more stable for you.
> 

Ok looks like that's in mainline - machine is now running 6.7.0-rc7 -
unless you prefer I patch 6.6.8 with above and change to that.

thanks and sorry about bringing up wacky problem.

gene




[-- Attachment #2: Type: text/html, Size: 1978 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-31 20:59 ` Matthew Wilcox
  2023-12-31 21:12   ` Genes Lists
@ 2023-12-31 21:15   ` Genes Lists
  1 sibling, 0 replies; 14+ messages in thread
From: Genes Lists @ 2023-12-31 21:15 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, Andrew Morton, linux-fsdevel, linux-mm

On Sun, 2023-12-31 at 20:59 +0000, Matthew Wilcox wrote:
> On Sat, Dec 30, 2023 at 10:23:26AM -0500, Genes Lists wrote:
> > Apologies in advance, but I cannot git bisect this since machine
> > was
> > running for 10 days on 6.6.8 before this happened.
> 
> This problem simply doesn't make sense.  There's just no way we shoud
> be
> able to get a not-uptodate folio into the page tables.  We do have
> one
> pending patch which fixes a situation in which we can get some very
> odd-looking situations due to reusing a page which has been freed.
> I appreciate your ability to reproduce this is likely nil, but if you
> could add
> 
> https://lore.kernel.org/all/20231220214715.912B4C433CA@smtp.kernel.or
> g/
> 
> to your kernel, it might make things more stable for you.
> 

Ok looks like that's in mainline - machine is now running 6.7.0-rc7 -
unless you prefer I patch 6.6.8 with above and change to that.

thanks and sorry about bringing up wacky problem.

gene


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2023-12-31 13:07   ` Matthew Wilcox
@ 2024-01-01  1:55     ` Hillf Danton
  2024-01-01  9:07       ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Hillf Danton @ 2024-01-01  1:55 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Genes Lists, linux-kernel, linux-fsdevel, linux-mm

On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <willy@infradead.org>
> On Sun, Dec 31, 2023 at 09:28:46AM +0800, Hillf Danton wrote:
> > On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <lists@sapience.com>
> > > Apologies in advance, but I cannot git bisect this since machine was
> > > running for 10 days on 6.6.8 before this happened.
> > >
> > > Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> > > Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?) 
> > > Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> > > Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?) 
> > > Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?) 
> > > Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?) 
> > > Dec 30 07:00:36 s6 kernel: exit_mmap (??:?) 
> > > Dec 30 07:00:36 s6 kernel: __mmput (??:?) 
> > > Dec 30 07:00:36 s6 kernel: do_exit (??:?) 
> > > Dec 30 07:00:36 s6 kernel: do_group_exit (??:?) 
> > > Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?) 
> > > Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?) 
> > 
> > See what comes out if race is handled.
> > Only for thoughts.
> 
> I don't think this can happen.  Look at the call trace;
> block_dirty_folio() is called from unmap_page_range().  That means the
> page is in the page tables.  We unmap the pages in a folio from the
> page tables before we set folio->mapping to NULL.  Look at
> invalidate_inode_pages2_range() for example:
> 
>                                 unmap_mapping_pages(mapping, indices[i],
>                                                 (1 + end - indices[i]), false);
>                         folio_lock(folio);
>                         folio_wait_writeback(folio);
>                         if (folio_mapped(folio))
>                                 unmap_mapping_folio(folio);
>                         BUG_ON(folio_mapped(folio));
>                                 if (!invalidate_complete_folio2(mapping, folio))
> 
What is missed here is the same check [1] in invalidate_inode_pages2_range(),
so I built no wheel.

			folio_lock(folio);
			if (unlikely(folio->mapping != mapping)) {
				folio_unlock(folio);
				continue;
			}

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2024-01-01  1:55     ` Hillf Danton
@ 2024-01-01  9:07       ` Matthew Wilcox
  2024-01-01 11:33         ` Hillf Danton
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2024-01-01  9:07 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Genes Lists, linux-kernel, linux-fsdevel, linux-mm

On Mon, Jan 01, 2024 at 09:55:04AM +0800, Hillf Danton wrote:
> On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <willy@infradead.org>
> > On Sun, Dec 31, 2023 at 09:28:46AM +0800, Hillf Danton wrote:
> > > On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <lists@sapience.com>
> > > > Apologies in advance, but I cannot git bisect this since machine was
> > > > running for 10 days on 6.6.8 before this happened.
> > > >
> > > > Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> > > > Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> > > > Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: exit_mmap (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: __mmput (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: do_exit (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: do_group_exit (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?) 
> > > > Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?) 
> > > 
> > > See what comes out if race is handled.
> > > Only for thoughts.
> > 
> > I don't think this can happen.  Look at the call trace;
> > block_dirty_folio() is called from unmap_page_range().  That means the
> > page is in the page tables.  We unmap the pages in a folio from the
> > page tables before we set folio->mapping to NULL.  Look at
> > invalidate_inode_pages2_range() for example:
> > 
> >                                 unmap_mapping_pages(mapping, indices[i],
> >                                                 (1 + end - indices[i]), false);
> >                         folio_lock(folio);
> >                         folio_wait_writeback(folio);
> >                         if (folio_mapped(folio))
> >                                 unmap_mapping_folio(folio);
> >                         BUG_ON(folio_mapped(folio));
> >                                 if (!invalidate_complete_folio2(mapping, folio))
> > 
> What is missed here is the same check [1] in invalidate_inode_pages2_range(),
> so I built no wheel.
> 
> 			folio_lock(folio);
> 			if (unlikely(folio->mapping != mapping)) {
> 				folio_unlock(folio);
> 				continue;
> 			}
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658

That's entirely different.  That's checking in the truncate path whether
somebody else already truncated this page.  What I was showing was why
a page found through a page table walk cannot have been truncated (which
is actually quite interesting, because it's the page table lock that
prevents the race).


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2024-01-01  9:07       ` Matthew Wilcox
@ 2024-01-01 11:33         ` Hillf Danton
  2024-01-01 14:11           ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Hillf Danton @ 2024-01-01 11:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Genes Lists, linux-kernel, linux-fsdevel, linux-mm

On Mon, 1 Jan 2024 09:07:52 +0000 Matthew Wilcox
> On Mon, Jan 01, 2024 at 09:55:04AM +0800, Hillf Danton wrote:
> > On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <willy@infradead.org>
> > > I don't think this can happen.  Look at the call trace;
> > > block_dirty_folio() is called from unmap_page_range().  That means the
> > > page is in the page tables.  We unmap the pages in a folio from the
> > > page tables before we set folio->mapping to NULL.  Look at
> > > invalidate_inode_pages2_range() for example:
> > > 
> > >                                 unmap_mapping_pages(mapping, indices[i],
> > >                                                 (1 + end - indices[i]), false);
> > >                         folio_lock(folio);
> > >                         folio_wait_writeback(folio);
> > >                         if (folio_mapped(folio))
> > >                                 unmap_mapping_folio(folio);
> > >                         BUG_ON(folio_mapped(folio));
> > >                                 if (!invalidate_complete_folio2(mapping, folio))
> > > 
> > What is missed here is the same check [1] in invalidate_inode_pages2_range(),
> > so I built no wheel.
> > 
> > 			folio_lock(folio);
> > 			if (unlikely(folio->mapping != mapping)) {
> > 				folio_unlock(folio);
> > 				continue;
> > 			}
> > 
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658
> 
> That's entirely different.  That's checking in the truncate path whether
> somebody else already truncated this page.  What I was showing was why
> a page found through a page table walk cannot have been truncated (which
> is actually quite interesting, because it's the page table lock that
> prevents the race).
> 
Feel free to shed light on how ptl protects folio->mapping.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2024-01-01 11:33         ` Hillf Danton
@ 2024-01-01 14:11           ` Matthew Wilcox
  2024-01-03 10:49             ` Hillf Danton
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2024-01-01 14:11 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Genes Lists, linux-kernel, linux-fsdevel, linux-mm

On Mon, Jan 01, 2024 at 07:33:16PM +0800, Hillf Danton wrote:
> On Mon, 1 Jan 2024 09:07:52 +0000 Matthew Wilcox
> > On Mon, Jan 01, 2024 at 09:55:04AM +0800, Hillf Danton wrote:
> > > On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <willy@infradead.org>
> > > > I don't think this can happen.  Look at the call trace;
> > > > block_dirty_folio() is called from unmap_page_range().  That means the
> > > > page is in the page tables.  We unmap the pages in a folio from the
> > > > page tables before we set folio->mapping to NULL.  Look at
> > > > invalidate_inode_pages2_range() for example:
> > > > 
> > > >                                 unmap_mapping_pages(mapping, indices[i],
> > > >                                                 (1 + end - indices[i]), false);
> > > >                         folio_lock(folio);
> > > >                         folio_wait_writeback(folio);
> > > >                         if (folio_mapped(folio))
> > > >                                 unmap_mapping_folio(folio);
> > > >                         BUG_ON(folio_mapped(folio));
> > > >                                 if (!invalidate_complete_folio2(mapping, folio))
> > > > 
> > > What is missed here is the same check [1] in invalidate_inode_pages2_range(),
> > > so I built no wheel.
> > > 
> > > 			folio_lock(folio);
> > > 			if (unlikely(folio->mapping != mapping)) {
> > > 				folio_unlock(folio);
> > > 				continue;
> > > 			}
> > > 
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658
> > 
> > That's entirely different.  That's checking in the truncate path whether
> > somebody else already truncated this page.  What I was showing was why
> > a page found through a page table walk cannot have been truncated (which
> > is actually quite interesting, because it's the page table lock that
> > prevents the race).
> > 
> Feel free to shed light on how ptl protects folio->mapping.

The documentation for __folio_mark_dirty() hints at it:

 * The caller must hold folio_memcg_lock().  Most callers have the folio
 * locked.  A few have the folio blocked from truncation through other
 * means (eg zap_vma_pages() has it mapped and is holding the page table
 * lock).  This can also be called from mark_buffer_dirty(), which I
 * cannot prove is always protected against truncate.

Re-reading that now, I _think_ mark_buffer_dirty() always has to be
called with a reference on the bufferhead, which means that a racing
truncate will fail due to

invalidate_inode_pages2_range -> invalidate_complete_folio2 -> 
filemap_release_folio -> try_to_free_buffers -> drop_buffers -> buffer_busy


From an mm point of view, what is implicit is that truncate calls
unmap_mapping_folio -> unmap_mapping_range_tree ->
unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma ->
unmap_page_range -> zap_p4d_range -> zap_pud_range -> zap_pmd_range ->
zap_pte_range -> pte_offset_map_lock()

So a truncate will take the page lock, then spin on the pte lock
until the racing munmap() has finished (ok, this was an exit(), not
a munmap(), but exit() does an implicit munmap()).


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2024-01-01 14:11           ` Matthew Wilcox
@ 2024-01-03 10:49             ` Hillf Danton
  2024-01-03 17:53               ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Hillf Danton @ 2024-01-03 10:49 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Genes Lists, linux-kernel, linux-fsdevel, linux-mm

On Mon, 1 Jan 2024 14:11:02 +0000 Matthew Wilcox
> 
> From an mm point of view, what is implicit is that truncate calls
> unmap_mapping_folio -> unmap_mapping_range_tree ->
> unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma ->
> unmap_page_range -> zap_p4d_range -> zap_pud_range -> zap_pmd_range ->
> zap_pte_range -> pte_offset_map_lock()
> 
> So a truncate will take the page lock, then spin on the pte lock
> until the racing munmap() has finished (ok, this was an exit(), not
> a munmap(), but exit() does an implicit munmap()).
> 
But ptl fails to explain the warning reported, while the sequence in
__block_commit_write()

	mark_buffer_dirty();
	folio_mark_uptodate();

hints the warning is bogus.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.6.8 stable: crash in folio_mark_dirty
  2024-01-03 10:49             ` Hillf Danton
@ 2024-01-03 17:53               ` Matthew Wilcox
  0 siblings, 0 replies; 14+ messages in thread
From: Matthew Wilcox @ 2024-01-03 17:53 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Genes Lists, linux-kernel, linux-fsdevel, linux-mm

On Wed, Jan 03, 2024 at 06:49:07PM +0800, Hillf Danton wrote:
> On Mon, 1 Jan 2024 14:11:02 +0000 Matthew Wilcox
> > 
> > From an mm point of view, what is implicit is that truncate calls
> > unmap_mapping_folio -> unmap_mapping_range_tree ->
> > unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma ->
> > unmap_page_range -> zap_p4d_range -> zap_pud_range -> zap_pmd_range ->
> > zap_pte_range -> pte_offset_map_lock()
> > 
> > So a truncate will take the page lock, then spin on the pte lock
> > until the racing munmap() has finished (ok, this was an exit(), not
> > a munmap(), but exit() does an implicit munmap()).
> > 
> But ptl fails to explain the warning reported, while the sequence in
> __block_commit_write()
> 
> 	mark_buffer_dirty();
> 	folio_mark_uptodate();
> 
> hints the warning is bogus.

The folio is locked when filesystems call __block_commit_write().

Nothing explains the reported warning, IMO.  Other than data corruption,
and I'm not sure that we've found the last data corrupter.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-01-03 17:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-30 15:23 6.6.8 stable: crash in folio_mark_dirty Genes Lists
2023-12-30 18:02 ` Matthew Wilcox
2023-12-30 19:16   ` Genes Lists
2023-12-31  1:28 ` Hillf Danton
2023-12-31 13:07   ` Matthew Wilcox
2024-01-01  1:55     ` Hillf Danton
2024-01-01  9:07       ` Matthew Wilcox
2024-01-01 11:33         ` Hillf Danton
2024-01-01 14:11           ` Matthew Wilcox
2024-01-03 10:49             ` Hillf Danton
2024-01-03 17:53               ` Matthew Wilcox
2023-12-31 20:59 ` Matthew Wilcox
2023-12-31 21:12   ` Genes Lists
2023-12-31 21:15   ` Genes Lists

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox