Re: Kernel oops with 6.14 when enabling TLS

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Hannes Reinecke <hare@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Sagi Grimberg <sagi@grimberg.me>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: Kernel oops with 6.14 when enabling TLS
Date: Mon, 3 Mar 2025 15:48:48 +0000	[thread overview]
Message-ID: <Z8XPYNw4BSAWPAWT@casper.infradead.org> (raw)
In-Reply-To: <15be2446-f096-45b9-aaf3-b371a694049d@suse.com>

On Mon, Mar 03, 2025 at 04:39:47PM +0100, Hannes Reinecke wrote:
> On 3/3/25 15:42, Matthew Wilcox wrote:
> > On Mon, Mar 03, 2025 at 02:27:06PM +0000, Matthew Wilcox wrote:
> > > We have a _lot_ of page types available.  We should mark large kmallocs
> > > as such.  I'll send a patch to do that.
> > 
> > Can you try this?  It should fix the crash, at least.  Not sure why the
> > frozen patch triggered it.
> 
> Still crashes:

It warns, but doesn't crash!  This is an improvement.

> [   63.658068] WARNING: CPU: 6 PID: 5216 at mm/slub.c:4720
> free_large_kmalloc+0x89/0xa0
> [   63.667728] RIP: 0010:free_large_kmalloc+0x89/0xa0
> [   63.842773] Call Trace:
> [   63.934398]  kfree+0x2a5/0x340
> [   63.987632]  nvmf_connect_admin_queue+0x105/0x1a0 [nvme_fabrics
> 18bfa9223bf0bd1ec571f5f45774adcc919a867e]
> [   63.987641]  nvme_tcp_start_queue+0x192/0x310 [nvme_tcp
> a0629454ac5200d03b72a09e4d2b1e27dfa113e9]
> [   63.987649]  nvme_tcp_setup_ctrl+0xf8/0x700 [nvme_tcp
> a0629454ac5200d03b72a09e4d2b1e27dfa113e9]
> [   64.043323]  nvme_tcp_create_ctrl+0x2e3/0x4d0 [nvme_tcp
> a0629454ac5200d03b72a09e4d2b1e27dfa113e9]
> [   64.043332]  nvmf_dev_write+0x323/0x3d0 [nvme_fabrics
> 18bfa9223bf0bd1ec571f5f45774adcc919a867e]
> [   64.043344]  vfs_write+0xd9/0x430
> [   64.108458] ---[ end trace 0000000000000000 ]---
> [   64.108461] page: refcount:0 mapcount:0 mapping:0000000000000000
> index:0x2 pfn:0x5e3a
> [   64.108465] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> [   64.108469] raw: 000fffffc0000000 0000000000000000 fffb0b48c0178e90
> 0000000000000000
> [   64.108472] raw: 0000000000000002 0000000000000000 00000000ffffffff
> 0000000000000000
> [   64.108473] page dumped because: Not a kmalloc allocation

Right.  So you called kfree() on something that isn't currently
kmalloced memory.  Either it used to be kmalloced memory and we freed
the slab that it used to be in, or it's a wild pointer.  Whichever
it is, that's a bug in the caller, not in slab.

Why it bisected to that commit, I can't say.  Maybe it changed the
timing, or maybe it was just luck (whether the allocation which is now
being freed is the last allocation in the slab or not).

> [   66.084156] page: refcount:0 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0x5de5
> [   66.093770] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> [   66.101810] raw: 000fffffc0000000 0000000000000000 dead000000000122
> 0000000000000000
> [   66.111311] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [   66.111314] page dumped because: Not a kmalloc allocation
> [   66.112001] page: refcount:0 mapcount:0 mapping:0000000000000000
> index:0xdc pfn:0x5de3
> [   66.137452] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> [   66.137460] raw: 000fffffc0000000 ff45d9a24d93f420 ff45d9a24d93f420
> 0000000000000000
> [   66.137464] raw: 00000000000000dc 0000000000000000 00000000ffffffff
> 0000000000000000

It happened again ;-)

> [   66.137466] page dumped because: Not a kmalloc allocation
> [   66.138095] page: refcount:0 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0x5de5
> [   66.180944] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> [   66.180950] raw: 000fffffc0000000 ff45d9a24da3f420 ff45d9a24da3f420
> 0000000000000000
> [   66.180953] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [   66.180954] page dumped because: Not a kmalloc allocation

And again ...

> [   66.181672] BUG: unable to handle page fault for address:
> ff40e4ea8fa50250

Oh, now it crashed.  But we have so much evidence of a bug in the caller
at this point that I don't think we can blame slab for falling over.
If you're double-freeing something that's _not_ in a freed slab, this
is the kind of thing we might expect?

You need to turn on the debugging options Vlastimil mentioned and try to
figure out what nvme is doing wrong.

next prev parent reply	other threads:[~2025-03-03 15:48 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <08c29e4b-2f71-4b6d-8046-27e407214d8c@suse.com>
2025-03-03  7:48 ` Hannes Reinecke
2025-03-03 11:06   ` Hannes Reinecke
2025-03-03 12:57     ` Hannes Reinecke
2025-03-03 13:57     ` Matthew Wilcox
2025-03-03 14:05       ` Hannes Reinecke
2025-03-03 14:27   ` Matthew Wilcox
2025-03-03 14:42     ` Matthew Wilcox
2025-03-03 15:12       ` Vlastimil Babka
2025-03-03 15:39       ` Hannes Reinecke
2025-03-03 15:48         ` Matthew Wilcox [this message]
2025-03-03 16:15           ` Vlastimil Babka
2025-03-03 22:02             ` Vlastimil Babka
2025-03-04  7:58               ` Hannes Reinecke
2025-03-04  8:18                 ` Vlastimil Babka
2025-03-04 10:20                   ` Hannes Reinecke
2025-03-04 10:26                     ` Vlastimil Babka
2025-03-04 15:11                       ` Hannes Reinecke
2025-03-04 15:29                       ` Vlastimil Babka
2025-03-04 16:20                         ` Hannes Reinecke
2025-03-04 16:14                       ` Matthew Wilcox
2025-03-04 16:32                         ` Hannes Reinecke
2025-03-04 16:53                           ` Matthew Wilcox
2025-03-04 18:05                             ` Matthew Wilcox
2025-03-04 18:31                               ` Vlastimil Babka
2025-03-04 19:39                               ` Hannes Reinecke
2025-03-04 19:44                                 ` Vlastimil Babka
2025-03-05  7:14                                   ` Hannes Reinecke
2025-03-05  8:20                                   ` Hannes Reinecke
2025-03-05  8:58                                     ` Vlastimil Babka
2025-03-05 11:43                                       ` Hannes Reinecke
2025-03-05 18:11                                         ` Networking people smell funny and make poor life choices Matthew Wilcox
2025-03-06  0:46                                           ` Cong Wang
2025-03-12 15:09                                           ` Christoph Hellwig
2025-03-12 18:28                                             ` James R. Bergsten
2025-03-13  9:43                                           ` David Laight
2025-03-06  9:15                                         ` Kernel oops with 6.14 when enabling TLS Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8XPYNw4BSAWPAWT@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=hare@suse.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox