Re: Kernel oops with 6.14 when enabling TLS

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hannes Reinecke <hare@suse.de>
To: Vlastimil Babka <vbabka@suse.cz>, Hannes Reinecke <hare@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	Boris Pismenny <borisp@nvidia.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	linux-mm@kvack.org, Harry Yoo <harry.yoo@oracle.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: Kernel oops with 6.14 when enabling TLS
Date: Tue, 4 Mar 2025 17:20:47 +0100	[thread overview]
Message-ID: <3aa8a453-2cfe-4b54-90ac-e9596c967c8c@suse.de> (raw)
In-Reply-To: <3a1b72be-6e2a-4b74-91f5-d51d230d22b5@suse.cz>

On 3/4/25 16:29, Vlastimil Babka wrote:
> On 3/4/25 11:26, Vlastimil Babka wrote:
>> +Cc NETWORKING [TLS] maintainers and netdev for input, thanks.
>>
>> The full error is here:
>> https://lore.kernel.org/all/fcfa11c6-2738-4a2e-baa8-09fa8f79cbf3@suse.de/
>>
>> On 3/4/25 11:20, Hannes Reinecke wrote:
>>> On 3/4/25 09:18, Vlastimil Babka wrote:
>>>> On 3/4/25 08:58, Hannes Reinecke wrote:
>>>>> On 3/3/25 23:02, Vlastimil Babka wrote:
>>>>>> On 3/3/25 17:15, Vlastimil Babka wrote:
>>>>>>> On 3/3/25 16:48, Matthew Wilcox wrote:
>>>>>>>> You need to turn on the debugging options Vlastimil mentioned and try to
>>>>>>>> figure out what nvme is doing wrong.
>>>>>>>
>>>>>>> Agree, looks like some error path going wrong?
>>>>>>> Since there seems to be actual non-large kmalloc usage involved, another
>>>>>>> debug parameter that could help: CONFIG_SLUB_DEBUG=y, and boot with
>>>>>>> "slab_debug=FZPU,kmalloc-*"
>>>>>>
>>>>>> Also make sure you have CONFIG_DEBUG_VM please.
>>>>>>
>>>>> Here you go:
>>>>>
>>>>> [  134.506802] page: refcount:0 mapcount:0 mapping:0000000000000000
>>>>> index:0x0 pfn:0x101ef8
>>>>> [  134.509253] head: order:3 mapcount:0 entire_mapcount:0
>>>>> nr_pages_mapped:0 pincount:0
>>>>> [  134.511594] flags:
>>>>> 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
>>>>> [  134.513556] page_type: f5(slab)
>>>>> [  134.513563] raw: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810
>>>>> ffff8881000402f0
>>>>> [  134.513568] raw: 0000000000000000 00000000000a000a 00000000f5000000
>>>>> 0000000000000000
>>>>> [  134.513572] head: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810
>>>>> ffff8881000402f0
>>>>> [  134.513575] head: 0000000000000000 00000000000a000a 00000000f5000000
>>>>> 0000000000000000
>>>>> [  134.513579] head: 0017ffffc0000003 ffffea000407be01 ffffffffffffffff
>>>>> 0000000000000000
>>>>> [  134.513583] head: 0000000000000008 0000000000000000 00000000ffffffff
>>>>> 0000000000000000
>>>>> [  134.513585] page dumped because: VM_BUG_ON_FOLIO(((unsigned int)
>>>>> folio_ref_count(folio) + 127u <= 127u))
>>>>> [  134.513615] ------------[ cut here ]------------
>>>>> [  134.529822] kernel BUG at ./include/linux/mm.h:1455!
>>>>
>>>> Yeah, just as I suspected, folio_get() says the refcount is 0.
>>>>
>>>>> [  134.529835] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
>>>>> DEBUG_PAGEALLOC NOPTI
>>>>> [  134.529843] CPU: 0 UID: 0 PID: 274 Comm: kworker/0:1H Kdump: loaded
>>>>> Tainted: G            E      6.14.0-rc4-default+ #309
>>>>> 03b131f1ef70944969b40df9d90a283ed638556f
>>>>> [  134.536577] Tainted: [E]=UNSIGNED_MODULE
>>>>> [  134.536580] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>>>>> 0.0.0 02/06/2015
>>>>> [  134.536583] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
>>>>> [  134.536595] RIP: 0010:__iov_iter_get_pages_alloc+0x676/0x710
>>>>> [  134.542810] Code: e8 4c 39 e0 49 0f 47 c4 48 01 45 08 48 29 45 18 e9
>>>>> 90 fa ff ff 48 83 ef 01 e9 7f fe ff ff 48 c7 c6 40 57 4f 82 e8 6a e2 ce
>>>>> ff <0f> 0b e8 43 b8 b1 ff eb c5 f7 c1 ff 0f 00 00 48 89 cf 0f 85 4f ff
>>>>> [  134.542816] RSP: 0018:ffffc900004579d8 EFLAGS: 00010282
>>>>> [  134.542821] RAX: 000000000000005c RBX: ffffc90000457a90 RCX:
>>>>> 0000000000000027
>>>>> [  134.542825] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
>>>>> ffff88817f423748
>>>>> [  134.542828] RBP: ffffc90000457d60 R08: 0000000000000000 R09:
>>>>> 0000000000000001
>>>>> [  134.554485] R10: ffffc900004579c0 R11: ffffc90000457720 R12:
>>>>> 0000000000000000
>>>>> [  134.554488] R13: ffffea000407be40 R14: ffffc90000457a70 R15:
>>>>> ffffc90000457d60
>>>>> [  134.554495] FS:  0000000000000000(0000) GS:ffff88817f400000(0000)
>>>>> knlGS:0000000000000000
>>>>> [  134.554499] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [  134.554502] CR2: 0000556b0675b600 CR3: 0000000106bd8000 CR4:
>>>>> 0000000000350ef0
>>>>> [  134.554509] Call Trace:
>>>>> [  134.554512]  <TASK>
>>>>> [  134.554516]  ? __die_body+0x1a/0x60
>>>>> [  134.554525]  ? die+0x38/0x60
>>>>> [  134.554531]  ? do_trap+0x10f/0x120
>>>>> [  134.554538]  ? __iov_iter_get_pages_alloc+0x676/0x710
>>>>> [  134.568839]  ? do_error_trap+0x64/0xa0
>>>>> [  134.568847]  ? __iov_iter_get_pages_alloc+0x676/0x710
>>>>> [  134.568855]  ? exc_invalid_op+0x53/0x60
>>>>> [  134.572489]  ? __iov_iter_get_pages_alloc+0x676/0x710
>>>>> [  134.572496]  ? asm_exc_invalid_op+0x16/0x20
>>>>> [  134.572512]  ? __iov_iter_get_pages_alloc+0x676/0x710
>>>>> [  134.576726]  ? __iov_iter_get_pages_alloc+0x676/0x710
>>>>> [  134.576733]  ? srso_return_thunk+0x5/0x5f
>>>>> [  134.576740]  ? ___slab_alloc+0x924/0xb60
>>>>> [  134.580253]  ? mempool_alloc_noprof+0x41/0x190
>>>>> [  134.580262]  ? tls_get_rec+0x3d/0x1b0 [tls
>>>>> 47f199c97f69357468c91efdbba24395e9dbfa77]
>>>>> [  134.580282]  iov_iter_get_pages2+0x19/0x30
>>>>
>>>> Presumably that's __iov_iter_get_pages_alloc() doing get_page() either in
>>>> the " if (iov_iter_is_bvec(i)) " branch or via iter_folioq_get_pages()?
>>>>
>>> Looks like it.
>>>
>>>> Which doesn't work for a sub-size kmalloc() from a slab folio, which after
>>>> the frozen refcount conversion no longer supports get_page().
>>>>
>>>> The question is if this is a mistake specific for this path that's easy to
>>>> fix or there are more paths that do this. At the very least the pinning of
>>>> page through a kmalloc() allocation from it is useless - the object itself
>>>> has to be kfree()'d and that would never happen through a put_page()
>>>> reaching zero.
>>>>
>>> Looks like a specific mistake.
>>> tls_sw is the only user of sk_msg_zerocopy_from_iter()
>>> (which is calling into __iov_iter_get_pages_alloc()).
> 
> That's from tls_sw_sendmsg_locked(), right? But that's under:
> 
> if (!is_kvec && (full_record || eor) && !async_capable) {
> 
> Shouldn't is_kvec be true if we're dealing a kernel buffer (kmalloc()) there?
> 
Yes, and no.

We're initializing the iter in nvme_tcp_try_send_data():

		bvec_set_page(&bvec, page, len, offset);
		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);

and 'page' is coming from bio bvec. So the bv_page could refer to a 
kmalloced page, the bvec is still that, a bvec.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

next prev parent reply	other threads:[~2025-03-04 16:20 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <08c29e4b-2f71-4b6d-8046-27e407214d8c@suse.com>
2025-03-03  7:48 ` Hannes Reinecke
2025-03-03 11:06   ` Hannes Reinecke
2025-03-03 12:57     ` Hannes Reinecke
2025-03-03 13:57     ` Matthew Wilcox
2025-03-03 14:05       ` Hannes Reinecke
2025-03-03 14:27   ` Matthew Wilcox
2025-03-03 14:42     ` Matthew Wilcox
2025-03-03 15:12       ` Vlastimil Babka
2025-03-03 15:39       ` Hannes Reinecke
2025-03-03 15:48         ` Matthew Wilcox
2025-03-03 16:15           ` Vlastimil Babka
2025-03-03 22:02             ` Vlastimil Babka
2025-03-04  7:58               ` Hannes Reinecke
2025-03-04  8:18                 ` Vlastimil Babka
2025-03-04 10:20                   ` Hannes Reinecke
2025-03-04 10:26                     ` Vlastimil Babka
2025-03-04 15:11                       ` Hannes Reinecke
2025-03-04 15:29                       ` Vlastimil Babka
2025-03-04 16:20                         ` Hannes Reinecke [this message]
2025-03-04 16:14                       ` Matthew Wilcox
2025-03-04 16:32                         ` Hannes Reinecke
2025-03-04 16:53                           ` Matthew Wilcox
2025-03-04 18:05                             ` Matthew Wilcox
2025-03-04 18:31                               ` Vlastimil Babka
2025-03-04 19:39                               ` Hannes Reinecke
2025-03-04 19:44                                 ` Vlastimil Babka
2025-03-05  7:14                                   ` Hannes Reinecke
2025-03-05  8:20                                   ` Hannes Reinecke
2025-03-05  8:58                                     ` Vlastimil Babka
2025-03-05 11:43                                       ` Hannes Reinecke
2025-03-05 18:11                                         ` Networking people smell funny and make poor life choices Matthew Wilcox
2025-03-06  0:46                                           ` Cong Wang
2025-03-12 15:09                                           ` Christoph Hellwig
2025-03-12 18:28                                             ` James R. Bergsten
2025-03-13  9:43                                           ` David Laight
2025-03-06  9:15                                         ` Kernel oops with 6.14 when enabling TLS Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3aa8a453-2cfe-4b54-90ac-e9596c967c8c@suse.de \
    --to=hare@suse.de \
    --cc=borisp@nvidia.com \
    --cc=hare@suse.com \
    --cc=harry.yoo@oracle.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=sagi@grimberg.me \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox