From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5350BC282D0 for ; Tue, 4 Mar 2025 10:20:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE247280003; Tue, 4 Mar 2025 05:20:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D6AF4280002; Tue, 4 Mar 2025 05:20:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBD90280003; Tue, 4 Mar 2025 05:20:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 99EBA280002 for ; Tue, 4 Mar 2025 05:20:19 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2824A14181C for ; Tue, 4 Mar 2025 10:20:19 +0000 (UTC) X-FDA: 83183473758.26.869D491 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by imf28.hostedemail.com (Postfix) with ESMTP id 13873C0011 for ; Tue, 4 Mar 2025 10:20:16 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="HSpgru/p"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf28.hostedemail.com: domain of hare@suse.com designates 209.85.221.45 as permitted sender) smtp.mailfrom=hare@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741083617; a=rsa-sha256; cv=none; b=O6JVqkvz7v9ykmwjKcb4rz/BQL3tSKbpu3j1qa5ZqtgV5r3UPQ0WIFuQIer8ZZojhgBmOP CWxGGOAxRnVfW9vLtwyp6vmT+mLNeVn7q+VjHMHG4YQpCnCWW/D7QfR7yHwC0DT1+WFT5y oe3uo2PxcozWtZvIme/fK00EU+d6CUc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="HSpgru/p"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf28.hostedemail.com: domain of hare@suse.com designates 209.85.221.45 as permitted sender) smtp.mailfrom=hare@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741083617; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mM6uhVuxdq3yTuuEuc4jtPzNlL9cRg4isSUjeaJbmAs=; b=T1RvmK3L0aVrf4EKuywOZs+CvnyrpIcBippn+c2qgJz2U6daoiX7NzBp8Yq+6WDGPEMKdx 5yY6QA7P3e3rPcu+MKjhMKM8f88OHyU4XrnKtUUB7mBDVtUTSL5qt1jDCvarTu23xgaJIn YLFDCaVucA6CXgZgnRg38qek2gIkOp0= Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-38f406e9f80so4112729f8f.2 for ; Tue, 04 Mar 2025 02:20:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1741083615; x=1741688415; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=mM6uhVuxdq3yTuuEuc4jtPzNlL9cRg4isSUjeaJbmAs=; b=HSpgru/pKLyGhNAIHfurrdjop7o3irHu/YDI9EegYwHc9wUurZw+lNr4vIaqjBrcrN 3SwkttTTx6vrrQlJ9UNlHF4tFTrUaQ7ejrsok+2A0IzJjT+lGUbTyPufS8z/+A85sq6C lh1JlvWOvclOrTqZzAv5QIEola+fRrq80T+zoJcHh4MgszLQ+WjB8Cyu40bWezV5qu/Y zwaQtNfvgSXw95xtCVym4fsvU30TaH5hpRa0/CQu3UDmQzFUHMwmr5PN+RTCG8UA25o0 GPOn1airZMKae3KGaHahSrifa3/GbuPFNqUhAPeOZV2wqfXS2WGXbcrxvN6evAPQ68Ff PFtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741083615; x=1741688415; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mM6uhVuxdq3yTuuEuc4jtPzNlL9cRg4isSUjeaJbmAs=; b=QMPU7myByQLdBUxaMMVWzVIez69z4nuIzyHUO8BgFZEOThtadJex+/ZbphWxiDKNaY /TYjjZRHQCoAJhQaKUHDmYgiDUfoTA8CH958mw9QeqFWao18fxnXXIX8iwTyv+gNsVfU i2bjxAiznn4+TdhThSMo34h/hyoE5OD6BrC/DAfb7VRpxO7s8XPnm/fVtC+vqWWW3Frv 34gNgkJiW62KIdHaBio8jvqHDfIMnRpXcInMfnR3sVX4eUzCji+3yQo0EdEPg864vgzU qQjQLSYJ423BEOpJISCAggTV8idlXZ5+gSoLvfr6JHTU8O6uYC82+33GZlERUb9vp95P 6dww== X-Forwarded-Encrypted: i=1; AJvYcCUMhrAhlvOJQonuiIH0eMD/eTdlUc6w84r1xnHbSEWAKmQK8HPK0te7ybZkLhlv/4atpNsR1cLnNw==@kvack.org X-Gm-Message-State: AOJu0Yws1EQcPZKewAjNFmUr8SPo1EYp/Cgjh52U4xKzPMDZYFrTGr0R yDSNw7DfBGcTScEBtwxHgxdblD4m+pPP8G0W8XJ4G4y2OEx+YZRUanhx6mBMxWU= X-Gm-Gg: ASbGncuOOirLiKzKVB/ebOoEd3qx5x/iFpKS38TNYD/xZW2j6LXFOQBJ3afTkRehBDc B82p5Lp1feur78bgd36Dt23B7wVAecYv/QWbxReIHR3KzCW0wjB8WO+XE7Wri/ADRibvq/Hva6S rtj61w/8eQbUi4Hbd9oW1X9h9fLuC7BJ4b8IFUfW02WCIgLHumCJtSBRXRUYlSkKt5/UKbz+P7H 8IWvK7ZuaBZbKy2hpuDiTmy4+sHRNNRcxAkakmQJfjB1GI3liq/wXwYrlKt7WKeTwu74typylFq K/hwDtut8kFKxnKFyhbTgtQSc/OjqPhvC+N40JyjdQG+n3j/sJBQRjwfWnfx17sfiylfrA== X-Google-Smtp-Source: AGHT+IEJYUHkHpDXnLexQIZ1xTZQsresalMyN3ct2lfKJwO55dx4NW2ku8asHz4sUsXDXgCX8DGrmg== X-Received: by 2002:a5d:584d:0:b0:391:12a5:3cb3 with SMTP id ffacd0b85a97d-39112a53f00mr4610609f8f.3.1741083615391; Tue, 04 Mar 2025 02:20:15 -0800 (PST) Received: from ?IPV6:2a07:de40:a101:3:ce70:3e6f:3b9c:9125? (megane.afterburst.com. [2a01:4a0:11::2]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b73703caesm189761065e9.12.2025.03.04.02.20.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Mar 2025 02:20:15 -0800 (PST) Message-ID: <6877dfb1-9f44-4023-bb6d-e7530d03e33c@suse.com> Date: Tue, 4 Mar 2025 11:20:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Kernel oops with 6.14 when enabling TLS To: Vlastimil Babka , Hannes Reinecke , Matthew Wilcox Cc: Sagi Grimberg , "linux-nvme@lists.infradead.org" , "linux-block@vger.kernel.org" , linux-mm@kvack.org, Harry Yoo References: <08c29e4b-2f71-4b6d-8046-27e407214d8c@suse.com> <509dd4d3-85e9-40b2-a967-8c937909a1bf@suse.com> <15be2446-f096-45b9-aaf3-b371a694049d@suse.com> <95b0b93b-3b27-4482-8965-01963cc8beb8@suse.cz> Content-Language: en-US From: Hannes Reinecke In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 13873C0011 X-Stat-Signature: dwsr5fb4kzi5m6dr4rsga61iywwxuxnf X-Rspam-User: X-HE-Tag: 1741083616-363355 X-HE-Meta: U2FsdGVkX18dLxmTATddJGAlryXjzti9/Omq/xNXAK0OqEhDeCX+H4mDKBfzCqXoJqJLZtSCm1AtIm8VYWjCvKaeW2SnwoOHW+ByOrhcNymvIOKU0a6bWGDMGdBotfXC9ZVDEOLa4ay9j+Li+Fg5fDozYTUwRvXfHwf6p3HJCSDif98wR7xK+7I9hFc2NHJRQC9V8mJXhWFxQ6y/IcOj6fVQe9YYcUlalFnUH5+HyqhCJYLXwyyDz6u8iGAKxtpqYobGV2l04FsQAW/1LCmepT/6G8GGNDM2wC7ublTpVYrqlRKAUVPJPMtle5Qpy10aAhNjGFxhf/s1YVZ1v9nbBlvEQds3T5ISr+4rFo+965asaheNYMlNZ2zBQ4w6f6lENZFyceKFIEYAkwHqIC5n6QiTiFI3vATG1OEKbYptYK2JISfsq8EftsV4spggafeQmIAqOtrsiOPp/SvM212Ac8xJimGoKzjdxH5kyzDZ87dcfVWx2FnFRXPcvjm3XVDT2O3NKlI1THYiMYDhgHYmM2NRir9C1r8e+nEtrnGT6gJtxxAOg7rgpjgERyuG/SMADkPUceE/TTGsRaJf2E9zQGKv9/KiDbBqNX2pX/OQhiqQPL+T60M2aJr2NZPUwUAH/Tm3KYn63LpzaaNdZ6/1d9XVoa1+Mxwsx4Kh3r+mrq/cjgwSxvP5A4NN7ItiMxsYKT4EXmpSKXPOn7WV4I+oMmpvjuCercXWsZ5T4K+OEsJI4P7uMha/s1i3ZDFxmtHk3xwnDsSxlQRNAhnSMVRJn+Av82UUwULHzuVR1tChcv+ZyTTyRGDkoRA50jFK9OAtus7GrIjVMFlFXUS5hXLkk/httfE5/sK7USZW8B8kF8QP9wRDTx1DejJJCqF19JsyDKOM6lDya2UMzZYgzVSeTIOmk1thcXSbCfsE/SIgoZwZ9eThNHrNwRDPt8UqOrEloBV+bZgqoGAer9mNUHF RJXIcDMi ctTbrKcwCqbpAMx+x93VydCYQyIoO9GN6q4yCpEu5mFy0FyHTLzuCsZB9XGUy7IHjPDOeTU1Srcbad2AxytcSI+PWJXD1fcIjzz+EUVfT5xwCX3+PNwsfqy5XSZwdWsAUrNEdRdu/vzCmYZvewLneUhD92xDnFIGmdCpsklGszdBxN7AOO/e1bgoMvS1oEPkSsgDTiRbwBKQsXxlwje3z6Gnw7lmMixvw66BqCE3atrAGJ4n2ghr3PCIEYbE/6QfhyNUkGAPefMtXi8DgI3QUF9B2yEtC4WWIK9uHYx9gRsXwRZ01PILdikctYJakyteJvxj6aNxQe2PoHEveev14KSAQLP9i+IHBbDjAJc9Fh62kCbWqjv8hKcg+Ymnf/C20VmMwlTnY1t1D29chy4976yL5o94KP75kx4t7PfMAVHO7sqU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/4/25 09:18, Vlastimil Babka wrote: > On 3/4/25 08:58, Hannes Reinecke wrote: >> On 3/3/25 23:02, Vlastimil Babka wrote: >>> On 3/3/25 17:15, Vlastimil Babka wrote: >>>> On 3/3/25 16:48, Matthew Wilcox wrote: >>>>> You need to turn on the debugging options Vlastimil mentioned and try to >>>>> figure out what nvme is doing wrong. >>>> >>>> Agree, looks like some error path going wrong? >>>> Since there seems to be actual non-large kmalloc usage involved, another >>>> debug parameter that could help: CONFIG_SLUB_DEBUG=y, and boot with >>>> "slab_debug=FZPU,kmalloc-*" >>> >>> Also make sure you have CONFIG_DEBUG_VM please. >>> >> Here you go: >> >> [ 134.506802] page: refcount:0 mapcount:0 mapping:0000000000000000 >> index:0x0 pfn:0x101ef8 >> [ 134.509253] head: order:3 mapcount:0 entire_mapcount:0 >> nr_pages_mapped:0 pincount:0 >> [ 134.511594] flags: >> 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) >> [ 134.513556] page_type: f5(slab) >> [ 134.513563] raw: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810 >> ffff8881000402f0 >> [ 134.513568] raw: 0000000000000000 00000000000a000a 00000000f5000000 >> 0000000000000000 >> [ 134.513572] head: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810 >> ffff8881000402f0 >> [ 134.513575] head: 0000000000000000 00000000000a000a 00000000f5000000 >> 0000000000000000 >> [ 134.513579] head: 0017ffffc0000003 ffffea000407be01 ffffffffffffffff >> 0000000000000000 >> [ 134.513583] head: 0000000000000008 0000000000000000 00000000ffffffff >> 0000000000000000 >> [ 134.513585] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) >> folio_ref_count(folio) + 127u <= 127u)) >> [ 134.513615] ------------[ cut here ]------------ >> [ 134.529822] kernel BUG at ./include/linux/mm.h:1455! > > Yeah, just as I suspected, folio_get() says the refcount is 0. > >> [ 134.529835] Oops: invalid opcode: 0000 [#1] PREEMPT SMP >> DEBUG_PAGEALLOC NOPTI >> [ 134.529843] CPU: 0 UID: 0 PID: 274 Comm: kworker/0:1H Kdump: loaded >> Tainted: G E 6.14.0-rc4-default+ #309 >> 03b131f1ef70944969b40df9d90a283ed638556f >> [ 134.536577] Tainted: [E]=UNSIGNED_MODULE >> [ 134.536580] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >> 0.0.0 02/06/2015 >> [ 134.536583] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] >> [ 134.536595] RIP: 0010:__iov_iter_get_pages_alloc+0x676/0x710 >> [ 134.542810] Code: e8 4c 39 e0 49 0f 47 c4 48 01 45 08 48 29 45 18 e9 >> 90 fa ff ff 48 83 ef 01 e9 7f fe ff ff 48 c7 c6 40 57 4f 82 e8 6a e2 ce >> ff <0f> 0b e8 43 b8 b1 ff eb c5 f7 c1 ff 0f 00 00 48 89 cf 0f 85 4f ff >> [ 134.542816] RSP: 0018:ffffc900004579d8 EFLAGS: 00010282 >> [ 134.542821] RAX: 000000000000005c RBX: ffffc90000457a90 RCX: >> 0000000000000027 >> [ 134.542825] RDX: 0000000000000000 RSI: 0000000000000002 RDI: >> ffff88817f423748 >> [ 134.542828] RBP: ffffc90000457d60 R08: 0000000000000000 R09: >> 0000000000000001 >> [ 134.554485] R10: ffffc900004579c0 R11: ffffc90000457720 R12: >> 0000000000000000 >> [ 134.554488] R13: ffffea000407be40 R14: ffffc90000457a70 R15: >> ffffc90000457d60 >> [ 134.554495] FS: 0000000000000000(0000) GS:ffff88817f400000(0000) >> knlGS:0000000000000000 >> [ 134.554499] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 134.554502] CR2: 0000556b0675b600 CR3: 0000000106bd8000 CR4: >> 0000000000350ef0 >> [ 134.554509] Call Trace: >> [ 134.554512] >> [ 134.554516] ? __die_body+0x1a/0x60 >> [ 134.554525] ? die+0x38/0x60 >> [ 134.554531] ? do_trap+0x10f/0x120 >> [ 134.554538] ? __iov_iter_get_pages_alloc+0x676/0x710 >> [ 134.568839] ? do_error_trap+0x64/0xa0 >> [ 134.568847] ? __iov_iter_get_pages_alloc+0x676/0x710 >> [ 134.568855] ? exc_invalid_op+0x53/0x60 >> [ 134.572489] ? __iov_iter_get_pages_alloc+0x676/0x710 >> [ 134.572496] ? asm_exc_invalid_op+0x16/0x20 >> [ 134.572512] ? __iov_iter_get_pages_alloc+0x676/0x710 >> [ 134.576726] ? __iov_iter_get_pages_alloc+0x676/0x710 >> [ 134.576733] ? srso_return_thunk+0x5/0x5f >> [ 134.576740] ? ___slab_alloc+0x924/0xb60 >> [ 134.580253] ? mempool_alloc_noprof+0x41/0x190 >> [ 134.580262] ? tls_get_rec+0x3d/0x1b0 [tls >> 47f199c97f69357468c91efdbba24395e9dbfa77] >> [ 134.580282] iov_iter_get_pages2+0x19/0x30 > > Presumably that's __iov_iter_get_pages_alloc() doing get_page() either in > the " if (iov_iter_is_bvec(i)) " branch or via iter_folioq_get_pages()? > Looks like it. > Which doesn't work for a sub-size kmalloc() from a slab folio, which after > the frozen refcount conversion no longer supports get_page(). > > The question is if this is a mistake specific for this path that's easy to > fix or there are more paths that do this. At the very least the pinning of > page through a kmalloc() allocation from it is useless - the object itself > has to be kfree()'d and that would never happen through a put_page() > reaching zero. > Looks like a specific mistake. tls_sw is the only user of sk_msg_zerocopy_from_iter() (which is calling into __iov_iter_get_pages_alloc()). And, more to the point, tls_sw messes up iov pacing coming in from the upper layers. So even if the upper layers send individual iovs (where each iov might contain different allocation types), tls_sw is packing them together into full records. So it might end up with iovs having _different_ allocations. Which would explain why we only see it with TLS, but not with normal connections. Or so my reasoning goes. Not sure if that's correct. So I'd be happy with an 'easy' fix for now. Obviously :-) Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.com +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich