From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F412FC021B8 for ; Tue, 4 Mar 2025 16:32:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B2046B0082; Tue, 4 Mar 2025 11:32:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 863306B0088; Tue, 4 Mar 2025 11:32:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 704276B0089; Tue, 4 Mar 2025 11:32:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4B13F6B0082 for ; Tue, 4 Mar 2025 11:32:38 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E7CA212034F for ; Tue, 4 Mar 2025 16:32:37 +0000 (UTC) X-FDA: 83184411954.02.4CDDF3E Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf18.hostedemail.com (Postfix) with ESMTP id B126A1C0009 for ; Tue, 4 Mar 2025 16:32:35 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=KSd888Il; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf18.hostedemail.com: domain of hare@suse.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=hare@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741105956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jc0fHINLgpRpFjBoumMxo+EFHaGqMPVtOaRHH1CUD+c=; b=oLLRvG2W7d8jKRthHiLIGrzla9Qluq/2fALmKh/V2aHaIHHmDa0LgbLKf0zwLqsoZgey8y kTeCkWxecfSHdLAn/kbDWdfKqpqXFcMUvqhLXxZPRCmMu4LUC0CPBwG0wJFnI31NfYAYvq o1MsCLIFZrQnmd6FQ7+2jX+igqbrjpg= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=KSd888Il; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf18.hostedemail.com: domain of hare@suse.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=hare@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741105956; a=rsa-sha256; cv=none; b=xlYA/t3YRuEfz6sO0R+olpe0pO/mHPlTq0WCTECwzuLWcaxi4zPtGvDaPQh0PWe8EHtmrc Repu9qNTgi0KEOIeYXlbgxMm0EP5B5ctZf3CkQjKaQMpCnnXC9D48Q9JTet4gfKJhlTm+L M4aOV8/vmn9h/vhvKO5T6MMv/hUR6NU= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-38dcac27bcbso4767417f8f.0 for ; Tue, 04 Mar 2025 08:32:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1741105954; x=1741710754; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=jc0fHINLgpRpFjBoumMxo+EFHaGqMPVtOaRHH1CUD+c=; b=KSd888Il0jyDmHRxhq0tS5rdW/pmqbfSBza7/HQdB7xqEjtxRQVEisXlZs0R0PXOlT A5BRUFslJYZFrCNsZ6XkSnuYzzagBoBddinGNZ/1JJOilQffXpB6OlHt4S6Yw5vBE/mF zFGP4z+a0Ju8XFUg1KSFlsD2hlTGHr99PGph/lBDUsoIqYwaohueP4U0IK0NWs9oq061 bkaJ/hZInySk58PGfUqKqi7P6WZDf05Sn5ce1cQJuE+HicrmsGKiN5C2qiW3nZagG3ai /F9K8yuwZHvHs9XLM4O1+Q1ItbkmMA//QlVkAD5G1YplHJ8UWUVGvEAhR71DwKcsl7yK +j0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741105954; x=1741710754; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jc0fHINLgpRpFjBoumMxo+EFHaGqMPVtOaRHH1CUD+c=; b=aH+KTfuwXaU92S4Fon72XkepCDa/lpWIaWOFhWYuvacsaH21oiTsIM7UOtIq/zw9Bc OH8XmrKEoozPHRs2eUX2skGC60NR6SrNDFVdN8MhyXrCUjsCqRbcR0kWUZiyLi+//FtL 65tHdfQAIuqjNXWrvWLOZ+AdEnLXpiBjEVsLFs3EDb7N86paZCQdOmTwdgz65F3Qt5J4 LwxkgKv+5Gs7Aw+qpY0iZMBAGfmAgJBWScr0EaBAkMy+py1oM10CqvDWoKsfZIz7rGJx DNlgsjbPgFu2kIrF9kDUDsuZmo0l5IvFbxpWfaIuw7+/xw6u9o7nSCKSE76nw8nmyftx Oidg== X-Forwarded-Encrypted: i=1; AJvYcCXW6ze5UN9fhbiRL9mjlGTcNhLhbsEWdFEk0F7q48KcsWLMpqvGu88VIdGR8EXsLj9Vh4goNcC2Zg==@kvack.org X-Gm-Message-State: AOJu0Yyx58zqQkqpJC5ZMY21MRLRstRXnZbbtH60gzKkN4b+EhVgPs9D w+DC43b3O+0cuKtXECOrL5cg8dy+fpi/1eteh6PKbRE+anTP0+9rtqdmELiCMWM= X-Gm-Gg: ASbGncsptoT+3Pr+mZ6HfiZp9/VZgcEhh1Z/L9YnWrMT2n5KAXDdBWXCY+889RPjBnk xAgIFkX6XSK7YefCHRjeI3Bu8PxWopAai8mQyqHPDqrAUhr7HsS126VtxUJrpTRSKZHSk5tD1mA nG6PxnmmwrZcdNmmbnoGEpABY9zDl50fx4XbRApX+F0n4BNg67sy7W6RSQHm0REQFftRruo09Je fXBKJUb3JMhYFYC9x4OsgKAumAxyOwX+IHumyYEmc/rWAm2ZeV/wzMvSNGI7L/lMoD/S0/LB29X IMwfI4FES9Uy1EX/eTYH431eH9Bngu9M1BG1ZN2mYM+TpUVgtUZziGC/kKHYSnQEWv8nxvM9/Km RuG9UvhPLoNhEY/g= X-Google-Smtp-Source: AGHT+IE9XHPKCWYKHmfXqhIFhQbBA+5LnUK+LBGTEc5R7+PxSq+ocpU3+RvEvLUOWl1sPoOEowzQ0Q== X-Received: by 2002:a05:6000:154e:b0:390:fe2d:3172 with SMTP id ffacd0b85a97d-391155fa457mr3008728f8f.3.1741105953921; Tue, 04 Mar 2025 08:32:33 -0800 (PST) Received: from [192.168.178.47] (aftr-62-216-202-126.dynamic.mnet-online.de. [62.216.202.126]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47a7a2asm18282891f8f.37.2025.03.04.08.32.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Mar 2025 08:32:33 -0800 (PST) Message-ID: Date: Tue, 4 Mar 2025 17:32:32 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Kernel oops with 6.14 when enabling TLS To: Matthew Wilcox , Vlastimil Babka Cc: Boris Pismenny , John Fastabend , Jakub Kicinski , Sagi Grimberg , "linux-nvme@lists.infradead.org" , "linux-block@vger.kernel.org" , linux-mm@kvack.org, Harry Yoo , "netdev@vger.kernel.org" References: <15be2446-f096-45b9-aaf3-b371a694049d@suse.com> <95b0b93b-3b27-4482-8965-01963cc8beb8@suse.cz> <6877dfb1-9f44-4023-bb6d-e7530d03e33c@suse.com> Content-Language: en-US From: Hannes Reinecke In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B126A1C0009 X-Rspam-User: X-Stat-Signature: dsyh7k1gbucsehgjhcwyu68q67bt5k7f X-HE-Tag: 1741105955-858959 X-HE-Meta: U2FsdGVkX1+VZJJbrcTUd+tOrA7Eq9/lkn8eKE5d6g3SWjadrVwGBmOUA7BlxHsQGmGMrtWGtx64Zn3N3ttKE0KKlE2aoa+715xHYq108gVN77KHSPuplWVl6gW8GShMQ4Tpob6MMtmtUQ0uzSGI1AKY+b1jMu8dhXGUbfCum3ZdvL9KpNrfS6GSfJ4lxjcnA7qfvy5mEjVasAocA/bTdW1k/Tn8CQXbu8XgqbcE8KpNHWMIIwa/tD0xqWoX8060wPxBsVItWa75/BMX9zZtRCAcyetIVIp3zz6OuPDD9gYT00ZOU2K+zsehCUA9yaPoocljklVlDZqI5OZUDWPQiUsc5A8/iMGCBO/3WxaUpdjpNlFEk6NJQQ7cr6NvE3f6bViON9a9whhtkSngEBclwruzACkvQXoFTmc5vwAJuTIYo5afRp3HbDR5np77PG+mOdDb+7oMYpScZJ+lyEG+he81wazeOg8+fCaAKuva+IH9GV5nCBRY96tdfscMFFjroYHdc1vIfj1CJ5U4eDslTphzeSSoBPPSxnDH99gfyHMQBUILN/TlG8Qp1S/9TK5091P449G7Lq09wivVGDQ7Bxqfcusn8+iWUbmVDpf52eCkDtdr/SRMFMeyHkdpj/lUvLDhv1GyYZ6Ea2NJAmacVhptdzrPncRTOgHhEcXYHRpOwvNvTSGeBdWijBr0OVU7WzHS6sG9tRrkADShfroklq1aEDO7BJ5pJmB+1OAaJtqgTp5pgJiNTeIwQadFLHEeI6GMw2lrTpRoY541MPfBaumNTbVOQmKMdCyWDFqwb7TnDbHgDT6EJJoJsXC5zZWCYfRkvRlEvkwObCjFuOt9vrunwPdTwbcvnzfnfBEb5CqSS2bLPlRRXocPH8nw79rTFWqknlrSAnmyIQ4+xGeD6bq4/ppIN9LrYk31tFaoW1MqbhfaInwuNp+aUs4UzuBktNn6kYS9VIJhr+9dusQ w4O3KkHq 7zG7KiO2Lag6LpqpREiEvGf85rZtqtRaxQsJmD30ByFVCZmOIw6ekXGuF+3PFRRntzvVOlxuddBxGnx9e7AfJZ4SVxYiouW9MzzbSpUCMhJxMaJ7aQDoJKWlHo6Y2yrDUYjzD34HGoY3SYyKD0jeI76dYo6DqCslps58w8QTFGLSyCp+6LbdunaDvsjTPFy/ZyYJfbUQqjY+Ca/dPX4RD/IIeLI9jDy/ObiGLO0Qkk8hczSTmmagU+4IKiZsZTefoqY54uxVnJU65HC0mN43YWswklqmPoYi2LR9ob2kIo1AgQZbQueNtntRVpP73Ms5te5VIhPtr1TwkmmXZy1vemc3gTpg9NOt2ecJoJwk2L9H9FXPwL8HmG2emU3hiKcHNlMb5aLQRNv/SKTQDjlrEX7v7ZYyhzE29emQNtFCKTv1X5tRBpd2E7hHgEyqpv2fN/LMYtv7EZLIf+e/xz/8IPxNSo/XxpNL2NYY1ESropBYO4el3zALNklsdrhAvJaQj0PQZSD4VuvDJ0r/NrIZUYbpJ4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/4/25 17:14, Matthew Wilcox wrote: > On Tue, Mar 04, 2025 at 11:26:07AM +0100, Vlastimil Babka wrote: >> +Cc NETWORKING [TLS] maintainers and netdev for input, thanks. >> >> The full error is here: >> https://lore.kernel.org/all/fcfa11c6-2738-4a2e-baa8-09fa8f79cbf3@suse.de/ >> >> On 3/4/25 11:20, Hannes Reinecke wrote: >>> On 3/4/25 09:18, Vlastimil Babka wrote: >>>> On 3/4/25 08:58, Hannes Reinecke wrote: >>>>> On 3/3/25 23:02, Vlastimil Babka wrote: >>>>>> Also make sure you have CONFIG_DEBUG_VM please. >>>>>> >>>>> Here you go: >>>>> >>>>> [ 134.506802] page: refcount:0 mapcount:0 mapping:0000000000000000 >>>>> index:0x0 pfn:0x101ef8 >>>>> [ 134.509253] head: order:3 mapcount:0 entire_mapcount:0 >>>>> nr_pages_mapped:0 pincount:0 >>>>> [ 134.511594] flags: >>>>> 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) >>>>> [ 134.513556] page_type: f5(slab) >>>>> [ 134.513563] raw: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810 >>>>> ffff8881000402f0 >>>>> [ 134.513568] raw: 0000000000000000 00000000000a000a 00000000f5000000 >>>>> 0000000000000000 >>>>> [ 134.513572] head: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810 >>>>> ffff8881000402f0 >>>>> [ 134.513575] head: 0000000000000000 00000000000a000a 00000000f5000000 >>>>> 0000000000000000 >>>>> [ 134.513579] head: 0017ffffc0000003 ffffea000407be01 ffffffffffffffff >>>>> 0000000000000000 >>>>> [ 134.513583] head: 0000000000000008 0000000000000000 00000000ffffffff >>>>> 0000000000000000 >>>>> [ 134.513585] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) >>>>> folio_ref_count(folio) + 127u <= 127u)) >>>>> [ 134.513615] ------------[ cut here ]------------ >>>>> [ 134.529822] kernel BUG at ./include/linux/mm.h:1455! >>>> >>>> Yeah, just as I suspected, folio_get() says the refcount is 0. > > ... and it has a page_type of f5 (slab) > >>>>> [ 134.554509] Call Trace: >>>>> [ 134.580282] iov_iter_get_pages2+0x19/0x30 >>>> >>>> Presumably that's __iov_iter_get_pages_alloc() doing get_page() either in >>>> the " if (iov_iter_is_bvec(i)) " branch or via iter_folioq_get_pages()? > > It's the bvec path: > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len); > >>>> Which doesn't work for a sub-size kmalloc() from a slab folio, which after >>>> the frozen refcount conversion no longer supports get_page(). >>>> >>>> The question is if this is a mistake specific for this path that's easy to >>>> fix or there are more paths that do this. At the very least the pinning of >>>> page through a kmalloc() allocation from it is useless - the object itself >>>> has to be kfree()'d and that would never happen through a put_page() >>>> reaching zero. >>>> >>> Looks like a specific mistake. >>> tls_sw is the only user of sk_msg_zerocopy_from_iter() >>> (which is calling into __iov_iter_get_pages_alloc()). >>> >>> And, more to the point, tls_sw messes up iov pacing coming in from >>> the upper layers. >>> So even if the upper layers send individual iovs (where each iov might >>> contain different allocation types), tls_sw is packing them together >>> into full records. So it might end up with iovs having _different_ >>> allocations. >>> Which would explain why we only see it with TLS, but not with normal >>> connections. > > I thought we'd done all the work needed to get rid of these pointless > refcount bumps. Turns out that's only on the block side (eg commit > e4cc64657bec). So what does networking need in order to understand > that some iovecs do not need to mess with the refcount? The network stack needs to get hold of the page while transmission is ongoing, as there is potentially rather deep queueing involved, requiring several calls to sendmsg() and friends before the page is finally transmitted. And maybe some post-processing (checksums, digests, you name it), too, all of which require the page to be there. It's all so jumbled up ... personally, I would _love_ to do away with __iov_iter_get_pages_alloc(). Allocating a page array? Seriously? And the problem with that is that it's always takes a page(!) reference, completely oblivious to the fact whether you even _can_ take a page reference (eg for tail pages); we've hit this problem several times now (check for sendpage_ok() ...). But that's not the real issue; real issue is that the page reference is taken down in the very bowels of __iov_iter_get_pages_alloc(), but needs to be undone by the _caller_. Who might (or might not) have an idea that he needs to drop the reference here. That's why there is no straightforward conversion; you need to audit each and every caller and try to find out where the page reference (if any) is dropped. Bah. Can't we (at the very least) leave it to the caller of __iov_iter_get_pages() to get a page reference (he has access to the page array, after all ...)? That would make the interface slightly better, and it'll be far more obvious to the caller what needs to be done. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.com +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich