From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35ED9C36018 for ; Wed, 2 Apr 2025 22:38:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AEAB280005; Wed, 2 Apr 2025 18:38:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15E4D280004; Wed, 2 Apr 2025 18:38:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3FE5280005; Wed, 2 Apr 2025 18:38:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D5181280004 for ; Wed, 2 Apr 2025 18:38:30 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CF08B1209CB for ; Wed, 2 Apr 2025 22:38:30 +0000 (UTC) X-FDA: 83290569180.19.11166A6 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf24.hostedemail.com (Postfix) with ESMTP id CF7E618000A for ; Wed, 2 Apr 2025 22:38:28 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=YXsZ23NG; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf24.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743633509; a=rsa-sha256; cv=none; b=T3hRR2t+VrcKa+EeQ8jBd3GeRLwk2SPddDi6PvxW/Ya2YkpHVstrxvCoqWbj4anQ9KqZP6 ZgDqiX5pphR1wYLXVI9K3iEOx71UbbxLNgqfdQ8eRTIvvIbayp8+Y3ws+eFvlsTxifwUYJ yAQ2UK3su3EXrPysY2Z6tk79K1X47/w= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=YXsZ23NG; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf24.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743633509; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ynaTDnoWmvbCj4VrGm5vrQOyko6HmMR33HNBCVOjMXI=; b=nna9jkkY+Sjeeo9BqqEwm/IoXRHK13yEVJijQF1PbOw5OkN/HzM4Nw46vk1a8xJ8vr5liW IEX0xEQXkmzq3wgT+MWnOajCYGOE+vzpV/AjPUnHjSr+5jibpTeQdPuMt73juhD6AxTsJX fAkLImt6WMKVD2rkTsR3SucuI1Rirmw= Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-739be717eddso203810b3a.2 for ; Wed, 02 Apr 2025 15:38:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1743633507; x=1744238307; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ynaTDnoWmvbCj4VrGm5vrQOyko6HmMR33HNBCVOjMXI=; b=YXsZ23NGU87TCkw492p+NZTgQVXaB7CRpsE3OldvW7B56dxIyC0b9v0pL+uGot+ZfM +Y4gnMN0DKU6nO3oe7yMlyHuxQyfL9XCCc3SKufvi7I3EgzG9BUWVF1uXsSlF/I5hCZV NMsBLFxbvRcdxmVc7s7ygkRl0CbSzmQPzrrQofYSnN5phh8vieDAWZIPzZsQvjZMoof8 s4CB/mYlU+kauzyPVXMK30pbx1anBabEfBHXBZXW1In+SgYXtQyxY4f4iXvDFpstlUY/ rAIIkM8o17HF+IljhTX7helB0Yvz7Vc10KQjdnXl3fu3CpfFky1MzqJ9beUbbJEcbSU1 FHtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743633507; x=1744238307; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ynaTDnoWmvbCj4VrGm5vrQOyko6HmMR33HNBCVOjMXI=; b=cX6h/I1iyU1mZ6ka5TDiWFEsnYgedOlgFNQ0xYF81aXAh0lvR0sUBhlIfmlMrhk70P Z8eSayU8zHaRGwx8vMw2SKgosBE3HP9AemJRzoLBW6/xmt0YMckeB6L4mE8Bj8qMWuyr GboufN0SSYmqvkLUJ2cw69WcS97lmvlk/5nF4pMgwZTPL+boRoPj4WJtopHGrjz0pY7x SU5xZtpDpWS+m/4y1sBS1TqOh8meMMVGEucGVC4Hx4IM4D6Rqvv74Puwhz7dM+cpqgAy htkjNvo5IsdFObnROeXQiQZ8M4si0MYvhlWM/S0JvjGQIb5+cGB3cyH0L9y1Lz5lSrdD QWfw== X-Forwarded-Encrypted: i=1; AJvYcCVwamIRnLj2gpVujBuj/6QrJn6/KfkiHrT4rlA9e3Vv6omAXM+5jnSOcCTzkDnttj4DjQslJje0Iw==@kvack.org X-Gm-Message-State: AOJu0YwfXeF6BuQL4b9nrTBCg72Z5Gc60kCnJkyvXWqAqO+NXgEzZtA/ YOp+jdQjY0Uw3U0yx+GVCGpSruYdFWJOOoP5ZOdktgXy9hl18cWmdsp9nMIakP8= X-Gm-Gg: ASbGnctQx1l9joIJ26lszRnExlKRBbGtsud/e+GzMtg1IguKLQt0gRi0N20N1N3TIBd mDaWDDdPvGSV4WGpj/IsK2F9uiifhxkfL/VFT1X37oHlbwT8Qqf5n7pc6ZvDzCG5MKTKLXTFbKa jWa9MBELWST7y+EU3JPazSx9X+G9/2HNcE3YYRRKyNJcGXQYCLJcmEaDyEdPW4BDfaj2BmN9wpg Pz3FSmA9Z8b4+W0fkjHS8bPQPsr03aG3wS1kWpxd1/LRxza+lThzdw2i3UnjTZI/RhC3zTdcQ4g E96OIXChTQ3FS0Cz2FCsutUp6DXKwxyihNAAgzmc166Lqz9P+R24QSOayVe/c/8zySpW3kkrswA gyYd/kxAIdxaLTEawzw== X-Google-Smtp-Source: AGHT+IEvxyZSV4gjW+quadnAWEaceKmDex5jTEGIVB8xOUn0w7lDLW/hziyx7YfhBI0zuINgXMnqTQ== X-Received: by 2002:a05:6a20:6f04:b0:1fd:e9c8:cf3b with SMTP id adf61e73a8af0-200e4cc69c1mr6790332637.30.1743633507527; Wed, 02 Apr 2025 15:38:27 -0700 (PDT) Received: from dread.disaster.area (pa49-181-60-96.pa.nsw.optusnet.com.au. [49.181.60.96]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-739d9ea154dsm35796b3a.112.2025.04.02.15.38.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Apr 2025 15:38:26 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1u06j2-00000003idM-0JrZ; Thu, 03 Apr 2025 09:38:24 +1100 Date: Thu, 3 Apr 2025 09:38:24 +1100 From: Dave Chinner To: Matthew Wilcox Cc: Michal Hocko , Yafang Shao , Harry Yoo , Kees Cook , joel.granados@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Josef Bacik , linux-mm@kvack.org, Vlastimil Babka Subject: Re: [PATCH] proc: Avoid costly high-order page allocations when reading proc files Message-ID: References: <20250401073046.51121-1-laoar.shao@gmail.com> <3315D21B-0772-4312-BCFB-402F408B0EF6@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CF7E618000A X-Stat-Signature: neoru6mkkux87rp56u51wkgbagzh3uzn X-Rspam-User: X-HE-Tag: 1743633508-217719 X-HE-Meta: U2FsdGVkX1+jVvZfDUvzHjDwBIRSbUB+J9OgxFa+f1UJqmCR2O1o8vMZN/w0tzev0skleuEgSIGdI6yLewKTpPEC8yDjAX+eu10DR8RAAbEEahmTH1GOPandAJ7NXXZCJEEIyiTcSlcfteJujxBtxroGGWpv+zIkBq6z5ycd9YjQ292FCzshnxY8KakNLlHr4xzObW16PhWcDImyl1yRNri7iQO8gVyUVwQBCovHKCnqVKJOF/8js3jSX70B2y2msS6dO9zeTwbinkzqrKMPonWqGQo8PvwTse14GHfhKIsjE1sJnPnBW2+NloN3cAEcWS+XWxihO7MSrzuwmAa0JNGXnqOSMAc/95NYpm3PlN8pc9eTTI6eSMCEvf4gyyr0lwsRsub09akX0zkaxY4NBa7gEu1l5asPXyiyWb1+gvEA5lTchIZdAl1ciqvjFZUj10G+QKdOlnvowrG270bZphBtpaaaD/h0DWh3F4Cq7N19VHuWfecEXwJOcq+O9SpC/mcl393HKRKUwgQgSIvP6bMRU9WMtJuU5GsuC30AlMnsISl4vLmC5Gu+0S64wrDZULJnpjsJQDsTb7VI8+mmDJ8KJtYnpZxUsFrQdgPoAzuRsDw36vuHANA89YxyPMHmLMhJZN9aXP+61Se0cqzjc7Iyl4jHFG6lkQeSvhvCpGlMMsqO8RYaODo6HgHjMqu1RxNWBXLeuiGh5YfmTSjYrfMqGwUl1YaXQnPmLJ+vYzQE8Xckg8CgzRAnV9OBGnnYip8BdhQEagl2vuOeNlYV/W0mA5G64C/ZovA2eOhLGSOT8403CXApCNI+3xX6RzyhInVkEIs9pwJG4uqAa0U2a4Oezmon0yW8CAJozsEywbYJMGGnW+4sXXCx9G0KN4jqDcv0fx6lCXIB0HV90JNq0S2qK74D0lBmI3UVcG4KzgdiiGJhgfH+6mxIevq0eM0c2zXM9gNigxSnIKiGVFN k2EPKa4K pgiVfbbVmAq/Q4cqXZ4YWqkgGON1ngX1YqlgyC+c10tZpUVvGUAMRmiIPg75cvjxFMUz3clFBGSI7c6RSJWyzE/AIAtxWkEJQQaHjti5KJ435MylOPIqm5G7m+4Ut2HrKeGTr2mTNPhHT2gC/LwURPR8cIXUFx+Ouh6cKwQEk+u0vygoHzRC2o06bLbVXzBaZFc22GWPV+ceSQb3Ko8jG8GFfLEnksm2Ome3BbwSJHgANHoKMhhV2jCMqKWjyxVKOuIc7w1ukx1m6A5bjttpr57VoNdNYDLX5ofaAtprH9azsO5AS4ILoYeabFCAv7DcoNqWQoly4I38wHDc3zjIkKa7rTQl2SuHUMrapMPiZarwNMjtz7tyJl0U25j+dWQyjCEs7H3A61V3ZQSFs8hKRTd9UY7cuhaKn3oee2xsQdJj8yd7M6GMrx4WZvx9EuOgrEvR7uoaf13vbJ3RwaISpZxae8M99Wc5Jbt9pNiwTtKKu1Lw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 02, 2025 at 06:24:10PM +0100, Matthew Wilcox wrote: > On Wed, Apr 02, 2025 at 02:24:45PM +0200, Michal Hocko wrote: > > On Wed 02-04-25 22:32:14, Dave Chinner wrote: > > > > > > >+ /* > > > > > > >+ * Use vmalloc if the count is too large to avoid costly high-order page > > > > > > >+ * allocations. > > > > > > >+ */ > > > > > > >+ if (count < (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) > > > > > > >+ kbuf = kvzalloc(count + 1, GFP_KERNEL); > > > > > > > > > > > > Why not move this check into kvmalloc family? > > > > > > > > > > Hmm should this check really be in kvmalloc family? > > > > > > > > Modifying the existing kvmalloc functions risks performance regressions. > > > > Could we instead introduce a new variant like vkmalloc() (favoring > > > > vmalloc over kmalloc) or kvmalloc_costless()? > > > > > > We should fix kvmalloc() instead of continuing to force > > > subsystems to work around the limitations of kvmalloc(). > > > > Agreed! > > > > > Have a look at xlog_kvmalloc() in XFS. It implements a basic > > > fast-fail, no retry high order kmalloc before it falls back to > > > vmalloc by turning off direct reclaim for the kmalloc() call. > > > Hence if the there isn't a high-order page on the free lists ready > > > to allocate, it falls back to vmalloc() immediately. > > ... but if vmalloc fails, it goes around again! This is exactly why > we don't want filesystems implementing workarounds for MM problems. > What a mess. That's because we need __GFP_NOFAIL semantics for the overall operation, and we can't pass that to kvmalloc() because it doesn't support __GFP_NOFAIL. And when this code was written, vmalloc didn't support __GFP_NOFAIL, either. We *had* to open code nofail semantics, because the mm infrastructure did not provide it. Yes, we can fix this now that __vmalloc(__GFP_NOFAIL) is a thing. We still need to open code the kmalloc() side of the operation right now because.... > > if (size > PAGE_SIZE) { > > flags |= __GFP_NOWARN; > > > > if (!(flags & __GFP_RETRY_MAYFAIL)) > > flags |= __GFP_NORETRY; .... this is a built-in catch-22. If we use kvmalloc(__GFP_NOFAIL), this code results in kmalloc with __GFP_NORETRY | __GFP_NOFAIL flags set. i.e. we are telling the allocation that it must not retry but it also must retry until it succeeds. To work around this, the caller then has to use __GFP_RETRY_MAYFAIL | __GFP_NOFAIL, which is telling the allocation that it is allowed to fail but it also must not fail. Again, this makes no sense at all, and on top of that it doesn't give us fast-fail semantics we want from the kmalloc side of kvmalloc. i.e. high order page allocation from kmalloc() is an optimisation, not a requirement for kvmalloc(). If high order page allocation is frequently more expensive than simply falling back to vmalloc(), then we've made the wrong optimisation choices for the kvmalloc() implementation... > I think it might be better to do this: > > flags |= __GFP_NOWARN; > > if (!(flags & __GFP_RETRY_MAYFAIL)) > flags |= __GFP_NORETRY; > + else if (size > (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) > + flags &= ~__GFP_DIRECT_RECLAIM; > > I think it's entirely appropriate for a call to kvmalloc() to do > direct reclaim if it's asking for, say, 16KiB and we don't have any of > those available. I disagree - we have background compaction to address the lack of high order folios in the allocator reserves. Let that do the work of resolving the internal resource shortage instead of slowing down allocations that *do not require high order pages to be allocated*. > Better than exacerbating the fragmentation problem by > allocating 4x4KiB pages, each from different groupings. We have no evidence that this allocation behaviour in XFS causes or exacerbates memory fragmentation. We have been running it in production systems for a few years now.... -Dave. -- Dave Chinner david@fromorbit.com