From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87EDCC28B20 for ; Wed, 2 Apr 2025 18:25:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5D5B280003; Wed, 2 Apr 2025 14:25:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B0B0D280001; Wed, 2 Apr 2025 14:25:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D34C280003; Wed, 2 Apr 2025 14:25:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7E5EA280001 for ; Wed, 2 Apr 2025 14:25:47 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 873B2160642 for ; Wed, 2 Apr 2025 18:25:48 +0000 (UTC) X-FDA: 83289932376.13.4C2D2D8 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) by imf30.hostedemail.com (Postfix) with ESMTP id BACF880014 for ; Wed, 2 Apr 2025 18:25:46 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=bMkN6bIU; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743618347; a=rsa-sha256; cv=none; b=eeZfoOAuJomC1A40OjpKSpwWEbrVV97N+gHfJY+cpgh+eXVK61BBHut+O1AYdvvmPPfny+ /dCauCNqxsfV8v4r3yQEyWatV9tzqLnGNeYpm5vM3kENifgRM+sRwqGPJXcAG5/Ud+ZxTe zGDDSLOLzt12sH23x0zoBfv/MPt8sT4= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=bMkN6bIU; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743618347; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=utksKWb1mh68CIFg9Ttln7a3dxhUhC5QtFuhH58eisE=; b=z4iSNlD4XPPF/HDV5ymOQd707fdz4GDUyQAF0Q8di+tUqA5cxDO0rvKOR0zJmLKsXXAbTJ AZ8rckURbPXn+OSE4ZJJ0sA/Y27XtPfBxhLeXvUx0bKFSGPrsstu2MPQLyeWjBYdiiz6b0 hKb06/RlX16U5FAAOy31P3z1Zqkw4CA= Date: Wed, 2 Apr 2025 11:25:39 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1743618344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=utksKWb1mh68CIFg9Ttln7a3dxhUhC5QtFuhH58eisE=; b=bMkN6bIUDl+UMVJNMfZt8MWB0kTzuA2+pNZVC0p6QxN/GxYB62OfWclmNJlOBFBzod9XbC CG+FopgouoebhGAfAEBBIVnb1g6myRqTLtrGRQyy+oI8Eqtf5cYPRhyvnPnK3Y1V0g2vvB 2riQUyBS6Kwn4rUhbnTBUCeoD/SSRMQ= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Vlastimil Babka Cc: Yafang Shao , Harry Yoo , Kees Cook , joel.granados@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Josef Bacik , linux-mm@kvack.org, Michal Hocko Subject: Re: [PATCH] proc: Avoid costly high-order page allocations when reading proc files Message-ID: References: <20250401073046.51121-1-laoar.shao@gmail.com> <3315D21B-0772-4312-BCFB-402F408B0EF6@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: BACF880014 X-Stat-Signature: h4mbdtiw3qhrra4uhhj7zhd89epwojx1 X-HE-Tag: 1743618346-161661 X-HE-Meta: U2FsdGVkX19O495CZIJN6m4i3zNkBPJaAa+E50zY7Wjs99yH1GLAkC0cjCaDFxmI9I+BX762HXlqHlGNT1mXfY1D3w4FWc+sc5f/HitEtZAJBCJ5L3mNGoafd7Wkn+q9xsCep48fzai6IxA/BiaxcCYZWGe7ALRrrcizbOmGBXvDn+/DmWMxw1bYMoA91c+6kk3NT7JQSBRxKEw5urtxe3/rZeqD5/8Ox+cVNe38JNfLFnXuLyh8W1bvpLjnWozsP9XK+tHqFPK+MKWQ1tF0fzquKd2s2txPGYT9vklxjJI/TDawkJKTUsrLUuGVWPJMUJNDO/YC20+xJNvhw5IV4cXPVVW2l7sM60q1T4mRumS6HhU2VQZoMyp+lK+TrSY46C1QHpPLbtGXFF1ezR6OP/dRfnfKJo5hoRVbLzNKOwygvjjOMJr4tXV6vnebmMkkyzfHmVvlg7JrHUR4yfAC9+LNURDtEXQR5vR9lOJZSPUGYHwBfvdRD+LBQbX3ogtsviHtAd/vwGpZG92v5qA+V6MtkN2TgmBZzMoGsK6/46DThKkJK3FcS6wseXtLzgMC0fRQBT3q3czJ6y/QKG+FoyZ2EkFpBaOGoOXcngfKs0BQCoXEehKsqQqCygpb2JBfEtFU6OUXcAJmNge3l67szfAsrtOC+yf4N4p1xMBFf3HRVG3l1isucen5LsuH9hVl202R/xfUD4/ngoqeDEQNRSN2QX3BFfKSy/KdHLQyHNHTSSTeTUQCTeMGpaR19mYMlqEQRVAXTWNyBIofjxayfAH20YlHm88lIQB8tRBqh4NJK3rg+7dnB0cqxN7eIv55YHpnvdYI0oDEsbsxRy5DVkbUEELNzTxEkHOVx5sVIp0fI9kUKQ9OJX1tz7Tp5evaaVgf/lioeqYh+DpjLDtepmU4UlsR2aMWDycvpy07hxit1xpPnUkO5w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 02, 2025 at 11:25:12AM +0200, Vlastimil Babka wrote: > On 4/2/25 10:42, Yafang Shao wrote: > > On Wed, Apr 2, 2025 at 12:15 PM Harry Yoo wrote: > >> > >> On Tue, Apr 01, 2025 at 07:01:04AM -0700, Kees Cook wrote: > >> > > >> > > >> > On April 1, 2025 12:30:46 AM PDT, Yafang Shao wrote: > >> > >While investigating a kcompactd 100% CPU utilization issue in production, I > >> > >observed frequent costly high-order (order-6) page allocations triggered by > >> > >proc file reads from monitoring tools. This can be reproduced with a simple > >> > >test case: > >> > > > >> > > fd = open(PROC_FILE, O_RDONLY); > >> > > size = read(fd, buff, 256KB); > >> > > close(fd); > >> > > > >> > >Although we should modify the monitoring tools to use smaller buffer sizes, > >> > >we should also enhance the kernel to prevent these expensive high-order > >> > >allocations. > >> > > > >> > >Signed-off-by: Yafang Shao > >> > >Cc: Josef Bacik > >> > >--- > >> > > fs/proc/proc_sysctl.c | 10 +++++++++- > >> > > 1 file changed, 9 insertions(+), 1 deletion(-) > >> > > > >> > >diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c > >> > >index cc9d74a06ff0..c53ba733bda5 100644 > >> > >--- a/fs/proc/proc_sysctl.c > >> > >+++ b/fs/proc/proc_sysctl.c > >> > >@@ -581,7 +581,15 @@ static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter, > >> > > error = -ENOMEM; > >> > > if (count >= KMALLOC_MAX_SIZE) > >> > > goto out; > >> > >- kbuf = kvzalloc(count + 1, GFP_KERNEL); > >> > >+ > >> > >+ /* > >> > >+ * Use vmalloc if the count is too large to avoid costly high-order page > >> > >+ * allocations. > >> > >+ */ > >> > >+ if (count < (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) > >> > >+ kbuf = kvzalloc(count + 1, GFP_KERNEL); > >> > > >> > Why not move this check into kvmalloc family? > >> > >> Hmm should this check really be in kvmalloc family? > > > > Modifying the existing kvmalloc functions risks performance regressions. > > Could we instead introduce a new variant like vkmalloc() (favoring > > vmalloc over kmalloc) or kvmalloc_costless()? > > We have gfp flags and kmalloc_gfp_adjust() to moderate how aggressive > kmalloc() is before the vmalloc() fallback. It does e.g.: > > if (!(flags & __GFP_RETRY_MAYFAIL)) > flags |= __GFP_NORETRY; > > However if your problem is kcompactd utilization then the kmalloc() attempt > would have to avoid ___GFP_KSWAPD_RECLAIM to avoid waking up kswapd and then > kcompactd. Should we remove the flag for costly orders? Dunno. Agree with the following points (i.e. ad-hoc fixing etc). The above point of removing kswapd reclaim for costly orders need more thought. Will we be hiding some compaction issues by doing so (i.e. no kswapd reclaim for costly orders)? > Ideally the > deferred compaction mechanism would limit the issue in the first place. > > The ad-hoc fixing up of a particular place (/proc files reading) or creating > a new vkmalloc() and then spreading its use as you see other places > triggering the issue seems quite suboptimal to me. > > >> > >> I don't think users would expect kvmalloc() to implictly decide on using > >> vmalloc() without trying kmalloc() first, just because it's a high-order > >> allocation. > >> > > >