From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63B01C28B20 for ; Wed, 2 Apr 2025 09:25:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C290F280005; Wed, 2 Apr 2025 05:25:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB172280001; Wed, 2 Apr 2025 05:25:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2B52280005; Wed, 2 Apr 2025 05:25:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 810A1280001 for ; Wed, 2 Apr 2025 05:25:17 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9CBC91214FC for ; Wed, 2 Apr 2025 09:25:16 +0000 (UTC) X-FDA: 83288570232.22.92E2D7A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf16.hostedemail.com (Postfix) with ESMTP id 302EC180004 for ; Wed, 2 Apr 2025 09:25:13 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=wsoYIm1U; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6/KP1Z87"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=wsoYIm1U; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6/KP1Z87"; dmarc=none; spf=pass (imf16.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743585914; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FKPVfWE/EtWkBYJnIZW8Og7SyLNPhkZ6aWrFs7Q5i6Y=; b=BS83nJHxTmoQPkqY2qLLvf6U0vO4g8yLSYl0wWsthevRdFtk2Xus5RNj56zZp5Temq2orM YNG0VXzw+ALUdxQjbj0U/Y4jDqqbeyPC/N69q8tGeu3xm/9ftZ4ljNl5VbJI6C3llZxsF6 EJoRNwFR+tyviJzS0rvjsECED+TkBW4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743585914; a=rsa-sha256; cv=none; b=dzCpUMnzhssvT4x1ez04TWexg35GtQ5NU2NYbWOBYtiTPLPI9XEVSPR9BiPWgrvriEe5+T gZpAeaTkqd94H6LexGWEQAKEi33ITRK88Sa5e38TTg3wGWY35/8b+EMj0JCuMo0PUYeBqJ rI72ZuEbMebyP2KFlTbUwrlfSbuLnK8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=wsoYIm1U; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6/KP1Z87"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=wsoYIm1U; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6/KP1Z87"; dmarc=none; spf=pass (imf16.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 965D02116E; Wed, 2 Apr 2025 09:25:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1743585912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FKPVfWE/EtWkBYJnIZW8Og7SyLNPhkZ6aWrFs7Q5i6Y=; b=wsoYIm1UbJrwwC6eaFG/G+XndD/MWydcnJHzOHdAOImtLMaYhUKBCB75CjOFrL9M/nk+LJ RKPF0ZUqEu3yqe38HREw4E+tuR0TEryAfW2FiBhmjLa0hesfB0MFYZPQSE7e0cz+R9Gx3A qbIk+1U79HREdoA+slI89DooRBmDdfQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1743585912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FKPVfWE/EtWkBYJnIZW8Og7SyLNPhkZ6aWrFs7Q5i6Y=; b=6/KP1Z87z+VdtNhc9UA31z/z0XkptMc87l33l5UWKkCBLa5pDN3zoD01mlF23DMmnX/AMr SFdZWKhcpACcPKCA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1743585912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FKPVfWE/EtWkBYJnIZW8Og7SyLNPhkZ6aWrFs7Q5i6Y=; b=wsoYIm1UbJrwwC6eaFG/G+XndD/MWydcnJHzOHdAOImtLMaYhUKBCB75CjOFrL9M/nk+LJ RKPF0ZUqEu3yqe38HREw4E+tuR0TEryAfW2FiBhmjLa0hesfB0MFYZPQSE7e0cz+R9Gx3A qbIk+1U79HREdoA+slI89DooRBmDdfQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1743585912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FKPVfWE/EtWkBYJnIZW8Og7SyLNPhkZ6aWrFs7Q5i6Y=; b=6/KP1Z87z+VdtNhc9UA31z/z0XkptMc87l33l5UWKkCBLa5pDN3zoD01mlF23DMmnX/AMr SFdZWKhcpACcPKCA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 7B3DE13A4B; Wed, 2 Apr 2025 09:25:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id DYTJHXgC7WfLQwAAD6G6ig (envelope-from ); Wed, 02 Apr 2025 09:25:12 +0000 Message-ID: Date: Wed, 2 Apr 2025 11:25:12 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] proc: Avoid costly high-order page allocations when reading proc files Content-Language: en-US To: Yafang Shao , Harry Yoo Cc: Kees Cook , joel.granados@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Josef Bacik , linux-mm@kvack.org, Michal Hocko References: <20250401073046.51121-1-laoar.shao@gmail.com> <3315D21B-0772-4312-BCFB-402F408B0EF6@kernel.org> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 302EC180004 X-Stat-Signature: 4gija3gdazm1tjbdr9ojuyzmxqaigwdj X-Rspam-User: X-HE-Tag: 1743585913-75506 X-HE-Meta: U2FsdGVkX18y8WxwBAFpoktZctlPAfIJNxChi7ex4hEEUJV35FHtuEGlSqhNttwv2pffZVkTTL8CaFo39wgjBfci8hi46/m+rOFGVl8XVlJa23Am+P39TkkRCdxE68Zqkb3UIH6IwuLNzu3JtogpVjXcB2H+g/uuOkwi95kQrLJl+tXdJooPm1O/7WTkvGfFX6feXs2OJM8+JDQ8YSadtMO0nar6utsiTMOOaeUlBGlaseakumEp1SH5mjbN/C4Hiuhdi1KCb17lQSyClIkKweZ/2s8N+TMWrWneNIPctwErHcx6bX06Rd2/XMU7PsYq7XwccXWZLqRnhaFCFaCycay96+BrFniMmRfcmWv8RSmsjM5xIRuqdDZdoHLkqDMaNNpwNuc1ZYR1FTSQP606Gjl9kxV0DXy7ahfVW5MU7kHm6OrVNAs4QBLYqymw22IO5sWp8O/EsUD12Gh5y9+x2sTFDHPks80P1v9Lvyv4//3c1yjXAHVfZVOrDHR2CDCCjhJONbnFjZMVdzjEGO/eIyBw+97YXrJiKXqINKpfylM/giZ7UeMMhIVR78CQvc9jvih0itrAQvpTU6p3Kjqh1uvEnQ2nRyqcgCSKOJEfc3Og1AqfpDlHd17zcwpqGgAAh47A6VOXk71U6pXu2KL9Thd4msrmTG0J1Uxwtkf+1zFNvcBS9s6e6nNwphNoMff6+gutZ9liDB6CHXn9hEah6UzNjZbF42iCirzd+P00XBOG1ujzD9KhBhKlgZz5pme3apFsVK3VpAYdGA2fwT+zs2htEhAKNVQ/LfRBpbL00KNdikIM2vmyOUVUq73CeyhXTyO/OUOeMKs9MQUWym2sXY1dTS52sAxRWDdqKiACLNClDZ7Ctz8IVqk7/rqL+eYT9aKF2IgNKQTUXXahiYJnyU0AI5w3ywvqEjI9mqS+FTSZUTOzhXkuzNWNMOhxmW7A8ZA7zKVCuHKcpuBf9dw heR0Uex8 mVp5cMaCXQ7F48DwWbkwnYvJKxq4RhefyerDQXVzgyxtUbaJO+FIDqtk5qgiC/poFpjKNLGmvUC/uvvFFzIXr2bjSH6+IUwkLxdg/9o+iu5ay7P7v6o4W4Dc3M8LKpHMVQ4KrsvjgGEimEpLGwRimtStFVBs24JzRQK6vONMtdLHB9eepeIhltQIBSoVtOptfh/L1xR9Ty+rKrltKMARTC8xNtQbDveNPdQtI0q/2w+3EHAdiRVnT941y5eXrfHcek5xpm+xPZTBmVfeeLjMTAb3WlMIS+Le7SbGv17Ft4LRhX9AAClqKJFZqaktV2OWooYsZ2EUBGRXMExDF9OnzA0TwJXFxpQFcRQh4i+984uskxNCi22z/ueOd5VcBW8e79IYQVxAOK25qVd5a42qF50ZbZqXK4bO2+Vjy9R+uZ4l1/EhgBQqYfQWs72uHciOTmBn9pAvVvbJFSfOl4yNJFMNheJIE7RkQTuwhzbm8ZbNbvw8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/2/25 10:42, Yafang Shao wrote: > On Wed, Apr 2, 2025 at 12:15 PM Harry Yoo wrote: >> >> On Tue, Apr 01, 2025 at 07:01:04AM -0700, Kees Cook wrote: >> > >> > >> > On April 1, 2025 12:30:46 AM PDT, Yafang Shao wrote: >> > >While investigating a kcompactd 100% CPU utilization issue in production, I >> > >observed frequent costly high-order (order-6) page allocations triggered by >> > >proc file reads from monitoring tools. This can be reproduced with a simple >> > >test case: >> > > >> > > fd = open(PROC_FILE, O_RDONLY); >> > > size = read(fd, buff, 256KB); >> > > close(fd); >> > > >> > >Although we should modify the monitoring tools to use smaller buffer sizes, >> > >we should also enhance the kernel to prevent these expensive high-order >> > >allocations. >> > > >> > >Signed-off-by: Yafang Shao >> > >Cc: Josef Bacik >> > >--- >> > > fs/proc/proc_sysctl.c | 10 +++++++++- >> > > 1 file changed, 9 insertions(+), 1 deletion(-) >> > > >> > >diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c >> > >index cc9d74a06ff0..c53ba733bda5 100644 >> > >--- a/fs/proc/proc_sysctl.c >> > >+++ b/fs/proc/proc_sysctl.c >> > >@@ -581,7 +581,15 @@ static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter, >> > > error = -ENOMEM; >> > > if (count >= KMALLOC_MAX_SIZE) >> > > goto out; >> > >- kbuf = kvzalloc(count + 1, GFP_KERNEL); >> > >+ >> > >+ /* >> > >+ * Use vmalloc if the count is too large to avoid costly high-order page >> > >+ * allocations. >> > >+ */ >> > >+ if (count < (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) >> > >+ kbuf = kvzalloc(count + 1, GFP_KERNEL); >> > >> > Why not move this check into kvmalloc family? >> >> Hmm should this check really be in kvmalloc family? > > Modifying the existing kvmalloc functions risks performance regressions. > Could we instead introduce a new variant like vkmalloc() (favoring > vmalloc over kmalloc) or kvmalloc_costless()? We have gfp flags and kmalloc_gfp_adjust() to moderate how aggressive kmalloc() is before the vmalloc() fallback. It does e.g.: if (!(flags & __GFP_RETRY_MAYFAIL)) flags |= __GFP_NORETRY; However if your problem is kcompactd utilization then the kmalloc() attempt would have to avoid ___GFP_KSWAPD_RECLAIM to avoid waking up kswapd and then kcompactd. Should we remove the flag for costly orders? Dunno. Ideally the deferred compaction mechanism would limit the issue in the first place. The ad-hoc fixing up of a particular place (/proc files reading) or creating a new vkmalloc() and then spreading its use as you see other places triggering the issue seems quite suboptimal to me. >> >> I don't think users would expect kvmalloc() to implictly decide on using >> vmalloc() without trying kmalloc() first, just because it's a high-order >> allocation. >> >