From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A93A8C3600C for ; Thu, 3 Apr 2025 03:32:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC1F5280004; Wed, 2 Apr 2025 23:32:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C44F8280001; Wed, 2 Apr 2025 23:32:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABED2280004; Wed, 2 Apr 2025 23:32:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 89278280001 for ; Wed, 2 Apr 2025 23:32:48 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A719C81137 for ; Thu, 3 Apr 2025 03:32:49 +0000 (UTC) X-FDA: 83291310858.10.C94C221 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf23.hostedemail.com (Postfix) with ESMTP id C23AF140007 for ; Thu, 3 Apr 2025 03:32:47 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=msG9kyU6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743651167; a=rsa-sha256; cv=none; b=NMsJCaTB9R3cyKExBOol7zRpnL3I8Ln9ssvUBNbQLpV+qe9vNl3gqGSKemk+et6OORS20q dn+MFx03DWKEAHoHwiEQq48cSJhtjFysweiQI6vxi2pQTixReo6wstv4OCVwzCSEbc+YtN xPH7asqkaZMd0y2D8YJR29jlP5zyZUE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=msG9kyU6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743651167; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xN89JI2VQ7k82pxEYojzTPPzefEph7bPqyCjL2FCwjs=; b=S0ltvV+foAURfjvNNLfHfTvuecGGqwr8MD50xiXh05O67y7kljxdXlmSWqnpE4pkhUD/Q4 iDf7ytvGhMnoiaGfEVYv0GWqWqjbHKdbxpDm/O7BxsR0kJGhF1+S+XyabNR7lyRhgRgQtz 9EZO13jYyxEx04u6IRoqioL3c5KPyZI= Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6e41e18137bso4075626d6.1 for ; Wed, 02 Apr 2025 20:32:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743651167; x=1744255967; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xN89JI2VQ7k82pxEYojzTPPzefEph7bPqyCjL2FCwjs=; b=msG9kyU6PMKGaJ5d/VSpCyDCnYuLAxxvbEVrZ4IFQD3OTU4+bFVjdxFQPB3xhMAl86 ddnvXpYCWyPI5j4PjpnDLzJ6VFp9uY4UuqM7E9EgOzJLpZfTwIw36enq7G8KfOf040xF a68dA6/VV8EnT+Z7imml+CkVuYkeXwnSCLqg7CDwKK0jUdNt27j9zzUhEWNLCcuUbjdb OVk+KdARfolP2trd9PSgi98Hq3UaDvWsrTXAqXsE3tUggWJQrObg/cd76y7Qtoq0+0YB rnvxfrQtDwpeGjdYaidz+V/Hp+5tzKJOClUCtqeizmPnbBGfKYVtEh+hbr+pO+djJlT1 pnmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743651167; x=1744255967; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xN89JI2VQ7k82pxEYojzTPPzefEph7bPqyCjL2FCwjs=; b=lu7Dd7LfLv0gdN3OvG58WzlqM59EqJuV+DRBMBdZ5ciELvGKDLNpuaW2NIycVygCR6 wtNbnfmo+LQAG3XFN5N9JTxU6gc+4QuYvkm0RHGu2g0f1qkaI0vkPzuH6+7tdCLkO+6s Wxd/TbGRo6ZklT6PPfK8bLhTdQ7m+CFzNzPlgiSGIZYPa3qZKOFpnsJKUa/9qkPIw6Va alV4pvXByZGgur6+aPxJirACTN5gHgfoyc+KW/4kzqY9m73QosIaV3q1C2S2I+SkasA7 7pCF8dDIoZBgDL/eppOj3zUlCOhBQaC08l3PucS6Kk9na8749+mrKmUTWOwJ72cBsXqK geXQ== X-Forwarded-Encrypted: i=1; AJvYcCVwRLb2RHAhWkT5d6a230zNZgYkn5d9pCy+jp36tMymAoZzzu5cOi65ygon5SCPhoo8niDqkogiQA==@kvack.org X-Gm-Message-State: AOJu0Ywno6+EB9RnQCa4OjHdGsXGFpAalq9a8HfAW6F0q+eEUXNO1M8b eOyYpebfmfpRzjRzwBgbH7rI/SibVQ3xQf46Esq3UOaDdbGwZJRGHu6kF5ixlLDEUwyJEVtYUaT xo4BfPxyKdR7OyRR0YJso0EgCfQ4= X-Gm-Gg: ASbGncvqZMH/gJC7UZm7+WS7+e78dPUvHj6L65O+fV/VJDpDNGk9Ed/lAZeA/vu2USg 3buSH8HsldMz/12ptM+RoPrlp9d2zNToTT3yObGwU0EkXFtbb0MXzAOwVGVH/oATR99H8m0TOev KK52mHA9NjwbxhEQjJ3ABdaDEq+E8= X-Google-Smtp-Source: AGHT+IFUg4ZAqjvgc5pcOSRgS/+j30sZrK9LRS7T6zrlZ59bfE+yMCiIgfxNFAGq5EqcJ4aApveq8wVzT5JQTS15L2I= X-Received: by 2002:ad4:5ba1:0:b0:6ea:d033:284c with SMTP id 6a1803df08f44-6ef02a84e08mr88985886d6.0.1743651166795; Wed, 02 Apr 2025 20:32:46 -0700 (PDT) MIME-Version: 1.0 References: <20250401073046.51121-1-laoar.shao@gmail.com> <3315D21B-0772-4312-BCFB-402F408B0EF6@kernel.org> In-Reply-To: From: Yafang Shao Date: Thu, 3 Apr 2025 11:32:10 +0800 X-Gm-Features: ATxdqUFkqq208cw4cDch-CknV3k9kx8MlaesikBMQDQvSW8KA9P4N_gPOjl9ANA Message-ID: Subject: Re: [PATCH] proc: Avoid costly high-order page allocations when reading proc files To: Dave Chinner , Uladzislau Rezki , npiggin@gmail.com Cc: Shakeel Butt , Michal Hocko , Harry Yoo , Kees Cook , joel.granados@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Josef Bacik , linux-mm@kvack.org, Vlastimil Babka Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C23AF140007 X-Stat-Signature: 1pfmu14jnd7d7ujdiq79wm45cwuk9zrs X-Rspam-User: X-HE-Tag: 1743651167-868591 X-HE-Meta: U2FsdGVkX18zz5dGpJWUluP6QZM0mAqrLO6WxGx6e2a0KA4VMvha9tm2ieChODOFdXEJqsxuryeohKx8NvW6cndfCb9kCt7uypr8zTom1oYRAusIwfCVOroWX+jexyWiwZ04VuRJFPptSSGCeJMvmksI+jW6//S8HBwqGSeqAzwedvOQqWoEQFzH7RyAI+QQcU/c3Bi1KHIt4+p74hvYWMWzT3kI2hx8I9wLHRAFTi5BEVZtMTHz7HX6hdBq4Ywzm35KTQzXS7+r+dHPBmbFw8oAbxTCsvcVUxnTRL/UGLDgIomqk6/vUTkLyfd2fcI0n+0ZNBjhzhQAtEQxjK0JuLD7gxkqEjzisCm1oHvNlfLeLn9Uueyjx0KdStuDJdf5RGY2rmAYtXPZ3+VzLlXQM1bAxhlEqYow4hyratiOeDr/FBNwc2kLRT8icbXVCQ18ao3tTHGTmVFUbFVYa3BU56GWMiV9oWdnq8SM9/g2M+LLlHbw5nwsKNK68PyKNfiVImnS8a7Q7rrQhxkFDFK7EQYssU0Z1Bv+IjO3MvpQpcOMKeEPTr8eTJIwiYpdMKcjR9H7MdKR5pbuNc0sjZlnMsHLqtUHVFkWI0Q90++07xowPhemlLCbwf2UQNKXRBfpOs8k8G169qpOh8nbaQRuuY190MlG2nGKBswtiX8z7W2ZaZZjHRctBvZhU4KYzyBHrPZJb2Mx0wsacOZPoHhgdMaGdDX5P7h3rm8LkpXq7u8kzl0qz5NaZTKe8dwKrtXcSlvJgI7JB9xXfplLV/qPpfgqSl7IKsq6pX2OaoGGJj1WMXjzampxLH+Z5POKfJ595fyVF1lgOvkMZFoJsYbTD7g18a6xy/TezxPRpxi0Yb+ZM1v40MPcY2/yjeavl5hLhl6LwzCPGb4zA1SUx24arH7c39EkBT3qlqDaBsIII64lEfXok0j3vFAyyVqOeqy+K+rwTlyNBYfl//teq80 j0PGuwm1 LwjWgOthyLYsQnlWgUe9g3MMDEsyNkPAelkJl1TxDn4nyD+RZEKhkaU/zhKMSVSfk4kBu/QqotgsM929fY2eBOv3K8JTyBHewYSH1eaM6L32pfQcPAOmpF0yRezYEjOAsP6HDJ+gUuYGGR0XsgGYGONLTPPkPrF2MyvM6Zker1oqI76NZmWndgqAkspC9TMsjshwcTSDZSJp+PbcPOMwfvoROILOOThYy+utBZxq9nym3rpkZeLp6A3Rm/t2lODSZfwoM3kCW/Ezxb4X61I7ds1bj3xsFxwT/8nnTw4bEVQQTOl04BbRMHrndunG4E9po0Fbi1QWGlX3OWuWbkvqUKCnXduiPqh6xk2lfuBGEgNeD4MmRb3VZTNeiGAj6FOLw9VqNzX0x59kxJ7RzxzxmIwIx9Lkt3sohqChc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 3, 2025 at 9:22=E2=80=AFAM Dave Chinner w= rote: > > On Wed, Apr 02, 2025 at 04:10:06PM -0700, Shakeel Butt wrote: > > On Thu, Apr 03, 2025 at 08:16:56AM +1100, Dave Chinner wrote: > > > On Wed, Apr 02, 2025 at 02:24:45PM +0200, Michal Hocko wrote: > > > > On Wed 02-04-25 22:32:14, Dave Chinner wrote: > > > > > Have a look at xlog_kvmalloc() in XFS. It implements a basic > > > > > fast-fail, no retry high order kmalloc before it falls back to > > > > > vmalloc by turning off direct reclaim for the kmalloc() call. > > > > > Hence if the there isn't a high-order page on the free lists read= y > > > > > to allocate, it falls back to vmalloc() immediately. > > > > > > > > > > For XFS, using xlog_kvmalloc() reduced the high-order per-allocat= ion > > > > > overhead by around 80% when compared to a standard kvmalloc() > > > > > call. Numbers and profiles were documented in the commit message > > > > > (reproduced in whole below)... > > > > > > > > Btw. it would be really great to have such concerns to be posted to= the > > > > linux-mm ML so that we are aware of that. > > > > > > I have brought it up in the past, along with all the other kvmalloc > > > API problems that are mentioned in that commit message. > > > Unfortunately, discussion focus always ended up on calling context > > > and API flags (e.g. whether stuff like GFP_NOFS should be supported > > > or not) no the fast-fail-then-no-fail behaviour we need. > > > > > > Yes, these discussions have resulted in API changes that support > > > some new subset of gfp flags, but the performance issues have never > > > been addressed... > > > > > > > kvmalloc currently doesn't support GFP_NOWAIT semantic but it does = allow > > > > to express - I prefer SLAB allocator over vmalloc. > > > > > > The conditional use of __GFP_NORETRY for the kmalloc call is broken > > > if we try to use __GFP_NOFAIL with kvmalloc() - this causes the gfp > > > mask to hold __GFP_NOFAIL | __GFP_NORETRY.... > > > > > > We have a hard requirement for xlog_kvmalloc() to provide > > > __GFP_NOFAIL semantics. > > > > > > IOWs, we need kvmalloc() to support kmalloc(GFP_NOWAIT) for > > > performance with fallback to vmalloc(__GFP_NOFAIL) for > > > correctness... > > > > Are you asking the above kvmalloc() semantics just for xfs or for all > > the users of kvmalloc() api? > > I'm suggesting that fast-fail should be the default behaviour for > everyone. > > If you look at __vmalloc() internals, you'll see that it turns off > __GFP_NOFAIL for high order allocations because "reclaim is too > costly and it's far cheaper to fall back to order-0 pages". This behavior was introduced in commit 7de8728f55ff ("mm: vmalloc: refactor vm_area_alloc_pages()") and only applies when HAVE_ARCH_HUGE_VMALLOC is enabled (added in commit 121e6f3258fe, "mm/vmalloc: hugepage vmalloc mappings"). Instead of disabling __GFP_NOFAIL for hugevmalloc allocations, perhaps we could simply enforce "vmap_allow_huge=3D false" when __GFP_NOFAIL is specified. Or we could ... > > That's pretty much exactly what we are doing with xlog_kvmalloc(), > and what I'm suggesting that kvmalloc should be doing by default. > > i.e. If it's necessary for mm internal implementations to avoid > high-order reclaim when there is a faster order-0 allocation > fallback path available for performance reasons, then we should be > using that same behaviour anywhere optimisitic high-order allocation > is used as an optimisation for those same performance reasons. > > The overall __GFP_NOFAIL requirement is something XFS needs, but it > is most definitely not something that should be enabled by default. > However, it needs to work with kvmalloc(), and it is not possible to > do so right now. 1. Introduce a new vmalloc() flag to explicitly disable hugepage mappings when needed (e.g., for __GFP_NOFAIL cases). 2. Extend kvmalloc() with finer control by allowing separate GFP flags for kmalloc and vmalloc, plus an option to disable hugevmalloc: kvmalloc(size_t size, gfp_t kmalloc_flags, gfp_t vmalloc_flags, bool allow_hugevmalloc); Then we can replace the xlog_cil_kvmalloc() with: kvmalloc(size, GFP_NOWAIT, __GFP_NOFAIL, false); This is just a preliminary idea... --=20 Regards Yafang