From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E700DC369C2 for ; Fri, 18 Apr 2025 03:03:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F34FE2800C8; Thu, 17 Apr 2025 23:03:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EE4A72800C6; Thu, 17 Apr 2025 23:03:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD2282800C8; Thu, 17 Apr 2025 23:03:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BF2BF2800C6 for ; Thu, 17 Apr 2025 23:03:30 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BB7E91608AD for ; Fri, 18 Apr 2025 03:03:30 +0000 (UTC) X-FDA: 83345668980.13.B01CACA Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf08.hostedemail.com (Postfix) with ESMTP id ED572160004 for ; Fri, 18 Apr 2025 03:03:28 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=nc2AN74T; dmarc=none; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744945409; a=rsa-sha256; cv=none; b=zVB8MUm9vqFmDO5dAO3yuQt33BWglc2kVzgjwjgO+OEwJIAHPG+/3HFsRELE5rHu2VUEns 0xAZlF09AAemCau5zRJ0FVbGiSxuutuXUJvy/6OOsCCFLGdfesWR3JOwmVbkoFAIVLE6+U IgXRjF8gOADs/LCaXnrmRd+dyORVZM0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=nc2AN74T; dmarc=none; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744945409; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jpkxpN6LPeNkJPmScuBy7o9JtRkfcs5nZVkcWCsdbhE=; b=oz4Pnf+DNBDgu4bBLbTFlPOoNWBCxPMk4vNBIZXJnLsRUs48uGMlUoHUNG9+IeLRYQIG0L zfPkaOXkdWAf8PaaxHOrI52xBXC+0AA2VNl+/TRJpYVK3kgVx08ITCuOXGS8KuodKoxCg/ pEe45Sd2FHhaoOtnhKC4AoRdhxdL8jw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id B9B5B437ED; Fri, 18 Apr 2025 03:03:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97CE3C4CEE4; Fri, 18 Apr 2025 03:03:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1744945407; bh=6ojsT+zx7RXzbBlsRuUndoxu1B7BSTUSHZ0fPpXHzhk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nc2AN74TWKe4ttGcNWkteXd4XuL7ao1+z/MlbYDwJwfd83If7+y/teGIkzeAgGfq4 Sjww+ATqQQKunGwvXHa+Cx1roMcnB2PpFd3436pJGjmE+wtmsp85mlJM82GCHOEzzv 21eiBCN2WaLPuCuMsJ9uS31P8uERggEBRQrQ6Prk= Date: Thu, 17 Apr 2025 20:03:27 -0700 From: Andrew Morton To: Wupeng Ma Cc: , , , , , Subject: Re: [PATCH] mm: hugetlb: Fix incorrect fallback for subpool Message-Id: <20250417200327.ef9d1aed59e198aa2c8b046e@linux-foundation.org> In-Reply-To: <20250410062633.3102457-1-mawupeng1@huawei.com> References: <20250410062633.3102457-1-mawupeng1@huawei.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: ED572160004 X-Stat-Signature: n4msoc7pfmfgn4hhnqbtw47c83y4cc5i X-Rspam-User: X-HE-Tag: 1744945408-628235 X-HE-Meta: U2FsdGVkX19qJqg1M1tXadTptWQ0PQpZ8lvCyhQi8dl5xYlnxJ5GcGEWBnXnqX1ATcQ8iKZp6Gr7OVzOlhvyD55jiHpyytAwpLKDBprYwsbDlB9Hd6ijVDc8/qpTAsfBTAiYwdHwX3Sd0f386d1z85FyZBa9IE3I63062SAAwaeixa//lL7DUynjo5e0TO4cDPJAYgsqxIrFZLHEVlyHMzCyqcWkQU393N3BCtE6Pt/qnuu/cqnn+Iw4XzebcXWe2FyjFOR6vY69CmmA7tocId7ER9DNaKE9aHxjcwjIpWCjLKXsVGB6QYf++ijeeFugziuxdfKl8epkv/A7Q8TSEOx/jrmxG7HtJASVpUidwRoNkrrRvxyHi7dcqHLKpyZcCIFTVCPT89GyBCfah7nI0bLyN1m+BnGXWUmc89y/nr9g/mXEqeaROtvTVnYCb8IdvgcMwa+xpPCX40vN50eqgdnpZfeJkDdpKl5Flr35x5uKuG4PmNTOn1fXMYk4q3+MfkBEQIVlQRr/ei6DcZdv/Ax4Bx8xL4GVO0b0ZI1wc94P+tXFnYRS1qyuarn0BG+r9fAn4946tFxz8VhT7dI7d2RylGwR6LYXmbml7CRvTsauZb+jXLTEDSvbLubryWiIMen2LwbC4isf4ukZBF98MRDCdJZo7GZBrTubbkPftr8YdkXxrdEG/YaQw+5gVL0Hb82iY1x9QX3/rmxbHMJxqPbhOTZfqSI1hisXW3qgd4LKApfXuSZuO4iW751qGKAK8xb5wLGu8A5zgL5SchYzTY8diXzXbzKzWczCJlqg215DAcsXDYgMIsmA+ZQrcj+9XjprPQ6bWUM6fnwnk8gbWDwbf6llMSX4kAkmPaTlzCPp6LteGaWrB+OjCAaZytI+J1X40C99y3a72Tm8t4U0PdHZ34EMRxL8OQ4f+RXuPyzhKyLVgI59kt1GYhZORiuB+FSmcv0CY+tbIBLfYbF /b66ILi/ Pg9GaHkGNmL0BY+LPgccpCQAEsi3ft3QFjbJcrxqckWHAYMpgV2NabHgAyHmj6pLMqWoQJZ5uABx8UOXmSkvF6wcaK5R2L+jeMjTgd4Mks1zSNU1SKHwvfwZ2TywPXAdpKLTDluQo6dZ36n+INLF6byvmUWChQU5mzmqW794BvzLAEAjvARucjGqYsoxof2MuEKnVPoKh24GrI2idL5p1nSGH2OQBIf+hRphGmn7uhXvxIY23VuXTOZoXHYCY/ToOGY/jX5tsFE3XO/EfSMOt6FuzSbzsWdxQv90HNakupb3cOT1jfDl7Dm6PJCAo9eXjlgPytjQCdtv8tEmdlE6IEceyrce8TlQYjXQsiFi+iLZKfeJFtGzgebm9yZuEUdy4EE8X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 10 Apr 2025 14:26:33 +0800 Wupeng Ma wrote: > During our testing with hugetlb subpool enabled, we observe that > hstate->resv_huge_pages may underflow into negative values. Root cause > analysis reveals a race condition in subpool reservation fallback handling > as follow: > > hugetlb_reserve_pages() > /* Attempt subpool reservation */ > gbl_reserve = hugepage_subpool_get_pages(spool, chg); > > /* Global reservation may fail after subpool allocation */ > if (hugetlb_acct_memory(h, gbl_reserve) < 0) > goto out_put_pages; > > out_put_pages: > /* This incorrectly restores reservation to subpool */ > hugepage_subpool_put_pages(spool, chg); > > When hugetlb_acct_memory() fails after subpool allocation, the current > implementation over-commits subpool reservations by returning the full > 'chg' value instead of the actual allocated 'gbl_reserve' amount. This > discrepancy propagates to global reservations during subsequent releases, > eventually causing resv_huge_pages underflow. > > This problem can be trigger easily with the following steps: > 1. reverse hugepage for hugeltb allocation > 2. mount hugetlbfs with min_size to enable hugetlb subpool > 3. alloc hugepages with two task(make sure the second will fail due to > insufficient amount of hugepages) > 4. with for a few seconds and repeat step 3 which will make > hstate->resv_huge_pages to go below zero. > > To fix this problem, return corrent amount of pages to subpool during the > fallback after hugepage_subpool_get_pages is called. > This has been in mm-hotfixes since April 1. Do we have any reviwers? > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3010,7 +3010,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > struct hugepage_subpool *spool = subpool_vma(vma); > struct hstate *h = hstate_vma(vma); > struct folio *folio; > - long retval, gbl_chg; > + long retval, gbl_chg, gbl_reserve; > map_chg_state map_chg; > int ret, idx; > struct hugetlb_cgroup *h_cg = NULL; > @@ -3163,8 +3163,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), > h_cg); > out_subpool_put: > - if (map_chg) > - hugepage_subpool_put_pages(spool, 1); > + /* > + * put page to subpool iff the quota of subpool's rsv_hpages is used > + * during hugepage_subpool_get_pages. > + */ > + if (map_chg && !gbl_chg) { > + gbl_reserve = hugepage_subpool_put_pages(spool, 1); > + hugetlb_acct_memory(h, -gbl_reserve); > + } > + > + > out_end_reservation: > if (map_chg != MAP_CHG_ENFORCED) > vma_end_reservation(h, vma, addr); > @@ -7216,7 +7224,7 @@ bool hugetlb_reserve_pages(struct inode *inode, > struct vm_area_struct *vma, > vm_flags_t vm_flags) > { > - long chg = -1, add = -1; > + long chg = -1, add = -1, spool_resv, gbl_resv; > struct hstate *h = hstate_inode(inode); > struct hugepage_subpool *spool = subpool_inode(inode); > struct resv_map *resv_map; > @@ -7351,8 +7359,16 @@ bool hugetlb_reserve_pages(struct inode *inode, > return true; > > out_put_pages: > - /* put back original number of pages, chg */ > - (void)hugepage_subpool_put_pages(spool, chg); > + spool_resv = chg - gbl_reserve; > + if (spool_resv) { > + /* put sub pool's reservation back, chg - gbl_reserve */ > + gbl_resv = hugepage_subpool_put_pages(spool, spool_resv); > + /* > + * subpool's reserved pages can not be put back due to race, > + * return to hstate. > + */ > + hugetlb_acct_memory(h, -gbl_resv); > + } > out_uncharge_cgroup: > hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), > chg * pages_per_huge_page(h), h_cg); > -- > 2.43.0 >