From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B1CDC71136 for ; Thu, 12 Jun 2025 00:54:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5898E6B007B; Wed, 11 Jun 2025 20:54:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53A1E6B0088; Wed, 11 Jun 2025 20:54:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4510F6B0089; Wed, 11 Jun 2025 20:54:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 262446B007B for ; Wed, 11 Jun 2025 20:54:54 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B1B581006A0 for ; Thu, 12 Jun 2025 00:54:53 +0000 (UTC) X-FDA: 83544928866.09.C232BCD Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf17.hostedemail.com (Postfix) with ESMTP id C0ACF40009 for ; Thu, 12 Jun 2025 00:54:51 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RKJLnHxH; spf=pass (imf17.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749689691; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2fUUqs/XpYTeiuvaMR/iruzFVVmJmbYnkY1yHeBufYc=; b=RArfkkQXK5FWX7xwcmzfGfFXjVV/0ADKbSTltqLxyH3ROpO/rqsSsnsxtDSNxQLnT/kUBH QmO6yrTjrcnXWp5VmcRf4ZektGfM9AxpOQiLUvdZtj51b8OML1E5s7HdPWQeM3zD8w4PCg rMvQDZyw8fL3yxGs8W0Wyu4+yFlGsas= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RKJLnHxH; spf=pass (imf17.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749689691; a=rsa-sha256; cv=none; b=Cwl8F8Jwt1+M4oLCxiXI0xyhlrEXmPX7RMMGX8LQ4JoIYvoWipN4v++KFcQ07ByxQrY7SP V07/Y3H1eAvPqH2YwRWJAJ7L94815qPxQzpSFwgvISJWe79+IeIPWb2e9igUO2JKp48RqQ CPjgt0YIsRTOh/X3qJTLraxXIvKKZmk= Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-710fe491842so2877637b3.0 for ; Wed, 11 Jun 2025 17:54:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749689691; x=1750294491; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2fUUqs/XpYTeiuvaMR/iruzFVVmJmbYnkY1yHeBufYc=; b=RKJLnHxHFid7Nde/9icllxawV5WT9UWK360djW+7YKJ9jlFNHE83XvIY0Qp+75GUp3 sI0WlbOqX1SBFaJGmtyoa8SSfIWNo69G1Z0ma+K1LeRU8WHJYuqTAUKqAvNLyBOSJRQM cbrJpHepAVhFxzQkmKcTdDJtVl1jqAjcZ7lwyr+y7iU1WswoSunb6tq+0osGRLyiHy5V 1vviXGrioM8hFjfRoHEiafD+AFBOTpnxgkPEbhaQJt6Ml1aer7qq0RXOFctNCNpdWa3F 2oXXtlHBuGoopdi5M/bQJgT7BkJaJJXuFNbVsAWDzCm/uw+Nb1Du+3wvhvg2mOdFoOwt PGOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749689691; x=1750294491; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2fUUqs/XpYTeiuvaMR/iruzFVVmJmbYnkY1yHeBufYc=; b=KKVfCyqqU0+tBC9IwgOP+5ay9VNrBGfO37LLBs0nuwhTac6cN5zLdfHmYiuAKM8Cy1 y/47oNfBlP5VyI8JWJ7xdQunim5uEoE2xASaCuMoL73zDvpxk8jnfX4S1mFxswrHFzaU NiroBTHWHVTMNuKMOVBL3/Lx8f1Du8dVJE2m4cq9BgN0ans1xbQvhaE0WWWy3TLKP7rx 7byHSwQGUgRmn1eC/vA7BlBMsCC20KrdLTR8bK3tDuQJ/ZPOpXBz8eveALvOf3zMspPe +OOVOMw60kwK7jvYCQ0xe7SWvUGuPvxoOmLzSW51OLL4DtZB3bARLfMSjMTc1tU+tUYq e1UQ== X-Forwarded-Encrypted: i=1; AJvYcCUI1UJXb4XXTITLW4vNTl4L+ptVJ9RbbwR6kG0wVdYK6TN3IgCSH86tPhKxYMEpB1F3FIpfQcB/Rw==@kvack.org X-Gm-Message-State: AOJu0YxbVq+o40QkRs2TGbWlh+JZ0bS0nZHgbc3+Y0SJMTkw3iQozSam +7RjFq6PSK49/7rFRwKcIBY3JDgb3cmFS7iF+kAhpheDrZxc8w/BN+n0 X-Gm-Gg: ASbGncsBwsEfbtb1qQy/h+nKORJXsJtg/unAkHKLPXCTfzWgTWrek09PKe2EX1m0j2X 5AwoBFzT38Tl84JRwO4Wn86DOfYlcavlest/FywTmU5BJoPjNfd7RXAbuDyszTxcNITU/mr2gvV J+AEivck3vcWe++HIh103eUDwgoODnsWAUPkCJDVSLYbv4sbSv7kVHmtaWBvxBpotK8OSFpMi0a v8FoIITmhGkbbBumnw3G6StCk4xA6ycxGrTXVA9qj49ornlqJzTOwo2+x5WVYqb5plla8VpFY00 cD6R/iSKjloWLqU/X066Y3ITQ+BH7qM+WeY3iadPdNjCtFcPxp1WEHZrG2Z3NA== X-Google-Smtp-Source: AGHT+IG5NLQFvPtZw/QWvMDoBBWNPKGwX6hYI+y6Hv1yv8CtXApA4KCOUhXbnZyc0GUJi+Y74mRIAg== X-Received: by 2002:a05:690c:7406:b0:6fb:b8a1:d3bb with SMTP id 00721157ae682-71140ad37f4mr79746807b3.17.1749689690726; Wed, 11 Jun 2025 17:54:50 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:43::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7115208a300sm976267b3.34.2025.06.11.17.54.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Jun 2025 17:54:50 -0700 (PDT) From: Joshua Hahn To: Ackerley Tng Cc: mawupeng1@huawei.com, akpm@linux-foundation.org, mike.kravetz@oracle.com, david@redhat.com, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: Re: [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool Date: Wed, 11 Jun 2025 17:54:41 -0700 Message-ID: <20250612005448.571615-1-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C0ACF40009 X-Stat-Signature: dum43urwyc19iajr1omum96tkxxr37if X-Rspam-User: X-HE-Tag: 1749689691-102097 X-HE-Meta: U2FsdGVkX18QYVQBPT2+A2uBIPdRJYkGPqyi4Yv5Q1XJNXK7dI2CFVJo/V7wnVvsgAyC2767UYzb0XFY8GQQbojhxY4xEHU+dJTJ9m+27o/J4HkphkDoixtLYmzbMVGn2Clo+Ip56EfqpAF+JheAdK/CRDGIE2VVi1fOxVPS6WbYDhTdCIbEVdwPJMVJGGfAXVGeJEk89iTkLyNcdwMzQAH5c6O82hPEQ2bnS2iLAlFguYS20Q7eAATLzBvrivtcl2KmWxHR9N6U1DPIdyFFulHNA1Mq8xEGFxOUjMGvotyBiq8siIGCdKXDPAvNhw1FvSzG3T6pQ3fzpcWm7QfreRDwGYEfIVycFWf4+0w0XsF0aQQ88b+xKjEK/PtVJxKlz448ydOyhTBGexEKVK1RjNBVwsnSrWOGjSO3VFfF7nV3k0Vp1tMg2OUFeB3+WC2bTwFh1k4vUxwU6m9wg3fXkThc0ynPfyqbxMsqIN0SuPDqrjKMRlTftNFpA70gqaUUwCFRIHAy7AoOyvnIA6ENO566fggpyALP4gS/BXy1ew0mSPMLvuLczfcScwynWiGI5t/vzF8elUomnwjAXgWHAsF+Zz44mwDcsWiYHCD6TH6tbH1WqYmkPjjMIdPB/rPjgFjg0pB++TkH2YuBFAQNPf/T7HpGn+ZUhJqwmJIRG2aPXJlZ7jsVZmApY52NtCCfi8jd5sD9LRVUJsnZ1VJpkWkh88NUC3WsxHuceESXp/b4H5KiQrw6y18Cw1JEAYddjgRL6LFmbfXlqAAt8aED/br0C0j18n4D13nBOzl4A5G6l2P55D8OaOxzKh2ilz0VLqHbTVAgfH3/8sxoYm/cuKAYYVvTVywqbXmXAdaeIhLOE4tSckEFDefc5PqmFLQZcyy9fuBBi/D1vQSR2lQmyF/F1WOICpdErwzz+tHof0yAQl6VTyDdYdWCUCBZ8HCLEckuEOaDp2kYezK/xfD 47pXDSbm 5DMjVV7gzKheKr1xCpy6Loft4Pne4Eim020kISXZOMd2+f7/WKuplP7XWhjFUSXZ+hfoTXRVZHcVMeUCq9ZgwBOKAPzVyMx5LxWdSabjvoUmYb/ZSdiTJFSsoGZXWHN5ALE2h6RfElqCfaVztLfo3SUSKF+gquSIVzAqNem6eqIlbw+bhCUafSMIwnTsHABhRB0H/tqXteORIM1qAa8Og4fuv4Oej2c+1Wbyt6y/lsdbrdlHy8qoCOd4LLRQC7sn+bZRnwBvTVHYUhlPP+ST5sktypKZSF1BIc0Y4bzW+EIaWFnCJaoLneytLpxStYMwqNEC5yaEuEajNPBxp5ZEro5m1pi5M0muh0T187OcyClqnmS165XEd16mEgBqP2hrW6J466wmHC5voUgqnnq/iApf4V8+tyZJPWEpWCXeMF+DiOZ5mOuckX1MID9c8gXCBi1DGtIcDj6ozSC2vvptdl8bdoAM3/T6BWx28trk+57JK38uOZRsi3YK+32/VIwMhKpOQEqfdVs7tQZvQnrjh2oKU+A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 11 Jun 2025 08:55:30 -0700 Ackerley Tng wrote: > Joshua Hahn writes: > > > On Tue, 25 Mar 2025 14:16:34 +0800 Wupeng Ma wrote: > > > >> During our testing with hugetlb subpool enabled, we observe that > >> hstate->resv_huge_pages may underflow into negative values. Root cause > >> analysis reveals a race condition in subpool reservation fallback handling > >> as follow: > >> > >> hugetlb_reserve_pages() > >> /* Attempt subpool reservation */ > >> gbl_reserve = hugepage_subpool_get_pages(spool, chg); > >> > >> /* Global reservation may fail after subpool allocation */ > >> if (hugetlb_acct_memory(h, gbl_reserve) < 0) > >> goto out_put_pages; > >> > >> out_put_pages: > >> /* This incorrectly restores reservation to subpool */ > >> hugepage_subpool_put_pages(spool, chg); > >> > >> When hugetlb_acct_memory() fails after subpool allocation, the current > >> implementation over-commits subpool reservations by returning the full > >> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This > >> discrepancy propagates to global reservations during subsequent releases, > >> eventually causing resv_huge_pages underflow. > >> > >> This problem can be trigger easily with the following steps: > >> 1. reverse hugepage for hugeltb allocation > >> 2. mount hugetlbfs with min_size to enable hugetlb subpool > >> 3. alloc hugepages with two task(make sure the second will fail due to > >> insufficient amount of hugepages) > >> 4. with for a few seconds and repeat step 3 which will make > >> hstate->resv_huge_pages to go below zero. > >> > >> To fix this problem, return corrent amount of pages to subpool during the > >> fallback after hugepage_subpool_get_pages is called. > >> > >> Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools") > >> Signed-off-by: Wupeng Ma > > > > Hi Wupeng, > > Thank you for the fix! This is a problem that we've also seen happen in > > our fleet at Meta. I was able to recreate the issue that you mentioned -- to > > explicitly lay down the steps I used: > > > > 1. echo 1 > /proc/sys/vm/nr_hugepages > > 2. mkdir /mnt/hugetlb-pool > > 3.mount -t hugetlbfs -o min_size=2M none /mnt/hugetlb-pool > > 4. (./get_hugepage &) && (./get_hugepage &) > > # get_hugepage just opens a file in /mnt/hugetlb-pool and mmaps 2M into it. > > Hi Joshua, > > Would you be able to share the source for ./get_hugepage? I'm trying to > reproduce this too. > > Does ./get_hugepage just mmap and then spin in an infinite loop? > > Do you have to somehow limit allocation of surplus HugeTLB pages from > the buddy allocator? > > Thanks! Hi Ackerley, The script I used for get_hugepage is very simple : -) No need to even spin infinitely! I just make a file descriptor, ftruncate it to 2M, and mmap into it. For good measure I set addr[0] = '.', sleep for 1 second, and then munmap the area afterwards. Here is a simplified version of the script (no error handling): int fd = open("/mnt/hugetlb-pool/hugetlb_file", O_RDWR | O_CREAT, 0666); ftruncate(fd, 2*1024*1024); char *addr = mmap(NULL, 2*1024*1024, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); addr[0] = '.'; sleep(1); munmap(addr, 2*1024*1024); close(fd); Hope this helps! Please let me know if it doesn't work, I would be happy to investigate this with you. Have a great day! Joshua Sent using hkml (https://github.com/sjp38/hackermail)