From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04AEFC369D3 for ; Tue, 22 Apr 2025 13:03:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 775AE6B0008; Tue, 22 Apr 2025 09:02:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 723CA6B000A; Tue, 22 Apr 2025 09:02:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6132B6B000C; Tue, 22 Apr 2025 09:02:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 418086B0008 for ; Tue, 22 Apr 2025 09:02:58 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E5F7E140CD0 for ; Tue, 22 Apr 2025 13:02:58 +0000 (UTC) X-FDA: 83361694836.30.BA63D7E Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf29.hostedemail.com (Postfix) with ESMTP id 10AE212000A for ; Tue, 22 Apr 2025 13:02:54 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf29.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745326977; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6ta8HVUieky8YzINLnb8qJaY6eGL2MFnp5jcoEG7hR8=; b=aXRxKjhwEe3qMMedryIOPt0LgtxPGEYqUwpwmQkof1umEqEf8jyfimW8oAo7CqNwqXKxfd lHcPv2Ik1ha5DslDhwv+CXkIrZHrfntv0l2H7sEruSaPMDMTOI3B18MVTNirk1U7HoK4Mn LnLNzzYqMCyKc0g7d0cp9Zuc5LDfT6I= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf29.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745326977; a=rsa-sha256; cv=none; b=aO0x++bwekZUUpNut0L4sE/fKug17q5pI4vkctBaIQm38buDYCWzdSMheRgNQvM07L4f27 DbF7d4YvbMaAAqqzRhhIeZ3F3vKuxDR9r0b+I80z+MgrRmE4ewrSVOaJIIlMqRDkj+vDRI mIyL4p3MpUREsNsir4VI5ixieKHuvKU= Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4Zhj5s6BVNz1R7bX; Tue, 22 Apr 2025 21:00:49 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 32DAB1A016C; Tue, 22 Apr 2025 21:02:49 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 22 Apr 2025 21:02:48 +0800 Message-ID: <9f731254-ccd3-5fde-a5a8-c2771a308588@huawei.com> Date: Tue, 22 Apr 2025 21:02:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description To: , , , , CC: , , References: <20250419073214.2688926-1-tujinjiang@huawei.com> From: Jinjiang Tu In-Reply-To: <20250419073214.2688926-1-tujinjiang@huawei.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 10AE212000A X-Stat-Signature: xjt5w3o1tyqc53jidtx9r5j3etf4c8d5 X-HE-Tag: 1745326974-210243 X-HE-Meta: U2FsdGVkX19ZMYjNqxX8fxcFklzi+e3TN/a+urfp/5pgdosU9ibtzKPouawtcaAuAjFUjx32gzyxLTWlbP/r1CeuzBJYHS7jUSnx3Fwxm/qjnVDYhQ1qK4K2KWo+NF4IKnOY941+bYr/BiAP6r3R0f2udW7pGbU9cCUA1MaFqAUmQV16THO+EA3jzOtrr0PPnSdueokjPqy7LYtz7S+zqP9//9XbA+kmowyaDegfCzsum8SJhyxUeMccz81t1gCQdb+pQNh/WK+Q95P+S7MYT2Do6yj8c9cEu0yORCYpRwblHkPpRNgqVhNolQKfHeAjbZtBaM7mBP9EIAeJ2At9NcN6nLqNQz0w3qf1VnedWbvSYcNX0IohC0jzylW8PcIgrFJhhYdxkEFb/GFwW28tYe6QpZ1fghgMpF1LBbB0V8DOko3Gp3+UcDqUU+4kT2vT6LXl4pO2H/5xbUEmQUEsFHZu20O7Efo4f/5FjJMcWyN3NV8WZ06usjK2cVSKp9QYEUJr1qJYTZCQSx61RgqfLMNkH5EyedX0sSJAoVDJKUp//q9s190gdmfwvs/IXGGSOndXK0cYL8CJlkZDij+B/xYKJ4zYl5xyThRaSLrWJp+gOaXinGlvDevKBBwNo/mpmHlronrkWzcOtGgwKePIteOiTLtGj0hS0TUr62kVOd5Rj4AdwQ38Z+Uiotpo5MT/+LwvR9RKN/BKJ4ns8SyxEJerF4E4chfNGnikd1xIIUKk61NAxlUTlsbIMb1jvWMFUZtTYQOz5Iy0wraCljhI9TRiBZTih1o8ALmjaZWH2H6dGov4EExzjAn1pBWlinmPYsz+h1YQvcbuzVYykb5lINpKhFJ2+nGeKos28AMR5fFrZFFvXFOJI1CAE3CTupx1mPJyVNKfphUd3SgwMbRtduEA0DvJOmvvGPAbjBkxU+3g6qhqaBfwydckXDpB7wESdERD8/ZthYL6/+kIw0N pyU1CTI6 mVBK/Rscauo5oIaBUzVUqmL3HwmgLW47eYtxhI/A6hn8zC5inQoD2YWmn4cbWSJGYwr0AJ0wMX/T9ngHe6cfq/EDeT2NZEYpy6vxJUTjni9zbMxmujJU2Gxx/R979w30gShBG8erAKgPI5sobYeRaT6z7nC5WIoEcXKcpHTvSuS3tmug= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/4/19 15:32, Jinjiang Tu 写道: Hi > When echo 0 > /proc/sys/vm/nr_hugepages is concurrent with freeing in-use > huge pages to the huge page pool, some free huge pages may fail to be > destroyed and accounted as surplus. The counts are like below: > > HugePages_Total: 1024 > HugePages_Free: 1024 > HugePages_Surp: 1024 > > When set_max_huge_pages() decrease the pool size, it first return free > pages to the buddy allocator, and then account other pages as surplus. > Between the two steps, the hugetlb_lock is released to free memory and > require the hugetlb_lock again. If another process free huge pages to the > pool between the two steps, these free huge pages will be accounted as > surplus. I think this is a constraint of interface nr_hugepages, this interface couldn't guarantee all huge pages will be freed. How do you think about it? Thanks. > Besides, Free surplus huge pages come from failing to restore vmemmap. > > Once the two situation occurs, users couldn't directly shrink the huge > page pool via echo 0 > nr_hugepages, should use one of the two ways to > destroy these free surplus huge pages: > 1) echo $nr_surplus > nr_hugepages to convert the surplus free huge pages > to persistent free huge pages first, and then echo 0 > nr_hugepages to > destroy these huge pages. > 2) allocate these free surplus huge pages, and will try to destroy them > when freeing them. > > However, there is no documentation to describe it, users may be confused > and don't know how to handle in such case. So update the documention. > > Signed-off-by: Jinjiang Tu > --- > Documentation/admin-guide/mm/hugetlbpage.rst | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst > index 67a941903fd2..0456cefae039 100644 > --- a/Documentation/admin-guide/mm/hugetlbpage.rst > +++ b/Documentation/admin-guide/mm/hugetlbpage.rst > @@ -239,6 +239,17 @@ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is > increased sufficiently, or the surplus huge pages go out of use and are freed-- > no more surplus huge pages will be allowed to be allocated. > > +Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` may be > +concurrent with freeing in-use huge pages to the huge page pool, leading to some > +huge pages are still in the huge page pool and accounted as surplus. Besides, > +When the feature of freeing unused vmemmap pages associated with each hugetlb page > +is enabled, free huge page may be accounted as surplus too. In such two cases, users > +couldn't directly shrink the huge page pool via echo 0 to ``nr_hugepages``, should > +echo $nr_surplus to ``nr_hugepages`` to convert the surplus free huge pages to > +persistent free huge pages first, and then echo 0 to ``nr_hugepages`` to destroy > +these huge pages. Another way to destroy is allocating these free surplus huge > +pages and these huge pages will be tried to destroy when they are freed. > + > With support for multiple huge page pools at run-time available, much of > the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in > sysfs.