From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 190A8C369C2 for ; Mon, 21 Apr 2025 01:56:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B674C6B0006; Sun, 20 Apr 2025 21:56:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B166F6B0007; Sun, 20 Apr 2025 21:56:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DF0B6B0008; Sun, 20 Apr 2025 21:56:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7FCBA6B0006 for ; Sun, 20 Apr 2025 21:56:18 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B5291141069 for ; Mon, 21 Apr 2025 01:56:18 +0000 (UTC) X-FDA: 83356386036.04.11125AB Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf06.hostedemail.com (Postfix) with ESMTP id BDECD180006 for ; Mon, 21 Apr 2025 01:56:15 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf06.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745200577; a=rsa-sha256; cv=none; b=QDORMCof53baeXwJiExdqEh0AS5+es1Onw3VvmujWn89Alf2JVcyI2IIr0telOYlnbDxrw vEcCvg6hm4XWJczoEZXgj0vn4leXPsIX8GIJ6OwcgnOa1acxiI4zI8k8bypwjmdIKeFylI R2kBY3F6fGedB4SPVZ4DT2i7sA4oP9M= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf06.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745200577; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QY9gOaGMK/NHdn7Y7sbXqzf3uxOuZwQu4nnKfeiB74o=; b=sxOwnbois0JM4CKTGWwRzy8KONeQ7x/YiWoByeFG6tTwiQxp+FGcTCOCKZH5ZF/TUIAsH7 BrAXQu6MtV2EYr0mJzPJhBlYDWxDP4G80CafSHSUKoqbx2eYP8MEw9nXKTPC9ET+5NyHfT A9Twm0oBW7SZ2CW0YC9c2j3waFUMutM= Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4ZgpMt0JkvzQwPB; Mon, 21 Apr 2025 09:54:50 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 84FD3180B49; Mon, 21 Apr 2025 09:56:11 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 21 Apr 2025 09:56:10 +0800 Message-ID: <64eaf1f9-1fab-8dda-ab75-5e48c0a8e0cb@huawei.com> Date: Mon, 21 Apr 2025 09:56:09 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description To: Randy Dunlap , , , , , CC: , , References: <20250419073214.2688926-1-tujinjiang@huawei.com> <1bb5fadd-583c-4c56-b52f-37eee516c1dd@infradead.org> From: Jinjiang Tu In-Reply-To: <1bb5fadd-583c-4c56-b52f-37eee516c1dd@infradead.org> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: BDECD180006 X-Stat-Signature: 3ucrczdpcfxhsapic11rfkbkzer417bi X-Rspam-User: X-HE-Tag: 1745200575-220223 X-HE-Meta: U2FsdGVkX18VT2f/ZMsLrgns1tBWQh4Sfo0VaF0hPKiXWPTlvxeNuOe5PwV53ujh3hhIwMa3NqYwFQ/25Evrpf2pXWWRky6DFA80LCUStqJlhKBKhGuvdbFt3soCGgjDkuWPD1VCWi8GD++rUNWynL2mOV62E7xZnO1CBPXUk38cZ5wen5X4owJOVTIXwcwBg0rWSuq+Uiwt7heq9Zps1lf5HbdM5mssVQnpw0QJe+agrg6Im0JyTKQ0hg7qNZ8NT1m1pWQJL3+YZuZ/k9AuJKsvP6/GgYL1lfv8MUpFbkXNF1kRWjQehw21G+VbVuMtAp62+x5LoEBhJXcAoH8n5mrxAaa0REzB1815J2AsT8vNq5ctq0KAuCblWsLf2piP/oRdHKLYh6qSktHjUWGEPID1bnk9IHI5Sj17RpsPrzuNZAIS6evyJQtocc/YE9R44O9/0Jgd27GFYbbI9GwPJEyPALSMNcNbUsSgrtis+WiblduE14sQNgWiRfqwniqllbTiIT7s8uvjG0K1YzMWWFhsB77Wk4KWtQFPCsR70iuElI1TggV4/8W7DxYjsoLfEd4yC709ftARtZR1m5b78dplVqIOEhUnPrfTvpq/DR+jnBFZkphQ3Wf1ua+2Mk4TWhmzU6IRC5ia2kuAC0JLNZ89kG+RzgF57KoWGPxYxlf3FEYEJYjX0ZmV4PXGLCspvT2Zmyrd7bs7e1Xa7XHs2PWh43m2W+cqpRBZ8WcA9INxk6CfTXrEISnEHuZGj/nUfiSLjE4GkHD2pAgMOT+rJG8MSvqWYmSzR/QpkhM4D+9BMGcZLuYlkEYTnBK3YyUTcKxXAA+E906Em7BPLRceXBCiDN01w32BVEzx+Zs9qjRX9coafKtPkEJ6FB+8BlDoU0j2gfG+AahFOBnj5WDPfJ0rsv3OkgU3zMU/Pzff7GNCsb34kwb8g2oSF47HPJz/LxoYbSFo5aLE5/3W1GC QQhgTPDw NVVbghFwzyUgkJ+3hboI23nZGYTDfzN4wI0/G9jU2gnlDtU+VpimMWUWPSxczA5BMsDHp/OMVwoVvbsdSNXhG08YnQ6QUkpoCtYNXbJcol5ExTAIboZ13oQelB6TctiL2gzgumea40oSMHANv9AqwYDW0JxrObRGQ9gP+uaftSdXl9d7545Ai0iYY4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/4/20 1:20, Randy Dunlap 写道: > > On 4/19/25 12:32 AM, Jinjiang Tu wrote: >> When echo 0 > /proc/sys/vm/nr_hugepages is concurrent with freeing in-use >> huge pages to the huge page pool, some free huge pages may fail to be >> destroyed and accounted as surplus. The counts are like below: >> >> HugePages_Total: 1024 >> HugePages_Free: 1024 >> HugePages_Surp: 1024 >> >> When set_max_huge_pages() decrease the pool size, it first return free >> pages to the buddy allocator, and then account other pages as surplus. >> Between the two steps, the hugetlb_lock is released to free memory and >> require the hugetlb_lock again. If another process free huge pages to the >> pool between the two steps, these free huge pages will be accounted as >> surplus. >> >> Besides, Free surplus huge pages come from failing to restore vmemmap. >> >> Once the two situation occurs, users couldn't directly shrink the huge >> page pool via echo 0 > nr_hugepages, should use one of the two ways to >> destroy these free surplus huge pages: >> 1) echo $nr_surplus > nr_hugepages to convert the surplus free huge pages >> to persistent free huge pages first, and then echo 0 > nr_hugepages to >> destroy these huge pages. >> 2) allocate these free surplus huge pages, and will try to destroy them >> when freeing them. >> >> However, there is no documentation to describe it, users may be confused >> and don't know how to handle in such case. So update the documention. >> >> Signed-off-by: Jinjiang Tu >> --- >> Documentation/admin-guide/mm/hugetlbpage.rst | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst >> index 67a941903fd2..0456cefae039 100644 >> --- a/Documentation/admin-guide/mm/hugetlbpage.rst >> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst >> @@ -239,6 +239,17 @@ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is >> increased sufficiently, or the surplus huge pages go out of use and are freed-- >> no more surplus huge pages will be allowed to be allocated. >> >> +Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` may be >> +concurrent with freeing in-use huge pages to the huge page pool, leading to some >> +huge pages are still in the huge page pool and accounted as surplus. Besides, >> +When the feature of freeing unused vmemmap pages associated with each hugetlb page > when > >> +is enabled, free huge page may be accounted as surplus too. In such two cases, users >> +couldn't directly shrink the huge page pool via echo 0 to ``nr_hugepages``, should > but should > > > Also, please limit each line to <80 characters. > >> +echo $nr_surplus to ``nr_hugepages`` to convert the surplus free huge pages to >> +persistent free huge pages first, and then echo 0 to ``nr_hugepages`` to destroy >> +these huge pages. Another way to destroy is allocating these free surplus huge >> +pages and these huge pages will be tried to destroy when they are freed. >> + > But I don't see why this is a user problem to be solved by users... echo xx > nr_hugepages isn't a atomic operation against huge pages allocation/free, we can't guarantee all huge pages will be destroyed after this operation. So users have to check if huge pages are successfully destroyed. > >> With support for multiple huge page pools at run-time available, much of >> the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in >> sysfs.