From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0EC2C001DB for ; Tue, 8 Aug 2023 09:13:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12D276B0071; Tue, 8 Aug 2023 05:13:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DD568D0003; Tue, 8 Aug 2023 05:13:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE7448D0001; Tue, 8 Aug 2023 05:13:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DAD976B0071 for ; Tue, 8 Aug 2023 05:13:20 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A99AE140BCD for ; Tue, 8 Aug 2023 09:13:20 +0000 (UTC) X-FDA: 81100373760.26.D1D41E0 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf17.hostedemail.com (Postfix) with ESMTP id 4CDE340003 for ; Tue, 8 Aug 2023 09:13:17 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=smartx-com.20221208.gappssmtp.com header.s=20221208 header.b=2p2AvUmy; dmarc=none; spf=none (imf17.hostedemail.com: domain of xueshi.hu@smartx.com has no SPF policy when checking 209.85.210.174) smtp.mailfrom=xueshi.hu@smartx.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691485998; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zBzESn3KVF6O8rwbw0qzM/iFxqlIdlUBzQUvTAN1VQM=; b=osofTk0HS/9vimry5sNKX3TGuZzq4HqFhjbupCy+FMiS8+an1EQ46/tfnSnVhnS9kRCHrB fjrmiLfBTn+W5Ig2B55bAwE8cwyRvrl/9s/u/aEJ+kWZnFHY9KI0Z1iwmgkhGHckzVBk/M pmFMvKFH0/WQCipWfGlWkTjCIAXOzYw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=smartx-com.20221208.gappssmtp.com header.s=20221208 header.b=2p2AvUmy; dmarc=none; spf=none (imf17.hostedemail.com: domain of xueshi.hu@smartx.com has no SPF policy when checking 209.85.210.174) smtp.mailfrom=xueshi.hu@smartx.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691485998; a=rsa-sha256; cv=none; b=p+rEeX3n2hFpxD76MyO8XODqgBii72yEL6raUGbFCEWz/CTDzPb0muSQSOoPfjLkUv6lsZ bTXeX/MQ0NNlcXzd52kTgCGTFOKneuffKFNop0TEzMhr/5xorg0USe5V5sfIsRHbLUDrby q5PGDAdSgY8qeLAj/hsgo8ME/gr3Jt0= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-68783b2e40bso3763712b3a.3 for ; Tue, 08 Aug 2023 02:13:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=smartx-com.20221208.gappssmtp.com; s=20221208; t=1691485996; x=1692090796; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=zBzESn3KVF6O8rwbw0qzM/iFxqlIdlUBzQUvTAN1VQM=; b=2p2AvUmyzn0caqBfgV4nA/vwRBkT5TbToP45If2xYUp4gKp7imHvhkZ/nXm9VXAE55 Q+0PYNm+QAhEVtFHXAmBMhc7TVZblOK8YqQB90el1qW4RhKpOFHKvTcu5GsAS8Oj4apT RFPOOYHR7ll5/BlJPLzYsILmvRererap9oYNdQMhBlHO67JprLfJquqoizQh9PNGovA6 paXYKaqIjv5B77D55ca4u7q4u1DGrelcMuNZD19oE5DMWBrDzrW470fIGpl7IX2jP9OH /9/8FAZnzxiTZXqzwi0QotEnaIMVGLWfsShhooz1P3vtsgR1g85yRSwhpLFcdpGAluB+ 5q9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691485996; x=1692090796; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zBzESn3KVF6O8rwbw0qzM/iFxqlIdlUBzQUvTAN1VQM=; b=aUps7vcrHQDDHIQNtSIFlflVJZxb+ATFSsXn5juk05WOB9qhHDw277BrsmUj7QwYXa TNlwv6ALtmVXlMrozSZZZPKjUKUw5CWJtLcfLKD2HztA3o768fHMR4jnKzQ6m/r+4gvW 9xPZixOcN6B5bbayk+iQpIUOvI/NjobzGIuDfLaNwCzAVqIQPadf7UA6jcYCzi8rObE4 AqCgsDvqc9xXQWzKeiG0g5KNy1X3GLWAMgEWam1EEb56PsBF0krC96Pn+iXYHGRbemJH D+o/X0tzR/6Xbp3H0c90AksyNTKrI44+V8px+ZbQeHxwCF5zfDzHA9083oJ5w8MeZ0iB MtuA== X-Gm-Message-State: AOJu0YygPiSOiE7wO/i3rc/XLu7N9ZXxQNr84bnUZjl6ii3a0LOGnFqc Z35aHeabhBSGn9yVbklT9O+woA== X-Google-Smtp-Source: AGHT+IGDS78EiMAfiVLHrnBVTNZ92UxO3uwxV13CFGxAxroPMaay12hnFco63g7ocTZZk22gKyJzxw== X-Received: by 2002:a05:6a00:1143:b0:687:8b52:112e with SMTP id b3-20020a056a00114300b006878b52112emr10622178pfm.14.1691485995659; Tue, 08 Aug 2023 02:13:15 -0700 (PDT) Received: from [127.0.0.1] ([47.75.78.161]) by smtp.googlemail.com with ESMTPSA id g28-20020a63375c000000b005633941a547sm6276943pgn.27.2023.08.08.02.13.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Aug 2023 02:13:15 -0700 (PDT) Message-ID: <42b78b9d-0a2f-c79d-0298-e4a7283a5633@smartx.com> Date: Tue, 8 Aug 2023 17:13:11 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages Content-Language: en-US To: David Hildenbrand , mike.kravetz@oracle.com, muchun.song@linux.dev, corbet@lwn.net, akpm@linux-foundation.org, n-horiguchi@ah.jp.nec.com, osalvador@suse.de Cc: linux-mm@kvack.org References: <20230806074853.317203-1-xueshi.hu@smartx.com> <20230806074853.317203-2-xueshi.hu@smartx.com> <5c9ebf69-cd59-0fb3-bb85-1ab219426530@redhat.com> <79508337-08c1-7926-afd9-af21ee128949@smartx.com> <5b404d86-6b6f-b6c7-6286-f2ce3c4b5424@redhat.com> From: Xueshi Hu In-Reply-To: <5b404d86-6b6f-b6c7-6286-f2ce3c4b5424@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4CDE340003 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: me4gfwkr6b9h6xtcbcqawpddgw7om8em X-HE-Tag: 1691485997-386409 X-HE-Meta: U2FsdGVkX18q5XbBvRDrYNEu3sfcyVFnt1L4H7qiXyqqjP91w44lw7sSeeiY4zLXvjncoibiDXJvz90ERXTWokKlEZn4Wp+9XWFBGv2tUtczk1/MlPBZxiC5QdJ/nLqqYWLGQ4zPS2j6hmqQR2YBaQoghnyH33TSmrrzFRdRqm4A+JYUH/H2h2GdQLbIsDN55BSxohmHhkPgaSsCpwNt9VUrArjNaURY/Xdggm8byJ5Wvqr0avKPEilDcB3zfdz2sXDcxr638LSHBhbl582fHJJtvNCJcFle20k58ZVvglt0TdcMiey1drndwo+Qs5LdZx/cC92HINAqQo7qcF7OnwIGKCbinhn78PGaeUF77fMOc07hoYA4lFeQI3t2KeiPzyMRjbs33Dof8nfmLJOD9be1yJ+HxbYrJBv/kKE+crx+U0+0znGZWqKle3NBinTskN3iquYYqduxARp2jc1JgVfZK2Ma6Xr1yOaYxw0O58cT9hLA9smRlusH+qn5UWEEixHM18yOXT+h8cbL4t+P2AdcSTbf2SspO3GBq+PMgXUUHoDL8ZkrcqIeGW/8vxnlGQcClMMw+lACf8EzL3xBjwq/SwXStv+1fHe+nQLBVFgupKUUQ5i/+qs10wMwvRDGONWnrWWi0RaI48Tztu9tDQ3vi9qj/IsxB5JdvxJbfQQjtIoP3F0ZWT7ZkyI/mi+dvgqkiJs+Vl1qBU9uiqx4j2Erf2y1bHDW0EBrryNb0jm8KPbRButJ9e7S3CjRkCfALnYuGfX+2cc7i35hk837r6sfCumDaCs0+9KzJW3pDTzGYTfw/H+pNk/r0RHIa3wBh0xItor+1K4A6buPpOYpRs+Cp/4iKZWbm5IOWS9thxnMtJWAj4haSAJD4Q+lc7K3C7tDJl0xLagYB7996jZGoOZyW4b5P8MY71zELSy0i+wldRMMM3EiISLz4AdfW+GHGXl8GVdIzYczA0Yhte1 UAE+XKXO jB73g+x2LNubIktrhCtHkJ/7bxhxvcdRb7nX4ZrR1sJ7r+uq+s3D0mL/BXVETF+BmPCJkwQ8oJdFfFToTBdLlKFR3nNA3naUhgKOkRFYBV7nvYh12o386ErglvAfh3ZihinyjLVJCsUfmXHBwKctKP/xP3f892wnlmzz9NVMuCc1HaUrTs+wMyFoVaTNRA/01cP7xzATIq+6wgcH4MtXAA9+2N/di9yWd/oC/3bFlESPYMYvi0xbvwaTn0NKCFnzMPZg5PSLR8+SAzg3/7d7amBRC3r942ALlXfeY3llsGWC6Nkm1SV5fxPozXIo9a/DhJsxeztIZYwA7zVeWh+LOEYdxa6JREoitoE/Rfl+arfIs26g1UB0tGYR66cpZhzedsNdOntwKa0KeQmTpjpyHu56qXtCTZAHTnZjjfPadDtSNyH+gpXEaePMbLHJbTgOFtvBDzhjz7jZyDVSqnB2LNDPJG8Km/R6O4RbwTIlLlGPIvmY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/8/23 15:58, David Hildenbrand wrote: > On 08.08.23 04:28, Xueshi Hu wrote: >> On 8/7/23 23:15, David Hildenbrand wrote: >>> On 06.08.23 09:48, Xueshi Hu wrote: >>>> There are currently three 'nr_hugepages' used to export the number of >>>> huge >>>> pages: >>>> 1. /proc/sys/vm/nr_hugepages >>>> 2. >>>> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages >>>> 3. /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages >>>> >>>> For consistency, all three 'nr_hugepages' should return the total >>>> number >>>> of huge pages. When written, the number of persistent huge pages >>>> will be >>>> adjusted to the specified value. >>>> >>>> But, /proc/sys/vm/nr_hugepages returns the number of persistent huge >>>> pages. >>>> >>> >>> But that's documented behavior, no? >>> >>> Documentation/admin-guide/mm/hugetlbpage.rst >>> >>> ``/proc/sys/vm/nr_hugepages`` indicates the current number of >>> "persistent" huge >>> pages in the kernel's huge page pool. "Persistent" huge pages will be >>> returned to the huge page pool when freed by a task. A user with root >>> privileges can dynamically allocate more or free some persistent huge >>> pages >>> by increasing or decreasing the value of ``nr_hugepages``. >>> >> >> Actually, Documentation/admin-guide/mm/hugetlbpage.rst is contradictory >> about the definition of /proc/sys/vm/nr_hugepages. >> >> The documentation says: >> >> - ``/proc/sys/vm/nr_hugepages`` indicates the current number of >> "persistent" huge. >> >> But, the documentation also says: >> >> - The ``/proc`` interfaces discussed above have been retained for >> backwards compatibility. > > Yes. And why shouldn't the behavior of these toggles be retained for > backwards compatibility? > >> >> - The ``nr_hugepages`` attribute returns the total number of huge >> pages on >> the specified node. When this attribute is written, the number of >> persistent huge pages on the parent node will be adjusted to the >> specified >> value, if sufficient resources exist, regardless of the task's mempolicy >> or cpuset constraints. > > That is part of the "/sys/devices/system/node/node[0-9]*/hugepages/" > docuemntation, > not "/proc/sys/vm/nr_hugepages" documentation. > > It's in the "Per Node Hugepages Attributes" section. The document states that compatibility has been maintained between /sys/devices/system/node/node[0-9]*/hugepages/nr_hugepages and /proc/sys/vm/nr_hugepages. But, /proc/sys/vm/nr_hugepages displays the number of persistent hugetlb pages, while /sys/devices/system/node/node[0-9]*/hugepages/nr_hugepages shows total number hugetlb pages. For clearness, an example: Prepare: echo 100 > /proc/sys/vm/nr_hugepages launch a program to reserve 100 hupatlb pages, then sleep echo 0 > /proc/sys/vm/nr_hugepages Check the result: cat /proc/sys/vm/nr_hugepages 0 cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 100 If the compatibility is maintained, they should return the same value, 0 or 100. Also, check the /proc/meminfo: grep "HugePages_" /proc/meminfo HugePages_Total: 100 HugePages_Free: 100 HugePages_Rsvd: 100 HugePages_Surp: 100 The documentation says: The ``/proc/meminfo`` file provides information about the total number of persistent hugetlb pages in the kernel's huge page pool. As you see, HugePages_Total should the total number of hugetlb pages instead of "total number of persistent hugetlb pages". This is one the mistakes the documentation made. > > > If I am not wrong, that documentation -- including the usage of the > "persistent" > term -- were introduced in 2009 already: > > commit 267b4c281b4a43c8f3d965c791d3a7fd62448733 > Author: Lee Schermerhorn > Date: Mon Dec 14 17:58:30 2009 -0800 > > hugetlb: update hugetlb documentation for NUMA controls > Update the kernel huge tlb documentation to describe the numa memory > policy based huge page management. Additionaly, the patch includes > a fair > amount of rework to improve consistency, eliminate duplication and > set the > context for documenting the memory policy interaction. > So this has been documented behavior for a long time. > >> >> So, I create the patch 4 to make the documentation more clear. >> >> If such subtle inconsistencies result in unexpected behavior, it can be >> challenging for a system administrator to detect. > > Your patch is changing the traditional, documented behavior to then change > the documentation to match the new implementation. I'm not intended to modify a correct api and then change the documentation, that's totally time-wasting. The API can be easily misused and the documentation is unclear, that's why I send the patchset. > > How can you be sure nobody relies on exactly that traditional, well > documented > behavior? Considering no special statement about the inconsistency, I think more people will just misuse it instead of relying on the minor inconsistency. > > Why change the behavior of an interface that is kept for backwards > compatibility, > that might still be in use under the assumptions that the behavior is > exactly > like that (and for at least the last 14 years)? > Maybe, you mean "to keep backwards compatibility" ? Thanks, Hu