[PATCH v2 0/4] mm/hugetlb: fix /sys and /proc fs dealing with persistent hugepages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] mm/hugetlb: fix /sys and /proc fs dealing with persistent hugepages
@ 2023-08-06  7:48 Xueshi Hu
  2023-08-06  7:48 ` [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages Xueshi Hu
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Xueshi Hu @ 2023-08-06  7:48 UTC (permalink / raw)
  To: mike.kravetz, muchun.song, corbet, akpm, n-horiguchi, osalvador
  Cc: linux-mm, Xueshi Hu

The meaning persistent huge pages is ambiguous both in documentation and
code implementation. About the bugfix, see the details in the commit 
message in patch 1 and 3. 

Additionally, code relevant with hstate:max_huge_pages can be simplified
easily after the bugfix, do it in patch 2.

v1 -> v2
1. Revise the commit message in patch 1.
3. Add fix tag in patch 3.
4. Add the patch 4 to correct documentation.

Xueshi Hu (4):
  mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  mm/hugeltb: clean up hstate::max_huge_pages
  mm/hugeltb: fix nodes huge page allocation when there are surplus
    pages
  docs: hugetlbpage.rst: make the meaning of persistent hugetlb pages
    clear

 Documentation/admin-guide/mm/hugetlbpage.rst | 31 ++++++++++----------
 fs/hugetlbfs/inode.c                         |  2 +-
 mm/hugetlb.c                                 | 30 ++++++-------------
 3 files changed, 26 insertions(+), 37 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-06  7:48 [PATCH v2 0/4] mm/hugetlb: fix /sys and /proc fs dealing with persistent hugepages Xueshi Hu
@ 2023-08-06  7:48 ` Xueshi Hu
  2023-08-07 15:15   ` David Hildenbrand
  2023-08-06  7:48 ` [PATCH v2 2/4] mm/hugeltb: clean up hstate::max_huge_pages Xueshi Hu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Xueshi Hu @ 2023-08-06  7:48 UTC (permalink / raw)
  To: mike.kravetz, muchun.song, corbet, akpm, n-horiguchi, osalvador
  Cc: linux-mm, Xueshi Hu

There are currently three 'nr_hugepages' used to export the number of huge
pages:
1. /proc/sys/vm/nr_hugepages
2. /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
3. /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

For consistency, all three 'nr_hugepages' should return the total number
of huge pages. When written, the number of persistent huge pages will be
adjusted to the specified value.

But, /proc/sys/vm/nr_hugepages returns the number of persistent huge
pages.

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e327a5a7602c..76af189053f0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4606,7 +4606,7 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
 			 void *buffer, size_t *length, loff_t *ppos)
 {
 	struct hstate *h = &default_hstate;
-	unsigned long tmp = h->max_huge_pages;
+	unsigned long tmp = h->nr_huge_pages;
 	int ret;
 
 	if (!hugepages_supported())
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 2/4] mm/hugeltb: clean up hstate::max_huge_pages
  2023-08-06  7:48 [PATCH v2 0/4] mm/hugetlb: fix /sys and /proc fs dealing with persistent hugepages Xueshi Hu
  2023-08-06  7:48 ` [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages Xueshi Hu
@ 2023-08-06  7:48 ` Xueshi Hu
  2023-08-07 15:17   ` David Hildenbrand
  2023-08-06  7:48 ` [PATCH v2 3/4] mm/hugeltb: fix nodes huge page allocation when there are surplus pages Xueshi Hu
  2023-08-06  7:48 ` [PATCH v2 4/4] docs: hugetlbpage.rst: make the meaning of persistent hugetlb pages clear Xueshi Hu
  3 siblings, 1 reply; 16+ messages in thread
From: Xueshi Hu @ 2023-08-06  7:48 UTC (permalink / raw)
  To: mike.kravetz, muchun.song, corbet, akpm, n-horiguchi, osalvador
  Cc: linux-mm, Xueshi Hu

Presently, the sole use case of hstate::max_huge_pages is confined to
hugetlb_sysctl_handler_common() and hugetlbfs_size_to_hpages().
The former has been replaced with hstate::nr_huge_pages, while the latter
can be effortlessly substituted.

After hugeltb subsystem has been initialized, hstate::max_huge_pages
always equals to persistent_huge_pages(). It's a burden to maintain
the equation[1][2].

After this patch, hstate::max_huge_pages is only used in kernel command
line parameter parsing.

Renaming set_max_huge_pages() to set_nr_huge_pages() would enhance the
readability of the code.

[1]: Commit a43a83c79b4f ("mm/hugetlb: fix incorrect update of
max_huge_pages")
[2]: Commit c1470b33bb6e ("mm/hugetlb: fix incorrect hugepages count
during mem hotplug")

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 fs/hugetlbfs/inode.c |  2 +-
 mm/hugetlb.c         | 24 +++++-------------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 316c4cebd3f3..cd1a3e4bf8fb 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1375,7 +1375,7 @@ hugetlbfs_size_to_hpages(struct hstate *h, unsigned long long size_opt,
 
 	if (val_type == SIZE_PERCENT) {
 		size_opt <<= huge_page_shift(h);
-		size_opt *= h->max_huge_pages;
+		size_opt *= (h->nr_huge_pages - h->surplus_huge_pages);
 		do_div(size_opt, 100);
 	}
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 76af189053f0..56647235ab21 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2343,14 +2343,13 @@ int dissolve_free_huge_page(struct page *page)
 		}
 
 		remove_hugetlb_folio(h, folio, false);
-		h->max_huge_pages--;
 		spin_unlock_irq(&hugetlb_lock);
 
 		/*
 		 * Normally update_and_free_hugtlb_folio will allocate required vmemmmap
 		 * before freeing the page.  update_and_free_hugtlb_folio will fail to
 		 * free the page if it can not allocate required vmemmap.  We
-		 * need to adjust max_huge_pages if the page is not freed.
+		 * need to adjust nr_huge_pages if the page is not freed.
 		 * Attempt to allocate vmemmmap here so that we can take
 		 * appropriate action on failure.
 		 */
@@ -2360,7 +2359,6 @@ int dissolve_free_huge_page(struct page *page)
 		} else {
 			spin_lock_irq(&hugetlb_lock);
 			add_hugetlb_folio(h, folio, false);
-			h->max_huge_pages++;
 			spin_unlock_irq(&hugetlb_lock);
 		}
 
@@ -3274,8 +3272,6 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
 	string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
 	pr_warn("HugeTLB: allocating %u of page size %s failed node%d.  Only allocated %lu hugepages.\n",
 		h->max_huge_pages_node[nid], buf, nid, i);
-	h->max_huge_pages -= (h->max_huge_pages_node[nid] - i);
-	h->max_huge_pages_node[nid] = i;
 }
 
 static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
@@ -3336,7 +3332,6 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 		string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
 		pr_warn("HugeTLB: allocating %lu of page size %s failed.  Only allocated %lu hugepages.\n",
 			h->max_huge_pages, buf, i);
-		h->max_huge_pages = i;
 	}
 	kfree(node_alloc_noretry);
 }
@@ -3460,7 +3455,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
 }
 
 #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
-static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
+static int set_nr_huge_pages(struct hstate *h, unsigned long count, int nid,
 			      nodemask_t *nodes_allowed)
 {
 	unsigned long min_count, ret;
@@ -3601,7 +3596,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 			break;
 	}
 out:
-	h->max_huge_pages = persistent_huge_pages(h);
 	spin_unlock_irq(&hugetlb_lock);
 	mutex_unlock(&h->resize_lock);
 
@@ -3639,7 +3633,7 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
 	destroy_compound_hugetlb_folio_for_demote(folio, huge_page_order(h));
 
 	/*
-	 * Taking target hstate mutex synchronizes with set_max_huge_pages.
+	 * Taking target hstate mutex synchronizes with set_nr_huge_pages.
 	 * Without the mutex, pages added to target hstate could be marked
 	 * as surplus.
 	 *
@@ -3664,14 +3658,6 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
 
 	spin_lock_irq(&hugetlb_lock);
 
-	/*
-	 * Not absolutely necessary, but for consistency update max_huge_pages
-	 * based on pool changes for the demoted page.
-	 */
-	h->max_huge_pages--;
-	target_hstate->max_huge_pages +=
-		pages_per_huge_page(h) / pages_per_huge_page(target_hstate);
-
 	return rc;
 }
 
@@ -3770,13 +3756,13 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy,
 	} else {
 		/*
 		 * Node specific request.  count adjustment happens in
-		 * set_max_huge_pages() after acquiring hugetlb_lock.
+		 * set_nr_huge_pages() after acquiring hugetlb_lock.
 		 */
 		init_nodemask_of_node(&nodes_allowed, nid);
 		n_mask = &nodes_allowed;
 	}
 
-	err = set_max_huge_pages(h, count, nid, n_mask);
+	err = set_nr_huge_pages(h, count, nid, n_mask);
 
 	return err ? err : len;
 }
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 3/4] mm/hugeltb: fix nodes huge page allocation when there are surplus pages
  2023-08-06  7:48 [PATCH v2 0/4] mm/hugetlb: fix /sys and /proc fs dealing with persistent hugepages Xueshi Hu
  2023-08-06  7:48 ` [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages Xueshi Hu
  2023-08-06  7:48 ` [PATCH v2 2/4] mm/hugeltb: clean up hstate::max_huge_pages Xueshi Hu
@ 2023-08-06  7:48 ` Xueshi Hu
  2023-08-06  7:48 ` [PATCH v2 4/4] docs: hugetlbpage.rst: make the meaning of persistent hugetlb pages clear Xueshi Hu
  3 siblings, 0 replies; 16+ messages in thread
From: Xueshi Hu @ 2023-08-06  7:48 UTC (permalink / raw)
  To: mike.kravetz, muchun.song, corbet, akpm, n-horiguchi, osalvador
  Cc: linux-mm, Xueshi Hu

In set_nr_huge_pages(), local variable "count" is used to record
persistent_huge_pages(), but when it cames to nodes huge page allocation,
the semantics changes to nr_huge_pages. When there exists surplus huge
pages and using the interface under
/sys/devices/system/node/node*/hugepages to change huge page pool size,
this difference can result in the allocation of an unexpected number of
huge pages.

Steps to reproduce the bug:

Starting with:

				  Node 0          Node 1    Total
	HugePages_Total             0.00            0.00     0.00
	HugePages_Free              0.00            0.00     0.00
	HugePages_Surp              0.00            0.00     0.00

create 100 huge pages in Node 0 and consume it, then set Node 0 's
nr_hugepages to 0.

yields:

				  Node 0          Node 1    Total
	HugePages_Total           200.00            0.00   200.00
	HugePages_Free              0.00            0.00     0.00
	HugePages_Surp            200.00            0.00   200.00

write 100 to Node 1's nr_hugepages

		echo 100 > /sys/devices/system/node/node1/\
	hugepages/hugepages-2048kB/nr_hugepages

gets:

				  Node 0          Node 1    Total
	HugePages_Total           200.00          400.00   600.00
	HugePages_Free              0.00          400.00   400.00
	HugePages_Surp            200.00            0.00   200.00

Kernel is expected to create only 100 huge pages and it gives 200.

Fixes: fd875dca7c71 ("hugetlbfs: fix potential over/underflow setting node specific nr_hugepages")
Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 mm/hugetlb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 56647235ab21..8ed4fffdebda 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3490,7 +3490,9 @@ static int set_nr_huge_pages(struct hstate *h, unsigned long count, int nid,
 	if (nid != NUMA_NO_NODE) {
 		unsigned long old_count = count;
 
-		count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
+		count += persistent_huge_pages(h) -
+			 (h->nr_huge_pages_node[nid] -
+			  h->surplus_huge_pages_node[nid]);
 		/*
 		 * User may have specified a large count value which caused the
 		 * above calculation to overflow.  In this case, they wanted
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 4/4] docs: hugetlbpage.rst: make the meaning of persistent hugetlb pages clear
  2023-08-06  7:48 [PATCH v2 0/4] mm/hugetlb: fix /sys and /proc fs dealing with persistent hugepages Xueshi Hu
                   ` (2 preceding siblings ...)
  2023-08-06  7:48 ` [PATCH v2 3/4] mm/hugeltb: fix nodes huge page allocation when there are surplus pages Xueshi Hu
@ 2023-08-06  7:48 ` Xueshi Hu
  3 siblings, 0 replies; 16+ messages in thread
From: Xueshi Hu @ 2023-08-06  7:48 UTC (permalink / raw)
  To: mike.kravetz, muchun.song, corbet, akpm, n-horiguchi, osalvador
  Cc: linux-mm, Xueshi Hu

1. Provide a definition for persistent huge pages prior to mentioning it.
2. Correct the definition of '/proc/sys/vm/nr_hugepages'.
3. Correct several incorrect usages of the word "persistent".

Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 Documentation/admin-guide/mm/hugetlbpage.rst | 31 ++++++++++----------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index e4d4b4a8dc97..1cf3cf58bd99 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -25,7 +25,7 @@ automatically when CONFIG_HUGETLBFS is selected) configuration
 options.
 
 The ``/proc/meminfo`` file provides information about the total number of
-persistent hugetlb pages in the kernel's huge page pool.  It also displays
+hugetlb pages in the kernel's huge page pool.  It also displays
 default huge page size and information about the number of free, reserved
 and surplus huge pages in the pool of huge pages of default size.
 The huge page size is needed for generating the proper alignment and
@@ -54,10 +54,11 @@ HugePages_Rsvd
         guarantee that an application will be able to allocate a
         huge page from the pool of huge pages at fault time.
 HugePages_Surp
-	is short for "surplus," and is the number of huge pages in
-        the pool above the value in ``/proc/sys/vm/nr_hugepages``. The
-        maximum number of surplus huge pages is controlled by
-        ``/proc/sys/vm/nr_overcommit_hugepages``.
+	is short for "surplus," and is the number of huge pages which will be
+        freed back to the kernel's normal page pool upon becoming unused. In
+        contrast, persistent huge pages will not be freed back even if the task
+        no longer uses it. The maximum number of surplus huge pages is
+        controlled by ``/proc/sys/vm/nr_overcommit_hugepages``.
 	Note: When the feature of freeing unused vmemmap pages associated
 	with each hugetlb page is enabled, the number of surplus huge pages
 	may be temporarily larger than the maximum number of surplus huge
@@ -76,11 +77,11 @@ Hugetlb
 ``/proc/filesystems`` should also show a filesystem of type "hugetlbfs"
 configured in the kernel.
 
-``/proc/sys/vm/nr_hugepages`` indicates the current number of "persistent" huge
-pages in the kernel's huge page pool.  "Persistent" huge pages will be
-returned to the huge page pool when freed by a task.  A user with root
-privileges can dynamically allocate more or free some persistent huge pages
-by increasing or decreasing the value of ``nr_hugepages``.
+``/proc/sys/vm/nr_hugepages`` returns the total number of huge pages. When this
+file is written, the number of persistent huge pages will be adjusted to the
+specified value. A user with root privileges can dynamically allocate more or
+free some persistent huge pages by increasing or decreasing the value of
+``nr_hugepages``.
 
 Note: When the feature of freeing unused vmemmap pages associated with each
 hugetlb page is enabled, we can fail to free the huge pages triggered by
@@ -95,7 +96,7 @@ pool, a user with appropriate privilege can use either the mmap system call
 or shared memory system calls to use the huge pages.  See the discussion of
 :ref:`Using Huge Pages <using_huge_pages>`, below.
 
-The administrator can allocate persistent huge pages on the kernel boot
+The administrator can allocate huge pages on the kernel boot
 command line by specifying the "hugepages=N" parameter, where 'N' = the
 number of huge pages requested.  This is the most reliable method of
 allocating huge pages as memory has not yet become fragmented.
@@ -173,7 +174,7 @@ default sized persistent huge pages::
 	echo 20 > /proc/sys/vm/nr_hugepages
 
 This command will try to adjust the number of default sized huge pages in the
-huge page pool to 20, allocating or freeing huge pages, as required.
+persistent huge page pool to 20, allocating or freeing huge pages, as required.
 
 On a NUMA platform, the kernel will attempt to distribute the huge page pool
 over all the set of allowed nodes specified by the NUMA memory policy of the
@@ -406,12 +407,12 @@ default huge page size and associated pool will be used.
 
 The ``size`` option sets the maximum value of memory (huge pages) allowed
 for that filesystem (``/mnt/huge``). The ``size`` option can be specified
-in bytes, or as a percentage of the specified huge page pool (``nr_hugepages``).
-The size is rounded down to HPAGE_SIZE boundary.
+in bytes, or as a percentage of the specified persistent huge page pool
+(``nr_hugepages``). The size is rounded down to HPAGE_SIZE boundary.
 
 The ``min_size`` option sets the minimum value of memory (huge pages) allowed
 for the filesystem. ``min_size`` can be specified in the same way as ``size``,
-either bytes or a percentage of the huge page pool.
+either bytes or a percentage of the persistent huge page pool.
 At mount time, the number of huge pages specified by ``min_size`` are reserved
 for use by the filesystem.
 If there are not enough free huge pages available, the mount will fail.
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-06  7:48 ` [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages Xueshi Hu
@ 2023-08-07 15:15   ` David Hildenbrand
  2023-08-08  2:28     ` Xueshi Hu
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-07 15:15 UTC (permalink / raw)
  To: Xueshi Hu, mike.kravetz, muchun.song, corbet, akpm, n-horiguchi,
	osalvador
  Cc: linux-mm

On 06.08.23 09:48, Xueshi Hu wrote:
> There are currently three 'nr_hugepages' used to export the number of huge
> pages:
> 1. /proc/sys/vm/nr_hugepages
> 2. /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
> 3. /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> 
> For consistency, all three 'nr_hugepages' should return the total number
> of huge pages. When written, the number of persistent huge pages will be
> adjusted to the specified value.
> 
> But, /proc/sys/vm/nr_hugepages returns the number of persistent huge
> pages.
> 

But that's documented behavior, no?

Documentation/admin-guide/mm/hugetlbpage.rst

``/proc/sys/vm/nr_hugepages`` indicates the current number of "persistent" huge
pages in the kernel's huge page pool.  "Persistent" huge pages will be
returned to the huge page pool when freed by a task.  A user with root
privileges can dynamically allocate more or free some persistent huge pages
by increasing or decreasing the value of ``nr_hugepages``.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/4] mm/hugeltb: clean up hstate::max_huge_pages
  2023-08-06  7:48 ` [PATCH v2 2/4] mm/hugeltb: clean up hstate::max_huge_pages Xueshi Hu
@ 2023-08-07 15:17   ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2023-08-07 15:17 UTC (permalink / raw)
  To: Xueshi Hu, mike.kravetz, muchun.song, corbet, akpm, n-horiguchi,
	osalvador
  Cc: linux-mm

On 06.08.23 09:48, Xueshi Hu wrote:
> Presently, the sole use case of hstate::max_huge_pages is confined to
> hugetlb_sysctl_handler_common() and hugetlbfs_size_to_hpages().
> The former has been replaced with hstate::nr_huge_pages, while the latter

Most probably that conversion is wrong as it changes documented 
behavior, and therefore, this patch is not applicable.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-07 15:15   ` David Hildenbrand
@ 2023-08-08  2:28     ` Xueshi Hu
  2023-08-08  7:58       ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Xueshi Hu @ 2023-08-08  2:28 UTC (permalink / raw)
  To: David Hildenbrand, mike.kravetz, muchun.song, corbet, akpm,
	n-horiguchi, osalvador
  Cc: linux-mm

On 8/7/23 23:15, David Hildenbrand wrote:
> On 06.08.23 09:48, Xueshi Hu wrote:
>> There are currently three 'nr_hugepages' used to export the number of 
>> huge
>> pages:
>> 1. /proc/sys/vm/nr_hugepages
>> 2. /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>> 3. /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>>
>> For consistency, all three 'nr_hugepages' should return the total number
>> of huge pages. When written, the number of persistent huge pages will be
>> adjusted to the specified value.
>>
>> But, /proc/sys/vm/nr_hugepages returns the number of persistent huge
>> pages.
>>
> 
> But that's documented behavior, no?
> 
> Documentation/admin-guide/mm/hugetlbpage.rst
> 
> ``/proc/sys/vm/nr_hugepages`` indicates the current number of 
> "persistent" huge
> pages in the kernel's huge page pool.  "Persistent" huge pages will be
> returned to the huge page pool when freed by a task.  A user with root
> privileges can dynamically allocate more or free some persistent huge pages
> by increasing or decreasing the value of ``nr_hugepages``.
> 

Actually, Documentation/admin-guide/mm/hugetlbpage.rst is contradictory
about the definition of /proc/sys/vm/nr_hugepages.

The documentation says:

- ``/proc/sys/vm/nr_hugepages`` indicates the current number of
"persistent" huge.

But, the documentation also says:

- The ``/proc`` interfaces discussed above have been retained for
backwards compatibility.

- The ``nr_hugepages`` attribute returns the total number of huge pages on
the specified node.  When this attribute is written, the number of
persistent huge pages on the parent node will be adjusted to the specified
value, if sufficient resources exist, regardless of the task's mempolicy
or cpuset constraints.

So, I create the patch 4 to make the documentation more clear.

If such subtle inconsistencies result in unexpected behavior, it can be
challenging for a system administrator to detect.

Thanks,
Hu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-08  2:28     ` Xueshi Hu
@ 2023-08-08  7:58       ` David Hildenbrand
  2023-08-08  9:13         ` Xueshi Hu
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-08  7:58 UTC (permalink / raw)
  To: Xueshi Hu, mike.kravetz, muchun.song, corbet, akpm, n-horiguchi,
	osalvador
  Cc: linux-mm

On 08.08.23 04:28, Xueshi Hu wrote:
> On 8/7/23 23:15, David Hildenbrand wrote:
>> On 06.08.23 09:48, Xueshi Hu wrote:
>>> There are currently three 'nr_hugepages' used to export the number of
>>> huge
>>> pages:
>>> 1. /proc/sys/vm/nr_hugepages
>>> 2. /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>>> 3. /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>>>
>>> For consistency, all three 'nr_hugepages' should return the total number
>>> of huge pages. When written, the number of persistent huge pages will be
>>> adjusted to the specified value.
>>>
>>> But, /proc/sys/vm/nr_hugepages returns the number of persistent huge
>>> pages.
>>>
>>
>> But that's documented behavior, no?
>>
>> Documentation/admin-guide/mm/hugetlbpage.rst
>>
>> ``/proc/sys/vm/nr_hugepages`` indicates the current number of
>> "persistent" huge
>> pages in the kernel's huge page pool.  "Persistent" huge pages will be
>> returned to the huge page pool when freed by a task.  A user with root
>> privileges can dynamically allocate more or free some persistent huge pages
>> by increasing or decreasing the value of ``nr_hugepages``.
>>
> 
> Actually, Documentation/admin-guide/mm/hugetlbpage.rst is contradictory
> about the definition of /proc/sys/vm/nr_hugepages.
> 
> The documentation says:
> 
> - ``/proc/sys/vm/nr_hugepages`` indicates the current number of
> "persistent" huge.
> 
> But, the documentation also says:
> 
> - The ``/proc`` interfaces discussed above have been retained for
> backwards compatibility.

Yes. And why shouldn't the behavior of these toggles be retained for backwards compatibility?

> 
> - The ``nr_hugepages`` attribute returns the total number of huge pages on
> the specified node.  When this attribute is written, the number of
> persistent huge pages on the parent node will be adjusted to the specified
> value, if sufficient resources exist, regardless of the task's mempolicy
> or cpuset constraints.

That is part of the "/sys/devices/system/node/node[0-9]*/hugepages/" docuemntation,
not "/proc/sys/vm/nr_hugepages" documentation.

It's in the "Per Node Hugepages Attributes" section.


If I am not wrong, that documentation -- including the usage of the "persistent"
term -- were introduced in 2009 already:

commit 267b4c281b4a43c8f3d965c791d3a7fd62448733
Author: Lee Schermerhorn <lee.schermerhorn@hp.com>
Date:   Mon Dec 14 17:58:30 2009 -0800

     hugetlb: update hugetlb documentation for NUMA controls
     
     Update the kernel huge tlb documentation to describe the numa memory
     policy based huge page management.  Additionaly, the patch includes a fair
     amount of rework to improve consistency, eliminate duplication and set the
     context for documenting the memory policy interaction.
     
So this has been documented behavior for a long time.

> 
> So, I create the patch 4 to make the documentation more clear.
> 
> If such subtle inconsistencies result in unexpected behavior, it can be
> challenging for a system administrator to detect.

Your patch is changing the traditional, documented behavior to then change
the documentation to match the new implementation.

How can you be sure nobody relies on exactly that traditional, well documented
behavior?

Why change the behavior of an interface that is kept for backwards compatibility,
that might still be in use under the assumptions that the behavior is exactly
like that (and for at least the last 14 years)?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-08  7:58       ` David Hildenbrand
@ 2023-08-08  9:13         ` Xueshi Hu
  2023-08-10  0:17           ` Mike Kravetz
  0 siblings, 1 reply; 16+ messages in thread
From: Xueshi Hu @ 2023-08-08  9:13 UTC (permalink / raw)
  To: David Hildenbrand, mike.kravetz, muchun.song, corbet, akpm,
	n-horiguchi, osalvador
  Cc: linux-mm

On 8/8/23 15:58, David Hildenbrand wrote:
> On 08.08.23 04:28, Xueshi Hu wrote:
>> On 8/7/23 23:15, David Hildenbrand wrote:
>>> On 06.08.23 09:48, Xueshi Hu wrote:
>>>> There are currently three 'nr_hugepages' used to export the number of
>>>> huge
>>>> pages:
>>>> 1. /proc/sys/vm/nr_hugepages
>>>> 2. 
>>>> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>>>> 3. /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>>>>
>>>> For consistency, all three 'nr_hugepages' should return the total 
>>>> number
>>>> of huge pages. When written, the number of persistent huge pages 
>>>> will be
>>>> adjusted to the specified value.
>>>>
>>>> But, /proc/sys/vm/nr_hugepages returns the number of persistent huge
>>>> pages.
>>>>
>>>
>>> But that's documented behavior, no?
>>>
>>> Documentation/admin-guide/mm/hugetlbpage.rst
>>>
>>> ``/proc/sys/vm/nr_hugepages`` indicates the current number of
>>> "persistent" huge
>>> pages in the kernel's huge page pool.  "Persistent" huge pages will be
>>> returned to the huge page pool when freed by a task.  A user with root
>>> privileges can dynamically allocate more or free some persistent huge 
>>> pages
>>> by increasing or decreasing the value of ``nr_hugepages``.
>>>
>>
>> Actually, Documentation/admin-guide/mm/hugetlbpage.rst is contradictory
>> about the definition of /proc/sys/vm/nr_hugepages.
>>
>> The documentation says:
>>
>> - ``/proc/sys/vm/nr_hugepages`` indicates the current number of
>> "persistent" huge.
>>
>> But, the documentation also says:
>>
>> - The ``/proc`` interfaces discussed above have been retained for
>> backwards compatibility.
> 
> Yes. And why shouldn't the behavior of these toggles be retained for 
> backwards compatibility?
> 
>>
>> - The ``nr_hugepages`` attribute returns the total number of huge 
>> pages on
>> the specified node.  When this attribute is written, the number of
>> persistent huge pages on the parent node will be adjusted to the 
>> specified
>> value, if sufficient resources exist, regardless of the task's mempolicy
>> or cpuset constraints.
> 
> That is part of the "/sys/devices/system/node/node[0-9]*/hugepages/" 
> docuemntation,
> not "/proc/sys/vm/nr_hugepages" documentation.
> 
> It's in the "Per Node Hugepages Attributes" section.
The document states that compatibility has been maintained between
/sys/devices/system/node/node[0-9]*/hugepages/nr_hugepages and
/proc/sys/vm/nr_hugepages. But, /proc/sys/vm/nr_hugepages displays the 
number of
persistent hugetlb pages,
while /sys/devices/system/node/node[0-9]*/hugepages/nr_hugepages shows total
number hugetlb pages.

For clearness, an example:

Prepare:

	echo 100 > /proc/sys/vm/nr_hugepages
	launch a program to reserve 100 hupatlb pages, then sleep
	echo 0 >  /proc/sys/vm/nr_hugepages

Check the result:

	cat /proc/sys/vm/nr_hugepages

	0

	cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

	100

If the compatibility is maintained, they should return the same value,
0 or 100.

Also, check the /proc/meminfo:

	grep "HugePages_" /proc/meminfo

		HugePages_Total:     100
		HugePages_Free:      100
		HugePages_Rsvd:      100
		HugePages_Surp:      100

The documentation says:

The ``/proc/meminfo`` file provides information about the total number of
persistent hugetlb pages in the kernel's huge page pool.	

As you see, HugePages_Total should the total number of hugetlb pages
instead of "total number of persistent hugetlb pages". This is one the
mistakes the documentation made.
> 
> 
> If I am not wrong, that documentation -- including the usage of the 
> "persistent"
> term -- were introduced in 2009 already:
> 
> commit 267b4c281b4a43c8f3d965c791d3a7fd62448733
> Author: Lee Schermerhorn <lee.schermerhorn@hp.com>
> Date:   Mon Dec 14 17:58:30 2009 -0800
> 
>      hugetlb: update hugetlb documentation for NUMA controls
>      Update the kernel huge tlb documentation to describe the numa memory
>      policy based huge page management.  Additionaly, the patch includes 
> a fair
>      amount of rework to improve consistency, eliminate duplication and 
> set the
>      context for documenting the memory policy interaction.
> So this has been documented behavior for a long time.
> 
>>
>> So, I create the patch 4 to make the documentation more clear.
>>
>> If such subtle inconsistencies result in unexpected behavior, it can be
>> challenging for a system administrator to detect.
> 
> Your patch is changing the traditional, documented behavior to then change
> the documentation to match the new implementation.
I'm not intended to modify a correct api and then change the documentation,
that's totally time-wasting. The API can be easily misused and the
documentation is unclear, that's why I send the patchset.
> 
> How can you be sure nobody relies on exactly that traditional, well 
> documented
> behavior?
Considering no special statement about the inconsistency, I think more
people will just misuse it instead of relying on the minor
inconsistency.
> 
> Why change the behavior of an interface that is kept for backwards 
> compatibility,
> that might still be in use under the assumptions that the behavior is 
> exactly
> like that (and for at least the last 14 years)?
> 
Maybe, you mean "to keep backwards compatibility" ?

Thanks,
Hu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-08  9:13         ` Xueshi Hu
@ 2023-08-10  0:17           ` Mike Kravetz
  2023-08-10  7:34             ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Kravetz @ 2023-08-10  0:17 UTC (permalink / raw)
  To: Xueshi Hu
  Cc: David Hildenbrand, muchun.song, corbet, akpm, n-horiguchi,
	osalvador, linux-mm

On 08/08/23 17:13, Xueshi Hu wrote:
> On 8/8/23 15:58, David Hildenbrand wrote:
> > On 08.08.23 04:28, Xueshi Hu wrote:
> > > On 8/7/23 23:15, David Hildenbrand wrote:
> > > > On 06.08.23 09:48, Xueshi Hu wrote:

Sorry for jumping in late, I was away for a while.

Hu and myself discussed this previously in,
https://lore.kernel.org/linux-mm/20230802182031.GA4762@monkey/T/#r1bdc8eeebafa08699fda5b15f247f3f966ddd090

The documentation around what is displayed with the hugetlb proc/sys
interfaces is at best confusing and at worst wrong in places.

One source of confusion is use of term 'persistent hugetlb pages'.  The
documentation does not define this term.  However, there is this
definition in the code:
#define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)

All of the write/update interfaces modify the number of persistent hugetlb
pages as defined by the code (#define).  Only one read/show interface
displays the number of persistent hugetlb pages as defined by the code
(#define).  That is /proc/sys/vm/nr_hugepages (and sysctl).

When thinking about this more, I am 'guessing' that when the documentation was
originally written the term 'persistent hugetlb pages' did not refer to the
#define in the code.  Rather, it was just the number of allocated hugetlb pages
that 'persisted' until modified by the admin/user.

There is little doubt the documentation could/should be updated.

The question is 'Should we change the /proc/sys/vm/nr_hugepages (and sysctl)
interfaces to be consistent with all the other read/show interfaces?

The argument for changing is that consistency is good.  Why have one interface
that is not like the others?

The reason for not changing is that this is the oldest interface.  The
information/interfaces originally available in /proc were created in /sys.
And, as mentioned in the documentation the /proc interfaces were kept
for backward compatibility.  Unfortunately, the meaning of nr_hugepages
was changed the /sys interfaces were created.  Sigh!!!

In the thread mentioned above, I was in agreement with Hu about changing
/proc/sys/vm/nr_hugepages to be consistent with other read/show interfaces.
Now, I am not sure.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-10  0:17           ` Mike Kravetz
@ 2023-08-10  7:34             ` David Hildenbrand
  2023-08-10 22:15               ` Mike Kravetz
  2023-08-25  4:02               ` Xueshi Hu
  0 siblings, 2 replies; 16+ messages in thread
From: David Hildenbrand @ 2023-08-10  7:34 UTC (permalink / raw)
  To: Mike Kravetz, Xueshi Hu
  Cc: muchun.song, corbet, akpm, n-horiguchi, osalvador, linux-mm

On 10.08.23 02:17, Mike Kravetz wrote:
> On 08/08/23 17:13, Xueshi Hu wrote:
>> On 8/8/23 15:58, David Hildenbrand wrote:
>>> On 08.08.23 04:28, Xueshi Hu wrote:
>>>> On 8/7/23 23:15, David Hildenbrand wrote:
>>>>> On 06.08.23 09:48, Xueshi Hu wrote:
> 
> Sorry for jumping in late, I was away for a while.
> 
> Hu and myself discussed this previously in,
> https://lore.kernel.org/linux-mm/20230802182031.GA4762@monkey/T/#r1bdc8eeebafa08699fda5b15f247f3f966ddd090
> 
> The documentation around what is displayed with the hugetlb proc/sys
> interfaces is at best confusing and at worst wrong in places.
> 
> One source of confusion is use of term 'persistent hugetlb pages'.  The
> documentation does not define this term.  However, there is this
> definition in the code:
> #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
> 
> All of the write/update interfaces modify the number of persistent hugetlb
> pages as defined by the code (#define).  Only one read/show interface
> displays the number of persistent hugetlb pages as defined by the code
> (#define).  That is /proc/sys/vm/nr_hugepages (and sysctl).

Yes.

> 
> When thinking about this more, I am 'guessing' that when the documentation was
> originally written the term 'persistent hugetlb pages' did not refer to the
> #define in the code.  Rather, it was just the number of allocated hugetlb pages
> that 'persisted' until modified by the admin/user.
> 
> There is little doubt the documentation could/should be updated.

Absolutely.

> 
> The question is 'Should we change the /proc/sys/vm/nr_hugepages (and sysctl)
> interfaces to be consistent with all the other read/show interfaces?
> 
> The argument for changing is that consistency is good.  Why have one interface
> that is not like the others?
> 
> The reason for not changing is that this is the oldest interface.  The
> information/interfaces originally available in /proc were created in /sys.
> And, as mentioned in the documentation the /proc interfaces were kept
> for backward compatibility.  Unfortunately, the meaning of nr_hugepages
> was changed the /sys interfaces were created.  Sigh!!!

Indeed, they were designed to be different and to just leave the /proc 
interface alone.

> 
> In the thread mentioned above, I was in agreement with Hu about changing
> /proc/sys/vm/nr_hugepages to be consistent with other read/show interfaces.
> Now, I am not sure.

My take would be to just leave /proc/sys/vm/nr_hugepages alone. Maybe 
pr_warn_once() when the interface is used to guide people away from that 
legacy interface + clarify the docs.

Your call. :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-10  7:34             ` David Hildenbrand
@ 2023-08-10 22:15               ` Mike Kravetz
  2023-08-25  4:02               ` Xueshi Hu
  1 sibling, 0 replies; 16+ messages in thread
From: Mike Kravetz @ 2023-08-10 22:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Xueshi Hu, muchun.song, corbet, akpm, n-horiguchi, osalvador, linux-mm

On 08/10/23 09:34, David Hildenbrand wrote:
> On 10.08.23 02:17, Mike Kravetz wrote:
> > On 08/08/23 17:13, Xueshi Hu wrote:
> > > On 8/8/23 15:58, David Hildenbrand wrote:
> > > > On 08.08.23 04:28, Xueshi Hu wrote:
> > > > > On 8/7/23 23:15, David Hildenbrand wrote:
> > > > > > On 06.08.23 09:48, Xueshi Hu wrote:
> > 
> > The question is 'Should we change the /proc/sys/vm/nr_hugepages (and sysctl)
> > interfaces to be consistent with all the other read/show interfaces?
> > 
> > The argument for changing is that consistency is good.  Why have one interface
> > that is not like the others?
> > 
> > The reason for not changing is that this is the oldest interface.  The
> > information/interfaces originally available in /proc were created in /sys.
> > And, as mentioned in the documentation the /proc interfaces were kept
> > for backward compatibility.  Unfortunately, the meaning of nr_hugepages
> > was changed the /sys interfaces were created.  Sigh!!!
> 
> Indeed, they were designed to be different and to just leave the /proc
> interface alone.
> 

I am not sure if this was the 'design'.  The commit to add the sysfs interfaces
is a3437870160c from 2008.  There is no mention of changing the meaning of
nr_hugepages when read/displayed.

It matters not if this was by design.  It has been this way for 15 years and
has become the expected behavior.

> > 
> > In the thread mentioned above, I was in agreement with Hu about changing
> > /proc/sys/vm/nr_hugepages to be consistent with other read/show interfaces.
> > Now, I am not sure.
> 
> My take would be to just leave /proc/sys/vm/nr_hugepages alone. Maybe
> pr_warn_once() when the interface is used to guide people away from that
> legacy interface + clarify the docs.

Now, I tend to agree that not modifying /proc/sys/vm/nr_hugepages may be
the right thing to do.  I 'know' of a DB that makes extensive use of this
and the corresponding sysctl interface.  A pr_warn_once() may help, but I
still see the warning "Using mlock ulimits for SHM_HUGETLB is obsolete"
in system logs. :(

> Your call. :)

I REALLY would like it if all these interfaces were consistent and showed the
same information.  However, this inconsistency has been there for 15+ years.
And, I know of users making extensive use of /proc/sys/vm/nr_hugepages.

Hu, did you get a report of this inconsistency from a customer/end user?
Or, is this something you and other developers noticed?
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-10  7:34             ` David Hildenbrand
  2023-08-10 22:15               ` Mike Kravetz
@ 2023-08-25  4:02               ` Xueshi Hu
  2023-08-25 21:15                 ` Mike Kravetz
  1 sibling, 1 reply; 16+ messages in thread
From: Xueshi Hu @ 2023-08-25  4:02 UTC (permalink / raw)
  To: David Hildenbrand, Mike Kravetz
  Cc: muchun.song, corbet, akpm, n-horiguchi, osalvador, linux-mm

On 8/10/23 15:34, David Hildenbrand wrote:
> On 10.08.23 02:17, Mike Kravetz wrote:
>> On 08/08/23 17:13, Xueshi Hu wrote:
>>> On 8/8/23 15:58, David Hildenbrand wrote:
>>>> On 08.08.23 04:28, Xueshi Hu wrote:
>>>>> On 8/7/23 23:15, David Hildenbrand wrote:
>>>>>> On 06.08.23 09:48, Xueshi Hu wrote:
>>
>> Sorry for jumping in late, I was away for a while.
>>
>> Hu and myself discussed this previously in,
>> https://lore.kernel.org/linux-mm/20230802182031.GA4762@monkey/T/#r1bdc8eeebafa08699fda5b15f247f3f966ddd090
>>
>> The documentation around what is displayed with the hugetlb proc/sys
>> interfaces is at best confusing and at worst wrong in places.
>>
>> One source of confusion is use of term 'persistent hugetlb pages'.  The
>> documentation does not define this term.  However, there is this
>> definition in the code:
>> #define persistent_huge_pages(h) (h->nr_huge_pages - 
>> h->surplus_huge_pages)
>>
>> All of the write/update interfaces modify the number of persistent 
>> hugetlb
>> pages as defined by the code (#define).  Only one read/show interface
>> displays the number of persistent hugetlb pages as defined by the code
>> (#define).  That is /proc/sys/vm/nr_hugepages (and sysctl).
> 
> Yes.
> 
>>
>> When thinking about this more, I am 'guessing' that when the 
>> documentation was
>> originally written the term 'persistent hugetlb pages' did not refer 
>> to the
>> #define in the code.  Rather, it was just the number of allocated 
>> hugetlb pages
>> that 'persisted' until modified by the admin/user.
>>
>> There is little doubt the documentation could/should be updated.
> 
> Absolutely.
> 
>>
>> The question is 'Should we change the /proc/sys/vm/nr_hugepages (and 
>> sysctl)
>> interfaces to be consistent with all the other read/show interfaces?
>>
>> The argument for changing is that consistency is good.  Why have one 
>> interface
>> that is not like the others?
>>
>> The reason for not changing is that this is the oldest interface.  The
>> information/interfaces originally available in /proc were created in 
>> /sys.
>> And, as mentioned in the documentation the /proc interfaces were kept
>> for backward compatibility.  Unfortunately, the meaning of nr_hugepages
>> was changed the /sys interfaces were created.  Sigh!!!
> 
> Indeed, they were designed to be different and to just leave the /proc 
> interface alone.
> 
>>
>> In the thread mentioned above, I was in agreement with Hu about changing
>> /proc/sys/vm/nr_hugepages to be consistent with other read/show 
>> interfaces.
>> Now, I am not sure.
> 
> My take would be to just leave /proc/sys/vm/nr_hugepages alone. Maybe 
> pr_warn_once() when the interface is used to guide people away from that 
> legacy interface + clarify the docs.
> 
> Your call. :)
> 
Considering /proc/sys/vm/nr_hugepages may be widely used, and it not total
equivalent to the interfaces under /sys. What about just clarifying the
docs?

Thanks,
Hu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-25  4:02               ` Xueshi Hu
@ 2023-08-25 21:15                 ` Mike Kravetz
  2023-08-26  2:51                   ` Xueshi Hu
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Kravetz @ 2023-08-25 21:15 UTC (permalink / raw)
  To: Xueshi Hu
  Cc: David Hildenbrand, muchun.song, corbet, akpm, n-horiguchi,
	osalvador, linux-mm

On 08/25/23 12:02, Xueshi Hu wrote:
> On 8/10/23 15:34, David Hildenbrand wrote:
> > On 10.08.23 02:17, Mike Kravetz wrote:
> > > On 08/08/23 17:13, Xueshi Hu wrote:
> > > > On 8/8/23 15:58, David Hildenbrand wrote:
> > > > > On 08.08.23 04:28, Xueshi Hu wrote:
> > > > > > On 8/7/23 23:15, David Hildenbrand wrote:
> > > > > > > On 06.08.23 09:48, Xueshi Hu wrote:
> > > 
> > > Sorry for jumping in late, I was away for a while.
> > > 
> > > Hu and myself discussed this previously in,
> > > https://lore.kernel.org/linux-mm/20230802182031.GA4762@monkey/T/#r1bdc8eeebafa08699fda5b15f247f3f966ddd090
> > > 
> > > The documentation around what is displayed with the hugetlb proc/sys
> > > interfaces is at best confusing and at worst wrong in places.
> > > 
> > > One source of confusion is use of term 'persistent hugetlb pages'.  The
> > > documentation does not define this term.  However, there is this
> > > definition in the code:
> > > #define persistent_huge_pages(h) (h->nr_huge_pages -
> > > h->surplus_huge_pages)
> > > 
> > > All of the write/update interfaces modify the number of persistent
> > > hugetlb
> > > pages as defined by the code (#define).  Only one read/show interface
> > > displays the number of persistent hugetlb pages as defined by the code
> > > (#define).  That is /proc/sys/vm/nr_hugepages (and sysctl).
> > 
> > Yes.
> > 
> > > 
> > > When thinking about this more, I am 'guessing' that when the
> > > documentation was
> > > originally written the term 'persistent hugetlb pages' did not refer
> > > to the
> > > #define in the code.  Rather, it was just the number of allocated
> > > hugetlb pages
> > > that 'persisted' until modified by the admin/user.
> > > 
> > > There is little doubt the documentation could/should be updated.
> > 
> > Absolutely.
> > 
> > > 
> > > The question is 'Should we change the /proc/sys/vm/nr_hugepages (and
> > > sysctl)
> > > interfaces to be consistent with all the other read/show interfaces?
> > > 
> > > The argument for changing is that consistency is good.  Why have one
> > > interface
> > > that is not like the others?
> > > 
> > > The reason for not changing is that this is the oldest interface.  The
> > > information/interfaces originally available in /proc were created in
> > > /sys.
> > > And, as mentioned in the documentation the /proc interfaces were kept
> > > for backward compatibility.  Unfortunately, the meaning of nr_hugepages
> > > was changed the /sys interfaces were created.  Sigh!!!
> > 
> > Indeed, they were designed to be different and to just leave the /proc
> > interface alone.
> > 
> > > 
> > > In the thread mentioned above, I was in agreement with Hu about changing
> > > /proc/sys/vm/nr_hugepages to be consistent with other read/show
> > > interfaces.
> > > Now, I am not sure.
> > 
> > My take would be to just leave /proc/sys/vm/nr_hugepages alone. Maybe
> > pr_warn_once() when the interface is used to guide people away from that
> > legacy interface + clarify the docs.
> > 
> > Your call. :)
> > 
> Considering /proc/sys/vm/nr_hugepages may be widely used, and it not total
> equivalent to the interfaces under /sys. What about just clarifying the
> docs?

I believe just updating the docs with clarification may be the best approach.
We need to say that /proc/sys/vm/nr_hugepages and sysctl (vm.nr_hugepages)
display the number of persistent hugetlb pages in the default pool.  And, we
should probably define 'persistent' as well.

With that, this patch should be dropped.  Without patch 1, patch 2 is not
necessary.  However, some cleanup (possible elimination) of max_huge_pages
could be done in the future.

Patch 3 is documentation updates which we agree need to be performed.  I can
assist here if you would like.

Patch 4 is most critical as it is a bug fix.  Perhaps this can/should be sent
separately.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages
  2023-08-25 21:15                 ` Mike Kravetz
@ 2023-08-26  2:51                   ` Xueshi Hu
  0 siblings, 0 replies; 16+ messages in thread
From: Xueshi Hu @ 2023-08-26  2:51 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: David Hildenbrand, muchun.song, corbet, akpm, n-horiguchi,
	osalvador, linux-mm

On 8/26/23 05:15, Mike Kravetz wrote:
> On 08/25/23 12:02, Xueshi Hu wrote:
>> On 8/10/23 15:34, David Hildenbrand wrote:
>>> On 10.08.23 02:17, Mike Kravetz wrote:
>>>> On 08/08/23 17:13, Xueshi Hu wrote:
>>>>> On 8/8/23 15:58, David Hildenbrand wrote:
>>>>>> On 08.08.23 04:28, Xueshi Hu wrote:
>>>>>>> On 8/7/23 23:15, David Hildenbrand wrote:
>>>>>>>> On 06.08.23 09:48, Xueshi Hu wrote:
>>>>
>>>> Sorry for jumping in late, I was away for a while.
>>>>
>>>> Hu and myself discussed this previously in,
>>>> https://lore.kernel.org/linux-mm/20230802182031.GA4762@monkey/T/#r1bdc8eeebafa08699fda5b15f247f3f966ddd090
>>>>
>>>> The documentation around what is displayed with the hugetlb proc/sys
>>>> interfaces is at best confusing and at worst wrong in places.
>>>>
>>>> One source of confusion is use of term 'persistent hugetlb pages'.  The
>>>> documentation does not define this term.  However, there is this
>>>> definition in the code:
>>>> #define persistent_huge_pages(h) (h->nr_huge_pages -
>>>> h->surplus_huge_pages)
>>>>
>>>> All of the write/update interfaces modify the number of persistent
>>>> hugetlb
>>>> pages as defined by the code (#define).  Only one read/show interface
>>>> displays the number of persistent hugetlb pages as defined by the code
>>>> (#define).  That is /proc/sys/vm/nr_hugepages (and sysctl).
>>>
>>> Yes.
>>>
>>>>
>>>> When thinking about this more, I am 'guessing' that when the
>>>> documentation was
>>>> originally written the term 'persistent hugetlb pages' did not refer
>>>> to the
>>>> #define in the code.  Rather, it was just the number of allocated
>>>> hugetlb pages
>>>> that 'persisted' until modified by the admin/user.
>>>>
>>>> There is little doubt the documentation could/should be updated.
>>>
>>> Absolutely.
>>>
>>>>
>>>> The question is 'Should we change the /proc/sys/vm/nr_hugepages (and
>>>> sysctl)
>>>> interfaces to be consistent with all the other read/show interfaces?
>>>>
>>>> The argument for changing is that consistency is good.  Why have one
>>>> interface
>>>> that is not like the others?
>>>>
>>>> The reason for not changing is that this is the oldest interface.  The
>>>> information/interfaces originally available in /proc were created in
>>>> /sys.
>>>> And, as mentioned in the documentation the /proc interfaces were kept
>>>> for backward compatibility.  Unfortunately, the meaning of nr_hugepages
>>>> was changed the /sys interfaces were created.  Sigh!!!
>>>
>>> Indeed, they were designed to be different and to just leave the /proc
>>> interface alone.
>>>
>>>>
>>>> In the thread mentioned above, I was in agreement with Hu about changing
>>>> /proc/sys/vm/nr_hugepages to be consistent with other read/show
>>>> interfaces.
>>>> Now, I am not sure.
>>>
>>> My take would be to just leave /proc/sys/vm/nr_hugepages alone. Maybe
>>> pr_warn_once() when the interface is used to guide people away from that
>>> legacy interface + clarify the docs.
>>>
>>> Your call. :)
>>>
>> Considering /proc/sys/vm/nr_hugepages may be widely used, and it not total
>> equivalent to the interfaces under /sys. What about just clarifying the
>> docs?
Thanks for your patient reply.
> 
> I believe just updating the docs with clarification may be the best approach.
> We need to say that /proc/sys/vm/nr_hugepages and sysctl (vm.nr_hugepages)
> display the number of persistent hugetlb pages in the default pool.  And, we
> should probably define 'persistent' as well.
> 
> With that, this patch should be dropped.  Without patch 1, patch 2 is not
> necessary.  However, some cleanup (possible elimination) of max_huge_pages
> could be done in the future.
A modified version will sent later.
> 
> Patch 3 is documentation updates which we agree need to be performed.  I can
> assist here if you would like.
I would greatly appreciate your assistance.
> 
> Patch 4 is most critical as it is a bug fix.  Perhaps this can/should be sent
> separately.
Ok, I'll send it separately.

Thanks,
Hu


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-26  2:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-06  7:48 [PATCH v2 0/4] mm/hugetlb: fix /sys and /proc fs dealing with persistent hugepages Xueshi Hu
2023-08-06  7:48 ` [PATCH v2 1/4] mm/hugetlb: fix the inconsistency of /proc/sys/vm/nr_huge_pages Xueshi Hu
2023-08-07 15:15   ` David Hildenbrand
2023-08-08  2:28     ` Xueshi Hu
2023-08-08  7:58       ` David Hildenbrand
2023-08-08  9:13         ` Xueshi Hu
2023-08-10  0:17           ` Mike Kravetz
2023-08-10  7:34             ` David Hildenbrand
2023-08-10 22:15               ` Mike Kravetz
2023-08-25  4:02               ` Xueshi Hu
2023-08-25 21:15                 ` Mike Kravetz
2023-08-26  2:51                   ` Xueshi Hu
2023-08-06  7:48 ` [PATCH v2 2/4] mm/hugeltb: clean up hstate::max_huge_pages Xueshi Hu
2023-08-07 15:17   ` David Hildenbrand
2023-08-06  7:48 ` [PATCH v2 3/4] mm/hugeltb: fix nodes huge page allocation when there are surplus pages Xueshi Hu
2023-08-06  7:48 ` [PATCH v2 4/4] docs: hugetlbpage.rst: make the meaning of persistent hugetlb pages clear Xueshi Hu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox