[PATCH] mm: Reduce memory bloat with THP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: Reduce memory bloat with THP
@ 2017-12-15  1:28 Nitin Gupta
  2017-12-15  5:55 ` Anshuman Khandual
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Nitin Gupta @ 2017-12-15  1:28 UTC (permalink / raw)
  To: linux-mm
  Cc: steven.sistare, Nitin Gupta, Andrew Morton, Ingo Molnar,
	Mel Gorman, Nadav Amit, Minchan Kim, Kirill A. Shutemov,
	Peter Zijlstra, Vegard Nossum, Levin, Alexander (Sasha Levin),
	Michal Hocko, David Rientjes, Vlastimil Babka, SeongJae Park,
	Shaohua Li, Aneesh Kumar K.V, Andrea Arcangeli, Mike Rapoport,
	Anshuman Khandual, Rik van Riel, Ross Zwisler, Jan Kara,
	Dave Jiang, Jérôme Glisse, Matthew Wilcox,
	Hugh Dickins, Tobin C Harding, open list

Currently, if the THP enabled policy is "always", or the mode
is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage
is allocated on a page fault if the pud or pmd is empty.  This
yields the best VA translation performance, but increases memory
consumption if some small page ranges within the huge page are
never accessed.

An alternate behavior for such page faults is to install a
hugepage only when a region is actually found to be (almost)
fully mapped and active.  This is a compromise between
translation performance and memory consumption.  Currently there
is no way for an application to choose this compromise for the
page fault conditions above.

With this change, when an application issues MADV_DONTNEED on a
memory region, the region is marked as "space-efficient". For
such regions, a hugepage is not immediately allocated on first
write.  Instead, it is left to the khugepaged thread to do
delayed hugepage promotion depending on whether the region is
actually mapped and active. When application issues
MADV_HUGEPAGE, the region is marked again as non-space-efficient
wherein hugepage is allocated on first touch.

Orabug: 26910556

Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
 include/linux/mm_types.h | 1 +
 mm/khugepaged.c          | 1 +
 mm/madvise.c             | 1 +
 mm/memory.c              | 6 ++++--
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4..6d0783a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -339,6 +339,7 @@ struct vm_area_struct {
 	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
 #endif
 	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
+	bool space_efficient;
 } __randomize_layout;
 
 struct core_thread {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ea4ff25..2f4037a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -319,6 +319,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 #endif
 		*vm_flags &= ~VM_NOHUGEPAGE;
 		*vm_flags |= VM_HUGEPAGE;
+		vma->space_efficient = false;
 		/*
 		 * If the vma become good for khugepaged to scan,
 		 * register it here without waiting a page fault that
diff --git a/mm/madvise.c b/mm/madvise.c
index 751e97a..b2ec07b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -508,6 +508,7 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
 					unsigned long start, unsigned long end)
 {
 	zap_page_range(vma, start, end - start);
+	vma->space_efficient = true;
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 5eb3d25..6485014 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4001,7 +4001,8 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 	vmf.pud = pud_alloc(mm, p4d, address);
 	if (!vmf.pud)
 		return VM_FAULT_OOM;
-	if (pud_none(*vmf.pud) && transparent_hugepage_enabled(vma)) {
+	if (pud_none(*vmf.pud) && transparent_hugepage_enabled(vma)
+		&& !vma->space_efficient) {
 		ret = create_huge_pud(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
 			return ret;
@@ -4027,7 +4028,8 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 	vmf.pmd = pmd_alloc(mm, vmf.pud, address);
 	if (!vmf.pmd)
 		return VM_FAULT_OOM;
-	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
+	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)
+		&& !vma->space_efficient) {
 		ret = create_huge_pmd(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
 			return ret;
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Reduce memory bloat with THP
  2017-12-15  1:28 [PATCH] mm: Reduce memory bloat with THP Nitin Gupta
@ 2017-12-15  5:55 ` Anshuman Khandual
  2017-12-16  7:18   ` Nitin Gupta
  2017-12-15 10:00 ` Kirill A. Shutemov
  2017-12-15 10:01 ` Kirill A. Shutemov
  2 siblings, 1 reply; 8+ messages in thread
From: Anshuman Khandual @ 2017-12-15  5:55 UTC (permalink / raw)
  To: Nitin Gupta, linux-mm
  Cc: steven.sistare,
	Andrew Morton
	(commit_signer:14/26=54%,commit_signer:10/16=62%,commit_signer:24/26=92%,commit_signer:48/63=76%),
	Ingo Molnar
	(commit_signer:6/26=23%,authored:4/26=15%,added_lines:17/189=9%,removed_lines:52/150=35%,authored:2/16=12%,added_lines:2/25=8%,authored:4/63=6%),
	Mel Gorman (commit_signer:5/26=19%,authored:2/26=8%),
	Nadav Amit
	(commit_signer:5/26=19%,authored:2/26=8%,added_lines:32/189=17%,removed_lines:13/150=9%),
	Minchan Kim
	(commit_signer:4/26=15%,authored:3/26=12%,added_lines:14/189=7%,removed_lines:21/150=14%,removed_lines:2/40=5%,commit_signer:5/26=19%,authored:4/63=6%,added_lines:83/883=9%,removed_lines:34/354=10%),
	Kirill A. Shutemov
	(authored:3/26=12%,commit_signer:4/16=25%,authored:2/16=12%,commit_signer:12/63=19%,authored:8/63=13%,added_lines:214/883=24%,removed_lines:56/354=16%),
	Peter Zijlstra
	(authored:2/26=8%,added_lines:72/189=38%,removed_lines:39/150=26%),
	Vegard Nossum (added_lines:21/189=11%),
	Levin, Alexander (Sasha Levin) (removed_lines:8/150=5%),
	Michal Hocko
	(commit_signer:7/16=44%,authored:2/16=12%,added_lines:4/25=16%,removed_lines:4/40=10%,commit_signer:7/26=27%,commit_signer:15/63=24%,removed_lines:32/354=9%),
	David Rientjes
	(commit_signer:3/16=19%,authored:2/16=12%,added_lines:3/25=12%,removed_lines:5/40=12%,added_lines:42/189=22%,removed_lines:9/73=12%),
	Vlastimil Babka (commit_signer:3/16=19%),
	SeongJae Park (authored:1/16=6%,added_lines:3/25=12%),
	Shaohua Li
	(added_lines:3/25=12%,removed_lines:5/40=12%,authored:4/26=15%,removed_lines:11/73=15%),
	Aneesh Kumar K.V (removed_lines:19/40=48%),
	Andrea Arcangeli
	(commit_signer:5/26=19%,authored:2/26=8%,added_lines:42/189=22%,removed_lines:4/73=5%),
	Mike Rapoport
	(commit_signer:5/26=19%,authored:3/26=12%,added_lines:24/189=13%,removed_lines:21/73=29%),
	Anshuman Khandual
	(authored:2/26=8%,added_lines:29/189=15%,removed_lines:18/73=25%),
	Rik van Riel (added_lines:13/189=7%),
	Ross Zwisler
	(commit_signer:8/63=13%,authored:4/63=6%,added_lines:105/883=12%),
	Jan Kara (commit_signer:7/63=11%), Dave Jiang (authored:5/63=8%),
	Jérôme Glisse (added_lines:128/883=14%),
	Matthew Wilcox (added_lines:81/883=9%),
	Hugh Dickins (removed_lines:65/354=18%),
	Tobin C Harding (removed_lines:34/354=10%),
	open list

On 12/15/2017 06:58 AM, Nitin Gupta wrote:
> Currently, if the THP enabled policy is "always", or the mode
> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage
> is allocated on a page fault if the pud or pmd is empty.  This
> yields the best VA translation performance, but increases memory
> consumption if some small page ranges within the huge page are
> never accessed.

Right, thats as per design.

> 
> An alternate behavior for such page faults is to install a
> hugepage only when a region is actually found to be (almost)
> fully mapped and active.  This is a compromise between

That is the async method by analyzing page table segment for
the process by khugepaged and evaluate if a huge page can be
installed replacing the existing pages.

> translation performance and memory consumption.  Currently there
> is no way for an application to choose this compromise for the
> page fault conditions above.

Cant we mark the THP enablement mode as "madvise", then switch
between MADV_HUGEPAGE/MADV_NOHUGEPAGE to implement this ?

> 
> With this change, when an application issues MADV_DONTNEED on a
> memory region, the region is marked as "space-efficient". For

Isn't it that MADV_DONTNEED should be used for a region where
there are already pages faulted in and page table populated ?
Are you suggesting that MADV_DONTNEED should be called upon
a region just after creation to control it's fault behavior ?
Thats not what MADV_DONTNEED was meant to be.

> such regions, a hugepage is not immediately allocated on first
> write.  Instead, it is left to the khugepaged thread to do
> delayed hugepage promotion depending on whether the region is
> actually mapped and active. When application issues
> MADV_HUGEPAGE, the region is marked again as non-space-efficient
> wherein hugepage is allocated on first touch

But MADV_HUGEPAGE/MADV_NOHUGEPAGE combination should do the trick
as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Reduce memory bloat with THP
  2017-12-15  1:28 [PATCH] mm: Reduce memory bloat with THP Nitin Gupta
  2017-12-15  5:55 ` Anshuman Khandual
@ 2017-12-15 10:00 ` Kirill A. Shutemov
  2017-12-16  7:04   ` Nitin Gupta
  2017-12-15 10:01 ` Kirill A. Shutemov
  2 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2017-12-15 10:00 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: linux-mm, steven.sistare, Andrew Morton, Ingo Molnar, Mel Gorman,
	Nadav Amit, Minchan Kim, Kirill A. Shutemov, Peter Zijlstra,
	Vegard Nossum, Levin, Alexander (Sasha Levin),
	Michal Hocko, David Rientjes, Vlastimil Babka, SeongJae Park,
	Shaohua Li, Aneesh Kumar K.V, Andrea Arcangeli, Mike Rapoport,
	Anshuman Khandual, Rik van Riel, Ross Zwisler, Jan Kara,
	Dave Jiang, Jérôme Glisse, Matthew Wilcox,
	Hugh Dickins, Tobin C Harding, open list

On Thu, Dec 14, 2017 at 05:28:52PM -0800, Nitin Gupta wrote:
> Currently, if the THP enabled policy is "always", or the mode
> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage
> is allocated on a page fault if the pud or pmd is empty.  This
> yields the best VA translation performance, but increases memory
> consumption if some small page ranges within the huge page are
> never accessed.
> 
> An alternate behavior for such page faults is to install a
> hugepage only when a region is actually found to be (almost)
> fully mapped and active.  This is a compromise between
> translation performance and memory consumption.  Currently there
> is no way for an application to choose this compromise for the
> page fault conditions above.
> 
> With this change, when an application issues MADV_DONTNEED on a
> memory region, the region is marked as "space-efficient". For
> such regions, a hugepage is not immediately allocated on first
> write.  Instead, it is left to the khugepaged thread to do
> delayed hugepage promotion depending on whether the region is
> actually mapped and active. When application issues
> MADV_HUGEPAGE, the region is marked again as non-space-efficient
> wherein hugepage is allocated on first touch.

I think this would be NAK. At least in this form.

What performance testing have you done? Any numbers?

Making whole vma "space_efficient" just because somebody freed one page
from it is just wrong. And there's no way back after this.

> 
> Orabug: 26910556

Wat?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Reduce memory bloat with THP
  2017-12-15  1:28 [PATCH] mm: Reduce memory bloat with THP Nitin Gupta
  2017-12-15  5:55 ` Anshuman Khandual
  2017-12-15 10:00 ` Kirill A. Shutemov
@ 2017-12-15 10:01 ` Kirill A. Shutemov
  2017-12-16  7:21   ` Nitin Gupta
  2 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2017-12-15 10:01 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: linux-mm, steven.sistare, Andrew Morton, Ingo Molnar, Mel Gorman,
	Nadav Amit, Minchan Kim, Kirill A. Shutemov, Peter Zijlstra,
	Vegard Nossum, Levin, Alexander (Sasha Levin),
	Michal Hocko, David Rientjes, Vlastimil Babka, SeongJae Park,
	Shaohua Li, Aneesh Kumar K.V, Andrea Arcangeli, Mike Rapoport,
	Anshuman Khandual, Rik van Riel, Ross Zwisler, Jan Kara,
	Dave Jiang, Jérôme Glisse, Matthew Wilcox,
	Hugh Dickins, Tobin C Harding, open list

On Thu, Dec 14, 2017 at 05:28:52PM -0800, Nitin Gupta wrote:
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 751e97a..b2ec07b 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -508,6 +508,7 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
>  					unsigned long start, unsigned long end)
>  {
>  	zap_page_range(vma, start, end - start);
> +	vma->space_efficient = true;
>  	return 0;
>  }
>  

And this modifies vma without down_write(mmap_sem).

No.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Reduce memory bloat with THP
  2017-12-15 10:00 ` Kirill A. Shutemov
@ 2017-12-16  7:04   ` Nitin Gupta
  2017-12-18 13:53     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Nitin Gupta @ 2017-12-16  7:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, steven.sistare, Andrew Morton, Ingo Molnar, Mel Gorman,
	Nadav Amit, Minchan Kim, Kirill A. Shutemov, Peter Zijlstra,
	Vegard Nossum, Levin, Alexander (Sasha Levin),
	Michal Hocko, David Rientjes, Vlastimil Babka, SeongJae Park,
	Shaohua Li, Aneesh Kumar K.V, Andrea Arcangeli, Mike Rapoport,
	Anshuman Khandual, Rik van Riel, Ross Zwisler, Jan Kara,
	Dave Jiang, Jérôme Glisse, Matthew Wilcox,
	Hugh Dickins, Tobin C Harding, open list

On 12/15/17 2:00 AM, Kirill A. Shutemov wrote:
> On Thu, Dec 14, 2017 at 05:28:52PM -0800, Nitin Gupta wrote:
>> Currently, if the THP enabled policy is "always", or the mode
>> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage
>> is allocated on a page fault if the pud or pmd is empty.  This
>> yields the best VA translation performance, but increases memory
>> consumption if some small page ranges within the huge page are
>> never accessed.
>>
>> An alternate behavior for such page faults is to install a
>> hugepage only when a region is actually found to be (almost)
>> fully mapped and active.  This is a compromise between
>> translation performance and memory consumption.  Currently there
>> is no way for an application to choose this compromise for the
>> page fault conditions above.
>>
>> With this change, when an application issues MADV_DONTNEED on a
>> memory region, the region is marked as "space-efficient". For
>> such regions, a hugepage is not immediately allocated on first
>> write.  Instead, it is left to the khugepaged thread to do
>> delayed hugepage promotion depending on whether the region is
>> actually mapped and active. When application issues
>> MADV_HUGEPAGE, the region is marked again as non-space-efficient
>> wherein hugepage is allocated on first touch.
> 
> I think this would be NAK. At least in this form.
> 
> What performance testing have you done? Any numbers?
> 

I wrote a throw-away code which mmaps 128G area and writes to a random
address in a loop. Together with writes, madvise(MADV_DONTNEED) are
issued at another random addresses. Writes are issued with 70%
probability and DONTNEED with 30%. With this test, I'm trying to emulate
workload of a large in-memory hash-table.

With the patch, I see that memory bloat is much less severe.
I've uploaded the test program with the memory usage plot here:

https://gist.github.com/nitingupta910/42ddf969e17556d74a14fbd84640ddb3

THP was set to 'always' mode in both cases but the result would be the
same if madvise mode was used instead.

> Making whole vma "space_efficient" just because somebody freed one page
> from it is just wrong. And there's no way back after this.
>

I'm using MADV_DONTNEED as a hint that although user wants to
transparently use hugepages but at the same time wants to be more
conservative with respect to memory usage. If a MADV_HUGEPAGE is issued
for a VMA range after any DONTNEEDs then the space_efficient bit is
again cleared, so we revert back to allocating hugepage on fault on
empty pud/pmd.

>>
>> Orabug: 26910556
> 
> Wat?
> 

It's oracle internal identifier used to track this work.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Reduce memory bloat with THP
  2017-12-15  5:55 ` Anshuman Khandual
@ 2017-12-16  7:18   ` Nitin Gupta
  0 siblings, 0 replies; 8+ messages in thread
From: Nitin Gupta @ 2017-12-16  7:18 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: steven.sistare,
	Andrew Morton
	(commit_signer:14/26=54%,commit_signer:10/16=62%,commit_signer:24/26=92%,commit_signer:48/63=76%),
	Ingo Molnar
	(commit_signer:6/26=23%,authored:4/26=15%,added_lines:17/189=9%,removed_lines:52/150=35%,authored:2/16=12%,added_lines:2/25=8%,authored:4/63=6%),
	Mel Gorman (commit_signer:5/26=19%,authored:2/26=8%),
	Nadav Amit
	(commit_signer:5/26=19%,authored:2/26=8%,added_lines:32/189=17%,removed_lines:13/150=9%),
	Minchan Kim
	(commit_signer:4/26=15%,authored:3/26=12%,added_lines:14/189=7%,removed_lines:21/150=14%,removed_lines:2/40=5%,commit_signer:5/26=19%,authored:4/63=6%,added_lines:83/883=9%,removed_lines:34/354=10%),
	Kirill A. Shutemov
	(authored:3/26=12%,commit_signer:4/16=25%,authored:2/16=12%,commit_signer:12/63=19%,authored:8/63=13%,added_lines:214/883=24%,removed_lines:56/354=16%),
	Peter Zijlstra
	(authored:2/26=8%,added_lines:72/189=38%,removed_lines:39/150=26%),
	Vegard Nossum (added_lines:21/189=11%),
	Levin, Alexander (Sasha Levin) (removed_lines:8/150=5%),
	Michal Hocko
	(commit_signer:7/16=44%,authored:2/16=12%,added_lines:4/25=16%,removed_lines:4/40=10%,commit_signer:7/26=27%,commit_signer:15/63=24%,removed_lines:32/354=9%),
	David Rientjes
	(commit_signer:3/16=19%,authored:2/16=12%,added_lines:3/25=12%,removed_lines:5/40=12%,added_lines:42/189=22%,removed_lines:9/73=12%),
	Vlastimil Babka (commit_signer:3/16=19%),
	SeongJae Park (authored:1/16=6%,added_lines:3/25=12%),
	Shaohua Li
	(added_lines:3/25=12%,removed_lines:5/40=12%,authored:4/26=15%,removed_lines:11/73=15%),
	Aneesh Kumar K.V (removed_lines:19/40=48%),
	Andrea Arcangeli
	(commit_signer:5/26=19%,authored:2/26=8%,added_lines:42/189=22%,removed_lines:4/73=5%),
	Mike Rapoport
	(commit_signer:5/26=19%,authored:3/26=12%,added_lines:24/189=13%,removed_lines:21/73=29%),
	Rik van Riel (added_lines:13/189=7%),
	Ross Zwisler
	(commit_signer:8/63=13%,authored:4/63=6%,added_lines:105/883=12%),
	Jan Kara (commit_signer:7/63=11%), Dave Jiang (authored:5/63=8%),
	Jérôme Glisse (added_lines:128/883=14%),
	Matthew Wilcox (added_lines:81/883=9%),
	Hugh Dickins (removed_lines:65/354=18%),
	Tobin C Harding (removed_lines:34/354=10%),
	open list

On 12/14/17 9:55 PM, Anshuman Khandual wrote:
> On 12/15/2017 06:58 AM, Nitin Gupta wrote:
>> Currently, if the THP enabled policy is "always", or the mode
>> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage
>> is allocated on a page fault if the pud or pmd is empty.  This
>> yields the best VA translation performance, but increases memory
>> consumption if some small page ranges within the huge page are
>> never accessed.
> 
> Right, thats as per design.
> 
>>
>> An alternate behavior for such page faults is to install a
>> hugepage only when a region is actually found to be (almost)
>> fully mapped and active.  This is a compromise between
> 
> That is the async method by analyzing page table segment for
> the process by khugepaged and evaluate if a huge page can be
> installed replacing the existing pages.
> 
>> translation performance and memory consumption.  Currently there
>> is no way for an application to choose this compromise for the
>> page fault conditions above.
> 
> Cant we mark the THP enablement mode as "madvise", then switch
> between MADV_HUGEPAGE/MADV_NOHUGEPAGE to implement this ?
> 

Asking applications to issue MADV_HUGEPAGE/NOHUGEPAGE would make THP
much less 'automatic'. With such a scheme applications would have to
track mapping and active status of each hugepage region and manually
issue MADV_HUGEPAGE again to let khugepaged back it with a hugepage.

Compare above with the approach used by this patch: MADV_DONTNEED is
taken as a hint that application still wants transparent hugepages but
wants to be more conservative with memory usage. khugepaged is still
free to collapse pages as it sees fit without explicit
HUGEPAGE/NOHUGEPAGE madvise hints.

>>
>> With this change, when an application issues MADV_DONTNEED on a
>> memory region, the region is marked as "space-efficient". For
> 
> Isn't it that MADV_DONTNEED should be used for a region where
> there are already pages faulted in and page table populated ?
> Are you suggesting that MADV_DONTNEED should be called upon
> a region just after creation to control it's fault behavior ?
> Thats not what MADV_DONTNEED was meant to be.
> 

No, I'm not suggesting MADV_DONTNEED be called on empty region. The
patch just uses these calls, whenever they are made, as a hint to be
more conservative with memory usage for that vma.

>> such regions, a hugepage is not immediately allocated on first
>> write.  Instead, it is left to the khugepaged thread to do
>> delayed hugepage promotion depending on whether the region is
>> actually mapped and active. When application issues
>> MADV_HUGEPAGE, the region is marked again as non-space-efficient
>> wherein hugepage is allocated on first touch
> 
> But MADV_HUGEPAGE/MADV_NOHUGEPAGE combination should do the trick
> as well.
> 

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Reduce memory bloat with THP
  2017-12-15 10:01 ` Kirill A. Shutemov
@ 2017-12-16  7:21   ` Nitin Gupta
  0 siblings, 0 replies; 8+ messages in thread
From: Nitin Gupta @ 2017-12-16  7:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, steven.sistare, Andrew Morton, Ingo Molnar, Mel Gorman,
	Nadav Amit, Minchan Kim, Kirill A. Shutemov, Peter Zijlstra,
	Vegard Nossum, Levin, Alexander (Sasha Levin),
	Michal Hocko, David Rientjes, Vlastimil Babka, SeongJae Park,
	Shaohua Li, Aneesh Kumar K.V, Andrea Arcangeli, Mike Rapoport,
	Anshuman Khandual, Rik van Riel, Ross Zwisler, Jan Kara,
	Dave Jiang, Jérôme Glisse, Matthew Wilcox,
	Hugh Dickins, Tobin C Harding, open list

On 12/15/17 2:01 AM, Kirill A. Shutemov wrote:
> On Thu, Dec 14, 2017 at 05:28:52PM -0800, Nitin Gupta wrote:
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 751e97a..b2ec07b 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -508,6 +508,7 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
>>  					unsigned long start, unsigned long end)
>>  {
>>  	zap_page_range(vma, start, end - start);
>> +	vma->space_efficient = true;
>>  	return 0;
>>  }
>>  
> 
> And this modifies vma without down_write(mmap_sem).
> 

I thought this function was always called with mmmap_sem write locked.
I will check again.

- Nitin


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Reduce memory bloat with THP
  2017-12-16  7:04   ` Nitin Gupta
@ 2017-12-18 13:53     ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-12-18 13:53 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Kirill A. Shutemov, linux-mm, steven.sistare, Andrew Morton,
	Ingo Molnar, Mel Gorman, Nadav Amit, Minchan Kim,
	Kirill A. Shutemov, Vegard Nossum, Levin, Alexander (Sasha Levin),
	Michal Hocko, David Rientjes, Vlastimil Babka, SeongJae Park,
	Shaohua Li, Aneesh Kumar K.V, Andrea Arcangeli, Mike Rapoport,
	Anshuman Khandual, Rik van Riel, Ross Zwisler, Jan Kara,
	Dave Jiang, Jérôme Glisse, Matthew Wilcox,
	Hugh Dickins, Tobin C Harding, open list

On Fri, Dec 15, 2017 at 11:04:03PM -0800, Nitin Gupta wrote:
> >> Orabug: 26910556
> > 
> > Wat?
> > 
> 
> It's oracle internal identifier used to track this work.

And as such has no place what so ever outside of oracle. Do not include
junk like that in upstream patches.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-18 13:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-15  1:28 [PATCH] mm: Reduce memory bloat with THP Nitin Gupta
2017-12-15  5:55 ` Anshuman Khandual
2017-12-16  7:18   ` Nitin Gupta
2017-12-15 10:00 ` Kirill A. Shutemov
2017-12-16  7:04   ` Nitin Gupta
2017-12-18 13:53     ` Peter Zijlstra
2017-12-15 10:01 ` Kirill A. Shutemov
2017-12-16  7:21   ` Nitin Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox