linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED)
       [not found] <bug-202089-27@https.bugzilla.kernel.org/>
@ 2018-12-29 20:53 ` Andrew Morton
  2018-12-29 22:48   ` Kirill A. Shutemov
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-12-29 20:53 UTC (permalink / raw)
  To: linux-mm; +Cc: bugzilla-daemon, jianpanlanyue


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=202089
> 
>             Bug ID: 202089
>            Summary: transparent hugepage not compatable with
>                     madvise(MADV_DONTNEED)
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.4.0-117
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: jianpanlanyue@163.com
>         Regression: No
> 
> environment:  
>   1.kernel 4.4.0 on x86_64
>   2.echo always > /sys/kernel/mm/transparent_hugepage/enable
>     echo always > /sys/kernel/mm/transparent_hugepage/defrag
>     echo 2000000 > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
> ( faster defrag pages to reproduce problem)
> 
> problem: 
>   1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G).
>   2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve a
> few pages(by if(i%33==0) continue;), then process's physical memory firstly
> come down, but after a few seconds, it rise back to 2G again, and can't come
> down forever.
>   3. if i delete this condition(if(i%33==0) continue;) or disable
> transparent_hugepage by setting 'enable' and 'defrag' to never, all go well and
> the physical memory can come down expectly.
> 
>   It seems like transparent_hugepage has problems with non-contiguous
> madvise(MADV_DONTEED).
> 
> 
> Belows is the test code:
> 
> #include <stdio.h>
> #include <memory.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <errno.h>
> #include <assert.h>
> 
> #define PAGE_SIZE 4096
> #define PAGE_COUNT 1024*512
> int main()
> {
>   void** table = (void**)malloc(sizeof(void*) * PAGE_COUNT);
>   printf("begin mmap...\n");
> 
>   for (int i=0; i<PAGE_COUNT; i++) {
>     table[i] = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1 ,0);
>     assert(table[i] != MAP_FAILED);
>     memset(table[i], 1, PAGE_SIZE);
>   }
> 
>   printf("mmap ok, press enter to free most of them\n");
>   getchar();
> 
>   //it behaves not expectly: after most pages freed, thp make it rise to 2G
> again
>   for(int i=0; i<PAGE_COUNT; i++) {
>     if (i%33==0) continue;
>     if (madvise(table[i], PAGE_SIZE, MADV_DONTNEED) != 0)
>       printf("madvise error, errno:%d\n", errno);
>   }
> 
>   printf("munmap finish\n");
>   free(table);
>   getchar();
>   getchar();
> }
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED)
  2018-12-29 20:53 ` [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED) Andrew Morton
@ 2018-12-29 22:48   ` Kirill A. Shutemov
  2019-01-03  9:44     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-12-29 22:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, bugzilla-daemon, jianpanlanyue

On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=202089
> > 
> >             Bug ID: 202089
> >            Summary: transparent hugepage not compatable with
> >                     madvise(MADV_DONTNEED)
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 4.4.0-117
> >           Hardware: x86-64
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: jianpanlanyue@163.com
> >         Regression: No
> > 
> > environment:  
> >   1.kernel 4.4.0 on x86_64
> >   2.echo always > /sys/kernel/mm/transparent_hugepage/enable
> >     echo always > /sys/kernel/mm/transparent_hugepage/defrag
> >     echo 2000000 > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
> > ( faster defrag pages to reproduce problem)
> > 
> > problem: 
> >   1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G).
> >   2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve a
> > few pages(by if(i%33==0) continue;), then process's physical memory firstly
> > come down, but after a few seconds, it rise back to 2G again, and can't come
> > down forever.
> >   3. if i delete this condition(if(i%33==0) continue;) or disable
> > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well and
> > the physical memory can come down expectly.
> > 
> >   It seems like transparent_hugepage has problems with non-contiguous
> > madvise(MADV_DONTEED).

It's expected behaviour.

MADV_DONTNEED doesn't guarantee that the range will not be repopulated
(with or without direct action on application behalf). It's just a hint
for the kernel.

For sparse mappings, consider using MADV_NOHUGEPAGE.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED)
  2018-12-29 22:48   ` Kirill A. Shutemov
@ 2019-01-03  9:44     ` Michal Hocko
  2019-01-03 14:35       ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2019-01-03  9:44 UTC (permalink / raw)
  To: Kirill A. Shutemov, jianpanlanyue
  Cc: Andrew Morton, linux-mm, bugzilla-daemon

On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote:
> On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=202089
> > > 
> > >             Bug ID: 202089
> > >            Summary: transparent hugepage not compatable with
> > >                     madvise(MADV_DONTNEED)
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 4.4.0-117
> > >           Hardware: x86-64
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: high
> > >           Priority: P1
> > >          Component: Other
> > >           Assignee: akpm@linux-foundation.org
> > >           Reporter: jianpanlanyue@163.com
> > >         Regression: No
> > > 
> > > environment:  
> > >   1.kernel 4.4.0 on x86_64
> > >   2.echo always > /sys/kernel/mm/transparent_hugepage/enable
> > >     echo always > /sys/kernel/mm/transparent_hugepage/defrag
> > >     echo 2000000 > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
> > > ( faster defrag pages to reproduce problem)
> > > 
> > > problem: 
> > >   1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G).
> > >   2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve a
> > > few pages(by if(i%33==0) continue;), then process's physical memory firstly
> > > come down, but after a few seconds, it rise back to 2G again, and can't come
> > > down forever.
> > >   3. if i delete this condition(if(i%33==0) continue;) or disable
> > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well and
> > > the physical memory can come down expectly.
> > > 
> > >   It seems like transparent_hugepage has problems with non-contiguous
> > > madvise(MADV_DONTEED).
> 
> It's expected behaviour.
> 
> MADV_DONTNEED doesn't guarantee that the range will not be repopulated
> (with or without direct action on application behalf). It's just a hint
> for the kernel.

I agree with Kirill here but I would be interested in the underlying
usecase that triggered this. The test case is clearly artificial but is
any userspace actually relying on MADV_DONTNEED reducing the rss
longterm?

> For sparse mappings, consider using MADV_NOHUGEPAGE.

Yes or use a high threshold for khugepaged for collapsing.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED)
  2019-01-03  9:44     ` Michal Hocko
@ 2019-01-03 14:35       ` Matthew Wilcox
  2019-01-03 14:41         ` Michal Hocko
  2019-01-03 14:53         ` Kirill A. Shutemov
  0 siblings, 2 replies; 6+ messages in thread
From: Matthew Wilcox @ 2019-01-03 14:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, jianpanlanyue, Andrew Morton, linux-mm,
	bugzilla-daemon

On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote:
> On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote:
> > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> > > >   1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G).
> > > >   2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve a
> > > > few pages(by if(i%33==0) continue;), then process's physical memory firstly
> > > > come down, but after a few seconds, it rise back to 2G again, and can't come
> > > > down forever.
> > > >   3. if i delete this condition(if(i%33==0) continue;) or disable
> > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well and
> > > > the physical memory can come down expectly.
> > > > 
> > > >   It seems like transparent_hugepage has problems with non-contiguous
> > > > madvise(MADV_DONTEED).
> > 
> > It's expected behaviour.
> > 
> > MADV_DONTNEED doesn't guarantee that the range will not be repopulated
> > (with or without direct action on application behalf). It's just a hint
> > for the kernel.
> 
> I agree with Kirill here but I would be interested in the underlying
> usecase that triggered this. The test case is clearly artificial but is
> any userspace actually relying on MADV_DONTNEED reducing the rss
> longterm?
> 
> > For sparse mappings, consider using MADV_NOHUGEPAGE.

Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE?  It'd prevent
coalescing elsewhere in the VMA, so that might negatively affect other
programs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED)
  2019-01-03 14:35       ` Matthew Wilcox
@ 2019-01-03 14:41         ` Michal Hocko
  2019-01-03 14:53         ` Kirill A. Shutemov
  1 sibling, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2019-01-03 14:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, jianpanlanyue, Andrew Morton, linux-mm,
	bugzilla-daemon

On Thu 03-01-19 06:35:02, Matthew Wilcox wrote:
> On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote:
> > On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote:
> > > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> > > > >   1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G).
> > > > >   2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve a
> > > > > few pages(by if(i%33==0) continue;), then process's physical memory firstly
> > > > > come down, but after a few seconds, it rise back to 2G again, and can't come
> > > > > down forever.
> > > > >   3. if i delete this condition(if(i%33==0) continue;) or disable
> > > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well and
> > > > > the physical memory can come down expectly.
> > > > > 
> > > > >   It seems like transparent_hugepage has problems with non-contiguous
> > > > > madvise(MADV_DONTEED).
> > > 
> > > It's expected behaviour.
> > > 
> > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated
> > > (with or without direct action on application behalf). It's just a hint
> > > for the kernel.
> > 
> > I agree with Kirill here but I would be interested in the underlying
> > usecase that triggered this. The test case is clearly artificial but is
> > any userspace actually relying on MADV_DONTNEED reducing the rss
> > longterm?
> > 
> > > For sparse mappings, consider using MADV_NOHUGEPAGE.
> 
> Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE?  It'd prevent
> coalescing elsewhere in the VMA, so that might negatively affect other
> programs.

I really do not think this is a good idea. MADV_DONTEED doesn't really
imply anything to future rss. It only wipes out the current content.
In other words do we want to stop fault around/readahead or any other
optimistic faulting on MADV_DONTEED?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED)
  2019-01-03 14:35       ` Matthew Wilcox
  2019-01-03 14:41         ` Michal Hocko
@ 2019-01-03 14:53         ` Kirill A. Shutemov
  1 sibling, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2019-01-03 14:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, jianpanlanyue, Andrew Morton, linux-mm, bugzilla-daemon

On Thu, Jan 03, 2019 at 06:35:02AM -0800, Matthew Wilcox wrote:
> On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote:
> > On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote:
> > > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> > > > >   1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G).
> > > > >   2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve a
> > > > > few pages(by if(i%33==0) continue;), then process's physical memory firstly
> > > > > come down, but after a few seconds, it rise back to 2G again, and can't come
> > > > > down forever.
> > > > >   3. if i delete this condition(if(i%33==0) continue;) or disable
> > > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well and
> > > > > the physical memory can come down expectly.
> > > > > 
> > > > >   It seems like transparent_hugepage has problems with non-contiguous
> > > > > madvise(MADV_DONTEED).
> > > 
> > > It's expected behaviour.
> > > 
> > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated
> > > (with or without direct action on application behalf). It's just a hint
> > > for the kernel.
> > 
> > I agree with Kirill here but I would be interested in the underlying
> > usecase that triggered this. The test case is clearly artificial but is
> > any userspace actually relying on MADV_DONTNEED reducing the rss
> > longterm?
> > 
> > > For sparse mappings, consider using MADV_NOHUGEPAGE.
> 
> Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE?  It'd prevent
> coalescing elsewhere in the VMA, so that might negatively affect other
> programs.

MADV_NOHUGEPAGE often creates a new VMA (or two) and it has performance
implications. And creating a new VMA would require down_write(mmap_sem)
which is no-go for MADV_DONTNEED.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-03 14:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-202089-27@https.bugzilla.kernel.org/>
2018-12-29 20:53 ` [Bug 202089] New: transparent hugepage not compatable with madvise(MADV_DONTNEED) Andrew Morton
2018-12-29 22:48   ` Kirill A. Shutemov
2019-01-03  9:44     ` Michal Hocko
2019-01-03 14:35       ` Matthew Wilcox
2019-01-03 14:41         ` Michal Hocko
2019-01-03 14:53         ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox