linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Elladan <elladan@eskimo.com>, Nick Piggin <npiggin@suse.de>,
	Andi Kleen <andi@firstfloor.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Minchan Kim <minchan.kim@gmail.com>
Subject: Re: oomkiller over-ambitious after "vmscan: make mapped executable pages the first class citizen" (bisected)
Date: Tue, 13 Oct 2009 10:26:50 +0800	[thread overview]
Message-ID: <20091013022650.GB7345@localhost> (raw)
In-Reply-To: <200910122244.19666.borntraeger@de.ibm.com>

Hi Christoph,

Thanks for the report!

On Tue, Oct 13, 2009 at 04:44:19AM +0800, Christian Borntraeger wrote:
> I have seen some OOM-killer action on my s390x system when using large amounts 
> of anonymous memory:
> 
> [cborntra@t63lp34 ~]$ cat memeat.c
> #include <sys/mman.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> int main()
> {
>         char *start;
>         char *a;
>         start = mmap(NULL, 4300000000UL,
>                     PROT_READ | PROT_WRITE,
>                     MAP_SHARED | MAP_ANONYMOUS, -1 , 0);
>
>         if (start == MAP_FAILED) {
>                 printf("cannot map guest memory\n");
>                 exit (1);
>         }
>         for (a = start; a < start + 4300000000UL; a += 4096)
>             *a='a';
>         exit(0);
> }
> [cborntra@t63lp34 ~]$ ./memeat
> Connection to t63lp34 closed.
> 
> 
> I attached the dmesg with the oom messages.
> 
> As you can see we are failing several order 0 allocations with gfpmask=0x201da. 
> 
> The application uses slightly more memory than is available. The thing is, that 
> there is plenty of swap space to fullfill the (non-atomic) request:
> 
> [cborntra@t63lp34 ~]$ free
>              total       used       free     shared    buffers     cached
> Mem:       4166560     127148    4039412          0       2256      19752
> -/+ buffers/cache:     105140    4061420
> Swap:      9615904       8328    9607576
> 
> Since old kernels never showed OOM, I was able to bisect the first kernel that 
> shows this behaviour:
> commit 8cab4754d24a0f2e05920170c845bd84472814c6                                                                                                                             
> Author: Wu Fengguang <fengguang.wu@intel.com>                                                                                                                               
>     vmscan: make mapped executable pages the first class citizen
> 
> In fact, applying this patch makes the problem go away:
> --- linux-2.6.orig/mm/vmscan.c
> +++ linux-2.6/mm/vmscan.c
> @@ -1345,22 +1345,8 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
> +		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
>  			nr_rotated++;
> -			/*
> -			 * Identify referenced, file-backed active pages and
> -			 * give them one more trip around the active list. So
> -			 * that executable code get better chances to stay in
> -			 * memory under moderate memory pressure.  Anon pages
> -			 * are not likely to be evicted by use-once streaming
> -			 * IO, plus JVM can create lots of anon VM_EXEC pages,
> -			 * so we ignore them here.
> -			 */
> -			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> -				list_add(&page->lru, &l_active);
> -				continue;
> -			}
> -		}
>  
>  		ClearPageActive(page);	/* we are de-activating */
>  		list_add(&page->lru, &l_inactive);
> 
> 
> 
> the interesting part is, that s390x in the default configuration has no no-
> execute feature, resulting in the following map 
> c0000000-1c04cd000 rwxs 00000000 00:04 18517        /dev/zero (deleted)
>
> As you can see, this area looks file mapped (/dev/zero) and executable. On the 
> other hand, the !PageAnon clause should cover this case. I am lost.

Yes, I can see this map in my desktop:

        $ cat /proc/5016/smaps #smaps for Xorg

        417fe000-41800000 rwxp 00000000 00:11 1370                               /dev/zero
        Size:                  8 kB
        Rss:                   8 kB
        Pss:                   8 kB
        Shared_Clean:          0 kB
        Shared_Dirty:          0 kB
        Private_Clean:         0 kB
        Private_Dirty:         8 kB
        Referenced:            8 kB
        Swap:                  0 kB
        KernelPageSize:        4 kB
        MMUPageSize:           4 kB

        # page-types -p 5016 -a 0x417fe,0x41800 -r
                     flags      page-count       MB  symbolic-flags                     long-symbolic-flags
        0x0000000000005868               2        0  ___U_lA____Ma_b_________________   uptodate,lru,active,mmap,anonymous,swapbacked
                     total               2        0

You can see page-types reports the expected "anonymous,swapbacked".

However, for your program (modified to reduce the page number and add
sleep), I see:

        root /home/wfg# cat /proc/`pidof memeat`/smaps

        7fa012722000-7fa012b3c000 rw-s 00000000 00:08 321900                     /dev/zero (deleted)
        Size:               4200 kB
        Rss:                4200 kB
        Pss:                4200 kB
        Shared_Clean:          0 kB
        Shared_Dirty:          0 kB
        Private_Clean:         0 kB
        Private_Dirty:      4200 kB
        Referenced:         4200 kB
        Swap:                  0 kB
        KernelPageSize:        4 kB
        MMUPageSize:           4 kB

        # page-types -p `pidof memeat` -a 0x7fa012722,0x7fa012b3c
                     flags      page-count       MB  symbolic-flags                     long-symbolic-flags
        0x0000000000004878            1050        4  ___UDlA____M__b_________________   uptodate,dirty,lru,active,mmap,swapbacked
                     total            1050        4

So the "(deleted)" /dev/zero has only "swapbacked" set.

In particular, the page belongs to the file initialized by shmem_zero_setup()
and populated by shmem_fault() => shmem_getpage().

> Does anybody on the CC (taken from the original patch) has an idea what the 
> problem is and how to fix this properly?

Can you try this patch? Thanks!

---
vmscan: limit VM_EXEC protection to file pages

It is possible to have !Anon but SwapBacked pages, and some apps could
create huge number of such pages with MAP_SHARED|MAP_ANONYMOUS. These
pages go into the ANON lru list, and hence shall not be protected: we
only care mapped executable files. Failing to do so may trigger OOM.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux.orig/mm/vmscan.c	2009-10-13 09:49:05.000000000 +0800
+++ linux/mm/vmscan.c	2009-10-13 09:49:37.000000000 +0800
@@ -1356,7 +1356,7 @@ static void shrink_active_list(unsigned 
 			 * IO, plus JVM can create lots of anon VM_EXEC pages,
 			 * so we ignore them here.
 			 */
-			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
+			if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
 				list_add(&page->lru, &l_active);
 				continue;
 			}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-10-13  2:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-12 20:44 Christian Borntraeger
2009-10-12 21:17 ` Peter Zijlstra
2009-10-13  2:26 ` Wu Fengguang [this message]
2009-10-13  2:32   ` Rik van Riel
2009-10-13  8:00     ` [PATCH][BUGFIX] vmscan: limit VM_EXEC protection to file pages Wu Fengguang
2009-10-13  8:03       ` Wu Fengguang
2009-10-13 11:33       ` Hugh Dickins
2009-10-13  5:50   ` oomkiller over-ambitious after "vmscan: make mapped executable pages the first class citizen" (bisected) Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091013022650.GB7345@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=borntraeger@de.ibm.com \
    --cc=cl@linux-foundation.org \
    --cc=elladan@eskimo.com \
    --cc=hannes@cmpxchg.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox