Re: [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	heiko.carstens@de.ibm.com, npiggin@suse.de,
	linux-kernel@vger.kernel.org, hugh@veritas.com,
	torvalds@linux-foundation.org, riel@redhat.com,
	linux-mm@kvack.org
Subject: Re: [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu()
Date: Wed, 29 Oct 2008 08:40:14 -0400	[thread overview]
Message-ID: <1225284014.8257.36.camel@lts-notebook> (raw)
In-Reply-To: <2f11576a0810290017g310e4469gd27aa857866849bd@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2747 bytes --]

On Wed, 2008-10-29 at 16:17 +0900, KOSAKI Motohiro wrote:
> > I believe that we still  have the lru_drain_all() called from the fault
> > path [with mmap_sem held] in clear_page_mlock().  We call
> > clear_page_mlock() on COW of an mlocked page in a VM_LOCKED vma to
> > ensure that we don't end up with an mlocked page in some other task's
> > non-VM_LOCKED vma where we'd then fail to munlock it later.  During
> > development testing, Rik encountered scenarios where a page would
> > encounter a COW fault while it was still making its way to the LRU via
> > the pagevecs.  So, he added the 'drain_all() and that seemed to avoid
> > this scenario.
> 
> Agreed.
> 
> 
> > Now, in the current upstream version of the unevictable mlocked pages
> > patches, we just count any mlocked pages [vmstat] that make their way to
> > free*page() instead of BUGging out, as we were doing earlier during
> > development.  So, maybe we can drop the lru_drain_add()s in the
> > unevictable mlocked pages work and live with the occasional freed
> > mlocked page, or mlocked page on the active/inactive lists to be dealt
> > with by vmscan.
> 
> hm, okey.
> maybe, I was wrong.
> 
> I'll make "dropping lru_add_drain_all()" patch soon.
> I expect I need few days.
>   make the patch:                  1 day
>   confirm by stress workload:  2-3 days
> 
> because rik's original problem only happend on heavy wokload, I think.

Indeed.  It was an ad hoc test program [2 versions attached] written
specifically to beat on COW of shared pages mlocked by parent then COWed
by parent or child and unmapped explicitly or via exit.  We were trying
to find all the ways the we could end up freeing mlocked pages--and
there were several.  Most of these turned out to be genuine
coding/design defects [as difficult as that may be to believe :-)], so
tracking them down was worthwhile.  And, I think that, in general,
clearing a page's mlocked state and rescuing from the unevictable lru
list on COW--to prevent the mlocked page from ending up mapped into some
task's non-VM_LOCKED vma--is a good thing to strive for.  

Now, looking at the current code [28-rc1] in [__]clear_page_mlock():
We've already cleared the PG_mlocked flag, we've decremented the mlocked
pages stats, and we're just trying to rescue the page from the
unevictable list to the in/active list.  If we fail to isolate the page,
then either some other task has it isolated and will return it to an
appropriate lru or it resides in a pagevec heading for an in/active lru
list.  We don't use pagevec for unevictable list.  Any other cases?  If
not, then we can probably dispense with the "try harder" logic--the
lru_add_drain()--in __clear_page_mlock().

Do you agree?  Or have I missed something?

Lee   

[-- Attachment #2: rvr-mlock-oops.c --]
[-- Type: text/x-csrc, Size: 1171 bytes --]

/*
 * In the split VM code in 2.6.25-rc3-mm1 and later, we see PG_mlock
 * pages freed from the exit/exit_mmap path.  This test case creates
 * a process, forks it, mlocks, touches some memory and exits, to
 * try and trigger the bug - Rik van Riel, Mar 2008
 */
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#define NUMFORKS 1000
#define MEMSIZE 1024*1024

void child(void)
{
	char * mem;
	int err;
	int i;

	err = mlockall(MCL_CURRENT|MCL_FUTURE);
	if (err < 0) {
		printf("child mlock failed\n");
		exit(1);
	}

	mem = malloc(MEMSIZE);
	if (!mem) {
		printf("child could not allocate memory\n");
		exit(2);
	}

	/* Touch the memory so the kernel allocates actual pages. */
	for (i = 0; i < MEMSIZE; i++)
		mem[i] = i;

	/* Avoids the oops?  Nope ... :( */
	munlockall();

	/* This is where we can trigger the oops. */
	exit(0);
}

int main(int argc, char *argv)
{
	int i;
	int status;

	for (i = 0; i < NUMFORKS ; i++) {
		pid_t pid = fork();

		if (!pid)	
			child(); /* does not return */
		else if (pid > 0)
			wait(&status);
		else {
			printf("fork failed\n");
			exit(1);
		}
	}
}

[-- Attachment #3: rvr-mlock-oops2.c --]
[-- Type: text/x-csrc, Size: 1176 bytes --]

/*
 * In the split VM code in 2.6.25-rc3-mm1 and later, we see PG_mlock
 * pages freed from the exit/exit_mmap path.  This test case creates
 * a process, forks it, mlocks, touches some memory and exits, to
 * try and trigger the bug - Rik van Riel, Mar 2008
 */
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#define NUMFORKS 1
#define MEMSIZE 1024*1024

void child(void)
{
	char * mem;
	int i;

	mem = malloc(MEMSIZE);
	if (!mem) {
		printf("child could not allocate memory\n");
		exit(2);
	}

	/* Touch the memory so the kernel allocates actual pages. */
	for (i = 0; i < MEMSIZE; i++)
		mem[i] = i;

	/* This is where we can trigger the oops. */
	exit(0);
}

int main(int argc, char *argv)
{
	int i;
	int status;
	pid_t pid;
	int err;

	err = mlockall(MCL_CURRENT|MCL_FUTURE);
	if (err < 0) {
		printf("parent mlock failed\n");
		exit(1);
	}

	pid = getpid();

	printf("parent pid = %d\n", pid);

	for (i = 0; i < NUMFORKS ; i++) {
		pid = fork();

		if (!pid)	
			child(); /* does not return */
		else if (pid > 0)
			wait(&status);
		else {
			printf("fork failed\n");
			exit(1);
		}
	}
}

next prev parent reply	other threads:[~2008-10-29 12:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200810201659.m9KGxtFC016280@hera.kernel.org>
2008-10-21 15:13 ` mlock: mlocked pages are unevictable Heiko Carstens
2008-10-21 15:51   ` KOSAKI Motohiro
2008-10-21 17:18     ` KOSAKI Motohiro
2008-10-21 20:30       ` Peter Zijlstra
2008-10-21 20:48         ` Peter Zijlstra
2008-10-23 15:00       ` [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu() KOSAKI Motohiro
2008-10-24  1:28         ` Nick Piggin
2008-10-24  4:54           ` KOSAKI Motohiro
2008-10-24  4:55             ` Nick Piggin
2008-10-24  5:29               ` KOSAKI Motohiro
2008-10-24  5:34                 ` Nick Piggin
2008-10-24  5:51                   ` KOSAKI Motohiro
2008-10-24 19:20         ` Heiko Carstens
2008-10-26 11:06         ` Peter Zijlstra
2008-10-26 13:37           ` KOSAKI Motohiro
2008-10-26 13:49             ` Peter Zijlstra
2008-10-26 15:51               ` KOSAKI Motohiro
2008-10-26 16:17                 ` Peter Zijlstra
2008-10-27  3:14                   ` KOSAKI Motohiro
2008-10-27  7:56                     ` Peter Zijlstra
2008-10-27  8:03                       ` KOSAKI Motohiro
2008-10-27 10:42                         ` KOSAKI Motohiro
2008-10-27 21:55         ` Andrew Morton
2008-10-28 14:25           ` Christoph Lameter
2008-10-28 20:45             ` Andrew Morton
2008-10-28 21:29               ` Lee Schermerhorn
2008-10-29  7:17                 ` KOSAKI Motohiro
2008-10-29 12:40                   ` Lee Schermerhorn [this message]
2008-11-06  0:14                     ` [PATCH] get rid of lru_add_drain_all() in munlock path KOSAKI Motohiro
2008-11-06 16:33                       ` Kamalesh Babulal
2008-10-29  7:20               ` [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu() KOSAKI Motohiro
2008-10-29  8:21                 ` KAMEZAWA Hiroyuki
2008-11-05  9:51                 ` Peter Zijlstra
2008-11-05  9:55                   ` KOSAKI Motohiro
2008-10-22 15:28   ` mlock: mlocked pages are unevictable Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1225284014.8257.36.camel@lts-notebook \
    --to=lee.schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hugh@veritas.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox