linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Christoph Lameter <clameter@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [rfc][patch 2/2] mm: mlocked pages off LRU
Date: Wed, 7 Mar 2007 04:07:41 +0100	[thread overview]
Message-ID: <20070307030741.GB6054@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0703061022270.24461@schroedinger.engr.sgi.com>

On Tue, Mar 06, 2007 at 10:30:43AM -0800, Christoph Lameter wrote:
> On Tue, 6 Mar 2007, Nick Piggin wrote:
> 
> > +	struct mm_struct *mm = vma->vm_mm;
> > +	unsigned long addr = start;
> > +	struct page *pages[16]; /* 16 gives a reasonable batch */
> 
> Use a pagevec instead?

That's annoying because get_user_pages doesn't do pagevec_add, so
you have to do everything the same, except manually fish out the pages
array from the pagevec every use.

> > +		/*
> > +		 * get_user_pages makes pages present if we are
> > +		 * setting mlock.
> > +		 */
> > +		ret = get_user_pages(current, mm, addr,
> > +				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> > +				write, 0, pages, NULL);
> > +		if (ret < 0)
> > +			break;
> > +		if (ret == 0) {
> > +			/*
> > +			 * We know the vma is there, so the only time
> > +			 * we cannot get a single page should be an
> > +			 * error (ret < 0) case.
> > +			 */
> > +			WARN_ON(1);
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> 
> ... pages could be evicted here by reclaim?

No, get_user_pages elevates refcount.

> > +
> > +		for (i = 0; i < ret; i++) {
> > +			struct page *page = pages[i];
> > +			lock_page(page);
> > +			if (lock) {
> > +				/*
> > +				 * Anonymous pages may have already been
> > +				 * mlocked by get_user_pages->handle_mm_fault.
> > +				 * Be conservative and don't count these:
> 
> 
> > @@ -801,8 +815,21 @@ static int try_to_unmap_anon(struct page
> >  		ret = try_to_unmap_one(page, vma, migration);
> >  		if (ret == SWAP_FAIL || !page_mapped(page))
> >  			break;
> > +		if (ret == SWAP_MLOCK) {
> > +			if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> > +				if (vma->vm_flags & VM_LOCKED) {
> > +					mlock_vma_page(page);
> > +					mlocked++;
> > +				}
> > +				up_read(&vma->vm_mm->mmap_sem);
> > +			}
> > +		}
> 
> Taking mmap_sem in try_to_unmap_one? It may already have been taken by 
> page migration. Ok, trylock but still.

Migration path won't get SWAP_MLOCK, though.
 
We still have to trylock because we're inside i_mmap_lock or anon_vma_lock.
It's not pretty, but it is very fortunate that this should be an uncommon
path and it is no problem if it fails sometimes.

I believe your patches have this race (ie. vmscan lazily mlocking the
page after the last vma has just been munlocked). But if not,I would like
to know how you dealt with it.

> >  			goto out;
> > +		if (ret == SWAP_MLOCK) {
> > +			if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> > +				if (vma->vm_flags & VM_LOCKED) {
> > +					mlock_vma_page(page);
> > +					mlocked++;
> > +				}
> > +				up_read(&vma->vm_mm->mmap_sem);
> > +			}
> 
> 
> Well this piece of code seem to repeat itself. New function?

I was thinking about putting it in mlock.c...

> > @@ -2148,7 +2196,10 @@ static int do_anonymous_page(struct mm_s
> >  		if (!pte_none(*page_table))
> >  			goto release;
> >  		inc_mm_counter(mm, anon_rss);
> > -		lru_cache_add_active(page);
> > +		if (!(vma->vm_flags & VM_LOCKED))
> > +			lru_cache_add_active(page);
> > +		else
> > +			mlock_new_vma_page(page);
> >  		page_add_new_anon_rmap(page, vma, address);
> >  	} else {
> >  		/* Map the ZERO_PAGE - vm_page_prot is readonly */
> > @@ -2291,7 +2342,10 @@ static int __do_fault(struct mm_struct *
> >  		set_pte_at(mm, address, page_table, entry);
> >  		if (anon) {
> >                          inc_mm_counter(mm, anon_rss);
> > -                        lru_cache_add_active(page);
> > +			if (!(vma->vm_flags & VM_LOCKED))
> > +				lru_cache_add_active(page);
> > +			else
> > +				mlock_new_vma_page(page);
> >                          page_add_new_anon_rmap(page, vma, address);
> >  		} else {
> 
> Another repeating chunk of code?

Well there are a few of those in mm/memory.c already. Consolidation
doesn't belong in this patch.

> 
> > Index: linux-2.6/drivers/base/node.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/node.c
> > +++ linux-2.6/drivers/base/node.c
> > @@ -60,6 +60,7 @@ static ssize_t node_read_meminfo(struct 
> >  		       "Node %d FilePages:    %8lu kB\n"
> >  		       "Node %d Mapped:       %8lu kB\n"
> >  		       "Node %d AnonPages:    %8lu kB\n"
> > +		       "Node %d MLock:        %8lu kB\n"
> 
> Upper case L in MLock? Should it not be Mlock from mlock with first letter 
> capitalized?

I didn't really think about it much. We use upper case for contractions
as well (eg. MemFree, Committed_AS). I was thinking M stands for memory,
but you could argue that it is just a single word, named after the syscall
... but then what about mlockall? ;)

Shal I just rename to Locked?

> > Index: linux-2.6/include/linux/mmzone.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/mmzone.h
> > +++ linux-2.6/include/linux/mmzone.h
> > @@ -54,6 +54,7 @@ enum zone_stat_item {
> >  	NR_ANON_PAGES,	/* Mapped anonymous pages */
> >  	NR_FILE_MAPPED,	/* pagecache pages mapped into pagetables.
> >  			   only modified from process context */
> > +	NR_MLOCK,	/* MLocked pages (conservative guess) */
> 
> Discovered mlocked pages?

Yeah.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-03-07  3:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-05 16:17 Nick Piggin
2007-03-05 16:40 ` Nick Piggin
2007-03-05 17:12 ` Christoph Hellwig
2007-03-05 18:17   ` Christoph Lameter
2007-03-05 18:14 ` Christoph Lameter
2007-03-05 19:26   ` Rik van Riel
2007-03-06  1:05   ` Nick Piggin
2007-03-06  1:27     ` Christoph Lameter
2007-03-06  1:44       ` Nick Piggin
2007-03-06  1:55         ` Christoph Lameter
2007-03-06  2:13           ` Nick Piggin
2007-03-06  2:46             ` Christoph Lameter
2007-03-06  2:50               ` Nick Piggin
2007-03-06 14:30                 ` Nick Piggin
2007-03-06 18:30                   ` Christoph Lameter
2007-03-07  3:07                     ` Nick Piggin [this message]
2007-03-06 22:23                   ` Lee Schermerhorn
2007-03-07  3:52                     ` Nick Piggin
2007-03-06 15:59               ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070307030741.GB6054@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@engr.sgi.com \
    --cc=clameter@sgi.com \
    --cc=hch@infradead.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox