linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: clameter@sgi.com, piggin@cyberone.com.au, torvalds@osdl.org,
	ak@suse.de, rohitseth@google.com, mbligh@google.com,
	hugh@veritas.com, riel@redhat.com, andrea@suse.de,
	arjan@infradead.org, apw@shadowen.org, mel@csn.ul.ie,
	marcelo@kvack.org, anton@samba.org, paulmck@us.ibm.com,
	linux-mm@kvack.org
Subject: Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
Date: Thu, 11 May 2006 08:02:20 -0700	[thread overview]
Message-ID: <20060511080220.48688b40.akpm@osdl.org> (raw)
In-Reply-To: <1147207458.27680.19.camel@lappy>

Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> 
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> People expressed the need to track dirty pages in shared mappings.
> 
> Linus outlined the general idea of doing that through making clean
> writable pages write-protected and taking the write fault.
> 
> This patch does exactly that, it makes pages in a shared writable
> mapping write-protected. On write-fault the pages are marked dirty and
> made writable. When the pages get synced with their backing store, the
> write-protection is re-instated.
> 
> It survives a simple test and shows the dirty pages in /proc/vmstat.

It'd be nice to have more that a "simple test" done.  Bugs in this area
will be subtle and will manifest in unpleasant ways.  That goes for both
correctness and performance bugs.

> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c	2006-05-08 18:49:39.000000000 +0200
> +++ linux-2.6/mm/memory.c	2006-05-09 09:15:11.000000000 +0200
> @@ -49,6 +49,7 @@
>  #include <linux/module.h>
>  #include <linux/init.h>
>  #include <linux/mm_page_replace.h>
> +#include <linux/backing-dev.h>
>  
>  #include <asm/pgalloc.h>
>  #include <asm/uaccess.h>
> @@ -2077,6 +2078,7 @@ static int do_no_page(struct mm_struct *
>  	unsigned int sequence = 0;
>  	int ret = VM_FAULT_MINOR;
>  	int anon = 0;
> +	struct page *dirty_page = NULL;
>  
>  	pte_unmap(page_table);
>  	BUG_ON(vma->vm_flags & VM_PFNMAP);
> @@ -2150,6 +2152,11 @@ retry:
>  		entry = mk_pte(new_page, vma->vm_page_prot);
>  		if (write_access)
>  			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		else if (VM_SharedWritable(vma)) {
> +			struct address_space *mapping = page_mapping(new_page);
> +			if (mapping && mapping_cap_account_dirty(mapping))
> +				entry = pte_wrprotect(entry);
> +		}
>  		set_pte_at(mm, address, page_table, entry);
>  		if (anon) {
>  			inc_mm_counter(mm, anon_rss);
> @@ -2159,6 +2166,10 @@ retry:
>  		} else {
>  			inc_mm_counter(mm, file_rss);
>  			page_add_file_rmap(new_page);
> +			if (write_access) {
> +				dirty_page = new_page;
> +				get_page(dirty_page);
> +			}

So let's see.  We take a write fault, we mark the page dirty then we return
to userspace which will proceed with the write and will mark the pte dirty.

Later, the VM will write the page out.

Later still, the pte will get cleaned by reclaim or by munmap or whatever
and the page will be marked dirty and the page will again be written out. 
Potentially needlessly.

How much extra IO will we be doing because of this change?

>  		return 0;
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c	2006-05-08 18:49:39.000000000 +0200
> +++ linux-2.6/mm/rmap.c	2006-05-08 18:53:34.000000000 +0200
> @@ -478,6 +478,72 @@ int page_referenced(struct page *page, i
>  	return referenced;
>  }
>  
> +static int page_wrprotect_one(struct page *page, struct vm_area_struct *vma)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	unsigned long address;
> +	pte_t *pte, entry;
> +	spinlock_t *ptl;
> +	int ret = 0;
> +
> +	address = vma_address(page, vma);
> +	if (address == -EFAULT)
> +		goto out;
> +
> +	pte = page_check_address(page, mm, address, &ptl);
> +	if (!pte)
> +		goto out;
> +
> +	if (!pte_write(*pte))
> +		goto unlock;
> +
> +	entry = pte_mkclean(pte_wrprotect(*pte));
> +	ptep_establish(vma, address, pte, entry);
> +	update_mmu_cache(vma, address, entry);
> +	lazy_mmu_prot_update(entry);
> +	ret = 1;
> +
> +unlock:
> +	pte_unmap_unlock(pte, ptl);
> +out:
> +	return ret;
> +}
> +
> +static int page_wrprotect_file(struct page *page)
> +{
> +	struct address_space *mapping = page->mapping;
> +	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +	struct vm_area_struct *vma;
> +	struct prio_tree_iter iter;
> +	int ret = 0;
> +
> +	BUG_ON(PageAnon(page));
> +
> +	spin_lock(&mapping->i_mmap_lock);
> +
> +	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> +		if (VM_SharedWritable(vma))
> +			ret += page_wrprotect_one(page, vma);
> +	}
> +
> +	spin_unlock(&mapping->i_mmap_lock);
> +	return ret;
> +}
> +
> +int page_wrprotect(struct page *page)
> +{
> +	int ret = 0;
> +
> +	BUG_ON(!PageLocked(page));

hm.  So clear_page_dirty() and clear_page_dirty_for_io() are only ever
called against a locked page?  I guess that makes sense, but it's not a
guarantee which we had in the past.  It really _has_ to be true, because
lock_page() is the only thing which can protect the address_space from
memory reclaim in those two functions.

Oh well.  We'll find out if people's machines start to go BUG.

> +	if (page_mapped(page) && page->mapping) {

umm, afaict this function can be called for swapcache pages and Bad Things
will happen.  I think we need page_mapping(page) here?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-05-11 15:02 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-05 20:35 [RFC][PATCH] tracking dirty pages in shared mappings Peter Zijlstra
2006-05-06 13:18 ` Nick Piggin
2006-05-06 13:34   ` Peter Zijlstra
2006-05-06 13:47     ` Nick Piggin
2006-05-06 15:29       ` Peter Zijlstra
2006-05-07  0:40         ` Nick Piggin
2006-05-07  3:43           ` Nick Piggin
2006-05-08  6:43         ` Christoph Lameter
2006-05-08  7:23           ` Peter Zijlstra
2006-05-08 19:20           ` [RFC][PATCH 1/2] tracking dirty pages in shared mappings -V3 Peter Zijlstra
2006-05-09  5:41             ` Christoph Lameter
2006-05-09  6:06               ` Peter Zijlstra
2006-05-09 20:44               ` [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4 Peter Zijlstra
2006-05-09 20:52                 ` Peter Chubb
2006-05-09 20:55                   ` Martin Bligh
2006-05-09 22:56                     ` Brian Twichell
2006-05-10  0:24                     ` Linus Torvalds
2006-05-10  0:29                       ` Nick Piggin
2006-05-10  1:24                         ` Linus Torvalds
2006-05-11 15:02                 ` Andrew Morton [this message]
2006-05-11 16:39                   ` Andy Whitcroft
2006-05-11 22:52                   ` Christoph Lameter
2006-05-11 23:30                     ` Linus Torvalds
2006-05-11 23:44                       ` Andrew Morton
2006-05-12  0:10                         ` Linus Torvalds
2006-05-12  8:07                         ` Andy Whitcroft
2006-05-12 14:25                           ` Martin J. Bligh
2006-05-14 15:58                         ` Andy Whitcroft
2006-05-12  1:51                   ` Nick Piggin
2006-05-12  4:30                     ` Andrew Morton
2006-05-12  5:05                       ` Nick Piggin
2006-05-12  7:06                       ` Peter Zijlstra
2006-05-12  8:04                         ` Nick Piggin
2006-05-12  8:52                           ` Peter Zijlstra
2006-05-12  8:07                         ` Nick Piggin
2006-05-12  4:51                   ` Christoph Lameter
2006-05-09 20:44               ` [RFC][PATCH 2/3] throttle writers of shared mappings Peter Zijlstra
2006-05-09 22:54                 ` Nick Piggin
2006-05-09 22:55                   ` Nick Piggin
2006-05-10  6:25                     ` Peter Zijlstra
2006-05-09 20:44               ` [RFC][PATCH 3/3] optimize follow_pages() Peter Zijlstra
2006-05-10  6:30                 ` Peter Zijlstra
2006-05-08 19:24           ` [RFC][PATCH 2/2] throttle writers of shared mappings Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060511080220.48688b40.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ak@suse.de \
    --cc=andrea@suse.de \
    --cc=anton@samba.org \
    --cc=apw@shadowen.org \
    --cc=arjan@infradead.org \
    --cc=clameter@sgi.com \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=marcelo@kvack.org \
    --cc=mbligh@google.com \
    --cc=mel@csn.ul.ie \
    --cc=paulmck@us.ibm.com \
    --cc=piggin@cyberone.com.au \
    --cc=riel@redhat.com \
    --cc=rohitseth@google.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox