linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@osdl.org>,
	David Howells <dhowells@redhat.com>,
	Christoph Lameter <christoph@lameter.com>,
	Martin Bligh <mbligh@google.com>, Nick Piggin <npiggin@suse.de>,
	Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH] mm: tracking shared dirty pages -v10
Date: Sat, 24 Jun 2006 00:00:17 +0200	[thread overview]
Message-ID: <1151100017.30819.50.camel@lappy> (raw)
In-Reply-To: <Pine.LNX.4.64.0606231933060.7524@blonde.wat.veritas.com>

On Fri, 2006-06-23 at 20:06 +0100, Hugh Dickins wrote:

> > -     if (unlikely((vma->vm_flags & (VM_SHARED|VM_WRITE)) ==
> > -                             (VM_SHARED|VM_WRITE))) {
> > +     /*
> > +      * Only catch write-faults on shared writable pages, read-only
> > +      * shared pages can get COWed by get_user_pages(.write=1, .force=1).
> > +      */
> > +     if (unlikely(((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
> > +                                     (VM_WRITE|VM_SHARED)))) {
> 
> I was about to say I'm satisfied with that test now, and it's all the
> better for not mentioning VM_PFNMAP there - though odd that you've
> exchanged WRITE and SHARED, and added extra unnecessary parentheses ;)

Oops, on the extra parentheses. I exchanged the order because that is
more consistent with all the other sites.

> But grrr, sigh, damn, argh - I now realize it's right to the first
> order (normal case) and to the second order (ptrace poke), but not
> to the third order (ptrace poke anon page here to be COWed -
> perhaps can't occur without intervening mprotects).
> 
> That's not an issue for you at all (there are other places which
> are inconsistent on whether such pages are private or shared e.g.
> copy_one_pte does not wrprotect them), but could be a problem for
> David's page_mkwrite - there's a danger of passing it an anonymous
> page, which (depending on what the ->page_mkwrite actually does)
> could go seriously wrong.
> 
> I guess it ought to be restructured
>         if (PageAnon(old_page)) {
>                 ...
>         } else if (shared writable vma) {
>                 ...
>         }
> and a patch to do that should precede your dirty page patches
> (and the only change your dirty page patches need add here on top
> of that would be the dirty_page = old_page, get_page(dirty_page)).

Hmm, I'll investigate this and maybe write a follow-up patch if you or
someone else doesn't beaten me to it.

> (You've probably noticed that use of mprotect on problem vmas
> is liable to remove the cache bits.  There have been occasional
> complaints about that, and perhaps a patch to fix it.  But that's
> long-standing behaviour, so not something you need worry about:
> we've only had to worry about a new change in behaviour in mmap.)

Yeah, had noticed, but had assumed that since it was a user action
they'd know better than to call it on dubious mappings.

However if you say there is demand; I've been through all the arch's to
verify the use of that most ugly drm construct, and whipping up an arch
function like pgprot_modify() that would fold a pgprot_t and
protection_map[] argument together seems quite doable.

> > Page from remap_pfn_range() are explicitly excluded because their
> > COW semantics are already horrid enough (see vm_normal_page() in
> > do_wp_page()) and because they don't have a backing store anyway.
> > 
> > copy_one_pte() cleans child PTEs, not wrong in the current context,
> > but why?
> 
> Dates back to Linux 2.2 if not earlier.  I think the idea is that the
> parent has its pte dirty, and that's enough to mark the page as dirty;
> the child is unlikely to dirty the page further itself, and a second
> instance of dirty pte for that page may entail some overhead.  What
> overhead would vary from release to release, I've not thought through
> what the current saving would be in 2.6.17, probably very little.
> But you can imagine there might have been some release which would
> write out the page each time it found pte dirty, in which case good
> reason to clear it there.  But I'm guessing, it's from before my time.

I'll keep that in mind for a future cleanup.

> > +	if ((pgprot_val(vma->vm_page_prot) == pgprot_val(vm_page_prot) &&
> > +	     ((vm_flags & (VM_WRITE|VM_SHARED|VM_PFNMAP|VM_INSERTPAGE)) ==
> > +			  (VM_WRITE|VM_SHARED)) &&
> > +	     vma->vm_file && vma->vm_file->f_mapping &&
> > +	     mapping_cap_account_dirty(vma->vm_file->f_mapping)) ||
> > +	    (vma->vm_ops && vma->vm_ops->page_mkwrite))
> > +		vma->vm_page_prot =
> > +			protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC)];
> > +
> 
> I'm dazzled by the beauty of it!

It's a real beauty isn't it :-)

> Given the change you've made to
> mapping_cap_account_dirty, you don't have to check vma->vm_file->f_mapping
> there; but other than that, seems right to me.  May offend other tastes.

I meant to remove the mapping_cap_account_dirty() changes, so as not to
burden all call sites with the extra conditional. This one extra check
in here does not make the difference between beauty and beast.

> > Index: 2.6-mm/mm/rmap.c
> > ===================================================================
> > --- 2.6-mm.orig/mm/rmap.c	2006-06-22 17:59:07.000000000 +0200
> > +++ 2.6-mm/mm/rmap.c	2006-06-23 01:14:39.000000000 +0200
> > +static int page_mkclean_file(struct address_space *mapping, struct page *page)
> > +{
> > +	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +	struct vm_area_struct *vma;
> > +	struct prio_tree_iter iter;
> > +	int ret = 0;
> > +
> > +	BUG_ON(PageAnon(page));
> > +
> > +	spin_lock(&mapping->i_mmap_lock);
> > +	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> > +		int protect = mapping_cap_account_dirty(mapping) &&
> > +	((vma->vm_flags & (VM_WRITE|VM_SHARED|VM_PFNMAP|VM_INSERTPAGE)) ==
> > +	 		  (VM_WRITE|VM_SHARED));
> 
> I think you can forget about testing VM_PFNMAP|VM_INSERTPAGE here,
> or else	BUG_ON or WARN_ON(vma->vm_flags & (VM_PFNMAP|VM_INSERTPAGE)):
> those just shouldn't arrive here.

Done

> But I'm still puzzled, why did you move page_mkclean out from under
> the mapping_cap_account_dirty tests in page-writeback.c: no explanation
> in your change history.  Doesn't that just add some unnecessary work,
> and this "protect" business?

You're right, added back under the wing of mapping_cap_account_dirty().

> If you're thinking ahead to other changes you want to make, you'd
> do better to add these changes then, and for now just work on the
> VM_SHARED vmas.

But I'll leave page_mkclean() as callable without, that way its more
true to its name.

Peter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-06-23 22:00 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-19 17:52 [PATCH 0/6] mm: tracking dirty pages -v9 Peter Zijlstra
2006-06-19 17:52 ` [PATCH 1/6] mm: tracking shared dirty pages Peter Zijlstra
2006-06-22  5:56   ` Andrew Morton
2006-06-22  6:07     ` Christoph Lameter
2006-06-22  6:15       ` Andrew Morton
2006-06-22 11:33     ` Peter Zijlstra
2006-06-22 13:17       ` Hugh Dickins
2006-06-22 20:52   ` Hugh Dickins
2006-06-22 23:02     ` Peter Zijlstra
2006-06-22 23:39     ` [PATCH] mm: tracking shared dirty pages -v10 Peter Zijlstra
2006-06-23  3:10       ` Jeff Dike
2006-06-23  3:31         ` Andrew Morton
2006-06-23  3:50           ` Jeff Dike
2006-06-23  4:01           ` H. Peter Anvin
2006-06-23 15:08             ` Jeff Dike
2006-06-23  6:08       ` Linus Torvalds
2006-06-23  7:27         ` Hugh Dickins
2006-06-23 17:00           ` Christoph Lameter
2006-06-23 17:22             ` Peter Zijlstra
2006-06-23 17:52               ` Christoph Lameter
2006-06-23 18:11                 ` Martin Bligh
2006-06-23 18:20                   ` Linus Torvalds
2006-06-23 17:56               ` Linus Torvalds
2006-06-23 18:03                 ` Peter Zijlstra
2006-06-23 18:23                   ` Christoph Lameter
2006-06-23 18:41                 ` Christoph Hellwig
2006-06-23 17:49           ` Linus Torvalds
2006-06-23 18:05             ` Arjan van de Ven
2006-06-23 18:08             ` Miklos Szeredi
2006-06-23 19:06       ` Hugh Dickins
2006-06-23 22:00         ` Peter Zijlstra [this message]
2006-06-23 22:35           ` Linus Torvalds
2006-06-23 22:44             ` Peter Zijlstra
2006-06-28 14:58         ` [RFC][PATCH] mm: fixup do_wp_page() Peter Zijlstra
2006-06-28 18:20           ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 2/6] mm: balance dirty pages Peter Zijlstra
2006-06-19 17:53 ` [PATCH 3/6] mm: msync() cleanup Peter Zijlstra
2006-06-22 17:02   ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 4/6] mm: optimize the new mprotect() code a bit Peter Zijlstra
2006-06-22 17:21   ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 5/6] mm: small cleanup of install_page() Peter Zijlstra
2006-06-19 17:53 ` [PATCH 6/6] mm: remove some update_mmu_cache() calls Peter Zijlstra
2006-06-22 16:29   ` Hugh Dickins
2006-06-22 16:37     ` Christoph Lameter
2006-06-22 17:35       ` Hugh Dickins
2006-06-22 18:31         ` Christoph Lameter
2006-06-23 18:27 [PATCH] mm: tracking shared dirty pages -v10 Brian D. McGrew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1151100017.30819.50.camel@lappy \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=christoph@lameter.com \
    --cc=dhowells@redhat.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@google.com \
    --cc=npiggin@suse.de \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox