linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hugh@veritas.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Andrea Arcangeli <andrea@suse.de>,
	a.p.zijlstra@chello.nl, torvalds@osdl.org, dhowells@redhat.com,
	linux-mm@kvack.org
Subject: Re: tracking dirty pages patches
Date: Tue, 23 May 2006 15:55:05 +0100 (BST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0605231454440.3700@blonde.wat.veritas.com> (raw)
In-Reply-To: <20060522132905.6e1a711c.akpm@osdl.org>

On Mon, 22 May 2006, Andrew Morton wrote:
> Hugh Dickins <hugh@veritas.com> wrote:
> > 
> > set_page_dirty has always (awkwardly) been liable to be called from
> > very low in the hierarchy; whereas you're assuming clear_page_dirty
> > is called from much higher up.  And in most cases there's no problem
> > (please cross-check to confirm that); but try_to_free_buffers in fs/
> > buffer.c calls it while holding mapping->private_lock - page_wrprotect
> > called from test_clear_page_dirty then violates the order.
> > 
> > If we're lucky and that is indeed the only violation, maybe Andrew
> > can recommend a change to try_to_free_buffers to avoid it: I have
> > no appreciation of the issues at that end myself.
> 
> I had troubles with that as well - tree_lock is a very "inner" lock.  So I
> moved test_clear_page_dirty()'s call to page_wrprotect() to be outside
> tree_lock.

Ah.  I've only been looking at versions after that good change.

> But I don't think you were referring to that

Correct.

>  - I am unable to evaluate your expression "the order".

Sorry.  The order shown in rmap.c goes:

 * mm->mmap_sem
 *   page->flags PG_locked (lock_page)
 *     mapping->i_mmap_lock
 *       anon_vma->lock
 *         mm->page_table_lock or pte_lock
 *           zone->lru_lock (in mark_page_accessed, isolate_lru_page)
 *           swap_lock (in swap_duplicate, swap_info_get)
 *             mmlist_lock (in mmput, drain_mmlist and others)
 *             mapping->private_lock (in __set_page_dirty_buffers)
 *             inode_lock (in set_page_dirty's __mark_inode_dirty)
 *               sb_lock (within inode_lock in fs/fs-writeback.c)
 *               mapping->tree_lock (widely used, in set_page_dirty,
 *                         in arch-dependent flush_dcache_mmap_lock,
 *                         within inode_lock in __sync_single_inode)

> The running of page_wrprotect_file() inside private_lock is a worry, yes. 
> We can move the clear_page_dirty() call in try_to_free_buffers() to be
> outside private_lock.

That would be great (for Peter) if you know it to be safe.

> But I don't know which particular ranking violation you've identified.

page_wrprotect and callees take i_mmap_lock and pte_lock,
neither of which should be taken while private_lock is held.

> > ...
> >
> > (Why does follow_page set_page_dirty at all?  I _think_ it's in case
> > the get_user_pages caller forgets to set_page_dirty when releasing.
> > But that's not how we usually write kernel code, to hide mistakes most
> > of the time,
> 
> Yes, that would be bad.

It would, but you've shown I was wrong in thinking that.

> > and your mods may change the balance there.  Andrew will
> > remember better whether that set_page_dirty has stronger justification.)
> 
> It was added by the below, which nobody was terribly happy with at the
> time.  (Took me 5-10 minutes to hunt this down.  Insert rote comment about
> comments).

Thanks so much for hunting it down, I should have done so myself.

I can well understand s390 looping on its always 0 pte_dirty test,
but very much agree with Nick: the real question then becomes,
what was the point of ever testing for pte_dirty there?
and why don't we just remove the whole
		if ((flags & FOLL_WRITE) &&
		    !pte_dirty(pte) && !PageDirty(page))
			set_page_dirty(page);
from follow_page?

That pte_dirty test first came in 2.4.4-final from
 - Andrea Arkangeli: raw-io fixes
I've just spent some frustrating hours thinking I'd be able to
explain why it was necessary in the old 2.4 context, but actually
I can't.  Vague memory of modifications vanishing under pressure
because of race when pte_dirty was not set; yet the page count is
raised without dropping page_table_lock, and mark_dirty_kiobuf was
introduced in the same patch, to set dirty before releasing.  I've
CC'ed Andrea, but it's a little (a lot!) unfair to expect him to
explain it now.

I do believe that dirty check now should be redundant.  But I
said "should be" not "is" because I still haven't fixed drivers/scsi
sg.c and st.c to use set_page_dirty_lock in place of SetPageDirty.

Last time I was preparing that patch, I got distracted by the over-
whelming feeling that it should be easier to do at interrupt time
(as sg.c needs), but failed to find a satisfying way.  I'd better
revisit that before trying to cut set_page_dirty from follow_page.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-05-23 14:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-22 19:31 Hugh Dickins
2006-05-22 20:29 ` Andrew Morton
2006-05-23  8:17   ` Nick Piggin
2006-05-23 14:55   ` Hugh Dickins [this message]
2006-05-23 16:24 ` Christoph Lameter
2006-05-23 19:21   ` Hugh Dickins
2006-05-23 19:31     ` Christoph Lameter
2006-05-23 20:34       ` Hugh Dickins
2006-05-23 21:16         ` Christoph Lameter
2006-05-23 21:17         ` Chen, Kenneth W
2006-05-23 21:40           ` update_mmu_cache vs. lazy_mmu_prot_update Christoph Lameter
2006-05-24 14:12             ` Hugh Dickins
2006-05-23 22:28         ` remove VM_LOCKED before remap_pfn_range and drop VM_SHM Christoph Lameter
2006-05-24 14:57           ` Hugh Dickins
2006-05-24  2:25         ` tracking dirty pages patches Arjan van de Ven
2006-05-24 15:10           ` Hugh Dickins
2006-05-25  2:26             ` Arjan van de Ven
2006-05-23 16:41 ` David Howells
2006-05-23 23:07 ` Peter Zijlstra
2006-05-24 14:20   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0605231454440.3700@blonde.wat.veritas.com \
    --to=hugh@veritas.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=dhowells@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox