linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org
Subject: Re: __set_page_dirty_nobuffers superfluous check
Date: Sat, 14 Aug 2004 10:37:17 -0300	[thread overview]
Message-ID: <20040814133717.GA32755@logos.cnet> (raw)
In-Reply-To: <Pine.LNX.4.44.0408140745280.20187-100000@localhost.localdomain>

Hi Hugh!

Thanks for writing this down :)

On Sat, Aug 14, 2004 at 07:46:45AM +0100, Hugh Dickins wrote:
> On Fri, 13 Aug 2004, Marcelo Tosatti wrote:
> 
> > While wandering through mm/page-writeback.c I noticed
> > __set_page_dirty_nobuffers does:
> > 
> > int __set_page_dirty_nobuffers(struct page *page)
> > {
> >         int ret = 0;
> >                                                                                          
> >         if (!TestSetPageDirty(page)) {
> >                 struct address_space *mapping = page_mapping(page);
> >                                                                                          
> >                 if (mapping) {
> >                         spin_lock_irq(&mapping->tree_lock);
> >                         mapping = page_mapping(page);
> >                         if (page_mapping(page)) { /* Race with truncate? */
> >                                 BUG_ON(page_mapping(page) != mapping);    <------------------
> >                                 if (!mapping->backing_dev_info->memory_backed)
> >                                         inc_page_state(nr_dirty);
> >                                 radix_tree_tag_set(&mapping->page_tree,
> >                                         page_index(page), PAGECACHE_TAG_DIRTY);
> >                         }
> > 
> > How could the mapping ever change if we have tree_lock?
> > 
> > Its basically a check which assumes there might be 
> > buggy page->mapping writers who do so without the lock, yes?
> 
> Nicely observed - and four evaluations of page_mapping(page)
> within seven lines is two too many, even if optimized away.
> 
> But actually your interpretation is wrong: because this has evolved
> from a sensible check on something seriously in doubt,
> to a pointless duplication of effort.
> 
> It makes sense if you look at the original, in 2.6.6 or earlier:
> 
> 	if (!TestSetPageDirty(page)) {
> 		struct address_space *mapping = page->mapping;
> 
> 		if (mapping) {
> 			spin_lock_irq(&mapping->tree_lock);
> 			if (page->mapping) {	/* Race with truncate? */
> 				BUG_ON(page->mapping != mapping);
> 				if (!mapping->backing_dev_info->memory_backed)
> 
> What this is actually worrying about (along with truncation suddenly
> setting page->mapping to NULL, won't happen without tree_lock) is
> tmpfs swizzling page->mapping between a tmpfs struct address_space *
> and &swapper_space (move_to/from_swap_cache in swap_state.c); or
> (more distant concern) the page getting reused while we're in here,
> coming in with one page->mapping and then suddenly another.
> 
> It's not doubting that tree_lock protects against that, but _which_
> tree_lock?  If page->mapping suddenly changes underneath us, then
> the spin_lock_irq(&mapping->tree_lock) may have been done on the
> wrong mapping->tree_lock - to lock "mapping->tree_lock" you have
> to choose "mapping" first, but perhaps that's not stable without
> its tree_lock.
> 
> In most cases, the (assumed) hold on the page in question prevents
> page->mapping from changing from one non-NULL to another non-NULL
> here, even without the tree_lock.  But that's not enough to protect
> against the tmpfs swizzling: what protects against that?  Er, er,
> it's the way tmpfs pages only go to swap when not in use, and are
> brought back from swap before being used, and shmem.c insists on
> page lock in each direction; but really it needs an audit of every
> use of set_page_dirty to be sure.  Though I've never heard of the
> (earlier, useful) BUG_ON actually firing.
> 
> And I think you'll find that in practice it's just a waste for ramfs,
> tmpfs and swap to be coming through __set_page_dirty_nobuffers at all:
> since we don't do mpage operations on the "memory_backed" filesystems,
> all the radix-tree tagging and dirty-inode operations on them are just
> a waste of time?  and they'd do better to use a .set_page_dirty which
> just does SetPageDirty.  But akpm does wonder from time to time whether
> to reintroduce mpage operations on at least swap, so may resist such a
> simplification.

Makes sense, why arent tmpfs/swap using mpage operations? 

> Anyway, perhaps a suitable patch to make sense of that BUG_ON would be:
> 
> --- 2.6.8/mm/page-writeback.c	2004-08-10 05:40:21.000000000 +0100
> +++ linux/mm/page-writeback.c	2004-08-14 07:21:58.744468256 +0100
> @@ -580,12 +580,13 @@ int __set_page_dirty_nobuffers(struct pa
>  
>  	if (!TestSetPageDirty(page)) {
>  		struct address_space *mapping = page_mapping(page);
> +		struct address_space *mapping2;
>  
>  		if (mapping) {
>  			spin_lock_irq(&mapping->tree_lock);
> -			mapping = page_mapping(page);
> -			if (page_mapping(page)) { /* Race with truncate? */
> -				BUG_ON(page_mapping(page) != mapping);
> +			mapping2 = page_mapping(page);
> +			if (mapping2) { /* Race with truncate? */
> +				BUG_ON(mapping2 != mapping);
>  				if (!mapping->backing_dev_info->memory_backed)
>  					inc_page_state(nr_dirty);
>  				radix_tree_tag_set(&mapping->page_tree,

I see.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-08-14 13:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-13 18:05 Marcelo Tosatti
2004-08-14  6:46 ` Hugh Dickins
2004-08-14 13:37   ` Marcelo Tosatti [this message]
2004-08-16 12:37     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040814133717.GA32755@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox