From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org
Subject: Re: __set_page_dirty_nobuffers superfluous check
Date: Sat, 14 Aug 2004 10:37:17 -0300 [thread overview]
Message-ID: <20040814133717.GA32755@logos.cnet> (raw)
In-Reply-To: <Pine.LNX.4.44.0408140745280.20187-100000@localhost.localdomain>
Hi Hugh!
Thanks for writing this down :)
On Sat, Aug 14, 2004 at 07:46:45AM +0100, Hugh Dickins wrote:
> On Fri, 13 Aug 2004, Marcelo Tosatti wrote:
>
> > While wandering through mm/page-writeback.c I noticed
> > __set_page_dirty_nobuffers does:
> >
> > int __set_page_dirty_nobuffers(struct page *page)
> > {
> > int ret = 0;
> >
> > if (!TestSetPageDirty(page)) {
> > struct address_space *mapping = page_mapping(page);
> >
> > if (mapping) {
> > spin_lock_irq(&mapping->tree_lock);
> > mapping = page_mapping(page);
> > if (page_mapping(page)) { /* Race with truncate? */
> > BUG_ON(page_mapping(page) != mapping); <------------------
> > if (!mapping->backing_dev_info->memory_backed)
> > inc_page_state(nr_dirty);
> > radix_tree_tag_set(&mapping->page_tree,
> > page_index(page), PAGECACHE_TAG_DIRTY);
> > }
> >
> > How could the mapping ever change if we have tree_lock?
> >
> > Its basically a check which assumes there might be
> > buggy page->mapping writers who do so without the lock, yes?
>
> Nicely observed - and four evaluations of page_mapping(page)
> within seven lines is two too many, even if optimized away.
>
> But actually your interpretation is wrong: because this has evolved
> from a sensible check on something seriously in doubt,
> to a pointless duplication of effort.
>
> It makes sense if you look at the original, in 2.6.6 or earlier:
>
> if (!TestSetPageDirty(page)) {
> struct address_space *mapping = page->mapping;
>
> if (mapping) {
> spin_lock_irq(&mapping->tree_lock);
> if (page->mapping) { /* Race with truncate? */
> BUG_ON(page->mapping != mapping);
> if (!mapping->backing_dev_info->memory_backed)
>
> What this is actually worrying about (along with truncation suddenly
> setting page->mapping to NULL, won't happen without tree_lock) is
> tmpfs swizzling page->mapping between a tmpfs struct address_space *
> and &swapper_space (move_to/from_swap_cache in swap_state.c); or
> (more distant concern) the page getting reused while we're in here,
> coming in with one page->mapping and then suddenly another.
>
> It's not doubting that tree_lock protects against that, but _which_
> tree_lock? If page->mapping suddenly changes underneath us, then
> the spin_lock_irq(&mapping->tree_lock) may have been done on the
> wrong mapping->tree_lock - to lock "mapping->tree_lock" you have
> to choose "mapping" first, but perhaps that's not stable without
> its tree_lock.
>
> In most cases, the (assumed) hold on the page in question prevents
> page->mapping from changing from one non-NULL to another non-NULL
> here, even without the tree_lock. But that's not enough to protect
> against the tmpfs swizzling: what protects against that? Er, er,
> it's the way tmpfs pages only go to swap when not in use, and are
> brought back from swap before being used, and shmem.c insists on
> page lock in each direction; but really it needs an audit of every
> use of set_page_dirty to be sure. Though I've never heard of the
> (earlier, useful) BUG_ON actually firing.
>
> And I think you'll find that in practice it's just a waste for ramfs,
> tmpfs and swap to be coming through __set_page_dirty_nobuffers at all:
> since we don't do mpage operations on the "memory_backed" filesystems,
> all the radix-tree tagging and dirty-inode operations on them are just
> a waste of time? and they'd do better to use a .set_page_dirty which
> just does SetPageDirty. But akpm does wonder from time to time whether
> to reintroduce mpage operations on at least swap, so may resist such a
> simplification.
Makes sense, why arent tmpfs/swap using mpage operations?
> Anyway, perhaps a suitable patch to make sense of that BUG_ON would be:
>
> --- 2.6.8/mm/page-writeback.c 2004-08-10 05:40:21.000000000 +0100
> +++ linux/mm/page-writeback.c 2004-08-14 07:21:58.744468256 +0100
> @@ -580,12 +580,13 @@ int __set_page_dirty_nobuffers(struct pa
>
> if (!TestSetPageDirty(page)) {
> struct address_space *mapping = page_mapping(page);
> + struct address_space *mapping2;
>
> if (mapping) {
> spin_lock_irq(&mapping->tree_lock);
> - mapping = page_mapping(page);
> - if (page_mapping(page)) { /* Race with truncate? */
> - BUG_ON(page_mapping(page) != mapping);
> + mapping2 = page_mapping(page);
> + if (mapping2) { /* Race with truncate? */
> + BUG_ON(mapping2 != mapping);
> if (!mapping->backing_dev_info->memory_backed)
> inc_page_state(nr_dirty);
> radix_tree_tag_set(&mapping->page_tree,
I see.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-08-14 13:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-13 18:05 Marcelo Tosatti
2004-08-14 6:46 ` Hugh Dickins
2004-08-14 13:37 ` Marcelo Tosatti [this message]
2004-08-16 12:37 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040814133717.GA32755@logos.cnet \
--to=marcelo.tosatti@cyclades.com \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox