* __set_page_dirty_nobuffers superfluous check
@ 2004-08-13 18:05 Marcelo Tosatti
2004-08-14 6:46 ` Hugh Dickins
0 siblings, 1 reply; 4+ messages in thread
From: Marcelo Tosatti @ 2004-08-13 18:05 UTC (permalink / raw)
To: linux-mm
Hi,
While wandering through mm/page-writeback.c I noticed
__set_page_dirty_nobuffers does:
int __set_page_dirty_nobuffers(struct page *page)
{
int ret = 0;
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
if (mapping) {
spin_lock_irq(&mapping->tree_lock);
mapping = page_mapping(page);
if (page_mapping(page)) { /* Race with truncate? */
BUG_ON(page_mapping(page) != mapping); <------------------
if (!mapping->backing_dev_info->memory_backed)
inc_page_state(nr_dirty);
radix_tree_tag_set(&mapping->page_tree,
page_index(page), PAGECACHE_TAG_DIRTY);
}
How could the mapping ever change if we have tree_lock?
Its basically a check which assumes there might be
buggy page->mapping writers who do so without the lock, yes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: __set_page_dirty_nobuffers superfluous check 2004-08-13 18:05 __set_page_dirty_nobuffers superfluous check Marcelo Tosatti @ 2004-08-14 6:46 ` Hugh Dickins 2004-08-14 13:37 ` Marcelo Tosatti 0 siblings, 1 reply; 4+ messages in thread From: Hugh Dickins @ 2004-08-14 6:46 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-mm On Fri, 13 Aug 2004, Marcelo Tosatti wrote: > While wandering through mm/page-writeback.c I noticed > __set_page_dirty_nobuffers does: > > int __set_page_dirty_nobuffers(struct page *page) > { > int ret = 0; > > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > > if (mapping) { > spin_lock_irq(&mapping->tree_lock); > mapping = page_mapping(page); > if (page_mapping(page)) { /* Race with truncate? */ > BUG_ON(page_mapping(page) != mapping); <------------------ > if (!mapping->backing_dev_info->memory_backed) > inc_page_state(nr_dirty); > radix_tree_tag_set(&mapping->page_tree, > page_index(page), PAGECACHE_TAG_DIRTY); > } > > How could the mapping ever change if we have tree_lock? > > Its basically a check which assumes there might be > buggy page->mapping writers who do so without the lock, yes? Nicely observed - and four evaluations of page_mapping(page) within seven lines is two too many, even if optimized away. But actually your interpretation is wrong: because this has evolved from a sensible check on something seriously in doubt, to a pointless duplication of effort. It makes sense if you look at the original, in 2.6.6 or earlier: if (!TestSetPageDirty(page)) { struct address_space *mapping = page->mapping; if (mapping) { spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ BUG_ON(page->mapping != mapping); if (!mapping->backing_dev_info->memory_backed) What this is actually worrying about (along with truncation suddenly setting page->mapping to NULL, won't happen without tree_lock) is tmpfs swizzling page->mapping between a tmpfs struct address_space * and &swapper_space (move_to/from_swap_cache in swap_state.c); or (more distant concern) the page getting reused while we're in here, coming in with one page->mapping and then suddenly another. It's not doubting that tree_lock protects against that, but _which_ tree_lock? If page->mapping suddenly changes underneath us, then the spin_lock_irq(&mapping->tree_lock) may have been done on the wrong mapping->tree_lock - to lock "mapping->tree_lock" you have to choose "mapping" first, but perhaps that's not stable without its tree_lock. In most cases, the (assumed) hold on the page in question prevents page->mapping from changing from one non-NULL to another non-NULL here, even without the tree_lock. But that's not enough to protect against the tmpfs swizzling: what protects against that? Er, er, it's the way tmpfs pages only go to swap when not in use, and are brought back from swap before being used, and shmem.c insists on page lock in each direction; but really it needs an audit of every use of set_page_dirty to be sure. Though I've never heard of the (earlier, useful) BUG_ON actually firing. And I think you'll find that in practice it's just a waste for ramfs, tmpfs and swap to be coming through __set_page_dirty_nobuffers at all: since we don't do mpage operations on the "memory_backed" filesystems, all the radix-tree tagging and dirty-inode operations on them are just a waste of time? and they'd do better to use a .set_page_dirty which just does SetPageDirty. But akpm does wonder from time to time whether to reintroduce mpage operations on at least swap, so may resist such a simplification. Anyway, perhaps a suitable patch to make sense of that BUG_ON would be: --- 2.6.8/mm/page-writeback.c 2004-08-10 05:40:21.000000000 +0100 +++ linux/mm/page-writeback.c 2004-08-14 07:21:58.744468256 +0100 @@ -580,12 +580,13 @@ int __set_page_dirty_nobuffers(struct pa if (!TestSetPageDirty(page)) { struct address_space *mapping = page_mapping(page); + struct address_space *mapping2; if (mapping) { spin_lock_irq(&mapping->tree_lock); - mapping = page_mapping(page); - if (page_mapping(page)) { /* Race with truncate? */ - BUG_ON(page_mapping(page) != mapping); + mapping2 = page_mapping(page); + if (mapping2) { /* Race with truncate? */ + BUG_ON(mapping2 != mapping); if (!mapping->backing_dev_info->memory_backed) inc_page_state(nr_dirty); radix_tree_tag_set(&mapping->page_tree, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: __set_page_dirty_nobuffers superfluous check 2004-08-14 6:46 ` Hugh Dickins @ 2004-08-14 13:37 ` Marcelo Tosatti 2004-08-16 12:37 ` Hugh Dickins 0 siblings, 1 reply; 4+ messages in thread From: Marcelo Tosatti @ 2004-08-14 13:37 UTC (permalink / raw) To: Hugh Dickins; +Cc: linux-mm Hi Hugh! Thanks for writing this down :) On Sat, Aug 14, 2004 at 07:46:45AM +0100, Hugh Dickins wrote: > On Fri, 13 Aug 2004, Marcelo Tosatti wrote: > > > While wandering through mm/page-writeback.c I noticed > > __set_page_dirty_nobuffers does: > > > > int __set_page_dirty_nobuffers(struct page *page) > > { > > int ret = 0; > > > > if (!TestSetPageDirty(page)) { > > struct address_space *mapping = page_mapping(page); > > > > if (mapping) { > > spin_lock_irq(&mapping->tree_lock); > > mapping = page_mapping(page); > > if (page_mapping(page)) { /* Race with truncate? */ > > BUG_ON(page_mapping(page) != mapping); <------------------ > > if (!mapping->backing_dev_info->memory_backed) > > inc_page_state(nr_dirty); > > radix_tree_tag_set(&mapping->page_tree, > > page_index(page), PAGECACHE_TAG_DIRTY); > > } > > > > How could the mapping ever change if we have tree_lock? > > > > Its basically a check which assumes there might be > > buggy page->mapping writers who do so without the lock, yes? > > Nicely observed - and four evaluations of page_mapping(page) > within seven lines is two too many, even if optimized away. > > But actually your interpretation is wrong: because this has evolved > from a sensible check on something seriously in doubt, > to a pointless duplication of effort. > > It makes sense if you look at the original, in 2.6.6 or earlier: > > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page->mapping; > > if (mapping) { > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > BUG_ON(page->mapping != mapping); > if (!mapping->backing_dev_info->memory_backed) > > What this is actually worrying about (along with truncation suddenly > setting page->mapping to NULL, won't happen without tree_lock) is > tmpfs swizzling page->mapping between a tmpfs struct address_space * > and &swapper_space (move_to/from_swap_cache in swap_state.c); or > (more distant concern) the page getting reused while we're in here, > coming in with one page->mapping and then suddenly another. > > It's not doubting that tree_lock protects against that, but _which_ > tree_lock? If page->mapping suddenly changes underneath us, then > the spin_lock_irq(&mapping->tree_lock) may have been done on the > wrong mapping->tree_lock - to lock "mapping->tree_lock" you have > to choose "mapping" first, but perhaps that's not stable without > its tree_lock. > > In most cases, the (assumed) hold on the page in question prevents > page->mapping from changing from one non-NULL to another non-NULL > here, even without the tree_lock. But that's not enough to protect > against the tmpfs swizzling: what protects against that? Er, er, > it's the way tmpfs pages only go to swap when not in use, and are > brought back from swap before being used, and shmem.c insists on > page lock in each direction; but really it needs an audit of every > use of set_page_dirty to be sure. Though I've never heard of the > (earlier, useful) BUG_ON actually firing. > > And I think you'll find that in practice it's just a waste for ramfs, > tmpfs and swap to be coming through __set_page_dirty_nobuffers at all: > since we don't do mpage operations on the "memory_backed" filesystems, > all the radix-tree tagging and dirty-inode operations on them are just > a waste of time? and they'd do better to use a .set_page_dirty which > just does SetPageDirty. But akpm does wonder from time to time whether > to reintroduce mpage operations on at least swap, so may resist such a > simplification. Makes sense, why arent tmpfs/swap using mpage operations? > Anyway, perhaps a suitable patch to make sense of that BUG_ON would be: > > --- 2.6.8/mm/page-writeback.c 2004-08-10 05:40:21.000000000 +0100 > +++ linux/mm/page-writeback.c 2004-08-14 07:21:58.744468256 +0100 > @@ -580,12 +580,13 @@ int __set_page_dirty_nobuffers(struct pa > > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > + struct address_space *mapping2; > > if (mapping) { > spin_lock_irq(&mapping->tree_lock); > - mapping = page_mapping(page); > - if (page_mapping(page)) { /* Race with truncate? */ > - BUG_ON(page_mapping(page) != mapping); > + mapping2 = page_mapping(page); > + if (mapping2) { /* Race with truncate? */ > + BUG_ON(mapping2 != mapping); > if (!mapping->backing_dev_info->memory_backed) > inc_page_state(nr_dirty); > radix_tree_tag_set(&mapping->page_tree, I see. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: __set_page_dirty_nobuffers superfluous check 2004-08-14 13:37 ` Marcelo Tosatti @ 2004-08-16 12:37 ` Hugh Dickins 0 siblings, 0 replies; 4+ messages in thread From: Hugh Dickins @ 2004-08-16 12:37 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-mm On Sat, 14 Aug 2004, Marcelo Tosatti wrote: > > Makes sense, why arent tmpfs/swap using mpage operations? They don't fit together usefully. Because the multipage operations are designed to help disk-based filesystems, gathering together readaheads and writeouts to reduce disk seeking; but tmpfs and swap are cases too special. writepages is important for guaranteeing data to disk efficiently; but tmpfs and swap don't need any such guarantee, sync'ing them is just a waste of effort (and so they're marked as "memory_backed" to avoid it). swap already had its own swapin_readahead (of limited value: swap locality is much less significant than file locality), not much point in trying to convert that over to readpages. tmpfs is mainly in memory, does overflow to swap and thence to disk, but the rules of that exchange are too peculiar to use general routines. When he originated the readpages and writepages operations, akpm did start off calling writepages from vmscan.c. I don't remember just why he dropped that in the end (certainly tmpfs had to suppress it, I forget how it fared with swap), perhaps just too much complication for too little gain. Nowadays, if vmscan is doing too many little file writeouts, the answer is usually to tweak thresholds to kick in pdflush earlier to do the more efficient writepages, rather than try to shoehorn writepages back into vmscan. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-08-16 12:37 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-08-13 18:05 __set_page_dirty_nobuffers superfluous check Marcelo Tosatti 2004-08-14 6:46 ` Hugh Dickins 2004-08-14 13:37 ` Marcelo Tosatti 2004-08-16 12:37 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox