linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Don't set/test/wait-for radix tree tags if no capability
@ 2006-09-13 19:35 Lee Schermerhorn
  2006-09-13 20:51 ` Christoph Lameter
  0 siblings, 1 reply; 6+ messages in thread
From: Lee Schermerhorn @ 2006-09-13 19:35 UTC (permalink / raw)
  To: linux-mm

While debugging a problem [in the out-of-tree migration cache], I
noticed a lot of radix-tree tag activity for address spaces that have
the BDI_CAP_NO_{ACCT_DIRTY|WRITEBACK} capability flags set--effectively
disabling these capabilities--in their backing device.  Altho'
functionally benign, I believe that this unnecessary overhead.  Seeking
contrary opinions.

Would this patch, if correct, be worthwhile?  Seems not to cause any
problem, and does eliminate the tag handling for swap space and similar
address spaces.

---

page-writeback functions are setting/testing radix-tree tags for
mappings whose backing device has the corresponding "capabilities"
turned off via the BDI_CAP_NO_{ACCT_DIRTY|WRITEBACK} flags.

This patch makes all setting/getting/waiting-on the radix-tree
tags conditional on the corresponding mapping capability, 
eliminating a lot of unnecessary radix tree traversal--e.g.,
for swap cache.

Note:  perhaps all of the tags should be conditional on the 
'_WRITEBACK capability alone, because if the mapping doesn't
support writeback, there is no need to track dirty pages in the
cache, whether or not dirty page accounting is enabled.  However,
currently, all mappings that have one of the capability flags set
also have the other flag set.  Might not always be the case, tho.

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/filemap.c        |    2 +-
 mm/page-writeback.c |   21 ++++++++++++---------
 2 files changed, 13 insertions(+), 10 deletions(-)

Index: linux-2.6.18-rc6-mm2/mm/page-writeback.c
===================================================================
--- linux-2.6.18-rc6-mm2.orig/mm/page-writeback.c	2006-09-13 11:46:50.000000000 -0400
+++ linux-2.6.18-rc6-mm2/mm/page-writeback.c	2006-09-13 14:44:57.000000000 -0400
@@ -766,11 +766,13 @@ int __set_page_dirty_nobuffers(struct pa
 			mapping2 = page_mapping(page);
 			if (mapping2) { /* Race with truncate? */
 				BUG_ON(mapping2 != mapping);
-				if (mapping_cap_account_dirty(mapping))
+				if (mapping_cap_account_dirty(mapping)) {
 					__inc_zone_page_state(page,
 								NR_FILE_DIRTY);
-				radix_tree_tag_set(&mapping->page_tree,
-					page_index(page), PAGECACHE_TAG_DIRTY);
+					radix_tree_tag_set(&mapping->page_tree,
+							page_index(page),
+							PAGECACHE_TAG_DIRTY);
+				}
 			}
 			write_unlock_irq(&mapping->tree_lock);
 			if (mapping->host) {
@@ -855,9 +857,10 @@ int test_clear_page_dirty(struct page *p
 	if (mapping) {
 		write_lock_irqsave(&mapping->tree_lock, flags);
 		if (TestClearPageDirty(page)) {
-			radix_tree_tag_clear(&mapping->page_tree,
-						page_index(page),
-						PAGECACHE_TAG_DIRTY);
+			if (mapping_cap_account_dirty(mapping))
+				radix_tree_tag_clear(&mapping->page_tree,
+							page_index(page),
+							PAGECACHE_TAG_DIRTY);
 			write_unlock_irqrestore(&mapping->tree_lock, flags);
 			/*
 			 * We can continue to use `mapping' here because the
@@ -919,7 +922,7 @@ int test_clear_page_writeback(struct pag
 
 		write_lock_irqsave(&mapping->tree_lock, flags);
 		ret = TestClearPageWriteback(page);
-		if (ret)
+		if (ret && mapping_cap_writeback_dirty(mapping))
 			radix_tree_tag_clear(&mapping->page_tree,
 						page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
@@ -940,11 +943,11 @@ int test_set_page_writeback(struct page 
 
 		write_lock_irqsave(&mapping->tree_lock, flags);
 		ret = TestSetPageWriteback(page);
-		if (!ret)
+		if (!ret && mapping_cap_writeback_dirty(mapping))
 			radix_tree_tag_set(&mapping->page_tree,
 						page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
-		if (!PageDirty(page))
+		if (!PageDirty(page) && mapping_cap_account_dirty(mapping))
 			radix_tree_tag_clear(&mapping->page_tree,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
Index: linux-2.6.18-rc6-mm2/mm/filemap.c
===================================================================
--- linux-2.6.18-rc6-mm2.orig/mm/filemap.c	2006-09-13 14:44:44.000000000 -0400
+++ linux-2.6.18-rc6-mm2/mm/filemap.c	2006-09-13 14:44:57.000000000 -0400
@@ -258,7 +258,7 @@ int wait_on_page_writeback_range(struct 
 	int ret = 0;
 	pgoff_t index;
 
-	if (end < start)
+	if (end < start || !mapping_cap_writeback_dirty(mapping))
 		return 0;
 
 	pagevec_init(&pvec, 0);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Don't set/test/wait-for radix tree tags if no capability
  2006-09-13 19:35 [RFC] Don't set/test/wait-for radix tree tags if no capability Lee Schermerhorn
@ 2006-09-13 20:51 ` Christoph Lameter
  2006-09-13 22:12   ` Lee Schermerhorn
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2006-09-13 20:51 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: linux-mm

On Wed, 13 Sep 2006, Lee Schermerhorn wrote:

> While debugging a problem [in the out-of-tree migration cache], I
> noticed a lot of radix-tree tag activity for address spaces that have
> the BDI_CAP_NO_{ACCT_DIRTY|WRITEBACK} capability flags set--effectively
> disabling these capabilities--in their backing device.  Altho'
> functionally benign, I believe that this unnecessary overhead.  Seeking
> contrary opinions.

I do not think that not wanting accounting for dirty pages means that we 
should not mark those dirty. If we do this then filesystems will 
not be able to find the dirty pags for writeout.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Don't set/test/wait-for radix tree tags if no capability
  2006-09-13 20:51 ` Christoph Lameter
@ 2006-09-13 22:12   ` Lee Schermerhorn
  2006-09-14 15:23     ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Lee Schermerhorn @ 2006-09-13 22:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

On Wed, 2006-09-13 at 13:51 -0700, Christoph Lameter wrote:
> On Wed, 13 Sep 2006, Lee Schermerhorn wrote:
> 
> > While debugging a problem [in the out-of-tree migration cache], I
> > noticed a lot of radix-tree tag activity for address spaces that have
> > the BDI_CAP_NO_{ACCT_DIRTY|WRITEBACK} capability flags set--effectively
> > disabling these capabilities--in their backing device.  Altho'
> > functionally benign, I believe that this unnecessary overhead.  Seeking
> > contrary opinions.
> 
> I do not think that not wanting accounting for dirty pages means that we 
> should not mark those dirty. If we do this then filesystems will 
> not be able to find the dirty pags for writeout.

That's why I asked, and why I noted that maybe setting the dirty tags
should be gated by the 'No writeback' capability, rather than the "No
dirty accounting" capability.  But then, maybe "no writeback" doesn't
really mean that the address space/backing device doesn't do
writeback.  

The 'no writeback' capability is set for things like:  configfs,
hugetlbfs, dlmfs, ramfs, cpuset, sysfs, shmem segs, swap, ...  And, as I
mentioned, the 'no dirty accounting' capability happens to be set for
all file systems that set 'no writeback'.  However, I agree that we
shouldn't count on this.  

So, do the file systems need to writeout dirty pages for these file
systems using the radix tree tags?  Just looking where the tags are
queried [radix_tree_gang_lookup_tag()], it appears that tags are only
used by "real" file systems, despite a call from pagevec_lookup_tag()
that resides in mm/swap.c.  And, it appears that the 'no writeback'
capability flag will prevent writeback in some cases.  Not sure if it
catches all.

If we can't gate setting the flags based on the existing capabilities,
maybe we want to define a new cap flag--e.g., BDI_CAP_NO_TAGS--for use
by file systems that don't need the tags set?  Not sure it's worth it,
but could eliminate some cache pollution.

Lee



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Don't set/test/wait-for radix tree tags if no capability
  2006-09-13 22:12   ` Lee Schermerhorn
@ 2006-09-14 15:23     ` Hugh Dickins
  2006-09-14 15:52       ` Lee Schermerhorn
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2006-09-14 15:23 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Christoph Lameter, linux-mm

On Wed, 13 Sep 2006, Lee Schermerhorn wrote:
> On Wed, 2006-09-13 at 13:51 -0700, Christoph Lameter wrote:
> > On Wed, 13 Sep 2006, Lee Schermerhorn wrote:
> > 
> > > While debugging a problem [in the out-of-tree migration cache], I
> > > noticed a lot of radix-tree tag activity for address spaces that have
> > > the BDI_CAP_NO_{ACCT_DIRTY|WRITEBACK} capability flags set--effectively
> > > disabling these capabilities--in their backing device.  Altho'
> > > functionally benign, I believe that this unnecessary overhead.  Seeking
> > > contrary opinions.
> > 
> > I do not think that not wanting accounting for dirty pages means that we 
> > should not mark those dirty. If we do this then filesystems will 
> > not be able to find the dirty pags for writeout.
> 
> That's why I asked, and why I noted that maybe setting the dirty tags
> should be gated by the 'No writeback' capability, rather than the "No
> dirty accounting" capability.  But then, maybe "no writeback" doesn't
> really mean that the address space/backing device doesn't do
> writeback.  
> 
> The 'no writeback' capability is set for things like:  configfs,
> hugetlbfs, dlmfs, ramfs, cpuset, sysfs, shmem segs, swap, ...  And, as I
> mentioned, the 'no dirty accounting' capability happens to be set for
> all file systems that set 'no writeback'.  However, I agree that we
> shouldn't count on this.  

I agree you do need to check it out carefully, but it sounds very
reasonable to me to avoid that radix tree tag overhead on the whole
class of storageless filesystems (and swap plays by those same rules,
despite that it does have backing storage).

If it checks out right at present, I think you can "count on this",
just so long as you insert a suitable BUG_ON somewhere to alert us
if some later mod unconsciously changes the situation (e.g. we find
we do need more dirty page tracking on those currently exempt).

A related saving you can make there, I believe, is to add another
.set_page_dirty variant which does nothing(?) more than SetPageDirty -
swap and tmpfs and probably all those you mentioned above don't really
want to do any more than that there - or didn't two or three years ago,
when I had a patch for that but got diverted - the situation may have
changed significantly since, and no longer be an option.

> 
> So, do the file systems need to writeout dirty pages for these file
> systems using the radix tree tags?  Just looking where the tags are
> queried [radix_tree_gang_lookup_tag()], it appears that tags are only
> used by "real" file systems, despite a call from pagevec_lookup_tag()
> that resides in mm/swap.c.  And, it appears that the 'no writeback'
> capability flag will prevent writeback in some cases.  Not sure if it
> catches all.

You are shamelessly parading your naivete, expecting mm/swap.c to
have something to do with swap.  Perhaps it did once upon a time
(ooh, swap_setup does have something to do with swap), and it still
has a lot to do with the page LRU lists (which are pointless unless
you're swapping in the wider sense); but really it's just mm/misc.c.

Hugh

> 
> If we can't gate setting the flags based on the existing capabilities,
> maybe we want to define a new cap flag--e.g., BDI_CAP_NO_TAGS--for use
> by file systems that don't need the tags set?  Not sure it's worth it,
> but could eliminate some cache pollution.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Don't set/test/wait-for radix tree tags if no capability
  2006-09-14 15:23     ` Hugh Dickins
@ 2006-09-14 15:52       ` Lee Schermerhorn
  2006-09-14 16:20         ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Lee Schermerhorn @ 2006-09-14 15:52 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Christoph Lameter, linux-mm

On Thu, 2006-09-14 at 16:23 +0100, Hugh Dickins wrote:
> On Wed, 13 Sep 2006, Lee Schermerhorn wrote:
> > On Wed, 2006-09-13 at 13:51 -0700, Christoph Lameter wrote:
> > > On Wed, 13 Sep 2006, Lee Schermerhorn wrote:
> > > 
> > > > While debugging a problem [in the out-of-tree migration cache], I
> > > > noticed a lot of radix-tree tag activity for address spaces that have
> > > > the BDI_CAP_NO_{ACCT_DIRTY|WRITEBACK} capability flags set--effectively
> > > > disabling these capabilities--in their backing device.  Altho'
> > > > functionally benign, I believe that this unnecessary overhead.  Seeking
> > > > contrary opinions.
> > > 
> > > I do not think that not wanting accounting for dirty pages means that we 
> > > should not mark those dirty. If we do this then filesystems will 
> > > not be able to find the dirty pags for writeout.
> > 
> > That's why I asked, and why I noted that maybe setting the dirty tags
> > should be gated by the 'No writeback' capability, rather than the "No
> > dirty accounting" capability.  But then, maybe "no writeback" doesn't
> > really mean that the address space/backing device doesn't do
> > writeback.  
> > 
> > The 'no writeback' capability is set for things like:  configfs,
> > hugetlbfs, dlmfs, ramfs, cpuset, sysfs, shmem segs, swap, ...  And, as I
> > mentioned, the 'no dirty accounting' capability happens to be set for
> > all file systems that set 'no writeback'.  However, I agree that we
> > shouldn't count on this.  
> 
> I agree you do need to check it out carefully, but it sounds very
> reasonable to me to avoid that radix tree tag overhead on the whole
> class of storageless filesystems (and swap plays by those same rules,
> despite that it does have backing storage).
> 
> If it checks out right at present, I think you can "count on this",
> just so long as you insert a suitable BUG_ON somewhere to alert us
> if some later mod unconsciously changes the situation (e.g. we find
> we do need more dirty page tracking on those currently exempt).
> 
> A related saving you can make there, I believe, is to add another
> .set_page_dirty variant which does nothing(?) more than SetPageDirty -
> swap and tmpfs and probably all those you mentioned above don't really
> want to do any more than that there - or didn't two or three years ago,
> when I had a patch for that but got diverted - the situation may have
> changed significantly since, and no longer be an option.

Yes.  I considered that at first, when I saw where the tags were getting
set.   But in looking at the existing set_page_dirty variants, it
occurred to me that the BDI capability flags were telling us what file
systems needed dirty page tracking and came up with this patch.

However, a separate function for those file systems might be a bit more
efficient in the paths that call set_page_dirty().  There might still
some benefit to be had in short circuiting some of the wait-on and gang
lookups on the "other side".  But, perhaps those functions never get
called for the file systems we're discussing.

I think I'll rework the patch to gate all of the tags on the
NO_WRITEBACK_DIRTY capability instead of the ACCOUNT_DIRTY cap.  Along
the way, I'll look into which of these changes could be dropped by
adding the new set_page_dirty op function.  As time permits, of
course...

> 
> > 
> > So, do the file systems need to writeout dirty pages for these file
> > systems using the radix tree tags?  Just looking where the tags are
> > queried [radix_tree_gang_lookup_tag()], it appears that tags are only
> > used by "real" file systems, despite a call from pagevec_lookup_tag()
> > that resides in mm/swap.c.  And, it appears that the 'no writeback'
> > capability flag will prevent writeback in some cases.  Not sure if it
> > catches all.
> 
> You are shamelessly parading your naivete, expecting mm/swap.c to
> have something to do with swap.  Perhaps it did once upon a time
> (ooh, swap_setup does have something to do with swap), and it still
> has a lot to do with the page LRU lists (which are pointless unless
> you're swapping in the wider sense); but really it's just mm/misc.c.

So, I guess I shouldn't worry too much about why swapin_readahead() is
in mm/memory.c instead of one of the mm/swap*.c files, huh?

Lee
> 
> Hugh
> 
> > 
> > If we can't gate setting the flags based on the existing capabilities,
> > maybe we want to define a new cap flag--e.g., BDI_CAP_NO_TAGS--for use
> > by file systems that don't need the tags set?  Not sure it's worth it,
> > but could eliminate some cache pollution.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Don't set/test/wait-for radix tree tags if no capability
  2006-09-14 15:52       ` Lee Schermerhorn
@ 2006-09-14 16:20         ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2006-09-14 16:20 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Christoph Lameter, linux-mm

On Thu, 14 Sep 2006, Lee Schermerhorn wrote:
> 
> So, I guess I shouldn't worry too much about why swapin_readahead() is
> in mm/memory.c instead of one of the mm/swap*.c files, huh?

Don't worry too much about it, indeed; though I agree it'd be much
nicer tucked away somewhere else, near read_swap_cache_async or
near valid_swaphandles or the three brought together.

(But if you do want to worry about swapin_readahead, just savour the
absurdity of its NUMA next_vma code: just what is the likelihood that
swap allocation will match a task's address layout?)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-09-14 16:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-13 19:35 [RFC] Don't set/test/wait-for radix tree tags if no capability Lee Schermerhorn
2006-09-13 20:51 ` Christoph Lameter
2006-09-13 22:12   ` Lee Schermerhorn
2006-09-14 15:23     ` Hugh Dickins
2006-09-14 15:52       ` Lee Schermerhorn
2006-09-14 16:20         ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox