Re: [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mel@csn.ul.ie,
	riel@redhat.com, balbir@linux.vnet.ibm.com, andrea@suse.de,
	a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de
Subject: Re: [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure"
Date: Mon, 17 Sep 2007 11:17:25 -0400	[thread overview]
Message-ID: <1190042245.5460.81.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0709141537180.14937@schroedinger.engr.sgi.com>

On Fri, 2007-09-14 at 15:47 -0700, Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Lee Schermerhorn wrote:
> 
> > 1.  for now, use bit 30 in page flags.  This restricts the no reclaim
> >     infrastructure to 64-bit systems.  [The mlock patch, later in this
> >     series, uses another of these 64-bit-system-only flags.]
> > 
> >     Rationale:  32-bit systems have no free page flags and are less
> >     likely to have the large amounts of memory that exhibit the problems
> >     this series attempts to solve.  [I'm sure someone will disabuse me
> >     of this notion.]
> > 
> >     Thus, NORECLAIM currently depends on [CONFIG_]64BIT.
> 
> Hmmm.. Good a creative solution to the page flag dilemma.

Still not sure I can get away with it tho' :-).

> 
> > +#ifdef CONFIG_NORECLAIM
> > +static inline void
> > +add_page_to_noreclaim_list(struct zone *zone, struct page *page)
> > +{
> > +	add_page_to_lru_list(zone, page, LRU_NORECLAIM);
> > +}
> > +
> > +static inline void
> > +del_page_from_noreclaim_list(struct zone *zone, struct page *page)
> > +{
> > +	del_page_from_lru_list(zone, page, LRU_NORECLAIM);
> > +}
> > +#else
> > +static inline void
> > +add_page_to_noreclaim_list(struct zone *zone, struct page *page) { }
> > +
> > +static inline void
> > +del_page_from_noreclaim_list(struct zone *zone, struct page *page) { }
> > +#endif
> > +
> 
> Do we really need to spell these out separately? 

Well, you left the "{add|del}_page_to_[in]active_list() functions, so I
kept these separate as well.  We could make another cleanup pass and
replace all of these with calls to the "{add|del}_page_to_lru_list()"
functions with the appropriate list enum.  

Also, nothing calls del_page_from_noreclaim_list() right now, so we can
probably lose it.

> 
> > Index: Linux/mm/migrate.c
> > ===================================================================
> > --- Linux.orig/mm/migrate.c	2007-09-14 10:17:54.000000000 -0400
> > +++ Linux/mm/migrate.c	2007-09-14 10:21:48.000000000 -0400
> > @@ -52,13 +52,22 @@ int migrate_prep(void)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * move_to_lru() - place @page onto appropriate lru list
> > + * based on preserved page flags:  active, noreclaim, none
> > + */
> >  static inline void move_to_lru(struct page *page)
> >  {
> > -	if (PageActive(page)) {
> > +	if (PageNoreclaim(page)) {
> > +		VM_BUG_ON(PageActive(page));
> > +		ClearPageNoreclaim(page);
> > +		lru_cache_add_noreclaim(page);
> > +	} else if (PageActive(page)) {
> >  		/*
> >  		 * lru_cache_add_active checks that
> >  		 * the PG_active bit is off.
> >  		 */
> > +		VM_BUG_ON(PageNoreclaim(page));	/* race ? */
> >  		ClearPageActive(page);
> >  		lru_cache_add_active(page);
> >  	} else {
> 
> Could this be unified with the generic LRU handling in mm_inline.h? If you 
> have a function that determines the LRU_xxx from the page flags then you 
> can target the right list by indexing.
> 
> Maybe also create a generic lru_cache_add(page, list) function?

Possibly.  When you created the migration facility, you had these as
separate.  It's still private to migrate.c.  There are a number of
different variants of this.  Here, we put the page back onto the
appropriate list based on the Active|Noreclaim|<none> flag, preserved by
isolate_lru_page().  In mm/mlock.c, I have a similar
function--putback_lru_page() that just clears the flags and calls
lru_cache_add_active_or_noreclaim() to retest page_reclaimable() and
chose the appropriate list.  It never chooses the inactive list, tho'
Maybe that's a mistake?.  

Neither of these is a particularly hot path, I think.  So, maybe we can
come up with one that serves both purposes with a steering argument.
I'd want to test it for performance regression, of course.

> 
> > +	 * Non-reclaimable pages shouldn't make it onto the inactive list,
> > +	 * so if we encounter one, we should be scanning either the active
> > +	 * list--e.g., after splicing noreclaim list to end of active list--
> > +	 * or nearby pages [lumpy reclaim].  Take it only if scanning active
> > +	 * list.
> 
> 
> One fleeting thought here: It may be useful to *not* allocate known 
> unreclaimable pages with __GFP_MOVABLE.

Sorry, I don't understand where you're coming from here.
Non-reclaimable pages should be migratable, but maybe __GFP_MOVABLE
means something else?

> 
> > @@ -670,6 +693,8 @@ int __isolate_lru_page(struct page *page
> >  		ret = 0;
> >  	}
> >  
> > +	if (TestClearPageNoreclaim(page))
> > +		SetPageActive(page);	/* will recheck in shrink_active_list */
> >  	return ret;
> >  }
> 
> Failing to do the isoilation in vmscan.c is not an option?

1) This test doesn't fail the isolation.  It just ensures that the
noreclaim flag is cleared and, if it was set, replaces it with Active.
I think this is OK because I only accept non-reclaimable pages if we're
scanning the active list.  This is in support of splicing the noreclaim
list onto the active list when we want to rescan.  As I mentioned in
mail to Peter, I'm not too happy with this approach--my current
implementation anyway.  Need to revisit/discuss this...

2) Since lumpy reclaim, __isolate_lru_page() CAN return -EBUSY and
isolate_lru_pages() will just stick the page back on the list being
scanned.  We do this if the page state [active/inactive] doesn't match
the isolation "mode"--i.e., when lumpy reclaim is looking for physically
adjacent pages.  I also do this if the page is non-reclaimable and
isolate_lru_pages() doesn't specify that it's OK to take them.  As
mentioned above, it's only OK if we're scanning the active list from
shrink_active_list().  I think this whole thing is fragile--thus my
dissatisfaction...

> 
> > @@ -843,6 +870,8 @@ int isolate_lru_page(struct page *page)
> >  			ClearPageLRU(page);
> >  			if (PageActive(page))
> >  				del_page_from_active_list(zone, page);
> > +			else if (PageNoreclaim(page))
> > +				del_page_from_noreclaim_list(zone, page);
> >  			else
> >  				del_page_from_inactive_list(zone, page);
> >  		}
> 
> Another place where an indexing function from page flags to type of LRU 
> list could simplify code.

Agreed.  Need another pass...

> 
> > @@ -933,14 +962,21 @@ static unsigned long shrink_inactive_lis
> >  			VM_BUG_ON(PageLRU(page));
> >  			SetPageLRU(page);
> >  			list_del(&page->lru);
> > -			add_page_to_lru_list(zone, page, PageActive(page));
> > +			if (PageActive(page)) {
> > +				VM_BUG_ON(PageNoreclaim(page));
> > +				add_page_to_active_list(zone, page);
> > +			} else if (PageNoreclaim(page)) {
> > +				VM_BUG_ON(PageActive(page));
> > +				add_page_to_noreclaim_list(zone, page);
> > +			} else
> > +				add_page_to_inactive_list(zone, page);
> >  			if (!pagevec_add(&pvec, page)) {
> 
> Ditto.
> 
> > +void putback_all_noreclaim_pages(void)
> > +{
> > +	struct zone *zone;
> > +
> > +	for_each_zone(zone) {
> > +		spin_lock(&zone->lru_lock);
> > +
> > +		list_splice(&zone->list[LRU_NORECLAIM],
> > +				&zone->list[LRU_ACTIVE]);
> > +		INIT_LIST_HEAD(&zone->list[LRU_NORECLAIM]);
> > +
> > +		zone_page_state_add(zone_page_state(zone, NR_NORECLAIM), zone,
> > +								NR_ACTIVE);
> > +		atomic_long_set(&zone->vm_stat[NR_NORECLAIM], 0);
> 
> Racy if multiple reclaims are ongoing. Better subtract the value via 
> mod_zone_page_state

OK.  I'll make that change.  But, again, I need to revisit this entire
concept--splicing the noreclaim list back to the active.

Thanks for the review.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-09-17 15:17 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-14 20:53 [PATCH/RFC 0/14] Page Reclaim Scalability Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 1/14] Reclaim Scalability: Convert anon_vma lock to read/write lock Lee Schermerhorn
2007-09-17 11:02   ` Mel Gorman
2007-09-18  2:41     ` KAMEZAWA Hiroyuki
2007-09-18 11:01       ` Mel Gorman
2007-09-18 14:57         ` Rik van Riel
2007-09-18 15:37       ` Lee Schermerhorn
2007-09-18 20:17     ` Lee Schermerhorn
2007-09-20 10:19       ` Mel Gorman
2007-09-14 20:54 ` [PATCH/RFC 2/14] Reclaim Scalability: convert inode i_mmap_lock to reader/writer lock Lee Schermerhorn
2007-09-17 12:53   ` Mel Gorman
2007-09-20  1:24   ` Andrea Arcangeli
2007-09-20 14:10     ` Lee Schermerhorn
2007-09-20 14:16       ` Andrea Arcangeli
2007-09-14 20:54 ` [PATCH/RFC 3/14] Reclaim Scalability: move isolate_lru_page() to vmscan.c Lee Schermerhorn
2007-09-14 21:34   ` Peter Zijlstra
2007-09-15  1:55     ` Rik van Riel
2007-09-17 14:11     ` Lee Schermerhorn
2007-09-17  9:20   ` Balbir Singh
2007-09-17 19:19     ` Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 4/14] Reclaim Scalability: Define page_anon() function Lee Schermerhorn
2007-09-15  2:00   ` Rik van Riel
2007-09-17 13:19   ` Mel Gorman
2007-09-18  1:58   ` KAMEZAWA Hiroyuki
2007-09-18  2:27     ` Rik van Riel
2007-09-18  2:40       ` KAMEZAWA Hiroyuki
2007-09-18 15:04     ` Lee Schermerhorn
2007-09-18 19:41       ` Christoph Lameter
2007-09-19  0:30       ` KAMEZAWA Hiroyuki
2007-09-19 16:58         ` Lee Schermerhorn
2007-09-20  0:56           ` KAMEZAWA Hiroyuki
2007-09-14 20:54 ` [PATCH/RFC 5/14] Reclaim Scalability: Use an indexed array for LRU variables Lee Schermerhorn
2007-09-17 13:40   ` Mel Gorman
2007-09-17 14:17     ` Lee Schermerhorn
2007-09-17 14:39       ` Lee Schermerhorn
2007-09-17 18:58   ` Balbir Singh
2007-09-17 19:12     ` Lee Schermerhorn
2007-09-17 19:36       ` Balbir Singh
2007-09-17 19:36     ` Rik van Riel
2007-09-17 20:21       ` Balbir Singh
2007-09-17 21:01         ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure" Lee Schermerhorn
2007-09-14 22:47   ` Christoph Lameter
2007-09-17 15:17     ` Lee Schermerhorn [this message]
2007-09-17 18:41       ` Christoph Lameter
2007-09-18  9:54         ` Mel Gorman
2007-09-18 19:45           ` Christoph Lameter
2007-09-19 11:11             ` Mel Gorman
2007-09-19 18:03               ` Christoph Lameter
2007-09-19  6:00   ` Balbir Singh
2007-09-19 14:47     ` Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 7/14] Reclaim Scalability: Non-reclaimable page statistics Lee Schermerhorn
2007-09-17  1:56   ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 8/14] Reclaim Scalability: Ram Disk Pages are non-reclaimable Lee Schermerhorn
2007-09-17  1:57   ` Rik van Riel
2007-09-17 14:40     ` Lee Schermerhorn
2007-09-17 18:42       ` Christoph Lameter
2007-09-14 20:54 ` [PATCH/RFC 9/14] Reclaim Scalability: SHM_LOCKED pages are nonreclaimable Lee Schermerhorn
2007-09-17  2:18   ` Rik van Riel
2007-09-14 20:55 ` [PATCH/RFC 10/14] Reclaim Scalability: track anon_vma "related vmas" Lee Schermerhorn
2007-09-17  2:52   ` Rik van Riel
2007-09-17 15:52     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 11/14] Reclaim Scalability: swap backed pages are nonreclaimable when no swap space available Lee Schermerhorn
2007-09-17  2:53   ` Rik van Riel
2007-09-18 17:46     ` Lee Schermerhorn
2007-09-18 20:01       ` Rik van Riel
2007-09-19 14:55         ` Lee Schermerhorn
2007-09-18  2:59   ` KAMEZAWA Hiroyuki
2007-09-18 15:47     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 12/14] Reclaim Scalability: Non-reclaimable Mlock'ed pages Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 13/14] Reclaim Scalability: Handle Mlock'ed pages during map/unmap and truncate Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 14/14] Reclaim Scalability: cull non-reclaimable anon pages in fault path Lee Schermerhorn
2007-09-14 21:11 ` [PATCH/RFC 0/14] Page Reclaim Scalability Peter Zijlstra
2007-09-14 21:42   ` Linus Torvalds
2007-09-14 22:02     ` Peter Zijlstra
2007-09-15  0:07       ` Linus Torvalds
2007-09-17  6:44 ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1190042245.5460.81.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox