linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 5/6] mm/page_alloc: Free pages in a single pass during bulk free
Date: Fri, 18 Feb 2022 20:13:04 +0800	[thread overview]
Message-ID: <Yg+NUEazDNNtpVzv@ziqianlu-nuc9qn> (raw)
In-Reply-To: <20220218094716.GY3366@techsingularity.net>

On Fri, Feb 18, 2022 at 09:47:16AM +0000, Mel Gorman wrote:
> On Fri, Feb 18, 2022 at 02:07:42PM +0800, Aaron Lu wrote:
> > > @@ -1498,12 +1508,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> > >  			if (bulkfree_pcp_prepare(page))
> > >  				continue;
> > >  
> > > -			/* Encode order with the migratetype */
> > > -			page->index <<= NR_PCP_ORDER_WIDTH;
> > > -			page->index |= order;
> > > -
> > > -			list_add_tail(&page->lru, &head);
> > > -
> > >  			/*
> > >  			 * We are going to put the page back to the global
> > >  			 * pool, prefetch its buddy to speed up later access
> > > @@ -1517,36 +1521,18 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> > >  				prefetch_buddy(page, order);
> > >  				prefetch_nr--;
> > >  			}
> > 
> > The comment above 'if (prefetch_nr)' says: "We are going to put the page
> > back to the global pool, prefetch its buddy to speed up later access
> > under zone->lock..." will have to be modified as the prefetch is now
> > done inside the lock.
> > 
> 
> Yes, that was my understanding.
> 
> > I remember prefetch_buddy()'s original intent is to fetch the buddy
> > page's 'struct page' before acquiring the zone lock to speed up
> > operations inside the locked region. Now that the zone lock is acquired
> > early, whether to still keep the prefetch_buddy() inside the lock
> > becomes questionable.
> > 
> 
> I agree. I wanted to take it out but worried it might stall (drumroll)
> the rest of the series as evaluating prefetch is machine specific. Before

Understood.

> this series I thought it was possible that the prefetched lines would be
> flushed if the lists were large enough. Due to free_factor, it's possible
> we are 10's of thousands of pages and the prefetched pages would be
> evicted. It would require a fairly small cache though.

Makes sense.

> 
> There are still two reasons why I thought it should go away as a
> follow-up to the series.
> 
> 1. There is a guaranteed cost to calculating the buddy which definitely
>    has to be calculated again. However, as the zone lock is held and
>    there is no deferring of buddy merging, there is no guarantee that the
>    prefetch will have completed when the second buddy calculation takes
>    place and buddies are being merged.  With or without the prefetch, there
>    may be further stalls depending on how many pages get merged. In other
>    words, a stall due to merging is inevitable and at best only one stall
>    might be avoided at the cost of calculating the buddy location twice.
> 
> 2. As the zone lock is held, prefetch_nr makes less sense as once
>    prefetch_nr expires, the cache lines of interest have already been
>    merged.
> 
> It's point 1 that was my main concern. We are paying a guaranteed cost for
> a maybe win if prefetching is fast enough and it would be very difficult to
> spot what percentage of prefetches actually helped. It was more clear-cut
> when the buddy freeing was deferred as there was more time for the prefetch
> to complete.

Both points make sense to me.

I'm also thinking since zone lock contention is much better
now(presumbly due to your free_factor patchset) than before, these
techniques(pick pages to free before acquiring lock and prefetch buddy
on free path) make less sense now.

> 
> > After the nr_task=4/16/64 tests finished, I'll also test the effect of
> > removing prefetch_buddy() here.
> > 
> 
> I'd appreciate it. I think the patch is this (build tested only);
> 

Looks good to me, thanks!

> --8<--
> mm/page_alloc: Do not prefetch buddies during bulk free
> 
> free_pcppages_bulk() has taken two passes through the pcp lists since
> commit 0a5f4e5b4562 ("mm/free_pcppages_bulk: do not hold lock when picking
> pages to free") due to deferring the cost of selecting PCP lists until
> the zone lock is held.
> 
> As the list processing now takes place under the zone lock, it's less
> clear that this will always benefit for two reasons.
> 
> 1. There is a guaranteed cost to calculating the buddy which definitely
>    has to be calculated again. However, as the zone lock is held and
>    there is no deferring of buddy merging, there is no guarantee that the
>    prefetch will have completed when the second buddy calculation takes
>    place and buddies are being merged.  With or without the prefetch, there
>    may be further stalls depending on how many pages get merged. In other
>    words, a stall due to merging is inevitable and at best only one stall
>    might be avoided at the cost of calculating the buddy location twice.
> 
> 2. As the zone lock is held, prefetch_nr makes less sense as once
>    prefetch_nr expires, the cache lines of interest have already been
>    merged.
> 
> The main concern is that there is a definite cost to calculating the
> buddy location early for the prefetch and it is a "maybe win" depending
> on whether the CPU prefetch logic and memory is fast enough. Remove the
> prefetch logic on the basis that reduced instructions in a path is always
> a saving where as the prefetch might save one memory stall depending on
> the CPU and memory.
> 
> Suggested-by: Aaron Lu <aaron.lu@intel.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/page_alloc.c | 24 ------------------------
>  1 file changed, 24 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index de9f072d23bd..2d5cc098136d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1432,15 +1432,6 @@ static bool bulkfree_pcp_prepare(struct page *page)
>  }
>  #endif /* CONFIG_DEBUG_VM */
>  
> -static inline void prefetch_buddy(struct page *page, unsigned int order)
> -{
> -	unsigned long pfn = page_to_pfn(page);
> -	unsigned long buddy_pfn = __find_buddy_pfn(pfn, order);
> -	struct page *buddy = page + (buddy_pfn - pfn);
> -
> -	prefetch(buddy);
> -}
> -
>  /*
>   * Frees a number of pages from the PCP lists
>   * Assumes all pages on list are in same zone.
> @@ -1453,7 +1444,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  	int min_pindex = 0;
>  	int max_pindex = NR_PCP_LISTS - 1;
>  	unsigned int order;
> -	int prefetch_nr = READ_ONCE(pcp->batch);
>  	bool isolated_pageblocks;
>  	struct page *page;
>  
> @@ -1508,20 +1498,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  			if (bulkfree_pcp_prepare(page))
>  				continue;
>  
> -			/*
> -			 * We are going to put the page back to the global
> -			 * pool, prefetch its buddy to speed up later access
> -			 * under zone->lock. It is believed the overhead of
> -			 * an additional test and calculating buddy_pfn here
> -			 * can be offset by reduced memory latency later. To
> -			 * avoid excessive prefetching due to large count, only
> -			 * prefetch buddy for the first pcp->batch nr of pages.
> -			 */
> -			if (prefetch_nr) {
> -				prefetch_buddy(page, order);
> -				prefetch_nr--;
> -			}
> -
>  			/* MIGRATE_ISOLATE page should not go to pcplists */
>  			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
>  			/* Pageblock could have been isolated meanwhile */


  reply	other threads:[~2022-02-18 12:13 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17  0:22 [PATCH v2 0/6] Follow-up on high-order PCP caching Mel Gorman
2022-02-17  0:22 ` [PATCH 1/6] mm/page_alloc: Fetch the correct pcp buddy during bulk free Mel Gorman
2022-02-17  1:43   ` Aaron Lu
2022-02-17  0:22 ` [PATCH 2/6] mm/page_alloc: Track range of active PCP lists " Mel Gorman
2022-02-17  9:41   ` Vlastimil Babka
2022-02-17  0:22 ` [PATCH 3/6] mm/page_alloc: Simplify how many pages are selected per pcp list " Mel Gorman
2022-02-17  0:22 ` [PATCH 4/6] mm/page_alloc: Drain the requested list first " Mel Gorman
2022-02-17  9:42   ` Vlastimil Babka
2022-02-17  0:22 ` [PATCH 5/6] mm/page_alloc: Free pages in a single pass " Mel Gorman
2022-02-17  1:53   ` Aaron Lu
2022-02-17  8:49     ` Aaron Lu
2022-02-17  9:31     ` Mel Gorman
2022-02-18  4:20       ` Aaron Lu
2022-02-18  9:20         ` Mel Gorman
2022-02-21 13:38         ` Aaron Lu
2022-02-23 11:30           ` Aaron Lu
2022-02-23 13:05             ` Mel Gorman
2022-02-24  1:34               ` Lu, Aaron
2022-02-18  6:07   ` Aaron Lu
2022-02-18  9:47     ` Mel Gorman
2022-02-18 12:13       ` Aaron Lu [this message]
2022-02-17  0:22 ` [PATCH 6/6] mm/page_alloc: Limit number of high-order pages on PCP " Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yg+NUEazDNNtpVzv@ziqianlu-nuc9qn \
    --to=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox