From: Huang Ying <ying.huang@intel.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
Arjan Van De Ven <arjan@linux.intel.com>,
Huang Ying <ying.huang@intel.com>,
Mel Gorman <mgorman@techsingularity.net>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
David Hildenbrand <david@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Michal Hocko <mhocko@suse.com>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Matthew Wilcox <willy@infradead.org>,
Christoph Lameter <cl@linux.com>
Subject: [PATCH 05/10] mm, page_alloc: scale the number of pages that are batch allocated
Date: Wed, 20 Sep 2023 14:18:51 +0800 [thread overview]
Message-ID: <20230920061856.257597-6-ying.huang@intel.com> (raw)
In-Reply-To: <20230920061856.257597-1-ying.huang@intel.com>
When a task is allocating a large number of order-0 pages, it may
acquire the zone->lock multiple times allocating pages in batches.
This may unnecessarily contend on the zone lock when allocating very
large number of pages. This patch adapts the size of the batch based
on the recent pattern to scale the batch size for subsequent
allocations.
On a 2-socket Intel server with 224 logical CPU, we tested kbuild on
one socket with `make -j 112`. With the patch, the cycles% of the
spinlock contention (mostly for zone lock) decreases from 40.5% to
37.9% (with PCP size == 361).
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
---
include/linux/mmzone.h | 3 ++-
mm/page_alloc.c | 52 ++++++++++++++++++++++++++++++++++--------
2 files changed, 44 insertions(+), 11 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4132e7490b49..4f7420e35fbb 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -685,9 +685,10 @@ struct per_cpu_pages {
int high; /* high watermark, emptying needed */
int batch; /* chunk size for buddy add/remove */
u8 flags; /* protected by pcp->lock */
+ u8 alloc_factor; /* batch scaling factor during allocate */
u8 free_factor; /* batch scaling factor during free */
#ifdef CONFIG_NUMA
- short expire; /* When 0, remote pagesets are drained */
+ u8 expire; /* When 0, remote pagesets are drained */
#endif
/* Lists of pages, one per migrate type stored on the pcp-lists */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 30554c674349..30bb05fa5353 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2376,6 +2376,12 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
int pindex;
bool free_high = false;
+ /*
+ * On freeing, reduce the number of pages that are batch allocated.
+ * See nr_pcp_alloc() where alloc_factor is increased for subsequent
+ * allocations.
+ */
+ pcp->alloc_factor >>= 1;
__count_vm_events(PGFREE, 1 << order);
pindex = order_to_pindex(migratetype, order);
list_add(&page->pcp_list, &pcp->lists[pindex]);
@@ -2682,6 +2688,41 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
return page;
}
+static int nr_pcp_alloc(struct per_cpu_pages *pcp, int order)
+{
+ int high, batch, max_nr_alloc;
+
+ high = READ_ONCE(pcp->high);
+ batch = READ_ONCE(pcp->batch);
+
+ /* Check for PCP disabled or boot pageset */
+ if (unlikely(high < batch))
+ return 1;
+
+ /*
+ * Double the number of pages allocated each time there is subsequent
+ * refiling of order-0 pages without drain.
+ */
+ if (!order) {
+ max_nr_alloc = max(high - pcp->count - batch, batch);
+ batch <<= pcp->alloc_factor;
+ if (batch <= max_nr_alloc && pcp->alloc_factor < PCP_BATCH_SCALE_MAX)
+ pcp->alloc_factor++;
+ batch = min(batch, max_nr_alloc);
+ }
+
+ /*
+ * Scale batch relative to order if batch implies free pages
+ * can be stored on the PCP. Batch can be 1 for small zones or
+ * for boot pagesets which should never store free pages as
+ * the pages may belong to arbitrary zones.
+ */
+ if (batch > 1)
+ batch = max(batch >> order, 2);
+
+ return batch;
+}
+
/* Remove page from the per-cpu list, caller must protect the list */
static inline
struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
@@ -2694,18 +2735,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
do {
if (list_empty(list)) {
- int batch = READ_ONCE(pcp->batch);
+ int batch = nr_pcp_alloc(pcp, order);
int alloced;
- /*
- * Scale batch relative to order if batch implies
- * free pages can be stored on the PCP. Batch can
- * be 1 for small zones or for boot pagesets which
- * should never store free pages as the pages may
- * belong to arbitrary zones.
- */
- if (batch > 1)
- batch = max(batch >> order, 2);
alloced = rmqueue_bulk(zone, order,
batch, list,
migratetype, alloc_flags);
--
2.39.2
next prev parent reply other threads:[~2023-09-20 6:19 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-20 6:18 [PATCH 00/10] mm: PCP high auto-tuning Huang Ying
2023-09-20 6:18 ` [PATCH 01/10] mm, pcp: avoid to drain PCP when process exit Huang Ying
2023-10-11 12:46 ` Mel Gorman
2023-10-11 17:16 ` Andrew Morton
2023-10-12 13:09 ` Mel Gorman
2023-10-12 13:35 ` Huang, Ying
2023-10-12 12:21 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 02/10] cacheinfo: calculate per-CPU data cache size Huang Ying
2023-09-20 9:24 ` Sudeep Holla
2023-09-22 7:56 ` Huang, Ying
2023-10-11 12:20 ` Mel Gorman
2023-10-12 12:08 ` Huang, Ying
2023-10-12 12:52 ` Mel Gorman
2023-10-12 13:12 ` Huang, Ying
2023-10-12 15:22 ` Mel Gorman
2023-10-13 3:06 ` Huang, Ying
2023-10-16 15:43 ` Mel Gorman
2023-09-20 6:18 ` [PATCH 03/10] mm, pcp: reduce lock contention for draining high-order pages Huang Ying
2023-10-11 12:49 ` Mel Gorman
2023-10-12 12:11 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 04/10] mm: restrict the pcp batch scale factor to avoid too long latency Huang Ying
2023-10-11 12:52 ` Mel Gorman
2023-10-12 12:15 ` Huang, Ying
2023-09-20 6:18 ` Huang Ying [this message]
2023-10-11 12:54 ` [PATCH 05/10] mm, page_alloc: scale the number of pages that are batch allocated Mel Gorman
2023-09-20 6:18 ` [PATCH 06/10] mm: add framework for PCP high auto-tuning Huang Ying
2023-09-20 6:18 ` [PATCH 07/10] mm: tune PCP high automatically Huang Ying
2023-09-20 6:18 ` [PATCH 08/10] mm, pcp: decrease PCP high if free pages < high watermark Huang Ying
2023-10-11 13:08 ` Mel Gorman
2023-10-12 12:19 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 09/10] mm, pcp: avoid to reduce PCP high unnecessarily Huang Ying
2023-10-11 14:09 ` Mel Gorman
2023-10-12 7:48 ` Huang, Ying
2023-10-12 12:49 ` Mel Gorman
2023-10-12 13:19 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 10/10] mm, pcp: reduce detecting time of consecutive high order page freeing Huang Ying
2023-09-20 16:41 ` [PATCH 00/10] mm: PCP high auto-tuning Andrew Morton
2023-09-21 13:32 ` Huang, Ying
2023-09-21 15:46 ` Andrew Morton
2023-09-22 0:33 ` Huang, Ying
2023-10-11 13:05 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230920061856.257597-6-ying.huang@intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=cl@linux.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=jweiner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@soleen.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox