From: Uladzislau Rezki <urezki@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>,
Christoph Hellwig <hch@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Nicholas Piggin <npiggin@gmail.com>,
Hillf Danton <hdanton@sina.com>, Michal Hocko <mhocko@suse.com>,
Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
Date: Wed, 19 May 2021 23:07:50 +0200 [thread overview]
Message-ID: <20210519210750.GA5615@pc638.lan> (raw)
In-Reply-To: <20210519195214.GA2343@pc638.lan>
On Wed, May 19, 2021 at 09:52:14PM +0200, Uladzislau Rezki wrote:
> > On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote:
> > > > > + /*
> > > > > + * If not enough pages were obtained to accomplish an
> > > > > + * allocation request, free them via __vfree() if any.
> > > > > + */
> > > > > + if (area->nr_pages != nr_small_pages) {
> > > > > + warn_alloc(gfp_mask, NULL,
> > > > > + "vmalloc size %lu allocation failure: "
> > > > > + "page order %u allocation failed",
> > > > > + area->nr_pages * PAGE_SIZE, page_order);
> > > > > + goto fail;
> > > > > + }
> > > >
> > > > From reading __alloc_pages_bulk not allocating all pages is something
> > > > that cn happen fairly easily. Shouldn't we try to allocate the missing
> > > > pages manually and/ore retry here?
> > > >
> > >
> > > It is a good point. The bulk-allocator, as i see, only tries to access
> > > to pcp-list and falls-back to a single allocator once it fails, so the
> > > array may not be fully populated.
> > >
> >
> > Partially correct. It does allocate via the pcp-list but the pcp-list will
> > be refilled if it's empty so if the bulk allocator returns fewer pages
> > than requested, it may be due to hitting watermarks or the local zone is
> > depleted. It does not take any special action to correct the situation
> > or stall e.g. wake kswapd, enter direct reclaim, allocate from a remote
> > node etc.
> >
> > If no pages were allocated, it'll try allocate at least one page via a
> > single allocation request in case the bulk failure would push the zone
> > over the watermark but 1 page does not. That path as a side-effect would
> > also wake kswapd.
> >
> OK. A single page allocator can enter a slow path i mean direct reclaim,
> etc to adjust watermarks.
>
> > > In that case probably it makes sense to manually populate it using
> > > single page allocator.
> > >
> > > Mel, could you please also comment on it?
> > >
> >
> > It is by design because it's unknown if callers can recover or if so,
> > how they want to recover and the primary intent behind the bulk allocator
> > was speed. In the case of network, it only wants some pages quickly so as
> > long as it gets 1, it makes progress. For the sunrpc user, it's willing
> > to wait and retry. For vmalloc, I'm unsure what a suitable recovery path
> > should be as I do not have a good handle on workloads that are sensitive
> > to vmalloc performance. The obvious option would be to loop and allocate
> > single pages with alloc_pages_node understanding that the additional
> > pages may take longer to allocate.
> >
> I got it. At least we should fall-back for a single allocator, that is how
> we used to allocate before(now it is for high-order pages). If it also fails
> to obtain a page we are done.
>
> Basically a single-page allocator is more permissive so it is a higher
> chance to success. Therefore a fallback to it makes sense.
>
Hello, Christoph.
See below the patch. Does it sound good for you? It is about moving
page allocation part into separate function:
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b2a0cbfa37c1..18773a4ad5fa 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2756,6 +2756,53 @@ void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot)
EXPORT_SYMBOL_GPL(vmap_pfn);
#endif /* CONFIG_VMAP_PFN */
+static inline unsigned int
+__vmalloc_area_node_get_pages(gfp_t gfp, int nid, unsigned int page_order,
+ unsigned long nr_small_pages, struct page **pages)
+{
+ unsigned int nr_allocated = 0;
+
+ /*
+ * For order-0 pages we make use of bulk allocator, if
+ * the page array is partly or not at all populated due
+ * to fails, fallback to a single page allocator that is
+ * more permissive.
+ */
+ if (!page_order)
+ nr_allocated = alloc_pages_bulk_array_node(
+ gfp, nid, nr_small_pages, pages);
+
+ /* High-order pages or fallback path if "bulk" fails. */
+ while (nr_allocated < nr_small_pages) {
+ struct page *page;
+ int i;
+
+ /*
+ * Compound pages required for remap_vmalloc_page if
+ * high-order pages. For the order-0 the __GFP_COMP
+ * is ignored.
+ */
+ page = alloc_pages_node(nid, gfp | __GFP_COMP, page_order);
+ if (unlikely(!page))
+ break;
+
+ /*
+ * Careful, we allocate and map page_order pages, but
+ * tracking is done per PAGE_SIZE page so as to keep the
+ * vm_struct APIs independent of the physical/mapped size.
+ */
+ for (i = 0; i < (1U << page_order); i++)
+ pages[nr_allocated + i] = page + i;
+
+ if (gfpflags_allow_blocking(gfp))
+ cond_resched();
+
+ nr_allocated += 1U << page_order;
+ }
+
+ return nr_allocated;
+}
+
static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
pgprot_t prot, unsigned int page_shift,
int node)
@@ -2789,37 +2836,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
return NULL;
}
- area->nr_pages = 0;
set_vm_area_page_order(area, page_shift - PAGE_SHIFT);
page_order = vm_area_page_order(area);
- if (!page_order) {
- area->nr_pages = alloc_pages_bulk_array_node(
- gfp_mask, node, nr_small_pages, area->pages);
- } else {
- /*
- * Careful, we allocate and map page_order pages, but tracking is done
- * per PAGE_SIZE page so as to keep the vm_struct APIs independent of
- * the physical/mapped size.
- */
- while (area->nr_pages < nr_small_pages) {
- struct page *page;
- int i;
-
- /* Compound pages required for remap_vmalloc_page */
- page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
- if (unlikely(!page))
- break;
-
- for (i = 0; i < (1U << page_order); i++)
- area->pages[area->nr_pages + i] = page + i;
-
- if (gfpflags_allow_blocking(gfp_mask))
- cond_resched();
-
- area->nr_pages += 1U << page_order;
- }
- }
+ area->nr_pages = __vmalloc_area_node_get_pages(gfp_mask,
+ node, page_order, nr_small_pages, area->pages);
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
<snip>
--
Vlad Rezki
next prev parent reply other threads:[~2021-05-19 21:07 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-16 20:20 [PATCH 0/3] vmalloc() vs bulk allocator v2 Uladzislau Rezki (Sony)
2021-05-16 20:20 ` [PATCH 1/3] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper Uladzislau Rezki (Sony)
2021-05-16 20:20 ` [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node() Uladzislau Rezki (Sony)
2021-05-17 8:24 ` Mel Gorman
2021-05-17 11:51 ` Uladzislau Rezki
2021-05-19 13:44 ` Christoph Hellwig
2021-05-19 14:39 ` Uladzislau Rezki
2021-05-19 15:56 ` Mel Gorman
2021-05-19 19:52 ` Uladzislau Rezki
2021-05-19 21:07 ` Uladzislau Rezki [this message]
2021-06-28 23:00 ` Andrew Morton
2021-05-16 20:20 ` [PATCH 3/3] mm/vmalloc: Print a warning message first on failure Uladzislau Rezki (Sony)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210519210750.GA5615@pc638.lan \
--to=urezki@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=npiggin@gmail.com \
--cc=oleksiy.avramchenko@sonymobile.com \
--cc=rostedt@goodmis.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox