Re: [RFC][PATCH 3/3] a big contig memory allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Bob Liu <lliubbo@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"minchan.kim@gmail.com" <minchan.kim@gmail.com>,
	andi.kleen@intel.com, KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	fujita.tomonori@lab.ntt.co.jp, felipe.contreras@gmail.com
Subject: Re: [RFC][PATCH 3/3] a big contig memory allocator
Date: Fri, 29 Oct 2010 13:02:51 +0900	[thread overview]
Message-ID: <20101029130251.f82f6925.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <AANLkTik-d4-6xN6BFYNcAOyR3P7uJDB-0ucr6Uks3AXv@mail.gmail.com>

On Fri, 29 Oct 2010 11:55:10 +0800
Bob Liu <lliubbo@gmail.com> wrote:

> On Tue, Oct 26, 2010 at 6:08 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Add an function to allocate contiguous memory larger than MAX_ORDER.
> > The main difference between usual page allocator is that this uses
> > memory offline technique (Isolate pages and migrate remaining pages.).
> >
> > I think this is not 100% solution because we can't avoid fragmentation,
> > but we have kernelcore= boot option and can create MOVABLE zone. That
> > helps us to allow allocate a contiguous range on demand.
> >
> > The new function is
> >
> > A alloc_contig_pages(base, end, nr_pages, alignment)
> >
> > This function will allocate contiguous pages of nr_pages from the range
> > [base, end). If [base, end) is bigger than nr_pages, some pfn which
> > meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
> > it will be raised to be MAX_ORDER.
> >
> > __alloc_contig_pages() has much more arguments.
> >
> > Some drivers allocates contig pages by bootmem or hiding some memory
> > from the kernel at boot. But if contig pages are necessary only in some
> > situation, kernelcore= boot option and using page migration is a choice.
> >
> > Note: I'm not 100% sure __GFP_HARDWALL check is required or not..
> >
> >
> > Changelog: 2010-10-26
> > A - support gfp_t
> > A - support zonelist/nodemask
> > A - support [base, end)
> > A - support alignment
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A include/linux/page-isolation.h | A  15 ++
> > A mm/page_alloc.c A  A  A  A  A  A  A  A | A  29 ++++
> > A mm/page_isolation.c A  A  A  A  A  A | A 239 +++++++++++++++++++++++++++++++++++++++++
> > A 3 files changed, 283 insertions(+)
> >
> > Index: mmotm-1024/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-1024.orig/mm/page_isolation.c
> > +++ mmotm-1024/mm/page_isolation.c
> > @@ -5,6 +5,7 @@
> > A #include <linux/mm.h>
> > A #include <linux/page-isolation.h>
> > A #include <linux/pageblock-flags.h>
> > +#include <linux/swap.h>
> > A #include <linux/memcontrol.h>
> > A #include <linux/migrate.h>
> > A #include <linux/memory_hotplug.h>
> > @@ -398,3 +399,241 @@ retry:
> > A  A  A  A }
> > A  A  A  A return 0;
> > A }
> > +
> > +/*
> > + * Comparing user specified [user_start, user_end) with physical memory layout
> > + * [phys_start, phys_end). If no intersection of length nr_pages, return 1.
> > + * If there is an intersection, return 0 and fill range in [*start, *end)
> > + */
> > +static int
> > +__calc_search_range(unsigned long user_start, unsigned long user_end,
> > + A  A  A  A  A  A  A  unsigned long nr_pages,
> > + A  A  A  A  A  A  A  unsigned long phys_start, unsigned long phys_end,
> > + A  A  A  A  A  A  A  unsigned long *start, unsigned long *end)
> > +{
> > + A  A  A  if ((user_start >= phys_end) || (user_end <= phys_start))
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  if (user_start <= phys_start) {
> > + A  A  A  A  A  A  A  *start = phys_start;
> > + A  A  A  A  A  A  A  *end = min(user_end, phys_end);
> > + A  A  A  } else {
> > + A  A  A  A  A  A  A  *start = user_start;
> > + A  A  A  A  A  A  A  *end = min(user_end, phys_end);
> > + A  A  A  }
> > + A  A  A  if (*end - *start < nr_pages)
> > + A  A  A  A  A  A  A  return 1;
> > + A  A  A  return 0;
> > +}
> > +
> > +
> > +/**
> > + * __alloc_contig_pages - allocate a contiguous physical pages
> > + * @base: the lowest pfn which caller wants.
> > + * @end: A the highest pfn which caller wants.
> > + * @nr_pages: the length of a chunk of pages to be allocated.
> > + * @align_order: alignment of start address of returned chunk in order.
> > + * A  Returned' page's order will be aligned to (1 << align_order).If smaller
> > + * A  than MAX_ORDER, it's raised to MAX_ORDER.
> > + * @node: allocate near memory to the node, If -1, current node is used.
> > + * @gfpflag: used to specify what zone the memory should be from.
> > + * @nodemask: allocate memory within the nodemask.
> > + *
> > + * Search a memory range [base, end) and allocates physically contiguous
> > + * pages. If end - base is larger than nr_pages, a chunk in [base, end) will
> > + * be allocated
> > + *
> > + * This returns a page of the beginning of contiguous block. At failure, NULL
> > + * is returned.
> > + *
> > + * Limitation: at allocation, nr_pages may be increased to be aligned to
> > + * MAX_ORDER before searching a range. So, even if there is a enough chunk
> > + * for nr_pages, it may not be able to be allocated. Extra tail pages of
> > + * allocated chunk is returned to buddy allocator before returning the caller.
> > + */
> > +
> > +#define MIGRATION_RETRY A  A  A  A (5)
> > +struct page *__alloc_contig_pages(unsigned long base, unsigned long end,
> > + A  A  A  A  A  A  A  A  A  A  A  unsigned long nr_pages, int align_order,
> > + A  A  A  A  A  A  A  A  A  A  A  int node, gfp_t gfpflag, nodemask_t *mask)
> > +{
> > + A  A  A  unsigned long found, aligned_pages, start;
> > + A  A  A  struct page *ret = NULL;
> > + A  A  A  int migration_failed;
> > + A  A  A  bool no_search = false;
> > + A  A  A  unsigned long align_mask;
> > + A  A  A  struct zoneref *z;
> > + A  A  A  struct zone *zone;
> > + A  A  A  struct zonelist *zonelist;
> > + A  A  A  enum zone_type highzone_idx = gfp_zone(gfpflag);
> > + A  A  A  unsigned long zone_start, zone_end, rs, re, pos;
> > +
> > + A  A  A  if (node == -1)
> > + A  A  A  A  A  A  A  node = numa_node_id();
> > +
> > + A  A  A  /* check unsupported flags */
> > + A  A  A  if (gfpflag & __GFP_NORETRY)
> > + A  A  A  A  A  A  A  return NULL;
> > + A  A  A  if ((gfpflag & (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)) !=
> > + A  A  A  A  A  A  A  (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL))
> > + A  A  A  A  A  A  A  return NULL;
> > +
> > + A  A  A  if (gfpflag & __GFP_THISNODE)
> > + A  A  A  A  A  A  A  zonelist = &NODE_DATA(node)->node_zonelists[1];
> > + A  A  A  else
> > + A  A  A  A  A  A  A  zonelist = &NODE_DATA(node)->node_zonelists[0];
> > + A  A  A  /*
> > + A  A  A  A * Base/nr_page/end should be aligned to MAX_ORDER
> > + A  A  A  A */
> > + A  A  A  found = 0;
> > +
> > + A  A  A  if (align_order < MAX_ORDER)
> > + A  A  A  A  A  A  A  align_order = MAX_ORDER;
> > +
> > + A  A  A  align_mask = (1 << align_order) - 1;
> > + A  A  A  if (end - base == nr_pages)
> > + A  A  A  A  A  A  A  no_search = true;
> 
> no_search is not used ?
> 
Ah, yes. I wanted to remove this and I missed this one.
But I have to do check again whether no_search check is required or not..

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-10-29  4:08 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-26 10:00 [RFC][PATCH 0/3] big chunk memory allocator v2 KAMEZAWA Hiroyuki
2010-10-26 10:02 ` [RFC][PATCH 1/3] move code from memory_hotplug to page_isolation KAMEZAWA Hiroyuki
2010-10-26 10:04 ` [RFC][PATCH 2/3] a help function for find physically contiguous block KAMEZAWA Hiroyuki
2010-10-29  3:53   ` Bob Liu
2010-10-29  4:00     ` KAMEZAWA Hiroyuki
2010-10-26 10:08 ` [RFC][PATCH 3/3] a big contig memory allocator KAMEZAWA Hiroyuki
2010-10-29  3:55   ` Bob Liu
2010-10-29  4:02     ` KAMEZAWA Hiroyuki [this message]
2010-10-27 23:22 ` [RFC][PATCH 0/3] big chunk memory allocator v2 Minchan Kim
2010-10-29  9:20   ` Michał Nazarewicz
2010-10-29 10:31     ` Andi Kleen
2010-10-29 10:59       ` KAMEZAWA Hiroyuki
2010-10-29 12:29         ` Andi Kleen
2010-10-29 12:31           ` KAMEZAWA Hiroyuki
2010-10-29 12:43           ` Michał Nazarewicz
2010-10-29 14:27             ` Andi Kleen
2010-10-29 14:58               ` Michał Nazarewicz
2010-10-29 13:11       ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101029130251.f82f6925.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=andi.kleen@intel.com \
    --cc=felipe.contreras@gmail.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lliubbo@gmail.com \
    --cc=minchan.kim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox