From: Christoph Lameter <cl@linux.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Linux-MM <linux-mm@kvack.org>,
Johannes Weiner <hannes@cmpxchg.org>, Dave Hansen <dave@sr71.net>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/22] mm: page allocator: Allocate/free order-0 pages from a per-zone magazine
Date: Thu, 9 May 2013 18:08:35 +0000 [thread overview]
Message-ID: <0000013e8a7afa80-34067330-12df-4b7e-a2b7-d298c78d3630-000000@email.amazonses.com> (raw)
In-Reply-To: <20130509172721.GG11497@suse.de>
On Thu, 9 May 2013, Mel Gorman wrote:
> > I would be useful if the allocator would hand out pages from the
> > same physical area first. This would reduce fragmentation as well and
> > since it is likely that numerous pages are allocated for some purpose
> > (given that that the page sizes of 4k are rather tiny compared to the data
> > needs these day) would reduce TLB pressure.
> >
>
> It already does this via the buddy allocator and the treatment of
> migratetypes.
Well it only does if it breaks larger sized pages from the buddy
allocator. If large page lists aggregate in the per cpu lists then we have
a LIFO order.
> > Yes. But we have lots of memory in machines these days. Why would that be
> > an issue?
> Because the embedded people will have a fit if the page allocator needs
> an additional 1K+ of memory just to turn on.
Why enlarge the existing per cpu areas? The size could
be restricted if we reduce the types of pages supported and/or if we do
not use double linked lists but single linked lists within say a PMD area.
> With this approach the lock can be made more fine or coarse based on the
> number of CPUs, the queues can be made arbitrarily large and if necessary,
> per-process magazines for heavily contended workloads could be added.
Arbitrarily large queues cause references to pointers all over memory. No
good.
> A fixed-size array like you propose would be only marginally better than
> what is implemented today as far as I can see because it still smacks into
> the irq-safe zone->lock and pages can be pinned in inaccessible per-cpu
> queues unless a global IPI is sent.
We do not send global IPIs but IPIs only to processors that have something
cached.
The fixed size array or constrained single linked list would be better
since it caches better and it is possible to avoid spin lock operations.
> > The problem with the page allocator is that it can serve various types of
> > pages. If one wants to setup caches for all of those then these caches are
> > replicated for each processor or whatever higher unit we decide to use. I
> > think one of the first moves need to be to identify which types of pages
> > are actually useful to serve in a fast way. Higher order pages are already
> > out but what about the different zone types, migration types etc?
> >
>
> What tpyes of pages are useful to serve in a fast way is workload
> dependenat and besides the per-cpu allocator as it exists today already
> has separate queues for migration types.
>
> I strongly suspect that your proposal would end up performing roughly the
> same as what exists today except that it'll be more complex because it'll
> have to deal with the race-prone array accesses.
The problems of the current scheme are the proliferation of page types,
the serving of pages in a random mix from all over memory, the heavy high
latency processing in the "fast" paths (these paths seem to accumulate
more and more procesing in each kernel version) and the disabling of
interrupts (which may be the least latency causing issue).
A solution without using locks cannot simply be a modification of the
existing scheme that you envison. The amount of processing in the fastpaths must be
significantly reduced and the data layout needs to be more cache friendly.
Only with these changes will make the use of fast cpu local instructions
sense.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-05-09 18:08 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-08 16:02 [RFC PATCH 00/22] Per-cpu page allocator replacement prototype Mel Gorman
2013-05-08 16:02 ` [PATCH 01/22] mm: page allocator: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2013-05-08 16:02 ` [PATCH 02/22] mm: page allocator: Push down where IRQs are disabled during page free Mel Gorman
2013-05-08 16:02 ` [PATCH 03/22] mm: page allocator: Use unsigned int for order in more places Mel Gorman
2013-05-08 16:02 ` [PATCH 04/22] mm: page allocator: Only check migratetype of pages being drained while CMA active Mel Gorman
2013-05-08 16:02 ` [PATCH 05/22] oom: Use number of online nodes when deciding whether to suppress messages Mel Gorman
2013-05-08 16:02 ` [PATCH 06/22] mm: page allocator: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2013-05-08 16:02 ` [PATCH 07/22] mm: page allocator: Do not lookup the pageblock migratetype during allocation Mel Gorman
2013-05-08 16:02 ` [PATCH 08/22] mm: page allocator: Remove the per-cpu page allocator Mel Gorman
2013-05-08 16:02 ` [PATCH 09/22] mm: page allocator: Allocate/free order-0 pages from a per-zone magazine Mel Gorman
2013-05-08 18:41 ` Christoph Lameter
2013-05-09 15:23 ` Mel Gorman
2013-05-09 16:21 ` Christoph Lameter
2013-05-09 17:27 ` Mel Gorman
2013-05-09 18:08 ` Christoph Lameter [this message]
2013-05-08 16:02 ` [PATCH 10/22] mm: page allocator: Allocate and free pages from magazine in batches Mel Gorman
2013-05-08 16:02 ` [PATCH 11/22] mm: page allocator: Shrink the magazine to the migratetypes in use Mel Gorman
2013-05-08 16:02 ` [PATCH 12/22] mm: page allocator: Remove knowledge of hot/cold from page allocator Mel Gorman
2013-05-08 16:02 ` [PATCH 13/22] mm: page allocator: Use list_splice to refill the magazine Mel Gorman
2013-05-08 16:02 ` [PATCH 14/22] mm: page allocator: Do not disable IRQs just to update stats Mel Gorman
2013-05-08 16:03 ` [PATCH 15/22] mm: page allocator: Check if interrupts are enabled only once per allocation attempt Mel Gorman
2013-05-08 16:03 ` [PATCH 16/22] mm: page allocator: Remove coalescing improvement heuristic during page free Mel Gorman
2013-05-08 16:03 ` [PATCH 17/22] mm: page allocator: Move magazine access behind accessors Mel Gorman
2013-05-08 16:03 ` [PATCH 18/22] mm: page allocator: Split magazine lock in two to reduce contention Mel Gorman
2013-05-09 15:21 ` Dave Hansen
2013-05-15 19:44 ` Andi Kleen
2013-05-08 16:03 ` [PATCH 19/22] mm: page allocator: Watch for magazine and zone lock contention Mel Gorman
2013-05-08 16:03 ` [PATCH 20/22] mm: page allocator: Hold magazine lock for a batch of pages Mel Gorman
2013-05-08 16:03 ` [PATCH 21/22] mm: compaction: Release free page list under a batched magazine lock Mel Gorman
2013-05-08 16:03 ` [PATCH 22/22] mm: page allocator: Drain magazines for direct compact failures Mel Gorman
2013-05-09 15:41 ` [RFC PATCH 00/22] Per-cpu page allocator replacement prototype Dave Hansen
2013-05-09 16:25 ` Christoph Lameter
2013-05-09 17:33 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0000013e8a7afa80-34067330-12df-4b7e-a2b7-d298c78d3630-000000@email.amazonses.com \
--to=cl@linux.com \
--cc=dave@sr71.net \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox