Re: [PATCH] Avoiding external fragmentation with a placement policy Version 10

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Joel Schopp <jschopp@austin.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH] Avoiding external fragmentation with a placement policy Version 10
Date: Wed, 4 May 2005 09:20:01 +0100 (IST)	[thread overview]
Message-ID: <Pine.LNX.4.58.0505040908001.7962@skynet> (raw)
In-Reply-To: <4277D13F.3050804@austin.ibm.com>

On Tue, 3 May 2005, Joel Schopp wrote:

> Comments inline below.
>
> > o Tightened what pools are used for fallbacks, less likely to fragment
> > o Many micro-optimisations to have the same performance as the standard
> > allocator. Modified allocator now faster than standard allocator using
> >   gcc 3.3.5
>
> Nice.
>
> > o Increased the size of reserve for fallbacks from 10% to 12.5%.
>
> This just screams out for a tunable.  Systems with different workloads and
> different amounts of memory will behave better with different values.  It
> would be even better if it would self tune, but that might prove difficult.
>

The 12.5% was chosen as it's slightly above what the low-level watermarks
are. I don't think it needs to be a tunable although I agree that
calculating it based on zone watermarks would be a better idea than me
guessing 12.5%.

> > Difference in performance operations report generated by diff-aim9.sh from
> > VMRegress 0.14
> > N  Test          Standard   MBuddy V10  Diff     % diff Test description
> >                  Ops/sec    Ops/sec     Ops/sec
> > -- ----------    ---------  ----------  -------- ------ ----------------
> >  1 add_double    460569.72  465222.46    4652.74  1.01% Thousand Double
> > Precision Additions/second
> >  2 add_float     460523.25  465322.45    4799.20  1.04% Thousand Single
> > Precision Additions/second
> >  3 add_long     1421763.04 1436042.64   14279.60  1.00% Thousand Long
> > Integer Additions/second
> >  4 add_int      1421763.04 1436042.64   14279.60  1.00% Thousand Integer
> > Additions/second
> >  5 add_short    1421363.11 1435760.71   14397.60  1.01% Thousand Short
> > Integer Additions/second
> >  7 page_test     121048.16  123059.49    2011.33  1.66% System Allocations &
> > Pages/second
> >  8 brk_test      445743.79  452407.93    6664.14  1.50% System Memory
> > Allocations/second
> >  9 jmp_test     4158416.67 4232083.33   73666.66  1.77% Non-local
> > gotos/second
> > 10 signal_test    94417.60   94584.24     166.64  0.18% Signal Traps/second
> > 11 exec_test         65.04      66.69       1.65  2.54% Program Loads/second
> > 12 fork_test       1537.82    1730.51     192.69 12.53% Task
> > Creations/second
> > 13 link_test       6411.28    6477.45      66.17  1.03% Link/Unlink
> > Pairs/second
> >
> > The aim9 results show that there are consistent improvements for common
> > page-related operations. The results are compiler dependant and there are
> > variances of 1-2% between versions.
>
> Any explanation for why fork_test shows markedly better improvement compared
> to the others?
>

Not a clue.

> > -#define __GFP_BITS_SHIFT 16 /* Room for 16 __GFP_FOO bits */
> > +#define __GFP_BITS_SHIFT 18 /* Room for 16 __GFP_FOO bits */
>
> Comment should have the new 18, not the old 16.
>

Opps, correct.

> > +#ifdef CONFIG_ALLOCSTATS
> > +    /*
> > +     * These are beancounters that track how the placement policy
> > +     * of the buddy allocator is performing
> > +     */
> > +    unsigned long fallback_count[ALLOC_TYPES];
> > +    unsigned long alloc_count[ALLOC_TYPES];
> > +    unsigned long reserve_count[ALLOC_TYPES];
> > +    unsigned long kernnorclm_full_steal;
> > +    unsigned long kernnorclm_partial_steal;
> > +    unsigned long bulk_requests[MAX_ORDER];
> > +    unsigned long bulk_alloced[MAX_ORDER];
> > +#endif
>
> It would be nice if all of the CONFIG_ALLOCSTATS stuff was broken out as a
> second patch.  It would make this patch much smaller and more readable.
>

I can do that. They were kept as one patch as I almost always collect the
statistics as it's easier to figure out what is happening. I've also found
that benchmark results tend to be better if I collect statistics (which
makes no sense but was consistently true).

> > +int fallback_allocs[ALLOC_TYPES][ALLOC_TYPES] = { + {ALLOC_KERNNORCLM,
> > ALLOC_FALLBACK,   ALLOC_KERNRCLM,   ALLOC_USERRCLM},
> > +    {ALLOC_KERNRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_USERRCLM},
>
> I would have thought that KernRclm would want to choose USERRCLM over
> KERNNOCRLM.
>

No, because UserRclm is the easiest to free pages in by far where as
KernRclm pages need a lot more work, specifically the ability to reclaim a
slab page on demand. From the point of view of fragmentation, KernNoRclm
is already lost, so I prefer to fallback there than anywhere else.

The problem is that it can accelerate when a KernNoRclm allocation needs
to fallback. I have not found it to be a problem in the benchmarks I've
run but there may be cases in the future where we need to be a lot more
strict about fallbacks.

> > +    {ALLOC_USERRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_KERNRCLM},
>
> I'm almost certain the UserRclm type should prefer KERNRCLM over KERNNORCLM.
>

Again no for similar reasons to why KernRclm falls back to KernNoRclm.

>
> > +     * Here, the alloc type lists has been depleted as well as the global
> > +     * pool, so fallback. When falling back, the largest possible block
> > +     * will be taken to keep the fallbacks clustered if possible
> > +     */
>
> I was curious if you had tried taking the smallest possible block.  I would
> think that it would reduce the amount of fallback needed, and thus increase
> the amount available for the 3 allocation types.  I would expect a net win,
> despite not clustering fallbacks particularly well.
>

I found it to be a net loss on tests with increased fallbacks. If we
fallback on order-0, we'll also fallback on the next allocation. However,
if we "steal" a large block of pages, fallbacks will be delayed until that
large block is consumed.

> > +    alloctype = fallback_list[retry_count];
> > +
> > +    /* Find a block to allocate */
> > +    area = zone->free_area_lists[alloctype] + (MAX_ORDER-1);
> > +    current_order=MAX_ORDER;
> > +    do {
> > +    current_order--;
> > +    if (list_empty(&area->free_list)) {
> > +    area--;
> > +    continue;
> > +    }
> > +
> > +    goto remove_page;
> > +    } while (current_order != order);
> > +    }
>
> This loop is a bit hard to understand.  I think it would be easier to
> understand if it looked something like this (totally untested):
>
> +      current_order=MAX_ORDER - 1 ;
> +      do {
> +      if (!list_empty(&area->free_list)) {
> +      goto remove_page;
> +      }
> +
> +      area--;
> +      current_order--;
> +      } while (current_order >= order);
>

Will try it out, thanks.


-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

     prev parent reply	other threads:[~2005-05-04  8:20 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-03 16:44 Mel Gorman
2005-05-03 19:30 ` Joel Schopp
2005-05-04  8:20   ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0505040908001.7962@skynet \
    --to=mel@csn.ul.ie \
    --cc=akpm@osdl.org \
    --cc=jschopp@austin.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox