linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Martin J. Bligh" <mbligh@aracnet.com>
To: Mel Gorman <mel@csn.ul.ie>, William Lee Irwin III <wli@holomorphy.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: What to expect with the 2.6 VM
Date: Tue, 01 Jul 2003 15:06:52 -0700	[thread overview]
Message-ID: <4860000.1057097211@[10.10.2.4]> (raw)
In-Reply-To: <Pine.LNX.4.53.0307012243030.16265@skynet>

> On delayed coalescing, I was seeing things that weren't there. I've this
> section removed and changed to;

Heh. I was wondering about that ...
 
> --Begin Extract--
>    Per-CPU Page Lists
>    ==================
> 
>    The most frequent type of allocation or free is an order-0 (i.e. one page)
>    allocation or free. In 2.4, each page allocation or free requires the
>    acquisition of an interrupt safe spinlock to protect the lists of free
>    pages which is an expensive operation. To reduce lock contention, kernel
>    2.6 has per-cpu page lists of order-0 pages called pagesets.
> 
>    These pagesets contain two lists for hot and cold pages where hot pages
>    have been recently used and can still be expected to be present in the CPU
>    cache. For an allocation, the pageset for the running CPU will be first
>    checked and if pages are available, they will be allocated. To determine
>    when the pageset should be emptied or filled, two watermarks are in place.
>    When the low watermark is reached, a batch of pages will be allocated and
>    placed on the list. When the high watermark is reached, a batch of pages
>    will be freed at the same time. Higher order allocations still require the
>    interrupt safe spinlock to be held and there is no delay in the splits or
>    coalescing.
> 
>    While coalescing of order-0 pages is delayed, this is not a lazy buddy
>    algorithm [BL89]. While pagesets introduce a merging delay for order-0
>    allocations, it is a side-effect rather than an intended feature and there
>    is no method available to drain the pagesets and merge the buddies. In
>    other words, despite the per-cpu and new accounting code bulking up the
>    amount of code in mm/page_alloc.c, the core of the buddy algorithm remains
>    the same as it was in 2.4.
> 
>    The implication of this change is straight forward; the number of times
>    the spinlock protecting the buddy lists must be acquired is reduced.
>    Higher order allocations are relatively rare in Linux so the optimisation
>    is for the common case. This change will be noticeable on large number of
>    CPU machines but will make little difference to single CPUs. There is some
>    issues with the idea though although they are not considered a serious
>    problem. The first item of note is that high order allocations may fail of
>    many of the pagesets are just below the high watermark. The second is that
>    when memory is low and the current CPU pageset is empty, an allocation may
>    fail as there is no means of draining remote pagesets. The last problem is
>    that buddies of newly freed pages may exist in other pagesets leading to
>    possible fragmentation problems.


Looks good. Might be useful to distinguish more carefully between the hot
and cold lists - what you've described is basically just the cold list.

The hot list is similar, except it's also used as a LIFO stack, so the
the most recently freed page is assumed to be cache-warm, and is reallocated
first. This reduces the overall number of cacheline misses in the system,
by reusing cachelines that are already present in that CPU's cache.

Moreover, the cold list tries to use pages that are NOT in another CPUs
cache. The main thing that allocates from the cold list is DMA operations,
and the main thing that populates it is page-reclaim. Other things are
generally assumed to be hot (this is one of the areas where more work
could probably be done ...)

M.


M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2003-07-01 22:06 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-01  1:39 Mel Gorman
2003-06-30 17:43 ` Daniel Phillips
2003-07-01 20:10   ` Martin J. Bligh
2003-07-01 21:41   ` Mel Gorman
2003-07-01 21:51     ` Davide Libenzi
2003-07-01 21:58     ` Martin J. Bligh
2003-07-02  9:01       ` Mel Gorman
2003-07-01  2:25 ` Andrea Arcangeli
2003-07-01  3:02   ` Andrew Morton
2003-07-01  3:22     ` Andrea Arcangeli
2003-07-01  3:25       ` Andrea Arcangeli
2003-07-01  3:29       ` Rik van Riel
2003-07-01  4:04         ` Andrea Arcangeli
2003-07-01 11:01     ` Hugh Dickins
2003-07-01  3:25   ` William Lee Irwin III
2003-07-01  4:39     ` Andrea Arcangeli
2003-07-01  6:33       ` William Lee Irwin III
2003-07-01  7:49         ` Andrea Arcangeli
2003-07-01  8:59           ` William Lee Irwin III
2003-07-01  9:27             ` Andrea Arcangeli
2003-07-01 14:24             ` Martin J. Bligh
2003-07-01 16:22               ` William Lee Irwin III
2003-07-01 17:54                 ` Martin J. Bligh
2003-07-02  3:04                   ` Andrea Arcangeli
2003-07-01 14:42           ` Martin J. Bligh
2003-07-01 21:45     ` Mel Gorman
2003-07-01 22:06       ` Martin J. Bligh [this message]
2003-07-01 21:46   ` Mel Gorman
2003-07-02  3:08     ` Andrea Arcangeli
2003-07-02 15:57   ` Mel Gorman
2003-07-02 17:11     ` Andrea Arcangeli
2003-07-02 17:10       ` Martin J. Bligh
2003-07-02 17:47         ` Andrea Arcangeli
2003-07-02 17:52           ` Martin J. Bligh
2003-07-02 18:13             ` Andrea Arcangeli
2003-07-02 18:05           ` Rik van Riel
2003-07-02 20:05             ` Martin J. Bligh
2003-07-02 21:40           ` William Lee Irwin III
2003-07-02 21:48             ` Martin J. Bligh
2003-07-02 22:14               ` William Lee Irwin III
2003-07-02 22:02             ` Andrea Arcangeli
2003-07-02 22:15               ` William Lee Irwin III
2003-07-02 22:26                 ` Andrea Arcangeli
2003-07-02 23:11                   ` William Lee Irwin III
2003-07-02 23:30                     ` Andrea Arcangeli
2003-07-02 23:55                       ` William Lee Irwin III
2003-07-03 11:31                         ` Andrea Arcangeli
2003-07-03 11:46                           ` William Lee Irwin III
2003-07-03 12:58                             ` Andrea Arcangeli
2003-07-03 13:06                               ` Rik van Riel
2003-07-03 13:48                                 ` Andrea Arcangeli
2003-07-03 18:53                                 ` William Lee Irwin III
2003-07-03 19:27                                   ` Andrea Arcangeli
2003-07-03 19:32                                     ` Rik van Riel
2003-07-03 20:16                                     ` William Lee Irwin III
2003-07-04  0:40                                       ` Andrea Arcangeli
2003-07-04  1:46                                         ` William Lee Irwin III
2003-07-04  2:34                                           ` Andrea Arcangeli
2003-07-04  4:10                                             ` William Lee Irwin III
2003-07-04  5:54                                               ` Andrea Arcangeli
2003-07-04  8:15                                                 ` William Lee Irwin III
2003-07-04 23:44                                                   ` Andrea Arcangeli
2003-07-05  0:05                                                     ` William Lee Irwin III
2003-07-05  0:08                                                       ` Andrea Arcangeli
2003-07-03 18:48                               ` Jamie Lokier
2003-07-03 18:54                                 ` William Lee Irwin III
2003-07-03 19:33                                   ` Andrea Arcangeli
2003-07-03 22:21                                     ` William Lee Irwin III
2003-07-04  0:46                                       ` Andrea Arcangeli
2003-07-04  1:33                                         ` Jamie Lokier
2003-07-04  1:36                                         ` William Lee Irwin III
2003-07-03 19:06                           ` Andrew Morton
2003-07-03 19:34                             ` Andrea Arcangeli
2003-07-02 18:07         ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='4860000.1057097211@[10.10.2.4]' \
    --to=mbligh@aracnet.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox