From: David Rientjes <rientjes@google.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Christoph Lameter <cl@linux-foundation.org>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 2/7] Export unusable free space index via /proc/pagetypeinfo
Date: Thu, 28 Jan 2010 14:27:39 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.00.1001281411290.30252@chino.kir.corp.google.com> (raw)
In-Reply-To: <1262795169-9095-3-git-send-email-mel@csn.ul.ie>
On Wed, 6 Jan 2010, Mel Gorman wrote:
> Unusuable free space index is a measure of external fragmentation that
> takes the allocation size into account. For the most part, the huge page
> size will be the size of interest but not necessarily so it is exported
> on a per-order and per-zone basis via /proc/pagetypeinfo.
>
> The index is normally calculated as a value between 0 and 1 which is
> obviously unsuitable within the kernel. Instead, the first three decimal
> places are used as a value between 0 and 1000 for an integer approximation.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> mm/vmstat.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 99 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 6051fba..e1ea2d5 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -451,6 +451,104 @@ static int frag_show(struct seq_file *m, void *arg)
> return 0;
> }
>
> +
> +struct config_page_info {
> + unsigned long free_pages;
> + unsigned long free_blocks_total;
> + unsigned long free_blocks_suitable;
> +};
> +
> +/*
> + * Calculate the number of free pages in a zone, how many contiguous
> + * pages are free and how many are large enough to satisfy an allocation of
> + * the target size. Note that this function makes to attempt to estimate
> + * how many suitable free blocks there *might* be if MOVABLE pages were
> + * migrated. Calculating that is possible, but expensive and can be
> + * figured out from userspace
> + */
> +static void fill_contig_page_info(struct zone *zone,
> + unsigned int suitable_order,
> + struct config_page_info *info)
There's a descrepency between the name of the function and the name of the
struct, I think they were probably both meant to be contig_page_info.
> +{
> + unsigned int order;
> +
> + info->free_pages = 0;
> + info->free_blocks_total = 0;
> + info->free_blocks_suitable = 0;
> +
> + for (order = 0; order < MAX_ORDER; order++) {
> + unsigned long blocks;
> +
> + /* Count number of free blocks */
> + blocks = zone->free_area[order].nr_free;
> + info->free_blocks_total += blocks;
> +
> + /* Count free base pages */
> + info->free_pages += blocks << order;
> +
> + /* Count the suitable free blocks */
> + if (order >= suitable_order)
> + info->free_blocks_suitable += blocks <<
> + (order - suitable_order);
> + }
> +}
> +
> +/*
> + * Return an index indicating how much of the available free memory is
> + * unusable for an allocation of the requested size.
> + */
> +int unusable_free_index(struct zone *zone,
> + unsigned int order,
> + struct config_page_info *info)
Should be static?
> +{
> + /* No free memory is interpreted as all free memory is unusable */
> + if (info->free_pages == 0)
> + return 100;
> +
> + /*
> + * Index should be a value between 0 and 1. Return a value to 3
> + * decimal places.
> + *
> + * 0 => no fragmentation
> + * 1 => high fragmentation
> + */
> + return ((info->free_pages - (info->free_blocks_suitable << order)) * 1000) / info->free_pages;
> +
This value is only for userspace consumption via /proc/pagetypeinfo, so
I'm wondering why it needs to be exported as an index. Other than a loss
of precision, wouldn't this be easier to understand (especially when
coupled with the free page counts already exported) if it were multipled
by 100 rather than 1000 and shown as a percent of _usable_ free memory at
each order? Otherwise, we're left doing this "free - unusuable"
calculation while the number of unusuable pages at an order isn't
necessarily of great interest as a vanilla value.
> +}
> +
> +static void pagetypeinfo_showunusable_print(struct seq_file *m,
> + pg_data_t *pgdat, struct zone *zone)
> +{
> + unsigned int order;
> +
> + /* Alloc on stack as interrupts are disabled for zone walk */
> + struct config_page_info info;
> +
> + seq_printf(m, "Node %4d, zone %8s %19s",
> + pgdat->node_id,
> + zone->name, " ");
> + for (order = 0; order < MAX_ORDER; ++order) {
> + fill_contig_page_info(zone, order, &info);
It's a shame we can't keep this data for the fragmentation index exported
subsequently in patch 3.
> + seq_printf(m, "%6d ", unusable_free_index(zone, order, &info));
> + }
> +
> + seq_putc(m, '\n');
> +}
> +
> +/*
> + * Display unusable free space index
> + * XXX: Could be a lot more efficient, but it's not a critical path
> + */
> +static int pagetypeinfo_showunusable(struct seq_file *m, void *arg)
> +{
> + pg_data_t *pgdat = (pg_data_t *)arg;
> +
> + seq_printf(m, "\nUnusable free space index at order\n");
> + walk_zones_in_node(m, pgdat, pagetypeinfo_showunusable_print);
> +
> + return 0;
> +}
> +
> static void pagetypeinfo_showfree_print(struct seq_file *m,
> pg_data_t *pgdat, struct zone *zone)
> {
> @@ -558,6 +656,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
> seq_printf(m, "Pages per block: %lu\n", pageblock_nr_pages);
> seq_putc(m, '\n');
> pagetypeinfo_showfree(m, pgdat);
> + pagetypeinfo_showunusable(m, pgdat);
> pagetypeinfo_showblockcount(m, pgdat);
>
> return 0;
/proc/pagetypeinfo isn't documented, but that's been fine until now
because all of the fields dealing with "free pages" and "number of blocks"
are easily understood. That changes now because there is no clear
understanding of "fragmentation index" in userspace, so we'll probably
need some kind of memory compaction documentation eventually.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-01-28 22:27 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-06 16:26 [RFC-PATCH 0/7] Memory Compaction v1 Mel Gorman
2010-01-06 16:26 ` [PATCH 1/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA Mel Gorman
2010-01-07 21:46 ` David Rientjes
2010-01-07 22:04 ` Christoph Lameter
2010-01-19 13:00 ` Mel Gorman
2010-01-06 16:26 ` [PATCH 2/7] Export unusable free space index via /proc/pagetypeinfo Mel Gorman
2010-01-06 17:10 ` Adam Litke
2010-01-06 17:29 ` Mel Gorman
2010-01-06 23:21 ` Tim Pepper
2010-01-28 22:27 ` David Rientjes [this message]
2010-02-05 10:23 ` Mel Gorman
2010-02-05 21:40 ` David Rientjes
2010-02-08 12:10 ` Mel Gorman
2010-01-06 16:26 ` [PATCH 3/7] Export fragmentation " Mel Gorman
2010-01-06 16:26 ` [PATCH 4/7] Memory compaction core Mel Gorman
2010-01-06 17:50 ` Mel Gorman
2010-01-06 18:22 ` Mel Gorman
2010-01-06 21:37 ` Andi Kleen
2010-01-06 22:07 ` Mel Gorman
2010-01-06 16:26 ` [PATCH 5/7] Add /proc trigger for memory compaction Mel Gorman
2010-01-07 22:00 ` David Rientjes
2010-01-13 23:23 ` David Rientjes
2010-01-20 9:48 ` Mel Gorman
2010-01-20 9:48 ` Mel Gorman
2010-01-20 18:12 ` Christoph Lameter
2010-01-20 20:53 ` Mel Gorman
2010-01-20 20:48 ` David Rientjes
2010-01-21 14:09 ` Mel Gorman
2010-01-21 23:34 ` David Rientjes
2010-01-06 16:26 ` [PATCH 6/7] Direct compact when a high-order allocation fails Mel Gorman
2010-01-06 16:26 ` [PATCH 7/7] Do not compact within a preferred zone after a compaction failure Mel Gorman
2010-01-13 23:28 ` David Rientjes
2010-01-20 9:51 ` Mel Gorman
2010-01-21 3:12 ` [RFC-PATCH 0/7] Memory Compaction v1 KOSAKI Motohiro
2010-01-21 10:11 ` Mel Gorman
2010-01-22 0:16 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1001281411290.30252@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=avi@redhat.com \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox