From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org,
Hugh Dickins <hugh@veritas.com>,
Nick Piggin <nickpiggin@yahoo.com.au>,
linux-mm@kvack.org, Andi Kleen <ak@suse.de>
Subject: Re: [RFC3 02/14] Basic counter functionality
Date: Sat, 17 Dec 2005 02:01:15 -0200 [thread overview]
Message-ID: <20051217040115.GA6975@dmt.cnet> (raw)
In-Reply-To: <20051215001425.31405.74009.sendpatchset@schroedinger.engr.sgi.com>
Hi Christoph,
On Wed, Dec 14, 2005 at 04:14:25PM -0800, Christoph Lameter wrote:
> Currently we have various vm counters for the pages in a zone that are split
> per cpu. This arrangement does not allow access to per zone statistics that
> are important to optimize VM behavior for NUMA architectures. All one can say
> from the per cpu differential variables is how much a certain variable was
> changed by this cpu without being able to deduce how many pages in each zone
> are of a certain type.
>
> This framework here implements differential counters for each processor
> in struct zone. The differential counters are consolidated when a threshold
> is exceeded (like done in the current implementation for nr_pageache), when
> slab reaping occurs or when a consolidation function is called.
> Consolidation uses atomic operations and accumulates counters per zone in
> the zone structure and also globally in the vm_stat array. VM function can
> access the counts by simply indexing a global or zone specific array.
>
> The arrangement of counters in an array simplifies processing when output
> has to be generated for /proc/*.
>
> Counter updates can be triggered by calling *_zone_page_state or
> __*_zone_page_state. The second function can be called if it is known that
> interrupts are disabled.
>
> Specially optimized increment and decrement functions are provided. These
> can avoid certain checks and use increment or decrement instructions that
> an architecture may provide.
>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>
> Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-12 15:07:45.000000000 -0800
> +++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 14:57:22.000000000 -0800
> @@ -596,7 +596,281 @@ static int rmqueue_bulk(struct zone *zon
> return i;
> }
>
> +/*
> + * Manage combined zone based / global counters
> + */
> +#define STAT_THRESHOLD 32
> +
> +atomic_long_t vm_stat[NR_STAT_ITEMS];
> +
> +static inline void zone_page_state_consolidate(long x, struct zone *zone, enum zone_stat_item item)
> +{
> + atomic_long_add(x, &zone->vm_stat[item]);
> + atomic_long_add(x, &vm_stat[item]);
> +}
> +
> +#ifdef CONFIG_SMP
> +/*
> + * Determine pointer to currently valid differential byte given a zone and
> + * the item number.
> + *
> + * Preemption must be off
> + */
> +static inline s8 *diff_pointer(struct zone *zone, enum zone_stat_item item)
> +{
> + return &zone_pcp(zone, raw_smp_processor_id())->vm_stat_diff[item];
> +}
> +
> +/*
> + * For use when we know that interrupts are disabled.
> + */
> +void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta)
> +{
> + s8 *p;
> + long x;
> +
> + p = diff_pointer(zone, item);
> + x = delta + *p;
> +
> + if (unlikely(x > STAT_THRESHOLD || x < -STAT_THRESHOLD)) {
> + zone_page_state_consolidate(x, zone, item);
> + x = 0;
> + }
> +
> + *p = x;
> +}
There is no need to disable interrupts AFAICS, but only preemption
(which could cause problems as your comment above describes). I suppose
that these counters are not accessed at interrupt time and are not meant
to be, right?
Which means that if an interrupt happens at any point in the code,
the state will be consistent after the IRQ(s) handler(s) finish and
execution restarts where it had been interrupted.
Why not use preempt_disable/preempt_enable? Those would disappear
if !CONFIG_PREEMPT, and could be faster than the interrupt
disabling/enabling (no need to save "flags" on stack, but increment
preempt count, which has a chance to be on cache, I guess).
It would also be nice to have all code related to debugging only
counters selectable at compile time, since it might not be interesting
data for some scenarios (but unnecessary bloat) - seems that was the
original intent by Andrew as you noted.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-12-17 4:01 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-15 0:14 [RFC3 00/14] Zoned VM stats Christoph Lameter
2005-12-15 0:14 ` [RFC3 01/14] Add some consts for inlines in mm.h Christoph Lameter
2005-12-15 1:01 ` J.A. Magallon
2005-12-15 0:14 ` [RFC3 02/14] Basic counter functionality Christoph Lameter
2005-12-17 4:01 ` Marcelo Tosatti [this message]
2005-12-17 4:19 ` Marcelo Tosatti
2005-12-19 17:58 ` Christoph Lameter
2005-12-15 0:14 ` [RFC3 03/14] Convert nr_mapped Christoph Lameter
2005-12-15 0:14 ` [RFC3 04/14] Convert nr_pagecache Christoph Lameter
2005-12-15 0:14 ` [RFC3 05/14] Resurrect scan_control.may_swap Christoph Lameter
2005-12-15 0:14 ` [RFC3 06/14] Zone Reclaim Christoph Lameter
2005-12-15 0:14 ` [RFC3 07/14] Expanded node and zone statistics Christoph Lameter
2005-12-15 0:14 ` [RFC3 08/14] Convert nr_slab Christoph Lameter
2005-12-15 0:15 ` [RFC3 09/14] Convert nr_page_table Christoph Lameter
2005-12-15 0:15 ` [RFC3 10/14] Convert nr_dirty Christoph Lameter
2005-12-15 0:15 ` [RFC3 11/14] Convert nr_writeback Christoph Lameter
2005-12-15 0:15 ` [RFC3 12/14] Convert nr_unstable Christoph Lameter
2005-12-15 0:15 ` [RFC3 13/14] Remove get_page_state functions Christoph Lameter
2005-12-15 0:15 ` [RFC3 14/14] Remove wbs Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051217040115.GA6975@dmt.cnet \
--to=marcelo.tosatti@cyclades.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=clameter@sgi.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox