From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
Zhang Yanfei <zhangyanfei@cn.fujitsu.com>,
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
Tang Chen <tangchen@cn.fujitsu.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
Wen Congyang <wency@cn.fujitsu.com>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Michal Nazarewicz <mina86@mina86.com>,
Laura Abbott <lauraa@codeaurora.org>,
Heesub Shin <heesub.shin@samsung.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Ritesh Harjani <ritesh.list@gmail.com>,
t.stanislaws@samsung.com, Gioh Kim <gioh.kim@lge.com>,
linux-mm@kvack.org, tglx@linutronix.de, cody@linux.vnet.ibm.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management
Date: Wed, 13 Aug 2014 17:13:46 +0900 [thread overview]
Message-ID: <20140813081346.GB30451@js1304-P5Q-DELUXE> (raw)
In-Reply-To: <20140812012408.GA23418@gmail.com>
On Tue, Aug 12, 2014 at 01:24:09AM +0000, Minchan Kim wrote:
> Hey Joonsoo,
>
> On Wed, Aug 06, 2014 at 04:18:28PM +0900, Joonsoo Kim wrote:
> > per cpu pages structure, aka pcp, has high and batch values to control
> > how many pages we perform caching. This values could be updated
> > asynchronously and updater should ensure that this doesn't make any
> > problem. For this purpose, pageset_update() is implemented and do some
> > memory synchronization. But, it turns out to be wrong when I implemented
> > new feature using this. There is no corresponding smp_rmb() in read-side
> > so that it can't guarantee anything. Without correct updating, system
> > could hang in free_pcppages_bulk() due to larger batch value than high.
> > To properly update this values, we need to synchronization primitives on
> > read-side, but, it hurts allocator's fastpath.
> >
> > There is another choice for synchronization, that is, sending IPI. This
> > is somewhat expensive, but, this is really rare case so I guess it has
> > no problem here. However, reducing IPI is very helpful here. Current
> > logic handles each CPU's pcp update one by one. To reduce sending IPI,
> > we need to re-ogranize the code to handle all CPU's pcp update at one go.
> > This patch implement these requirements.
>
> Let's add right reviewer for the patch.
> Cced Cody and Thomas.
Okay. I will do it next time.
>
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > mm/page_alloc.c | 139 ++++++++++++++++++++++++++++++++-----------------------
> > 1 file changed, 80 insertions(+), 59 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b99643d4..44672dc 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3797,7 +3797,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
> > * not check if the processor is online before following the pageset pointer.
> > * Other parts of the kernel may not check if the zone is available.
> > */
> > -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> > +static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
> > static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
> > static void setup_zone_pageset(struct zone *zone);
> >
> > @@ -3843,9 +3843,9 @@ static int __build_all_zonelists(void *data)
> > * needs the percpu allocator in order to allocate its pagesets
> > * (a chicken-egg dilemma).
> > */
> > - for_each_possible_cpu(cpu) {
> > - setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> > + setup_pageset(&boot_pageset);
> >
> > + for_each_possible_cpu(cpu) {
> > #ifdef CONFIG_HAVE_MEMORYLESS_NODES
> > /*
> > * We now know the "local memory node" for each node--
> > @@ -4227,24 +4227,59 @@ static int zone_batchsize(struct zone *zone)
> > * outside of boot time (or some other assurance that no concurrent updaters
> > * exist).
> > */
> > -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
> > - unsigned long batch)
> > +static void pageset_update(struct zone *zone, int high, int batch)
> > {
> > - /* start with a fail safe value for batch */
> > - pcp->batch = 1;
> > - smp_wmb();
> > + int cpu;
> > + struct per_cpu_pages *pcp;
> > +
> > + /* start with a fail safe value for batch */
> > + for_each_possible_cpu(cpu) {
> > + pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > + pcp->batch = 1;
> > + }
> > + kick_all_cpus_sync();
> > +
> > + /* Update high, then batch, in order */
> > + for_each_possible_cpu(cpu) {
> > + pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > + pcp->high = high;
> > + }
> > + kick_all_cpus_sync();
> >
> > - /* Update high, then batch, in order */
> > - pcp->high = high;
> > - smp_wmb();
> > + for_each_possible_cpu(cpu) {
> > + pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > + pcp->batch = batch;
> > + }
> > +}
> > +
> > +/*
> > + * pageset_get_values_by_high() gets the high water mark for
> > + * hot per_cpu_pagelist to the value high for the pageset p.
> > + */
> > +static void pageset_get_values_by_high(int input_high,
> > + int *output_high, int *output_batch)
>
> You don't use output_high so we could make it as follows,
Yeah, it could be. But I want to make pageset_get_values_xxx variants
consistent. I will remove output_high and leave output_batch as is.
>
> int pageset_batch(int high);
>
> > +{
> > + *output_batch = max(1, input_high / 4);
> > + if ((input_high / 4) > (PAGE_SHIFT * 8))
> > + *output_batch = PAGE_SHIFT * 8;
> > +}
> >
> > - pcp->batch = batch;
> > +/* a companion to pageset_get_values_by_high() */
> > +static void pageset_get_values_by_batch(int input_batch,
> > + int *output_high, int *output_batch)
> > +{
> > + *output_high = 6 * input_batch;
> > + *output_batch = max(1, 1 * input_batch);
> > }
> >
> > -/* a companion to pageset_set_high() */
> > -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> > +static void pageset_get_values(struct zone *zone, int *high, int *batch)
> > {
> > - pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> > + if (percpu_pagelist_fraction) {
> > + pageset_get_values_by_high(
> > + (zone->managed_pages / percpu_pagelist_fraction),
> > + high, batch);
> > + } else
> > + pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
> > }
> >
> > static void pageset_init(struct per_cpu_pageset *p)
> > @@ -4260,51 +4295,38 @@ static void pageset_init(struct per_cpu_pageset *p)
> > INIT_LIST_HEAD(&pcp->lists[migratetype]);
> > }
> >
> > -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> > +/* Use this only in boot time, because it doesn't do any synchronization */
> > +static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
>
> If we can use it with only boot_pages in boot time, let's make it more clear.
>
> static void boot_setup_pageset(void)
> {
> boot_pageset;
> XXX;
> }
Okay.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-08-13 8:13 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-06 7:18 [PATCH v2 0/8] fix freepage count problems in memory isolation Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 1/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC Joonsoo Kim
2014-08-07 1:46 ` Zhang Yanfei
2014-08-06 7:18 ` [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management Joonsoo Kim
2014-08-12 1:24 ` Minchan Kim
2014-08-13 8:13 ` Joonsoo Kim [this message]
2014-08-06 7:18 ` [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page Joonsoo Kim
2014-08-07 13:49 ` Vlastimil Babka
2014-08-08 6:22 ` Joonsoo Kim
2014-08-11 9:23 ` Aneesh Kumar K.V
2014-08-13 8:19 ` Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC Joonsoo Kim
2014-08-12 1:45 ` Minchan Kim
2014-08-13 8:20 ` Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 3/8] mm/isolation: remove unstable check for isolated page Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management Joonsoo Kim
2014-08-07 2:11 ` Zhang Yanfei
2014-08-07 8:23 ` Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation Joonsoo Kim
2014-08-07 14:34 ` Vlastimil Babka
2014-08-08 6:30 ` Joonsoo Kim
2014-08-12 5:17 ` Minchan Kim
2014-08-12 9:45 ` Vlastimil Babka
2014-08-13 8:09 ` Joonsoo Kim
2014-08-13 8:29 ` Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs Joonsoo Kim
2014-08-06 15:12 ` Vlastimil Babka
2014-08-07 8:19 ` Joonsoo Kim
2014-08-07 8:53 ` Vlastimil Babka
2014-08-07 12:26 ` Joonsoo Kim
2014-08-07 13:04 ` Vlastimil Babka
2014-08-07 13:35 ` Joonsoo Kim
2014-08-07 15:15 ` Vlastimil Babka
2014-08-08 6:45 ` Joonsoo Kim
2014-08-12 6:43 ` Minchan Kim
2014-08-12 10:58 ` Vlastimil Babka
2014-08-06 7:18 ` [PATCH v2 6/8] mm/isolation: factor out pre/post logic on set/unset_migratetype_isolate() Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 7/8] mm/isolation: fix freepage counting bug on start/undo_isolat_page_range() Joonsoo Kim
2014-08-06 7:18 ` [PATCH v2 8/8] mm/isolation: remove useless race handling related to pageblock isolation Joonsoo Kim
2014-08-06 7:25 ` [PATCH v2 0/8] fix freepage count problems in memory isolation Joonsoo Kim
2014-08-07 0:49 ` Zhang Yanfei
2014-08-07 8:20 ` Joonsoo Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140813081346.GB30451@js1304-P5Q-DELUXE \
--to=iamjoonsoo.kim@lge.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=b.zolnierkie@samsung.com \
--cc=cody@linux.vnet.ibm.com \
--cc=gioh.kim@lge.com \
--cc=hannes@cmpxchg.org \
--cc=heesub.shin@samsung.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=lauraa@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=m.szyprowski@samsung.com \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=ritesh.list@gmail.com \
--cc=srivatsa.bhat@linux.vnet.ibm.com \
--cc=t.stanislaws@samsung.com \
--cc=tangchen@cn.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=wency@cn.fujitsu.com \
--cc=zhangyanfei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox