From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E336CC433E0 for ; Wed, 20 May 2020 11:24:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 781D720758 for ; Wed, 20 May 2020 11:24:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 781D720758 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0806280008; Wed, 20 May 2020 07:24:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 05666900002; Wed, 20 May 2020 07:24:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED34380008; Wed, 20 May 2020 07:24:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id D7886900002 for ; Wed, 20 May 2020 07:24:48 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 86EAF180AD815 for ; Wed, 20 May 2020 11:24:48 +0000 (UTC) X-FDA: 76836865056.23.oil41_8104c1e17c71f X-HE-Tag: oil41_8104c1e17c71f X-Filterd-Recvd-Size: 7449 Received: from r3-17.sinamail.sina.com.cn (r3-17.sinamail.sina.com.cn [202.108.3.17]) by imf25.hostedemail.com (Postfix) with SMTP for ; Wed, 20 May 2020 11:24:46 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([123.123.26.246]) by sina.com with ESMTP id 5EC513780002B9B6; Wed, 20 May 2020 19:24:43 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 14496649283772 From: Hillf Danton To: "Ahmed S. Darwish" Cc: LKML , Konstantin Khlebnikov , linux-mm@kvack.org, Hillf Danton Subject: Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Date: Wed, 20 May 2020 19:24:30 +0800 Message-Id: <20200520112430.9044-1-hdanton@sina.com> In-Reply-To: <20200519214547.352050-3-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 19 May 2020 23:45:24 +0200 "Ahmed S. Darwish" wrote: >=20 > Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls") > implemented an optimization mechanism to exit the to-be-started LRU > drain operation (name it A) if another drain operation *started and > finished* while (A) was blocked on the LRU draining mutex. >=20 > This was done through a seqcount latch, which is an abuse of its > semantics: >=20 > 1. Seqcount latching should be used for the purpose of switching > between two storage places with sequence protection to allow > interruptible, preemptible writer sections. The optimization > mechanism has absolutely nothing to do with that. >=20 > 2. The used raw_write_seqcount_latch() has two smp write memory > barriers to always insure one consistent storage place out of the > two storage places available. This extra smp_wmb() is redundant fo= r > the optimization use case. >=20 > Beside the API abuse, the semantics of a latch sequence counter was > force fitted into the optimization. What was actually meant is to track > generations of LRU draining operations, where "current lru draining > generation =3D3D x" implies that all generations 0 < n <=3D3D x are alr= eady > *scheduled* for draining. >=20 > Remove the conceptually-inappropriate seqcount latch usage and manually > implement the optimization using a counter and SMP memory barriers. >=20 > Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvg= jBTGbJywUJpA@mail.gmail.com > Signed-off-by: Ahmed S. Darwish > --- > mm/swap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 47 insertions(+), 10 deletions(-) >=20 > diff --git a/mm/swap.c b/mm/swap.c > index bf9a79fed62d..d6910eeed43d 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_str= uct *dummy) > */ > void lru_add_drain_all(void) > { > - static seqcount_t seqcount =3D3D SEQCNT_ZERO(seqcount); > - static DEFINE_MUTEX(lock); > + /* > + * lru_drain_gen - Current generation of pages that could be in vecto= rs > + * > + * (A) Definition: lru_drain_gen =3D3D x implies that all generations > + * 0 < n <=3D3D x are already scheduled for draining. > + * > + * This is an optimization for the highly-contended use case where a > + * user space workload keeps constantly generating a flow of pages > + * for each CPU. > + */ > + static unsigned int lru_drain_gen; > static struct cpumask has_work; > - int cpu, seq; > + static DEFINE_MUTEX(lock); > + int cpu, this_gen; > =3D20 > /* > * Make sure nobody triggers this path before mm_percpu_wq is fully > @@ -725,21 +735,48 @@ void lru_add_drain_all(void) > if (WARN_ON(!mm_percpu_wq)) > return; > =3D20 > - seq =3D3D raw_read_seqcount_latch(&seqcount); > + /* > + * (B) Cache the LRU draining generation number > + * > + * smp_rmb() ensures that the counter is loaded before the mutex is > + * taken. It pairs with the smp_wmb() inside the mutex critical secti= on > + * at (D). > + */ > + this_gen =3D3D READ_ONCE(lru_drain_gen); > + smp_rmb(); > =3D20 > mutex_lock(&lock); > =3D20 > /* > - * Piggyback on drain started and finished while we waited for lock: > - * all pages pended at the time of our enter were drained from vector= s. > + * (C) Exit the draining operation if a newer generation, from anothe= r > + * lru_add_drain_all(), was already scheduled for draining. Check (A)= . > */ > - if (__read_seqcount_retry(&seqcount, seq)) > + if (unlikely(this_gen !=3D3D lru_drain_gen)) > goto done; > =3D20 > - raw_write_seqcount_latch(&seqcount); > + /* > + * (D) Increment generation number > + * > + * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critic= al > + * section. > + * > + * This pairing must be done here, before the for_each_online_cpu loo= p > + * below which drains the page vectors. > + * > + * Let x, y, and z represent some system CPU numbers, where x < y < z= . > + * Assume CPU #z is is in the middle of the for_each_online_cpu loop > + * below and has already reached CPU #y's per-cpu data. CPU #x comes > + * along, adds some pages to its per-cpu vectors, then calls > + * lru_add_drain_all(). > + * > + * If the paired smp_wmb() below is done at any later step, e.g. afte= r > + * the loop, CPU #x will just exit at (C) and miss flushing out all o= f > + * its added pages. > + */ > + WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); > + smp_wmb(); Writer is inside mutex. > =3D20 > cpumask_clear(&has_work); > - > for_each_online_cpu(cpu) { > struct work_struct *work =3D3D &per_cpu(lru_add_drain_work, cpu); > =3D20 > @@ -766,7 +803,7 @@ void lru_add_drain_all(void) > { > lru_add_drain(); > } > -#endif > +#endif /* CONFIG_SMP */ > =3D20 > /** > * release_pages - batched put_page() > --=3D20 > 2.20.1 The tiny diff, inspired by this work, is trying to trylock mutex by swicthing the sequence counter's readers and writer across the mutex boundary. --- a/mm/swap.c +++ b/mm/swap.c @@ -714,10 +714,12 @@ static void lru_add_drain_per_cpu(struct */ void lru_add_drain_all(void) { - static seqcount_t seqcount =3D SEQCNT_ZERO(seqcount); + static unsigned int lru_drain_gen; static DEFINE_MUTEX(lock); static struct cpumask has_work; - int cpu, seq; + int cpu; + int loop =3D 0; + unsigned int seq; =20 /* * Make sure nobody triggers this path before mm_percpu_wq is fully @@ -726,18 +728,12 @@ void lru_add_drain_all(void) if (WARN_ON(!mm_percpu_wq)) return; =20 - seq =3D raw_read_seqcount_latch(&seqcount); + seq =3D lru_drain_gen++; + smp_mb(); =20 - mutex_lock(&lock); - - /* - * Piggyback on drain started and finished while we waited for lock: - * all pages pended at the time of our enter were drained from vectors. - */ - if (__read_seqcount_retry(&seqcount, seq)) - goto done; - - raw_write_seqcount_latch(&seqcount); + if (!mutex_trylock(&lock)) + return; +more_work: =20 cpumask_clear(&has_work); =20 @@ -759,7 +755,11 @@ void lru_add_drain_all(void) for_each_cpu(cpu, &has_work) flush_work(&per_cpu(lru_add_drain_work, cpu)); =20 -done: + smp_mb(); + if (++seq !=3D lru_drain_gen && loop++ < 128) + goto more_work; + + smp_mb(); mutex_unlock(&lock); } #else --