From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 869F2C47258 for ; Mon, 15 Jan 2024 17:09:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3E116B0072; Mon, 15 Jan 2024 12:09:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEDF26B0074; Mon, 15 Jan 2024 12:09:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB67A6B0075; Mon, 15 Jan 2024 12:09:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BC6146B0072 for ; Mon, 15 Jan 2024 12:09:33 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 74DB940876 for ; Mon, 15 Jan 2024 17:09:33 +0000 (UTC) X-FDA: 81682181826.10.6D0E359 Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) by imf04.hostedemail.com (Postfix) with ESMTP id 9C0E240006 for ; Mon, 15 Jan 2024 17:09:31 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZwXT5Shg; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705338571; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LUyjug53SUOdBCweTFhVwZTV/LHsFjaL8stVuqO2M34=; b=2kI7kQmccF+9/2kmB+pzWDxKUFlaiPlyyE6qgLGu4FfFHzC5NYXRCRNKLq/oWnbCv2O75K RlzkXqL9d0b/obEkbBic+Qo7jj8mR67LVFnsb2IL5x9NRe3Sm469owSTTLW/8vhsaLx+kR reyPr1FgnfMqKzNNYeGu+ZK35Y5+Pyw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705338571; a=rsa-sha256; cv=none; b=n6DeJycNb8cCjMbKUNpaTX9/h5dLIRT0iEAvHiMizlMNkzH6e3ukM19e8fO+BVlhu9lCMt M/I1U9jKxvI8h4Yj1EU53O+iBJ7KQL8OAVXbcSpCzwm2+/sD2SSVvxWEzzvIyr/5fqI+PE pVDK9VLcEByBqptmPiTQBw4faomm3d0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZwXT5Shg; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f172.google.com with SMTP id 38308e7fff4ca-2cd2f472665so94219781fa.2 for ; Mon, 15 Jan 2024 09:09:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705338570; x=1705943370; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LUyjug53SUOdBCweTFhVwZTV/LHsFjaL8stVuqO2M34=; b=ZwXT5ShgewYZQQqUsWee4ZVJxc75DVIBE7PybJtNNcXnOcpdASlbg1w+OKUBW05fxc 24H8ZDopvvesmcDfjhlH6q8PjrFys1JmZ7zYUvckcFTxBAYwVKq+TGsDX0/9ZYZWQsEu GdZ2EvyY8CpZhrvIL759H6Yk5Iy7ke1MKa0lcuBoY1uj8e9/GNEpCj9YVTUrQ0wG9AUh 9gvGIOrc/Xgrfr7Ci+gEu83s6BXb/J7kh7e+kvZCQ4CcqZOj2GP6SSDAuUFhvFXc28bp DzmnrdltTQP+jgVFdkwqsk2h1M0bZ9Fm1OHNEW6Tf/2NXWozju3tdxwWWDVeUqpTQLOI qwaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705338570; x=1705943370; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LUyjug53SUOdBCweTFhVwZTV/LHsFjaL8stVuqO2M34=; b=v5M8sU4sve/gc15OM5fQwdyfi6DIO6EVC/FeTpPzEJoaVwAlvz3651Ci1cIfe29Qa5 Yut0mrhW1OrDJFQ8HRwzftwXLtf2ScxJkXxM0d2EAl0et3D68X4WJCtuHG7DnZNGwpsV 0AxIIiA42cqdn3s1h/vfzNzSgUxe/caEMZv/hQ0bmgVQVfv06Z4jKp7HLlq/oMPGvfRP 5gkwmRwJ3DisXpo2WCD9LIZmtV8NyevqzEeAS1OgZlf16Y0Cl750Wlh97tzKnqW5Qg1B 6oGnMJ1t4ZVyvPRJt/1j7nAMxNKFx46YWGVHBClNNj/lCXlgM2r2WUzvA8xrNgMeFMWW peYw== X-Gm-Message-State: AOJu0YxM4FiSfWIBFOy+oVo21+7dvra2/gqTg3/texTrNkTlHpE3cVd2 sW8O8nuIXkmr/n80lVPYy8LR/7y+eaiuDkzzMCQ= X-Google-Smtp-Source: AGHT+IFe2NRM+hHXHiF4wk8AZYf/jWHdaj9ymwt/MrPanxf4hHsvye88hCGeN2pNg2NbjMs31r5gIy/XwqLI4xzRFD0= X-Received: by 2002:a2e:8709:0:b0:2cd:9e6c:7f3f with SMTP id m9-20020a2e8709000000b002cd9e6c7f3fmr2247893lji.71.1705338569506; Mon, 15 Jan 2024 09:09:29 -0800 (PST) MIME-Version: 1.0 References: <20240111183321.19984-1-ryncsn@gmail.com> <20240111183321.19984-2-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Tue, 16 Jan 2024 01:09:11 +0800 Message-ID: Subject: Re: [PATCH v2 1/3] mm, lru_gen: batch update counters on againg To: Wei Xu Cc: linux-mm@kvack.org, Andrew Morton , Yu Zhao , Chris Li , Matthew Wilcox , linux-kernel@vger.kernel.org, Greg Thelen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9C0E240006 X-Rspam-User: X-Stat-Signature: rkopqyonir1jsh65cjycc3kqyeh1ww1b X-Rspamd-Server: rspam03 X-HE-Tag: 1705338571-310495 X-HE-Meta: U2FsdGVkX1/W0q2YnZqib6R8kqjwWklosYV9mkisptLRX59Lz066kL2Mpd+3BOrc94LfzvaxCbkFzFcpTwJKBFBo/5UYYiVwryvai10RCBbwZvIXVg2qKf9BOSr902TileJKajq+AGrbmFyWZfuGwJ9tujOQAHWWgBZN3CQU/TPJGTvTD1M3DYYApkkvk87dkqwJBNDf/dgF+sTsVu/iBy4CylUL0+vq2NmcNYT7HVwYaTbdjvtzlh/xZtdAFCvWjSAuyFHu3wAEGTsn0HY2FC6gsg9y/v+txfvmxt0m+nyHFQhL5McIfOn2NuL9auNjdnfxffKCm8n4D0ieDUhFIiYSTO4I/TBCzy/YGwqoFK/WyXaeuqIsv83r/sK+vFj1xEL3in+BdpZh+tbsLhPbrODdNFrlyw5hZQpJjdP7r33uiBigVrCmdBLqP/GRI+ChVGBNyhXFtWbYo4Wov49gKmXr3f1ac9rbzhur2lUogbsRYEYxjpu/UIyqKxuw7Brlum2yTqnZJAV+vLCJ/zfvcTwwBUGrcgFYkmPCoxXlT4LOaiDGizvLZ+nvb8/hTI3eyyjJP3CTWKCVtdeb2oRq/w4PHv1wjRuUnWVlh4S+iHJ3hxECB6/ZITjkJtenDDZDCIo9f1JKPyRQpw5oxXDsiwLcTzb7lBHo7ote/VkMHaIIPqtZo6MLeQxOFie0VXyOuNKsNzgSXlU0Z7/hYyyd7r1hWInBgpTiqOezOJahQ0h+n1BHa9TEOxzpQr3a2UPXfzKyvpX3Y4319Gpeq5F0iE29/pbq1N/2UtprcTal2q9Geqx0WfZ+p03U1nplnQJIz1vRVJSaQrRVNLnMF+UPskLSSP/mXjAauvoAUrqaXtYcFSyYAxW++EegdwDr0yB4eOXeQ9rpChtO57b5JWuKXb5x6uUPAv/stvM16UZjJVIy8wPQfbAZC+rsTeFtIQ6pUj94R6zEcPlT1Ib3Yi5 Imz2XTuf J6vUU0FevBgjsH9MV4bl+MCZ8DDo+KQcpgt2EQM0K2FteLME7xCI9VyGmwMNUYtZxRkEBhEfMai4HHhxW6glbCnlBdA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kairui Song =E4=BA=8E2024=E5=B9=B41=E6=9C=8815=E6=97=A5= =E5=91=A8=E4=B8=80 01:42=E5=86=99=E9=81=93=EF=BC=9A > > Wei Xu =E4=BA=8E2024=E5=B9=B41=E6=9C=8813=E6=97=A5= =E5=91=A8=E5=85=AD 05:01=E5=86=99=E9=81=93=EF=BC=9A > > > > On Thu, Jan 11, 2024 at 10:33=E2=80=AFAM Kairui Song = wrote: > > > > > > From: Kairui Song > > > > > > When lru_gen is aging, it will update mm counters page by page, > > > which causes a higher overhead if age happens frequently or there > > > are a lot of pages in one generation getting moved. > > > Optimize this by doing the counter update in batch. > > > > > > Although most __mod_*_state has its own caches the overhead > > > is still observable. > > > > > > Tested in a 4G memcg on a EPYC 7K62 with: > > > > > > memcached -u nobody -m 16384 -s /tmp/memcached.socket \ > > > -a 0766 -t 16 -B binary & > > > > > > memtier_benchmark -S /tmp/memcached.socket \ > > > -P memcache_binary -n allkeys \ > > > --key-minimum=3D1 --key-maximum=3D16000000 -d 1024 \ > > > --ratio=3D1:0 --key-pattern=3DP:P -c 2 -t 16 --pipeline 8 -x 6 > > > > > > Average result of 18 test runs: > > > > > > Before: 44017.78 Ops/sec > > > After: 44687.08 Ops/sec (+1.5%) > > > > I see the same performance numbers get quoted in all the 3 patches. > > How much performance improvements does this particular patch provide > > (the same for the other 2 patches)? If as the cover letter says, the > > most performance benefits come from patch 3 (prefetching), can we just > > have that patch alone to avoid the extra complexities. > > Hi Wei, > > Indeed these are two different optimization technique, I can reorder > the series, prefetch is the first one and should be more acceptable, > other optimizations can come later. And add standalone info about > improvement of batch operations. > > > > > > Signed-off-by: Kairui Song > > > --- > > > mm/vmscan.c | 64 +++++++++++++++++++++++++++++++++++++++++++++------= -- > > > 1 file changed, 55 insertions(+), 9 deletions(-) > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 4f9c854ce6cc..185d53607c7e 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -3113,9 +3113,47 @@ static int folio_update_gen(struct folio *foli= o, int gen) > > > return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; > > > } > > > > > > +/* > > > + * Update LRU gen in batch for each lru_gen LRU list. The batch is l= imited to > > > + * each gen / type / zone level LRU. Batch is applied after finished= or aborted > > > + * scanning one LRU list. > > > + */ > > > +struct gen_update_batch { > > > + int delta[MAX_NR_GENS]; > > > +}; > > > + > > > +static void lru_gen_update_batch(struct lruvec *lruvec, int type, in= t zone, > > > + struct gen_update_batch *batch) > > > +{ > > > + int gen; > > > + int promoted =3D 0; > > > + struct lru_gen_folio *lrugen =3D &lruvec->lrugen; > > > + enum lru_list lru =3D type ? LRU_INACTIVE_FILE : LRU_INACTIVE= _ANON; > > > + > > > + for (gen =3D 0; gen < MAX_NR_GENS; gen++) { > > > + int delta =3D batch->delta[gen]; > > > + > > > + if (!delta) > > > + continue; > > > + > > > + WRITE_ONCE(lrugen->nr_pages[gen][type][zone], > > > + lrugen->nr_pages[gen][type][zone] + delta)= ; > > > + > > > + if (lru_gen_is_active(lruvec, gen)) > > > + promoted +=3D delta; > > > + } > > > + > > > + if (promoted) { > > > + __update_lru_size(lruvec, lru, zone, -promoted); > > > + __update_lru_size(lruvec, lru + LRU_ACTIVE, zone, pro= moted); > > > + } > > > +} > > > + > > > /* protect pages accessed multiple times through file descriptors */ > > > -static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio,= bool reclaiming) > > > +static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, > > > + bool reclaiming, struct gen_update_batch *ba= tch) > > > { > > > + int delta =3D folio_nr_pages(folio); > > > int type =3D folio_is_file_lru(folio); > > > struct lru_gen_folio *lrugen =3D &lruvec->lrugen; > > > int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[typ= e]); > > > @@ -3138,7 +3176,8 @@ static int folio_inc_gen(struct lruvec *lruvec,= struct folio *folio, bool reclai > > > new_flags |=3D BIT(PG_reclaim); > > > } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags)); > > > > > > - lru_gen_update_size(lruvec, folio, old_gen, new_gen); > > > + batch->delta[old_gen] -=3D delta; > > > + batch->delta[new_gen] +=3D delta; > > > > > > return new_gen; > > > } > > > @@ -3672,6 +3711,7 @@ static bool inc_min_seq(struct lruvec *lruvec, = int type, bool can_swap) > > > { > > > int zone; > > > int remaining =3D MAX_LRU_BATCH; > > > + struct gen_update_batch batch =3D { }; > > > > Can this batch variable be moved away from the stack? We (Google) use > > a much larger value for MAX_NR_GENS internally. This large stack > > allocation from "struct gen_update_batch batch" can significantly > > increase the risk of stack overflow for our use cases. > > > > Thanks for the info. > Do you have any suggestion about where we should put the batch info? I > though about merging it with lru_gen_mm_walk but that depend on > kzalloc and not useable for slow allocation path, the overhead could > be larger than benefit in many cases. > > Not sure if we can use some thing like a preallocated per-cpu cache > here to avoid all the issues. Hi Wei, After second thought, the batch is mostly used together with folio_inc_gen which means most pages are only being moved between two gens (being protected/unreclaimable), so I think only one counter int is needed in the batch, I'll update this patch and do some test based on this.