From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC514E7716A for ; Sun, 15 Dec 2024 15:05:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC9016B007B; Sun, 15 Dec 2024 10:05:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B79246B0083; Sun, 15 Dec 2024 10:05:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A40E36B0085; Sun, 15 Dec 2024 10:05:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 88E196B007B for ; Sun, 15 Dec 2024 10:05:12 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EF12448D33 for ; Sun, 15 Dec 2024 15:05:11 +0000 (UTC) X-FDA: 82897515876.11.F4873C8 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 907DB4001E for ; Sun, 15 Dec 2024 15:04:48 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=YNzKafT6; spf=pass (imf01.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734275080; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yEwAZqA7cPDFNpD9NFtEsbn9k6ENEkCD4R5qAgLEaxE=; b=CVDn6pGwAMdcpXMPisWtXRPzQTd6qQxWqxWZpw/wGwWKI48ogA1KeSKcHtETWxKSMgFow/ T+dnlGm/arlfs1fKlJmoEbpeVmmQDUY5786JTt+/GpO2FbwxaM/VyksEDibDNf4zrsDKjt SDcy+So3/d5BR6Kbj1KJGxRKsoJ9M6Y= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=YNzKafT6; spf=pass (imf01.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734275080; a=rsa-sha256; cv=none; b=wpHrU9ydad/zzfVrDLDlEsWBuYnCqR2JawHBKIjpCX0+yWUNOR38P4t7fHpWo/smtNwNcH 0VaB1ye00WHDdhTl48bK1m4oBaT9Kf9+1CSxMXqZ6Zd7Hr83gCrCGUsTaB8ggudcGLDcg7 dOlFr507NEfIu+hLLTpywIMyBmDv+Aw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 9D5295C5A90 for ; Sun, 15 Dec 2024 15:04:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8B05FC4CEDE for ; Sun, 15 Dec 2024 15:05:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734275108; bh=+Kj3xJFv8izYEL+HCpmNVhZmnq59sFrQ9yzwcNHINHw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=YNzKafT6xvU+6845KZzcmt5iw76Zi0N/N6IsA2MhotMkGArVxd/ahS4anKr6h2C8x 2x/QmCtZxh640B6ovzJO9Ct9+wIpmV46389pjJ70RIPX4wQULV9rdODP3ha04yuIAO JQ0bguNmv9kW7kqrshq1lwArkvDN3evCG2R0PZVJyakc9BPDaQCrGGFVPB0rpGkLeR zL/+/+kzjudKCkqaS3qNU6rHYuCMCb4nV3BEm/6MFwqsJSozyfQL0JX+JDAtGuwAwX bjqZmE7fdiuiAt+oYmB7anNWuVIvZzaaEbSMWsknHlnoqcJHdFzJg+s03laTQcSqbM do3cHRl5nmK/w== Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2ffdbc0c103so30217881fa.3 for ; Sun, 15 Dec 2024 07:05:08 -0800 (PST) X-Gm-Message-State: AOJu0YxKN+wL54hCK8IULaoTOFfL+eVac+tG3uim1/FFYfCvvJTYV9zL CYicK52JMh4LvfK0lao9F0roaDdVxSm7nF2xiLSYb5yQJAczeklMon6wHwG2H0qDfmRWIUzvRj/ KnifzMNVO5DHkGoQJqtlaytk/Pg== X-Google-Smtp-Source: AGHT+IGU+GxDh/4rDUsMQSl00cmOVScHum5073oQMBpLAKWzA5k0SDG5O3mlSQohCd7fBL3uEXLpS3VV8OfA+jcPyJo= X-Received: by 2002:a2e:b8c4:0:b0:302:4171:51f0 with SMTP id 38308e7fff4ca-3025434c5b8mr30816341fa.0.1734275107169; Sun, 15 Dec 2024 07:05:07 -0800 (PST) MIME-Version: 1.0 References: <20241210092805.87281-1-ryncsn@gmail.com> <20241210092805.87281-4-ryncsn@gmail.com> In-Reply-To: From: Chris Li Date: Sun, 15 Dec 2024 07:04:55 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 3/3] mm, swap_cgroup: remove global swap cgroup lock To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Hugh Dickins , "Huang, Ying" , Yosry Ahmed , Roman Gushchin , Shakeel Butt , Johannes Weiner , Barry Song , Michal Hocko , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 907DB4001E X-Rspam-User: X-Stat-Signature: ohujdbfoxhdmjd7o7rpeafgr5hhhsj7j X-HE-Tag: 1734275088-118190 X-HE-Meta: U2FsdGVkX18szzwo9fRQrq8nNn2nTFqwmLNRTGNgZyMJersCXRbx6lpumWrCFaYuMAdljgCkXbkVhFk1VZ07EaC+bomVhgMXI1zmIe2RQ3c/31fAa/2Y78RnlUx9+vTUgsW6aCME7ODHeVH5nT3WM40sfvZR77xFVEZQ8uiojWzwiMebIwScf7tZENwkWM19sY3T454aCPa5aOojSdaZm+ppINe/A8eVoW8MfeXR71BFdAl47Tr3CNLL6QZsdA2tQbuqj34tLAgCJWTu/Oyw65XshQQ0mx/NafBMpwc7+ByDnI4VT8tJgM87qA0KkG1W6xPNRGpLlv7UR1LpUy2oZEZm5tmZ7n3B1Yk/kHor8K6idkiBCar915cPcnIhwbW4KP4fS2doRpgizDKlmlLc+myQGP4hBonQsHI2H9wTG8ye3W5n263K0CbhlXW8X0DqMgGCcUH/+6YU90DX1fPHBd1TM/OI3qWMFLrwcirkbC2StLgoJFpyZGbwKwmSTzP6ZSQyZvjO9tvgPkX2Cr8h369sWX7IyCJNVCe1WSVOzFoP9Wy2tgEpFBpWzTZy7PU8TySw7nL2Sp1Jid9ehPsp6abBfVew0BKvnoQEBf57rbs1lvIvaagj/rqFkWPWqMLrFu8gHLOkw0tr0y7UubyDI7L7tvovU48pAbkxX9IMOY9iG1N7Pc57EwmCnZmACcYBWvLR+/xfayyp3SrJYSaDe57GHxxzghNpKPeavsJRBdcUxlCBJtk8jT6oyRNFfGVB+zerNIiEp1fcgP/U/NsMLm601L5OmmXFSId/6j4SyFlr48BqQUFD0A8hp05xmqbuMHuRIXROHlZRVzO7ZRcgUDkiX2zciMG/semHIw9ditXXMQvU8pGADuDR85zzYWmdY+K0yhBZ3rJT4JMEap5hj3BGJJPCKS8TmgrmiFyaPBqNxs0NJ8bsaVwz6Z06hwy2hchtbfSzo/Ho9bHlB5e BX/pDorT G3vducL6AppASN+G3K1nzZq5PfIkk72g7zYg3LQBYmqTUADf2Lt1gGuzLwGXNg3kXaNXQ3CpM/NdkSJNNeWGdQcvpVpVBRUj7GDkVNAPzIZikyinJ03cnspCP1R/RDgPPPfZ4dpmY7MOoR5ypenuDpMWnCXDVloANVHqVHeWx2WshwpR0qg5eY+fuaOCpz+HTGrbGTo1FUGqxwNnfYpF5DFIhqGwUbgU7G6hhI+48eqm2WAGbxZ35pZYFc/sOrw244e9peLAWqdPuuWU9ipjzp8NgGWUr3c9ETj2A0tK57WDTwR5AjMETn25B3nAoPn0CbjIsoqxx4sh6TWP5RqpteiLafitTbj68ivvYxWs3ZauMJBEfEo9sbSoqD8yIEuXt7gxZURY7NSG45MOj2VEyJ6gfJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Dec 14, 2024 at 11:48=E2=80=AFAM Kairui Song wro= te: > > On Sun, Dec 15, 2024 at 12:07=E2=80=AFAM Chris Li wro= te: > > > > On Tue, Dec 10, 2024 at 1:29=E2=80=AFAM Kairui Song = wrote: > > > > > > From: Kairui Song > > > > > > commit e9e58a4ec3b1 ("memcg: avoid use cmpxchg in swap cgroup maintai= nance") > > > replaced the cmpxchg/xchg with a global irq spinlock because some arc= hs > > > doesn't support 2 bytes cmpxchg/xchg. Clearly this won't scale well. > > > > > > And as commented in swap_cgroup.c, this lock is not needed for map > > > synchronization. > > > > > > Emulation of 2 bytes xchg with atomic cmpxchg isn't hard, so implemen= t > > > it to get rid of this lock. Introduced two helpers for doing so and t= hey > > > can be easily dropped if a generic 2 byte xchg is support. > > > > > > Testing using 64G brd and build with build kernel with make -j96 in 1= .5G > > > memory cgroup using 4k folios showed below improvement (10 test run): > > > > > > Before this series: > > > Sys time: 10809.46 (stdev 80.831491) > > > Real time: 171.41 (stdev 1.239894) > > > > > > After this commit: > > > Sys time: 9621.26 (stdev 34.620000), -10.42% > > > Real time: 160.00 (stdev 0.497814), -6.57% > > > > > > With 64k folios and 2G memcg: > > > Before this series: > > > Sys time: 8231.99 (stdev 30.030994) > > > Real time: 143.57 (stdev 0.577394) > > > > > > After this commit: > > > Sys time: 7403.47 (stdev 6.270000), -10.06% > > > Real time: 135.18 (stdev 0.605000), -5.84% > > > > > > Sequential swapout of 8G 64k zero folios with madvise (24 test run): > > > Before this series: > > > 5461409.12 us (stdev 183957.827084) > > > > > > After this commit: > > > 5420447.26 us (stdev 196419.240317) > > > > > > Sequential swapin of 8G 4k zero folios (24 test run): > > > Before this series: > > > 19736958.916667 us (stdev 189027.246676) > > > > > > After this commit: > > > 19662182.629630 us (stdev 172717.640614) > > > > > > Performance is better or at least not worse for all tests above. > > > > > > Signed-off-by: Kairui Song > > > --- > > > mm/swap_cgroup.c | 73 +++++++++++++++++++++++++++++-----------------= -- > > > 1 file changed, 45 insertions(+), 28 deletions(-) > > > > > > diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c > > > index 1770b076f6b7..a0a8547dc85d 100644 > > > --- a/mm/swap_cgroup.c > > > +++ b/mm/swap_cgroup.c > > > @@ -7,19 +7,20 @@ > > > > > > static DEFINE_MUTEX(swap_cgroup_mutex); > > > > > > +/* Pack two cgroup id (short) of two entries in one swap_cgroup (ato= mic_t) */ > > > > Might not be two short if the atomic_t is more than 4 bytes. The > > assumption here is that short is 2 bytes and atomic_t is 4 bytes. It > > is hard to conclude that is the case for all architecture. > > > > > +#define ID_PER_SC (sizeof(atomic_t) / sizeof(unsigned short)) > > > > You should use "sizeof(struct swap_cgroup) / sizeof(unsigned short)", > > or get rid of struct swap_cgroup and directly use atomic_t. > > > > > +#define ID_SHIFT (BITS_PER_TYPE(unsigned short)) > > > +#define ID_MASK (BIT(ID_SHIFT) - 1) > > > struct swap_cgroup { > > > - unsigned short id; > > > + atomic_t ids; > > > > You use struct swap_cgroup and atomic_t which assumes no padding added > > to the struct. You might want to build an assert on sizeof(atomic_t) > > =3D=3D sizeof(struct swap_cgroup). > > Good idea, asserting struct swap_croup is an atomic_t ensures no > unexpected behaviour. > > > > > > }; > > > > > > struct swap_cgroup_ctrl { > > > struct swap_cgroup *map; > > > - spinlock_t lock; > > > }; > > > > > > static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; > > > > > > -#define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup)) > > > - > > > /* > > > * SwapCgroup implements "lookup" and "exchange" operations. > > > * In typical usage, this swap_cgroup is accessed via memcg's charge= /uncharge > > > @@ -30,19 +31,32 @@ static struct swap_cgroup_ctrl swap_cgroup_ctrl[M= AX_SWAPFILES]; > > > * SwapCache(and its swp_entry) is under lock. > > > * - When called via swap_free(), there is no user of this entry an= d no race. > > > * Then, we don't need lock around "exchange". > > > - * > > > - * TODO: we can push these buffers out to HIGHMEM. > > > */ > > > -static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, > > > - struct swap_cgroup_ctrl **ctr= lp) > > > +static unsigned short __swap_cgroup_id_lookup(struct swap_cgroup *ma= p, > > > + pgoff_t offset) > > > { > > > - pgoff_t offset =3D swp_offset(ent); > > > - struct swap_cgroup_ctrl *ctrl; > > > + unsigned int shift =3D (offset & 1) ? 0 : ID_SHIFT; > > > > Might not want to assume the ID_PER_SC is two. If some architecture > > atomic_t is 64 bits then that code will break. > > Good idea, atomic_t is by defining an int, not sure if there is any > strange archs will change the size though, more robust code is always > better. > > Can change this to (offset % ID_PER_SC) instead, the generated machine > code should be still the same for most archs. > > > > > > + unsigned int old_ids =3D atomic_read(&map[offset / ID_PER_SC]= .ids); > > > > Here assume sizeof(unsigned int) =3D=3D sizeof(atomic_t). Again,some > > strange architecture might break it. Better use unsigned version of > > aotmic_t; > > > > > > > > > > - ctrl =3D &swap_cgroup_ctrl[swp_type(ent)]; > > > - if (ctrlp) > > > - *ctrlp =3D ctrl; > > > - return &ctrl->map[offset]; > > > + return (old_ids & (ID_MASK << shift)) >> shift; > > > > Can be simplified as (old_ids >> shift) & ID_MASK. You might want to > > double check that. > > > > > +} > > > + > > > +static unsigned short __swap_cgroup_id_xchg(struct swap_cgroup *map, > > > + pgoff_t offset, > > > + unsigned short new_id) > > > +{ > > > + unsigned short old_id; > > > + unsigned int shift =3D (offset & 1) ? 0 : ID_SHIFT; > > > > Same here, it assumes ID_PER_SC is 2. > > > > > + struct swap_cgroup *sc =3D &map[offset / ID_PER_SC]; > > > + unsigned int new_ids, old_ids =3D atomic_read(&sc->ids); > > > > Again it assumes sizeof(unsigned int) =3D=3D sizeof(atomic_t). > > I think this should be OK? The document says "atomic_t, atomic_long_t > and atomic64_t use int, long and s64 respectively". > > Could change this with some wrapper but I think it's unnecessary. Ack. > > > > > > + > > > + do { > > > + old_id =3D (old_ids & (ID_MASK << shift)) >> shift; > > Can be simplify: > > old_id =3D (old_ids >> shift) & ID_MASK; > > > > > + new_ids =3D (old_ids & ~(ID_MASK << shift)); > > > + new_ids |=3D ((unsigned int)new_id) << shift; > > > > new_ids |=3D (atomic_t) new_id << shift; > > atomic_t doesn't work with bit operations, it must be an arithmetic > type, so here I think it has to stay like this. Ack Chris