From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC51AE77181 for ; Tue, 10 Dec 2024 07:05:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AC4D6B00A3; Tue, 10 Dec 2024 02:05:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15CB06B010C; Tue, 10 Dec 2024 02:05:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3ED36B012A; Tue, 10 Dec 2024 02:05:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CF91D6B0128 for ; Tue, 10 Dec 2024 02:05:52 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 547AE413F5 for ; Tue, 10 Dec 2024 07:05:52 +0000 (UTC) X-FDA: 82878164292.28.029AFC2 Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by imf26.hostedemail.com (Postfix) with ESMTP id F3F3414000A for ; Tue, 10 Dec 2024 07:05:33 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bv6lDQen; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733814336; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+r40tdqGxedKVHyxQ11mMVfv6cV8xIj7UlzXlFGqZx0=; b=RGupdGfAf+sb4JhyzBJigm47eRPaf+hXsPfUmOmo+L0Sjw5BaUtcPWI4ZP/ZngqlzMF67X nG7+A1jVjgylk/SEjJN29Ty0Zl3n52uk4pV47X5+sygcDj3Tz73eIlr7abI3c7UqO2xABf DdXZYmVWH9nQ7qbrTcJLTxX3warx62o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733814336; a=rsa-sha256; cv=none; b=MCbnUdd3lOhgfs36zqBWOlDGEwrLICCe1fuQSPmoH0qNQ75/65HCrqy2tFeLZJyOJHXmgu 4Mv64HVkJTPdpaMLb3Crz1rFryqVzAyRcF/yo8YC5/jrNB9ujUMyHYcQe+T7Af/xI22PUQ UauiaWVWh92FCBWdTrlctD6naxYN7YQ= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bv6lDQen; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.178 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-300479ca5c6so24634521fa.3 for ; Mon, 09 Dec 2024 23:05:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733814348; x=1734419148; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+r40tdqGxedKVHyxQ11mMVfv6cV8xIj7UlzXlFGqZx0=; b=bv6lDQendulq4mrAQ6iErLg1ZpLmdQyf0i//zGtKwEJNWCpXVMcbgReV540Vmn36dY 3FbDTx3am8u2dUjAMJQppvj/VcN7ftH1UqDLSW54nar89um/VYEydf4kwWCnkdIxc6hf Oco59TbMnnv1IUnTZJe8bNvl2Hhvo5GtfCTyqHN/bIFph9DEmBqnjUB6krC9fFGkE9EX P4NrSbK52Y7EvdFtMk0xZCznfhh7s49JeTxEuM7pihijeFLs5Inzttta9L1imBF/QVqw yb9vn3VeJrdwWgSnOIFnYneimFmHG5PrFE2Cw9EsEmq5aojH+DjRVNWCG5wuUHMCA4xW Iqjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733814348; x=1734419148; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+r40tdqGxedKVHyxQ11mMVfv6cV8xIj7UlzXlFGqZx0=; b=pBZSqwsmKp82XGFqk5w7jSUIjz9sCsV9Q7cNI26p98JPCMA8MzdHSp6tQV5C/38RCZ 63riERrHD63yJigZKiRH5+/Dt/yp1zba3W5If4OP+EBzXBxjnvLSg5YUyKIbc17OOl/C fR3HtcXG/FL8EGxP8zmPGolTTuNFpFS+ByzMe/x0Em+mEPi4tZzI/u1AfcNvOyzni672 pYTM46LaEet5OolwkIsXm/DicQ5L8j8CtORjE23LMti5LSvmRHFpgMkf9zz0pO+x/5xc 7eWshBbIQhzHbWZNNM2iL5GvZ/92b16CkVhXxTVOaI+Gjy3H8AVk+AnleWeGxQDL2aAZ gt4Q== X-Gm-Message-State: AOJu0Yx3nFi24cGjvipRmlf9cOh2ai0t/PrPF+anfG48MS3/EPUGzUXs KDCqmLqozrKnRVxJbPr680tDilcyTlBK72JKMulSWkHMPMq/FCEGmx4XgPhoGcJJkllFcCmxg79 QP9u1NlPAFLd2SFYDDEvIAUKBmpA= X-Gm-Gg: ASbGncsZUFp28tc3uW0dZRY1MUIYhDq9YOH17VQ33vAm7qufuWScYQtn36BTsyw5fMD KcAOQXcK+7HzKgKt9nMbPYYHiGykyaGBSCiA= X-Google-Smtp-Source: AGHT+IEjoI16xcnZF00aY7SZ9MT7aVBjK3lvkjb0W6tTxbslbqdA3eSBFAwxcLHXTBxMh/ShOeZiwD7vHDwNxwkCxfs= X-Received: by 2002:a05:651c:1556:b0:2ff:a89b:4210 with SMTP id 38308e7fff4ca-3022fb3ef21mr11751991fa.8.1733814348062; Mon, 09 Dec 2024 23:05:48 -0800 (PST) MIME-Version: 1.0 References: <20241202184154.19321-1-ryncsn@gmail.com> <20241202184154.19321-5-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Tue, 10 Dec 2024 15:05:31 +0800 Message-ID: Subject: Re: [PATCH 4/4] mm, swap_cgroup: remove global swap cgroup lock To: Chris Li Cc: linux-mm@kvack.org, Andrew Morton , Hugh Dickins , "Huang, Ying" , Yosry Ahmed , Roman Gushchin , Shakeel Butt , Johannes Weiner , Barry Song , Michal Hocko , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: F3F3414000A X-Stat-Signature: btug7ukf31qkp6nd73cetmu1aaemumqb X-Rspam-User: X-HE-Tag: 1733814333-603810 X-HE-Meta: U2FsdGVkX19DUWMqXNu4e9mFtrbwYTrKTRoy+xD/QGHHo+pbhIplRhWxjT49srmmFe/wlgRbi09u2NBLyUl/sMnIPRj4Pe+mJ4/wL2giHlf/8wet6yippF14wWJm1KwCw6e8nzAYlTpAe7PZGtw8WenRxuTqG3TwUrQHVq6wRX1Ej7skbN2+uKLd8RIX6HEl0jdbW7dwReYexarvN/mCki+Wpl6yaGgI4z3XQDFYsnv7Vv1CIHtzPKP32ZUf4wORORQM0lRilXzqLbYnoqRo7jWZ+PLskamwDvzsOvXSNMwhu29cwXdqRphJ7oK7Gwd5FpJhilLZGXVUpULbL5leJRq8cupxyB+bzNtOd9CbrOA9FYQFeKSkhQKdBs9HZcJtllp1GeqJdx7EBYJmz46eXmdT2js7kITHC1xeSQpjr2L8CrrWnYE3WiigWbd+EU/bSdFkDr2PFDd9pq8CIioHdsmqQfG8qAuWeoalmhU9f8+R2+56BK/8WQhpCKH3x5mZFTpITFETrKGcdNXEnootWrisZ6aynMA5xsiUyGXYCJO2dWb36fzL9VFKDBYFxpm/Yuh/8huctRCHrqsEfeHKABJQyNj1kLi5at5H47fmYJjfevxC94p0ZnQJXFBASu7Ee7K4pLeA9Ju2M9c48Y6ju/uSmDd2R/S5+1a74DH4hrqvUuh8leDDJ82P7uvFViu94Z3xcWD3Ksz+WxZ+EQlASK5Hd9N5HiwlvCKJnlPQyjJiKbJdtlDqvp3gPgQROxRevApqnQJ9mxl1L3WhryksSUZMRVQSSNAuxTWGbmgt0sdJtQ43y4BMaF2UE62OPk3jviss7Ora+EjpOQ9hM3VfNP9gx8CzsqoCHNEos54Pqr8imTZJ+BqcaMbtCSbK/qniKwu657ttmgi4QsFWfflYigvHp95IWXGtKQTDxONByIUBSEci5BMRcYpPWkMI/Tpxxn1n+Oftps5FYMNDVcj 3bTTxoNP x2Nrdw3yOg6KH/K+hCuHw4+LziVd3nMPHNz7edzO8GyStwnYjBOrH6u3XQYerYOSJOijw4XqA4ZxQU5vFeby3BODrGMZDvQcz/Wq5HBia+Elkl/W6NhrCXlcYxTiwM1/AdyvsAqg1vR0wcwqJT2a1Va8yFuadg/7Fpo0PSB5cquJMx3x8otvHSO2dc42swFC0Dr/SvCqBjVpL+jC6Mltf7/r2AZSbiqqpSiR+ZRTveTKuxzfUCqZA7C/iNXbALnw/ngm+CykhSSYBr0z70jG0iwxns2UdsGgSylpMfIJkGHjOdsOAldLR9kFgqljS9EAp8+9HP1hl8ZNbyAYJmz7hoHstNr5C+gMbRFGsLggZ9wOMrmtjltuJVqbwCRSa4t5jGqSPTM4q2/jxJV97ZLPIeAHWeWXLm2EcNRW43NvehecmstPEKdJPj9PkFw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 5, 2024 at 3:40=E2=80=AFAM Chris Li wrote: > > On Mon, Dec 2, 2024 at 10:42=E2=80=AFAM Kairui Song wr= ote: > > > > From: Kairui Song > > > > commit e9e58a4ec3b1 ("memcg: avoid use cmpxchg in swap cgroup maintaina= nce") > > replaced the cmpxchg/xchg with a global irq spinlock because some archs > > doesn't support 2 bytes cmpxchg/xchg. Clearly this won't scale well. > > > > And as commented in swap_cgroup.c, this lock is not needed for map > > synchronization. > > > > Emulation of 2 bytes cmpxchg/xchg with atomic isn't hard, so implement > > it to get rid of this lock. > > > > Testing using 64G brd and build with build kernel with make -j96 in 1.5= G > > memory cgroup using 4k folios showed below improvement (10 test run): > > > > Before this series: > > Sys time: 10730.08 (stdev 49.030728) > > Real time: 171.03 (stdev 0.850355) > > > > After this commit: > > Sys time: 9612.24 (stdev 66.310789), -10.42% > > Real time: 159.78 (stdev 0.577193), -6.57% > > > > With 64k folios and 2G memcg: > > Before this series: > > Sys time: 7626.77 (stdev 43.545517) > > Real time: 136.22 (stdev 1.265544) > > > > After this commit: > > Sys time: 6936.03 (stdev 39.996280), -9.06% > > Real time: 129.65 (stdev 0.880039), -4.82% > > > > Sequential swapout of 8G 4k zero folios (24 test run): > > Before this series: > > 5461409.12 us (stdev 183957.827084) > > > > After this commit: > > 5420447.26 us (stdev 196419.240317) > > > > Sequential swapin of 8G 4k zero folios (24 test run): > > Before this series: > > 19736958.916667 us (stdev 189027.246676) > > > > After this commit: > > 19662182.629630 us (stdev 172717.640614) > > > > Performance is better or at least not worse for all tests above. > > > > Signed-off-by: Kairui Song > > --- > > mm/swap_cgroup.c | 56 +++++++++++++++++++++++++++++++++++------------- > > 1 file changed, 41 insertions(+), 15 deletions(-) > > > > diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c > > index a76afdc3666a..028f5e6be3f0 100644 > > --- a/mm/swap_cgroup.c > > +++ b/mm/swap_cgroup.c > > @@ -5,6 +5,15 @@ > > > > #include /* depends on mm.h include */ > > > > +#define ID_PER_UNIT (sizeof(atomic_t) / sizeof(unsigned short)) > > You might want to have some compile time assert that (sizeof(atomic_t) > % sizeof(unsigned short)) is zero. Could not hurt. > > > +struct swap_cgroup_unit { > > + union { > > + int raw; > > + atomic_t val; > > + unsigned short __id[ID_PER_UNIT]; > > + }; > > +}; > > I suggest just getting rid of this complicated struct/union and using > bit shift and mask to get the u16 out from the atomic_t. Good suggestion. > > > + > > static DEFINE_MUTEX(swap_cgroup_mutex); > > > > struct swap_cgroup { > > @@ -12,8 +21,10 @@ struct swap_cgroup { > > }; > > > > struct swap_cgroup_ctrl { > > - unsigned short *map; > > - spinlock_t lock; > > + union { > > + struct swap_cgroup_unit *units; > > + unsigned short *map; > > You really shouldn't access the map as an "unsigned short" array, > therefore, I suggest changing the array pointer to "atomic_t". > > > + }; > > }; > > > > static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; > > @@ -31,6 +42,24 @@ static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_= SWAPFILES]; > > * > > * TODO: we can push these buffers out to HIGHMEM. > > */ > > +static unsigned short __swap_cgroup_xchg(void *map, > > + pgoff_t offset, > > + unsigned int new_id) > > +{ > > + unsigned int old_id; > > + struct swap_cgroup_unit *units =3D map; > > + struct swap_cgroup_unit *unit =3D &units[offset / ID_PER_UNIT]; > > + struct swap_cgroup_unit new, old =3D { .raw =3D atomic_read(&un= it->val) }; > > + > > + do { > > + new.raw =3D old.raw; > > + old_id =3D old.__id[offset % ID_PER_UNIT]; > > + new.__id[offset % ID_PER_UNIT] =3D new_id; > > + } while (!atomic_try_cmpxchg(&unit->val, &old.raw, new.raw)); > > I suggest just calculating the atomic_t offset (offset / > ID_PER_UNIT) and getting the address of the atomic_t. > Then use the mask and shift to construct the new atomic_t value. It is > likely to generate better code. > You don't want the compiler to generate memory load and store for > constructing the temporary new value. > I haven't checked the machine generated code, I suspect the compiler > is not smart enough to convert those into register shift here. Which > is what you really want. > > > + > > + return old_id; > > +} > > + > > /** > > * swap_cgroup_record - record mem_cgroup for a set of swap entries > > * @ent: the first swap entry to be recorded into > > @@ -44,22 +73,19 @@ unsigned short swap_cgroup_record(swp_entry_t ent, = unsigned short id, > > unsigned int nr_ents) > > { > > struct swap_cgroup_ctrl *ctrl; > > - unsigned short *map; > > - unsigned short old; > > - unsigned long flags; > > pgoff_t offset =3D swp_offset(ent); > > pgoff_t end =3D offset + nr_ents; > > + unsigned short old, iter; > > + unsigned short *map; > > Make it an atomic_t pointer here as well. > > > > > ctrl =3D &swap_cgroup_ctrl[swp_type(ent)]; > > map =3D ctrl->map; > > > > - spin_lock_irqsave(&ctrl->lock, flags); > > - old =3D map[offset]; > > + old =3D READ_ONCE(map[offset]); > > Ah, you shouldn't perform u16 reading directly. That will get into the > endian problem of how the u16 is arranged into atomic_t. You should do > atomic reading then shift the bits out so you don't have the endian > problem. It is a bad idea mixing atomic updates and reading the middle > of the atomic address location. Good suggestion, convert the whole map into atomic_t and access / xchg with bit shifts is also OK, mixing atomic with other types may lead to misuse indeed. > > Chris > > > do { > > - VM_BUG_ON(map[offset] !=3D old); > > - map[offset] =3D id; > > + iter =3D __swap_cgroup_xchg(map, offset, id); > > + VM_BUG_ON(iter !=3D old); > > } while (++offset !=3D end); > > - spin_unlock_irqrestore(&ctrl->lock, flags); > > > > return old; > > } > > @@ -85,20 +111,20 @@ unsigned short lookup_swap_cgroup_id(swp_entry_t e= nt) > > > > int swap_cgroup_swapon(int type, unsigned long max_pages) > > { > > - void *map; > > + struct swap_cgroup_unit *units; > > struct swap_cgroup_ctrl *ctrl; > > > > if (mem_cgroup_disabled()) > > return 0; > > > > - map =3D vzalloc(max_pages * sizeof(unsigned short)); > > - if (!map) > > + units =3D vzalloc(DIV_ROUND_UP(max_pages, ID_PER_UNIT) * > > + sizeof(struct swap_cgroup_unit)); > > + if (!units) > > goto nomem; > > > > ctrl =3D &swap_cgroup_ctrl[type]; > > mutex_lock(&swap_cgroup_mutex); > > - ctrl->map =3D map; > > - spin_lock_init(&ctrl->lock); > > + ctrl->units =3D units; > > mutex_unlock(&swap_cgroup_mutex); > > > > return 0; > > -- > > 2.47.0 > > >