From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E51CE69E81 for ; Mon, 2 Dec 2024 19:28:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26BC86B0088; Mon, 2 Dec 2024 14:28:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F4996B0089; Mon, 2 Dec 2024 14:28:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06EDD6B008A; Mon, 2 Dec 2024 14:28:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DA51C6B0089 for ; Mon, 2 Dec 2024 14:28:49 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9029F80298 for ; Mon, 2 Dec 2024 19:28:49 +0000 (UTC) X-FDA: 82851006000.21.61EC4B6 Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) by imf09.hostedemail.com (Postfix) with ESMTP id 689E814000D for ; Mon, 2 Dec 2024 19:28:39 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RarCdYbU; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733167715; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mGF7RbAroiJxVx8e3jT5nECdT705GYeY+ycgQDnbrIk=; b=GgX0xr+7L7gq0R5pcsiW+bxrQqLZpEBVX56Lq8BY5Vb++6m40WwvtfhsUmpNi0rWVMxRRu xqg5vJ//MvSj6R/jZOjlVfS9TQKUoXQqhAw4SmKOQk+5Gf1JIwn+OiU92R1CAl0c4jruo2 H9JL+nFrLEtcZcxn4AqTvzvYZudYGvU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733167715; a=rsa-sha256; cv=none; b=LYJHtix9IeuBHiHomum1yd+OBdvb4i28Hlo3VPvWhgeiXwH0ZqwKt53v10jNcIlZoCNr+1 MLYtDQa6WXEu+MN1306j/0V87P2DJVVkn8/Cv8CCTCp5G0V/+eikDuv9v93kHK1tx5JX4I lGQUmD0APStQvh1ifRzZA2YDaHF9FIs= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RarCdYbU; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6d87a55bc50so31369406d6.3 for ; Mon, 02 Dec 2024 11:28:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733167727; x=1733772527; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mGF7RbAroiJxVx8e3jT5nECdT705GYeY+ycgQDnbrIk=; b=RarCdYbUOTEftIHnCRc/VfidIbLTD+BESGZVuG8ImO1f/XnqUugT+YcKfWP05ctgnm vUG6z/lmjTNwOl8oz4+h/FI3xs8jJjHD2tStnH7aSTBcQsdeRMpEZ9fOQE62rPtm1Rl/ wy5zdNqafZ55qnMGerJfODQTXSbSwpW3KpL7DsG6NdYMost3/d5WMyph06PnxuiX9Yv0 dT2W1sjJi1XqBGDLz2nqoicNP27lV1+N6b2SRhhkUeAvJCeqLeEyNbTM1Vfgnl7oHDy6 iVpxfTL4zUO6fS3PChH7KxkX4pKDr3V2A+Z3sPb1yTfN2LWb22jFb92JdmXrlnOn5fbS Q4xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733167727; x=1733772527; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mGF7RbAroiJxVx8e3jT5nECdT705GYeY+ycgQDnbrIk=; b=v48xthWkG86ttHDz588N/cDuiVK7T0Rksvz698C2noDwmRZFtvf3HOTY9ECDanULa0 KHp48UtTN0RWH7eI6/didM3SzkRX7i/0MCog24JkF7tbjc8vXh3I0SxkxeEN6iGRa3v1 oViHME5mVJXr5V7P9Yla7uBLWsHWr95U3XDQtTOwNZcxscxWuDw8qdMYz/pkgunLwu7z Fp2QhR5AyIHaJe5P2GyoQJnw3JtpkXUeDJyGzj+wPPZo+zL1uB+AoYSpxx9HjdivaLip gsn0g1RjmulVDYuT8IYomZLOCjqQdsD6qktko8kPDkZkj9zRnZ+ap87uB5WX8kdXz8q9 ml7w== X-Gm-Message-State: AOJu0YxepxfEeg8oxgelXRa/R/IovUoldmyzh5EYvddqDevtZewcjcEQ ofCYzulXibqLpXxzX77HYDxF9QfEv2oOFBRcyGoA1JNT/LIzgL75NcLbyeCkXPNZUpKz09OT/H+ 2eCR5hpmfg1gMPe5+/YtD2uvWhYSyRi22cLzu X-Gm-Gg: ASbGncvGOQ1K1AQLOeeVvucTjs34KpUmrBOizNv85bB49jPza30dfF8NOIcTBcJdiuS 9emRDEXrSf8BBn6LxhiaapEEqBd9P X-Google-Smtp-Source: AGHT+IHDPlGKD2q6YGQSdjniBfbiR7T8Kq/4oaSX8i8I5QFnRnCPMPWSSmT13fQAzrcDFZLSgWNFL5JUY6hLCszbiLs= X-Received: by 2002:a05:6214:29c2:b0:6d8:9e9d:c223 with SMTP id 6a1803df08f44-6d89e9dc6demr106752546d6.7.1733167726726; Mon, 02 Dec 2024 11:28:46 -0800 (PST) MIME-Version: 1.0 References: <20241202184154.19321-1-ryncsn@gmail.com> <20241202184154.19321-5-ryncsn@gmail.com> In-Reply-To: <20241202184154.19321-5-ryncsn@gmail.com> From: Yosry Ahmed Date: Mon, 2 Dec 2024 11:28:10 -0800 Message-ID: Subject: Re: [PATCH 4/4] mm, swap_cgroup: remove global swap cgroup lock To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Chris Li , Hugh Dickins , "Huang, Ying" , Roman Gushchin , Shakeel Butt , Johannes Weiner , Barry Song , Michal Hocko , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 689E814000D X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 4y4ufnwccte8i8ftuxqupku14swfd6is X-HE-Tag: 1733167719-257213 X-HE-Meta: U2FsdGVkX18R7xbbGzdfKspoZyY1BZ/+q/3VMbCqKtBrmURQ25GqKafjilC35kyxHKSVM3V4Dd40+AD321tO8dK1J2rSSQzW0N5ho6esW2NTgqJLDpcxLnXlwmLkcJU9HO1MUIAZxbemtw46lnKp+8r9CVsGnASaNB3kFZgr2c6zPKsCNhCKJsAIks3CuqlGbeIYub/K6UCNomib0QQPUe9repsR+B7rz3bWasZykdTDcZ92ifu90YYjn/CuxdE+oC8tia1yCxC9SK8aaRgQoOLbKK9vf/t9nGZJdtbLPmeaXQANWC0It+Xw6kUxQUQEyo1aGUvGejBuCqZ1GxqQY5+372Dtu47wD1I03WNikLKAZPp0Ru6PzWVOToZECLUjh3I4B0pJxrB8ib7SLDgibSoU1EBy+Ox3bP28SiCtfS14yGPI44igIH5ZTzr+goUA80/xF9fpn7UO0ipMOl9cCbp67M9OTa4opciUQcmGZyv/hAKJkxfTo1SdVvXP0F3jWRT13PMV1Oj1oWy8e4Mgtn9VyWsdDDXipvv/mipSyt7sdW0Vd38eTUT7sVHYAG9CtaEg56IdhjI4F9kVCJuw1m34XopsDMfEOttQs9KB7d3E1RuEEqKUOXL+SNYxBG1JeElYySo8iJ+VPolWAgQ0U7gXePZ6huDYWhLjQC9BFPWrB2lCubHw6x506XeQtOqOJEyCcckbtLVc+EjPl9WRPZLaFMdoXM/zUkhxnfUewsBy+WcG8V/IJsmZ7M/3B/WdzoQjDWvYL+kk/NlbQ139+YaaP18PB19+JtXlMZNyLBF/3iIbibS5JXlup4jYWHVTU9zXPiFo7IoDwr1hL0UInmGJFwhgCqXFkdw58tWhEqrGhouXDfkcbp4TamP3/zMhbyAfcnCYvOzKAYza8AUnuFf1C98N73bRoQr6bWrDV3o/60Y+n145sS6r/x+b2Bqcl3VoC0lbkGNUAYC5A/S urU5DKUL nFvjGe6Bay1jbMKaK1AbxyQbr/M0RCNu2ZB2lROYxtydqnmZWVtStFf/4rm6YeMc6q9IUit9c8YU+IX4HHips+FAydmS70kMLl+yqoq5ufrtQb5jBC/8ORTUUVnYKV47AT51kPQGCV8BRbZXmP0fsFUPyfe2jQZMXZtIN+DV93XukBNgUk34gZMQm7p9Gb+IaDYLZTCPMYrr1f9kb6sgRlX1juYqDNqzBDrMmeKr/fxwTdNIuBVUfl7AtfCpV33Y7eX/FmnmR26E0eDdTD+MNtPRFjXbJ5sykRjrX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 2, 2024 at 10:42=E2=80=AFAM Kairui Song wrot= e: > > From: Kairui Song > > commit e9e58a4ec3b1 ("memcg: avoid use cmpxchg in swap cgroup maintainanc= e") > replaced the cmpxchg/xchg with a global irq spinlock because some archs > doesn't support 2 bytes cmpxchg/xchg. Clearly this won't scale well. > > And as commented in swap_cgroup.c, this lock is not needed for map > synchronization. > > Emulation of 2 bytes cmpxchg/xchg with atomic isn't hard, so implement > it to get rid of this lock. > > Testing using 64G brd and build with build kernel with make -j96 in 1.5G > memory cgroup using 4k folios showed below improvement (10 test run): > > Before this series: > Sys time: 10730.08 (stdev 49.030728) > Real time: 171.03 (stdev 0.850355) > > After this commit: > Sys time: 9612.24 (stdev 66.310789), -10.42% > Real time: 159.78 (stdev 0.577193), -6.57% > > With 64k folios and 2G memcg: > Before this series: > Sys time: 7626.77 (stdev 43.545517) > Real time: 136.22 (stdev 1.265544) > > After this commit: > Sys time: 6936.03 (stdev 39.996280), -9.06% > Real time: 129.65 (stdev 0.880039), -4.82% > > Sequential swapout of 8G 4k zero folios (24 test run): > Before this series: > 5461409.12 us (stdev 183957.827084) > > After this commit: > 5420447.26 us (stdev 196419.240317) > > Sequential swapin of 8G 4k zero folios (24 test run): > Before this series: > 19736958.916667 us (stdev 189027.246676) > > After this commit: > 19662182.629630 us (stdev 172717.640614) > > Performance is better or at least not worse for all tests above. > > Signed-off-by: Kairui Song > --- > mm/swap_cgroup.c | 56 +++++++++++++++++++++++++++++++++++------------- > 1 file changed, 41 insertions(+), 15 deletions(-) > > diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c > index a76afdc3666a..028f5e6be3f0 100644 > --- a/mm/swap_cgroup.c > +++ b/mm/swap_cgroup.c > @@ -5,6 +5,15 @@ > > #include /* depends on mm.h include */ > > +#define ID_PER_UNIT (sizeof(atomic_t) / sizeof(unsigned short)) > +struct swap_cgroup_unit { > + union { > + int raw; > + atomic_t val; > + unsigned short __id[ID_PER_UNIT]; > + }; > +}; This doubles the size of the per-entry data, right? Why do we need this? I thought cmpxchg() supports multiple sizes and will already do the emulation for us. > + > static DEFINE_MUTEX(swap_cgroup_mutex); > > struct swap_cgroup { > @@ -12,8 +21,10 @@ struct swap_cgroup { > }; > > struct swap_cgroup_ctrl { > - unsigned short *map; > - spinlock_t lock; > + union { > + struct swap_cgroup_unit *units; > + unsigned short *map; > + }; > }; > > static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; > @@ -31,6 +42,24 @@ static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SW= APFILES]; > * > * TODO: we can push these buffers out to HIGHMEM. > */ > +static unsigned short __swap_cgroup_xchg(void *map, > + pgoff_t offset, > + unsigned int new_id) > +{ > + unsigned int old_id; > + struct swap_cgroup_unit *units =3D map; > + struct swap_cgroup_unit *unit =3D &units[offset / ID_PER_UNIT]; > + struct swap_cgroup_unit new, old =3D { .raw =3D atomic_read(&unit= ->val) }; > + > + do { > + new.raw =3D old.raw; > + old_id =3D old.__id[offset % ID_PER_UNIT]; > + new.__id[offset % ID_PER_UNIT] =3D new_id; > + } while (!atomic_try_cmpxchg(&unit->val, &old.raw, new.raw)); > + > + return old_id; > +} > + > /** > * swap_cgroup_record - record mem_cgroup for a set of swap entries > * @ent: the first swap entry to be recorded into > @@ -44,22 +73,19 @@ unsigned short swap_cgroup_record(swp_entry_t ent, un= signed short id, > unsigned int nr_ents) > { > struct swap_cgroup_ctrl *ctrl; > - unsigned short *map; > - unsigned short old; > - unsigned long flags; > pgoff_t offset =3D swp_offset(ent); > pgoff_t end =3D offset + nr_ents; > + unsigned short old, iter; > + unsigned short *map; > > ctrl =3D &swap_cgroup_ctrl[swp_type(ent)]; > map =3D ctrl->map; > > - spin_lock_irqsave(&ctrl->lock, flags); > - old =3D map[offset]; > + old =3D READ_ONCE(map[offset]); > do { > - VM_BUG_ON(map[offset] !=3D old); > - map[offset] =3D id; > + iter =3D __swap_cgroup_xchg(map, offset, id); > + VM_BUG_ON(iter !=3D old); > } while (++offset !=3D end); > - spin_unlock_irqrestore(&ctrl->lock, flags); > > return old; > } > @@ -85,20 +111,20 @@ unsigned short lookup_swap_cgroup_id(swp_entry_t ent= ) > > int swap_cgroup_swapon(int type, unsigned long max_pages) > { > - void *map; > + struct swap_cgroup_unit *units; > struct swap_cgroup_ctrl *ctrl; > > if (mem_cgroup_disabled()) > return 0; > > - map =3D vzalloc(max_pages * sizeof(unsigned short)); > - if (!map) > + units =3D vzalloc(DIV_ROUND_UP(max_pages, ID_PER_UNIT) * > + sizeof(struct swap_cgroup_unit)); > + if (!units) > goto nomem; > > ctrl =3D &swap_cgroup_ctrl[type]; > mutex_lock(&swap_cgroup_mutex); > - ctrl->map =3D map; > - spin_lock_init(&ctrl->lock); > + ctrl->units =3D units; > mutex_unlock(&swap_cgroup_mutex); > > return 0; > -- > 2.47.0 >