From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F126EC369A2 for ; Tue, 8 Apr 2025 15:21:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A26E280004; Tue, 8 Apr 2025 11:21:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92872280001; Tue, 8 Apr 2025 11:21:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C8D8280004; Tue, 8 Apr 2025 11:21:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5A189280001 for ; Tue, 8 Apr 2025 11:21:02 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D7898140F7C for ; Tue, 8 Apr 2025 15:21:03 +0000 (UTC) X-FDA: 83311239606.22.B5ED4DB Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) by imf05.hostedemail.com (Postfix) with ESMTP id 2EC66100013 for ; Tue, 8 Apr 2025 15:21:02 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="L/FTUTjK"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744125662; a=rsa-sha256; cv=none; b=v7fyrpbYiRl9lyTJemG/N/94NW09AVl0l4wsjkpJ5dWQ++CwQeoTUACoZp2TYD/p4tMHJp 8sN8zyX7SlFcS/GdmG0rFW4oOoRCBzFO7B44Ad6fo78wgE6QdO2gByE4KLJbj1J3f607hf PuZRui4UZG3ocXig+v9iA9cch9yyfEw= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="L/FTUTjK"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744125662; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ey6nWCXg+L3TPWpZ7z3kTJsMGtn+HxgT/Jr3SmGd3uk=; b=qogJ8C34WtoQZUPwmGSyxUkLq0BM8djUv0bgvN4u2P7/UQ08gbyJTc9ABTfp3NRsIv0yxS nYtfDKd9VY+nfsT/MX2CHgCe1N96FVC7ECW+iTh7UE12qX6MWagtcqXM5dgSzh5Da1roRx Hbc1wXQ8rvYAspFX9QR4BEA0nt3hSeE= Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6eeb7589db4so59915856d6.1 for ; Tue, 08 Apr 2025 08:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744125661; x=1744730461; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ey6nWCXg+L3TPWpZ7z3kTJsMGtn+HxgT/Jr3SmGd3uk=; b=L/FTUTjKmS2n7ADA845rDS9Y3tsIyJsk9Ox5F52Uo60cmhxqaAVb5LGlaqOLOhmoh5 PdxP8esnkht/xQr71Dl2cC23wkDoqlQA07Tmdkmq7iUEnHLBb34G7WS5ZJoKkA4feeks oP+FlBd2WhlUjNQ9/MlxE1juW0iFC3wywQGmB1r5ewdnYr1cm2raTPkSzKO1MlABBQyI L9mnfRYrR+OgqRbMFY6QJpGYQa7acuQNAq2GHuiDjH7ctL5e/MgodoGk/1PyltLm498D w+KaUJC7+HFbffgpOHgMfNQwB22kz5mZe4pnEE3FJZeUBoXtw5qQNAexlW9R/+T2K4pd FNhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744125661; x=1744730461; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ey6nWCXg+L3TPWpZ7z3kTJsMGtn+HxgT/Jr3SmGd3uk=; b=AIlPqGq8oqEFc+X/kHkZK7kIYcXiKtjfAYOy7NgNQz0Uib2JXnykmzMHfokWMcA3r9 Vne7ygdB/j5PAfuS+9rLINPb+mXJWDvnXbMb3Pnzgnkz3ak8pk6EEFsmOPMHwWkbw3D1 KHmDDBd2kU3p3RVQtAhvoxjcF82OmNym2jtNb2r9+o3AQWnlxTbSuRWY27+omp4R00Kr PFddJj68YJFpkJxZaHjLllF5VgTOvUmZQntFEhkXcsfrV3tS/RXPOXmg4/rLjBLpxxnV cELC3kMwmTQ2hn8acuDq8rktsWlBoJEvYsh7okWkibyrwMSWoHux7Ku0lp09N7yv2sjH gt6w== X-Gm-Message-State: AOJu0YyHk0UOhTj+38j3LPnhegspNsL1jyxFJFUsmSVMxKGwye6t3HpM rvakZQcuTeELz0DPJoAwXLo2iJAluS+ST2fpRE9hKZJY/tOuzMdJ6B/vplK4dg+hAP7jUgqtWqC +NAZDgySsJhkV2pjP7aKYSL/Eb4k= X-Gm-Gg: ASbGncvgMeK2OJREm961E5E/3gHL1PPMJh+AnGgNqbk1LYufBBv4sCdtpbyBDYRd+go fferbQVgDXYHp2wWJP4U27+NS6TSjZFuMqXATfkW2r6mJoMPLb0Ev9PSpG2izihu7I25KER2yQ7 Sbxov/+qXKMdb1N46UFpFI3NrEjBtFM5m9CH2TdCrCYJjMgrUlsM3Ipz5sfQ== X-Google-Smtp-Source: AGHT+IGTRrDxwbNLzuvOqywnxn09Z9MIBiIW8jL7PYmbgeYEi936Vh8zP1vHoGOkooIJWXgAh2kitK9H48FMp+nw93M= X-Received: by 2002:a05:6214:4109:b0:6e8:f0fc:d6c4 with SMTP id 6a1803df08f44-6f05830f669mr182684556d6.6.1744125660685; Tue, 08 Apr 2025 08:21:00 -0700 (PDT) MIME-Version: 1.0 References: <20250407234223.1059191-1-nphamcs@gmail.com> <983965b6-2262-4f72-a672-39085dcdaa3c@gmail.com> In-Reply-To: <983965b6-2262-4f72-a672-39085dcdaa3c@gmail.com> From: Nhat Pham Date: Tue, 8 Apr 2025 08:20:49 -0700 X-Gm-Features: ATxdqUG2C20-NmbiR2-sPu0qDu-T-1hXdwmt4JH5TA7kOt76dOuAJy2oi4SCmNo Message-ID: Subject: Re: [RFC PATCH 00/14] Virtual Swap Space To: Usama Arif Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com, yosry.ahmed@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, len.brown@intel.com, chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org, huang.ying.caritas@gmail.com, ryan.roberts@arm.com, viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de, lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu, pavel@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-pm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2EC66100013 X-Stat-Signature: fqm5pdrsw6hxin43r9f6okwro7tg188g X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1744125662-23807 X-HE-Meta: U2FsdGVkX19GyBklVte/Ieko0mrN+LWRXdouRn8N8mpkTEbzJWyUsIsz6+oMs0Ll0j72aayr+8ArQb82lxe+NZ6Fq1VjzcVMKjC/E/4LtRBMci31PnCuJuOdZP6L6dG3airpbjQ2vBEzmdFU0uW4HIln7lpWWAY8JFgnvvKjvT8div/GLwN5Aa2PkuF67GEru5yyUtzqo7juYGCx9xV4UJmUpHqr6RSWaY5wCeWh5qc6Pj15voaB9IM6sAHIbEB32ZsKcwNo5rKBJmhwlHXcDD50A3sap2982RPPevawPWRRb+T2OdVQfA7jSBT5cdc7/0SdRlG4AWhlcggOY1g7EHTRJV8HffJpMXLvBF8TNG59qFZRplvIGsGPPu46R/54JkkfuFvpAJFU++UozCoQ2T95aPhY3ptaWZ+2gH1uFWVy9kA0qk4BwPygqf4d/aWXty0ZKGPmH/owhi6pU2emhRQ16x29Z9F9tXketGuxd6JHOBJWxyqkIaJsN0jNDmEhg8apiWCeg1rGQeS3CNsoRgxpGbiL/p8dYM87CCYWqFilN9PokF6qTCzRMzOq8krpoDUZOVuuYzfnfyJhTWEkCLwgYcD0gbDUPBURO6XRSpy3QeRakGPjDQRJoeGRjxlCcoF1Ql7n1xKm1n5o2YEJfFjUGraHp1hnP2n+oySyY2IXqM3vGGoCP0l+YDZKq9EI2l9rnk4FLvrxGKfue+kllSai1X8RZd8nZhN22dsWsO//a5sWMxuthr2c1VwbWKF65K5h/4tW5SnTHCRkhhNmCoTs10C2NDrhJP7ylvX38OSaUJgjg/JRswJvD8X3jgchCeek7bmXfIzGkJ67jK3R68BRZy9fPRQfLjnyfD9TQeCtq7VmxtJHWxKx1jEcm9dOuGqZzkvLv2vVgTNPViDAcQ6lYaKcqSvHpdcsFSkXRGpCUCOY3FxtnTc3qijQ+CjvHjfm0VzyuN0D/vFh3+h FyhSD11H 3GE1F79DaQqmZjBraIfRrPF4woC1oGQCa3yjpuTAJmQ3De8Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 8, 2025 at 6:04=E2=80=AFAM Usama Arif = wrote: > > > > On 08/04/2025 00:42, Nhat Pham wrote: > > > > V. Benchmarking > > > > As a proof of concept, I run the prototype through some simple > > benchmarks: > > > > 1. usemem: 16 threads, 2G each, memory.max =3D 16G > > > > I benchmarked the following usemem commands: > > > > time usemem --init-time -w -O -s 10 -n 16 2g > > > > Baseline: > > real: 33.96s > > user: 25.31s > > sys: 341.09s > > average throughput: 111295.45 KB/s > > average free time: 2079258.68 usecs > > > > New Design: > > real: 35.87s > > user: 25.15s > > sys: 373.01s > > average throughput: 106965.46 KB/s > > average free time: 3192465.62 usecs > > > > To root cause this regression, I ran perf on the usemem program, as > > well as on the following stress-ng program: > > > > perf record -ag -e cycles -G perf_cg -- ./stress-ng/stress-ng --pagesw= ap $(nproc) --pageswap-ops 100000 > > > > and observed the (predicted) increase in lock contention on swap cache > > accesses. This regression is alleviated if I put together the > > following hack: limit the virtual swap space to a sufficient size for > > the benchmark, range partition the swap-related data structures (swap > > cache, zswap tree, etc.) based on the limit, and distribute the > > allocation of virtual swap slotss among these partitions (on a per-CPU > > basis): > > > > real: 34.94s > > user: 25.28s > > sys: 360.25s > > average throughput: 108181.15 KB/s > > average free time: 2680890.24 usecs > > > > As mentioned above, I will implement proper dynamic swap range > > partitioning in a follow up work. > > > > 2. Kernel building: zswap enabled, 52 workers (one per processor), > > memory.max =3D 3G. > > > > Baseline: > > real: 183.55s > > user: 5119.01s > > sys: 655.16s > > > > New Design: > > real: mean: 184.5s > > user: mean: 5117.4s > > sys: mean: 695.23s > > > > New Design (Static Partition) > > real: 183.95s > > user: 5119.29s > > sys: 664.24s > > > > Hi Nhat, > > Thanks for the patches! I have glanced over a couple of them, but this wa= s the main question that came to my mind. > > Just wanted to check if you had a look at the memory regression during th= ese benchmarks? > > Also what is sizeof(swp_desc)? Maybe we can calculate the memory overhead= as sizeof(swp_desc) * swap size/PAGE_SIZE? Yeah, it's pretty big right now (120 bytes). I haven't done any space optimization yet - I basically listed out all the required information, and add one field for each of them. A couple of optimizations I have in mind: 1. Merged swap_count and in_swapcache (suggested by Yosry). 2. Unionize the rcu field with other fields, because rcu head is only needed for the free paths (suggested by Shakeel for a different context, but should be applicable here). Or maybe just remove it and free the swap descriptors in-context. 3. The type field is really only 2 bits - might be able to squeeze it in one of the other fields as well. 4. The lock field might not be needed. I think the in_swapcache bit is already used as a form of "backing storage pinning" mechanism, which should allow pinners exclusive rights to the backing state. etc. etc. The code will get uglier though, so I wanna at least send out one version with everything separate for clarity sake, before optimizing them away :) > > For a 64G swap that is filled with private anon pages, the overhead in MB= might be (sizeof(swp_desc) in bytes * 16M) - 16M (zerobitmap) - 16M*8 (swa= p map)? That is true. I will note, however, that in the past the overhead was static (i.e it is incurred no matter how much swap space you are using). In fact, you have to often overprovision for swap, so the overhead goes beyond what you will (ever) need. Now the overhead is (mostly) dynamic - only incurred on demand, and reduced when you don't need it. > > This looks like a sizeable memory regression? > > Thanks, > Usama >