From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC8F7C369A4 for ; Tue, 8 Apr 2025 15:45:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0372A6B0089; Tue, 8 Apr 2025 11:45:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F28EE6B008A; Tue, 8 Apr 2025 11:45:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DCB2F6B008C; Tue, 8 Apr 2025 11:45:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BA9536B0089 for ; Tue, 8 Apr 2025 11:45:54 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 71C411410C7 for ; Tue, 8 Apr 2025 15:45:55 +0000 (UTC) X-FDA: 83311302270.22.5ABDC67 Received: from mail-qv1-f43.google.com (mail-qv1-f43.google.com [209.85.219.43]) by imf25.hostedemail.com (Postfix) with ESMTP id 460DCA000E for ; Tue, 8 Apr 2025 15:45:53 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=o4GzoSLb; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.43 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744127153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z4ws45xegNtEo3hyX9VLFGRAMSTn7tItQyk+TYKn5LE=; b=lCt3GPwxPbR+9wxXNZF9taPySxP1qAN7QV4rn2USIimdZETfBU6r0AbzEAHU7NtyNbvq62 ezxTh4SEDT2y/VfRp7MSNaS3Xd+BQSVRz+TFXQSezLN+dZo8Ob3rJq/dBvZqEE5Ph+I04A sn8Jyd29yOQv2eCPgq6sfcexEndzFg4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744127153; a=rsa-sha256; cv=none; b=EszrCZRho43R1rfbgV1k2FQZQWpJwPkqdCfUJHsxREZen6KsJv9L5+A9dp4qErCunTJJXV pLn4M2zDQObBvZLicoBnf06DQnE6hjm+fb3OXA677AX5rnHl074WRj0mz/z1kzhVdmqyJn 93SjsYze5Wkvxzo+U88D7zDPXlnefbA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=o4GzoSLb; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.43 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org Received: by mail-qv1-f43.google.com with SMTP id 6a1803df08f44-6e8f94c2698so30086526d6.0 for ; Tue, 08 Apr 2025 08:45:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1744127152; x=1744731952; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Z4ws45xegNtEo3hyX9VLFGRAMSTn7tItQyk+TYKn5LE=; b=o4GzoSLbDjUTjd0elefCYqBW5dxFrr4+Yo1NxUDxki53m1YsJfu1ZjPSz9lzHMubL7 yxLKQth2ESUwfGy8qA/kaxrN6QxYYzts3dTFNAfCPR33AIv04qLtgTP0zfSb+wQHCY+Z n3UDrSvjpK5CXnWB4fp+NFqoabJnHfddNPvNe7+zW7oDYy8xnuXh3U5ZRNaDNVM8K3pw CytjWvdprJ1h2k94Lzu0q799eNXu67Lo8+NY3DtBVOTWNPGyVgKdi/EdpGof4ZfP0qYw s5Kff/+1A+C57pyLvaTBrDmv13SZJmzC9JQjx5gUc5xghv2fji0bQEDns/u+7xwzijp8 59eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744127152; x=1744731952; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Z4ws45xegNtEo3hyX9VLFGRAMSTn7tItQyk+TYKn5LE=; b=Wk11SbE40qrFtSJ9OpdeiqAIdCl3hvkymjm4Cm0EXKZlfAph3EdBh07nunXxw1aOHN XI1eZAaMHckKGLJn5qSORvzUiBsxd1zQ798lvfHSYNIyuYmvqcNK8gmbpWh8iSkDgGBa +B5Xg9Fa4RUzVqQM9mMXcqIhojNIvBIOjw9IhSz7Wb3+GDxv7k+ks7RrCMHKlEZF/g4M i6KHOGk3L2347WYVJdLUYog3H82XJONED6rtqpV9fFWZ1h+pY7G7d5SCz6CRac3pl3Mn /0Wdk7N1Y6pgODjDQr36nM8Sv/tGdkfbljRYtw8K+cRuw2bekgeLFJrVAplrjL2C47B5 1Xyg== X-Forwarded-Encrypted: i=1; AJvYcCXoc8vu2Xo4j5Jr0rHJEDTdx3cyIKLs8ga3cMLyWTHQfY49y7vreVring7Dt/g8QrAn9Vc4QS4SKQ==@kvack.org X-Gm-Message-State: AOJu0YwrNWpgSyjOcvbt82nncduLl2GVfLi1nF5N6FxX6dmdnOfdIjK2 G5uCotl7zMeNO7pG6TnvB9s00rHtrNAI7fjOnOeCWzQNIxKd8DX+gD2kPvwgw4g= X-Gm-Gg: ASbGncthl2Oj+wOnYw9nSAN6+/C683fFpAMY8X9u99AB41huVV22JtRB307juAO0qNM w5YYgeChSKMEcwFNnUvp6pMOlXPpaqf4ElvTzfKxHC4kY7igsni65cewjx/HjniPc3xXOqOuGlJ LVOuvIIJqwXLiIUFr8ie6fSVg8UChasWHjyeJ0e019IWSs387dO0aN1nLfM21ffbYdJjDHDIvXe IC8Kin7xoZmJYvLeF9wUc++0gxaXH1fRPHbF9ffIStgNCqTUkskozplsW0yombYBWLGjU3GkneE 1WTw1dKCegETQ8kAcxUsPrhy97GBY/hTm61b/L8d8/g= X-Google-Smtp-Source: AGHT+IFB93sjhMlD8Tc/Abwf4lsvbVA+UK3BQyylz33ogjZ0LPpUmiNISCzwhQcTDhyWWyzZaM2+Eg== X-Received: by 2002:a05:6214:2588:b0:6e6:5d61:4f01 with SMTP id 6a1803df08f44-6f0584a4650mr233144056d6.8.1744127152257; Tue, 08 Apr 2025 08:45:52 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:365a:60ff:fe62:ff29]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6ef0efc0a6csm75867956d6.14.2025.04.08.08.45.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Apr 2025 08:45:51 -0700 (PDT) Date: Tue, 8 Apr 2025 11:45:47 -0400 From: Johannes Weiner To: Usama Arif Cc: Nhat Pham , linux-mm@kvack.org, akpm@linux-foundation.org, hughd@google.com, yosry.ahmed@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, len.brown@intel.com, chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org, huang.ying.caritas@gmail.com, ryan.roberts@arm.com, viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de, lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu, pavel@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-pm@vger.kernel.org Subject: Re: [RFC PATCH 00/14] Virtual Swap Space Message-ID: <20250408154547.GC816@cmpxchg.org> References: <20250407234223.1059191-1-nphamcs@gmail.com> <983965b6-2262-4f72-a672-39085dcdaa3c@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <983965b6-2262-4f72-a672-39085dcdaa3c@gmail.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 460DCA000E X-Stat-Signature: jocqw48i135yjds7xq9s9aro6ud8oa4z X-Rspam-User: X-HE-Tag: 1744127153-373489 X-HE-Meta: U2FsdGVkX18SoMNk1FnlHn6jxQPLqcUMsFLH5KICFHjHM10F8yP9ISfEQpMQ73c4Zq5r29xW8Q9LTPiCQsbaF7noZ6IcUCnz2u8iSEpIeUy4ZD7BHPjV9F0ZuX9eIsZybFSwQw0ShQ570u8AXInC/1XtG3mymvHBYjQXWto88kCYW+wILsU74Ht/d/o/UBp8+62M0lARy1kheB4icOKjm6LmAQOZE/NVn6nM+/7axKkIE7mSJHk4UqaWeUXmNGtKFlxMrZh+/hzQLjqPOQUscwlWsIhXbctwX1lDgaRfhP/vpCGViJvFeaYkc381y44P8urWAQrVNRdUnvJwjUi5ji6RXOCe+TW73PVgsY42rPdSOdY5I/tietg89o7U3I6LE1TquK2yBWrLjQWrbr1dxSxcwJH+1yWvM0mfoSaGlA9N7Spl/M/WgZBHbB9zhYa3pR8x+p6F/WTy7zI+WzntHQ+Bm+IgTVLBiu5nUc4NKxpIzoKlwDnvjeNkJvJCoBU6ekN4fGRpdOI6Duvh348mRyNDEcX9AKmPlTtYuh2xp1OemFFh68eVmwsB1swgY927L5/2CsUUoEr3SzRQP/enaTNLT6GhpJ6xxiSJcjpuMDwMUwQIQIOR46jxCl2xhREm7R8R89DC3ElRz1YsFMDE+oh5dDWJcauwrNqnLIInFsAqP0ONfWhx/MZ6RpwQx3vAKnYxwtWUXgtBLXKFlC6pONKw7jnJN/aQbGdG6nmSKkoOvfjI25N6x2v0psTvxkclmKh1BWseNJnl3DMof8dLSBZkhY7M96gZ3F8W9v8ZcNne5kiqdqqJTB7kDFHLTHgJ0kgPEw7ZgDY++l7zLlcwxXa8RKdZzFR1oaZasqW5mL10JSNn0EdpGRa7AdRZh/DKpQBNRuxeY4dEcjlRjke65383hjn4ZIL1Ag8/WJ1Tvn65l7Y4KU9fV9WwSglHWDsKI20gcmkn1a1qYR0cgAx x8psp4GV 70VZeZKXVhxTK6NhWTK1G7AEnjwTH803xXB9rmBSyaJPXxnyxBVjHyy/FDI9rLn+MDwjcjOle6phDyQmUro4vn2dZZy1GgdYya/mkR2/JMK7eie1L7YfDTDYILfhZMe+mG4KoYl0jb1ugBP1rzrS0nsAl2Y/xuaTOsPKjcc1m5tFEGsrQICEDc893Ksk963xV4RWI0Z/2CLN8e90Z/50k5RRXNa1MD1/WEhc+44QgF4dTqS/v/K++IKpjqsK8zEBAacJoEHs1mczEfW0CgFogpptPCOz0QHlJTB9+/Q46VEEIgQ54jqUkcr6nyxz8WMb6+7iePRf4+G8xdqdlpe7Fou4yGv++xEMZBrapf++GsULnG+c4MNwbeZf4JSe/CD0K/GhnHP7YYEhlMKbUIMIH+AqHrqXOBvvqfgUZwInNR4qw4du2c8TbWKs9Fw9jxZMr+E/8QXcnnYL6OxRJo2oZnv/AQp1tNHNmJ3belp+H6CRMhn2C8RVqEoU/4SA8kto36uESScM0vfxsNILazBHOwL7nyQkLPyBDrUd9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 08, 2025 at 02:04:06PM +0100, Usama Arif wrote: > > > On 08/04/2025 00:42, Nhat Pham wrote: > > > > V. Benchmarking > > > > As a proof of concept, I run the prototype through some simple > > benchmarks: > > > > 1. usemem: 16 threads, 2G each, memory.max = 16G > > > > I benchmarked the following usemem commands: > > > > time usemem --init-time -w -O -s 10 -n 16 2g > > > > Baseline: > > real: 33.96s > > user: 25.31s > > sys: 341.09s > > average throughput: 111295.45 KB/s > > average free time: 2079258.68 usecs > > > > New Design: > > real: 35.87s > > user: 25.15s > > sys: 373.01s > > average throughput: 106965.46 KB/s > > average free time: 3192465.62 usecs > > > > To root cause this regression, I ran perf on the usemem program, as > > well as on the following stress-ng program: > > > > perf record -ag -e cycles -G perf_cg -- ./stress-ng/stress-ng --pageswap $(nproc) --pageswap-ops 100000 > > > > and observed the (predicted) increase in lock contention on swap cache > > accesses. This regression is alleviated if I put together the > > following hack: limit the virtual swap space to a sufficient size for > > the benchmark, range partition the swap-related data structures (swap > > cache, zswap tree, etc.) based on the limit, and distribute the > > allocation of virtual swap slotss among these partitions (on a per-CPU > > basis): > > > > real: 34.94s > > user: 25.28s > > sys: 360.25s > > average throughput: 108181.15 KB/s > > average free time: 2680890.24 usecs > > > > As mentioned above, I will implement proper dynamic swap range > > partitioning in a follow up work. > > > > 2. Kernel building: zswap enabled, 52 workers (one per processor), > > memory.max = 3G. > > > > Baseline: > > real: 183.55s > > user: 5119.01s > > sys: 655.16s > > > > New Design: > > real: mean: 184.5s > > user: mean: 5117.4s > > sys: mean: 695.23s > > > > New Design (Static Partition) > > real: 183.95s > > user: 5119.29s > > sys: 664.24s > > > > Hi Nhat, > > Thanks for the patches! I have glanced over a couple of them, but this was the main question that came to my mind. > > Just wanted to check if you had a look at the memory regression during these benchmarks? > > Also what is sizeof(swp_desc)? Maybe we can calculate the memory overhead as sizeof(swp_desc) * swap size/PAGE_SIZE? > > For a 64G swap that is filled with private anon pages, the overhead in MB might be (sizeof(swp_desc) in bytes * 16M) - 16M (zerobitmap) - 16M*8 (swap map)? > > This looks like a sizeable memory regression? One thing to keep in mind is that the swap descriptor is currently blatantly explicit, and many conversions and optimizations have not been done yet. There are some tradeoffs made here regarding code reviewability, but I agree it makes it hard to see what this would look like fully realized. I think what's really missing is an analysis of what the goal is and what the overhead will be then. The swapin path currently consults the swapcache, then the zeromap, then zswap, and finally the backend. The external swap_cgroup array is consulted to determine who to charge for the new page. With vswap, the descriptor is looked up and resolves to a type, location, cgroup ownership, a refcount. This means it replaces the swapcache, the zeromap, the cgroup map, and largely the swap_map. Nhat was not quite sure yet if the swap_map can be a single bit per entry or two bits to represent bad slots. In any case, it's a large reduction in static swap space overhead, and eliminates the tricky swap count continuation code.