From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC86CCF9C6B for ; Tue, 24 Sep 2024 21:38:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BC516B00A6; Tue, 24 Sep 2024 17:38:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76DB86B00A7; Tue, 24 Sep 2024 17:38:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60D686B00AD; Tue, 24 Sep 2024 17:38:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3C2706B00A6 for ; Tue, 24 Sep 2024 17:38:42 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B5F97A0508 for ; Tue, 24 Sep 2024 21:38:41 +0000 (UTC) X-FDA: 82600946442.18.0175E5C Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf08.hostedemail.com (Postfix) with ESMTP id D9EEC16000F for ; Tue, 24 Sep 2024 21:38:39 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rE94QcHF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727213884; a=rsa-sha256; cv=none; b=qdS+c1nYZurLZz+wFM4vAowceEOi4W4qsG5q4GWHBhody4Yk7D6dzdc0qcn/Qp45XXcNXf hNBGiydoLaAPpWd3RFECT3y2xRCEbwFwc8rqK0cgINizynT0S6I2RtryclGM8M/jbLk2wV mRPa38NrHRcTqaphLXI10m5DQ+u/Vh0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rE94QcHF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727213884; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0LU6c6CF0GzjYRCInPnHkkX8mDRzVLIkYu1GL6EK21U=; b=Z+40uCBLtmgtoLtuoc+BkzizmuPOsE6Z4sIPYHinm/Q01bMqawa8aqphvWC31YOU8sRk07 zlFm+z32dK3tzsEY2L/qjemnI3Wj0q2sEtE2l/xja3MovO/+J8vvnQ/cuwszLt2LtJf0GW VE2pRtJ/ICS09QH9+N1pLYBb2fvQbMQ= Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-5c42384c517so7343867a12.3 for ; Tue, 24 Sep 2024 14:38:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727213918; x=1727818718; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0LU6c6CF0GzjYRCInPnHkkX8mDRzVLIkYu1GL6EK21U=; b=rE94QcHFxxbIK0ZDl5AqLxDzdA4BcLUPiOZNtBO9/+jJ4+AGZ4lD7yFf3FNVAEar7l lOPdmSWlyqGB2D8OxSadUEP9kQF6G+v1UbSCq7d2+OeP1AcUEWQ1hp+g6KluWcJVRsYa 7C63Cs20Fd/IeRd80DiKmCYd/gmIbaf/ZKHozJZLSEB5MIXmP5JeG5FZkHCbGU+VqL2X Cqkpm+T7tIN6hM6fnF0obizHZdwdGxiJ+qOfzHv2qgVvnKvZxmmNaA46Dj9l+wl9cUdi EI8+i51P7MyhV2PirwIVWIyloGavGJH94A660nAyG+bZfpDN3XaNRzXF2xWowL9tXUYA CO/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727213918; x=1727818718; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0LU6c6CF0GzjYRCInPnHkkX8mDRzVLIkYu1GL6EK21U=; b=bKPOAnld3tTkFHNsW43kzS4OqEtJQgC59wl4w/sRgH9Uklhz9vnG8g9Jaq9m8k9q9d zJknHMq7rb9boO9bXZmddhd6W1mUFWEuZ/5U7FTQSxNSFLb3UuXNj/J80puQ47+WIE8f 6vjjU7BlcAtKz+qC2j36AnZw4SY/hdWlQxCfYrHh+hOq+zNex/bZQgdZs5DCoUUBVRKW mrYCNqu+Kq4EJlRzOaPXByZIQA24UP3HVJL91DrchPo0igW+28hu1bdUAXuuegvf3f/R lrf/T9yL/l20KhmELIvx0BV8NZVoa+Ew/D5HAZlk5rA9Ho5h7mGWac4unsV3jx0MthEV l3bA== X-Forwarded-Encrypted: i=1; AJvYcCWJKrnPdEVh7YHyEhdEHnJxn5I8yuoM7YauaLlzX0SxWRq1Ki/ZqREjeektTQ3b5yZYzCJvkuABeg==@kvack.org X-Gm-Message-State: AOJu0Ywfd3FKtm4GLvAJRA44kjgaAYG2ZfQ8kBbYd9NC9G7VAfFeKf1W /PI617viNfPbevKlo8zhkM+2cTfq5gwCm3sIXp0QRi6/yotamwzw6g35GKp7VonVrw+rmsi1uxJ BVTI+TbwZC25Jv7a1F71hluxRP0rJaqUaHLGw X-Google-Smtp-Source: AGHT+IHtceRrJ0msQPdS0aETwSLXd+v7dkQ50Ac2Rrp7RKGIHW4YCyLCPHXApNCX5AzVjGPOPr4TojuoYGkwUVDgzO8= X-Received: by 2002:a17:907:1c1f:b0:a8a:926a:d02a with SMTP id a640c23a62f3a-a93a05e7ea0mr49373066b.49.1727213918103; Tue, 24 Sep 2024 14:38:38 -0700 (PDT) MIME-Version: 1.0 References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <20240924011709.7037-7-kanchana.p.sridhar@intel.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 24 Sep 2024 14:38:00 -0700 Message-ID: Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store(). To: Nhat Pham Cc: Kanchana P Sridhar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, chengming.zhou@linux.dev, usamaarif642@gmail.com, shakeel.butt@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, nanhai.zou@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: D9EEC16000F X-Rspamd-Server: rspam01 X-Stat-Signature: xa5sbeawyrcy6ytcbrx3y1adaa16h1r7 X-HE-Tag: 1727213919-57918 X-HE-Meta: U2FsdGVkX19OnQbzrwGE2eqf4Jq+h0a3OMfSTEgNM8fk7A2zairxGJQZhDCvVLZVnzjK9ORJ5iIN685AvyG/lSaDn5jF4TpwEUp44ZpPvznJZXzxYoDnAt3oOzq8KahhGuS90PV5knkfUhgTt+ZVfK2f7ydR4kJ0+P+/oy+LjJ9czbsecHCgIaCL7OJ5Hgf8w+EG6wcwWm6i8dfghjsUr/IN753rPa271gBNwDaguFwXMflMQbYB0nhAEZfoKRIWqi5mKz4hLmLDxzDwmiT2bpFGvef7lpSVFYmx1/Ly72FTN8hcGVSTaPDg7GVro2uDxTWzk96XsJUw0lZLgnXVItiVH0JAOz/27uHP6RvrcqQt2+DBhPd5sJWD8FguKPwH1OkNZfkGFpKsN49OI+BQ0VIC7ZEQXM/kOJFsoDTNCABWuxkDy4amkcj2ue+u9Jm/E380wfFrQ4ZitmYU29P6M5i/VP9Aq/HdZNdu1STWgQoHWHDFJSXoZxFV+e74ABHnkBMnwUOsEqBkZQjx6ecxb0eiD+Lu3Bnai2f2f0PcEnObvT9RnpUmDil3Fies67HR9foa1WGv82Ih3DTSbG2D/FRZMT4uTpym+4RQDvcWky3csKNu0X6VXqvN2urS9mMItGxOqSIR+D1P9DGQcDyEYdVHciflO+7a58N4cAWRoLX0R6t1nL/mhpnRfT0IXsOl07DMQ+c5rLsq0ylYImtBa5IG5QIoMg65owNrSSoJnTaYfGU0ws4Wqu4Ogb8tAUtsTGVk6qjhe1NOx7P1xy555EKH3HXSemgXRc1uQ3Mc8+DOYFWklhNLUy9mO2l8+NTtrwCmqjzJF0q6rUd/GiCAXbAJZNWYqSBXY23bXtmTKgkz6dG2LqZsaZ3TaNYRcdIGRBnbqHJhT/i2fS4NAohHtm/LV9ozUAXgN01QjIzggNwLcS0NruaAo9IEvtTsphjo9EY3rwH2CJy8uS/FUqi 4S5wEprf 0yFGdDFIn2R0NZw9tmIwfqplnyXs53x7iikEfNyNe1j4Hs2mswVfYY+1OkBmkxF9q6SMADczwqHfMp7s1WUs4SqD8vduea0YvW3tn0N+3UuuPkC9OsaTEM9pA5ElBi/IifN0dAo467NPayiv8eXODcUYCotJRtS87wCoZnmX6GZ4XmWxpJre63mZzj4eglN7pXi+IJBOWpNE+XmKelxG7OL5P8No8dmPXJY+X5p6sABxYXU7JoShQRNI+zMSWYqA1CMUoXAP845DyW7sRMkr4hNZJPkZ/kv+d4cDAt6vd2f/HXMDMJYNxAZJKHA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000014, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 24, 2024 at 1:51=E2=80=AFPM Nhat Pham wrote= : > > On Tue, Sep 24, 2024 at 12:39=E2=80=AFPM Yosry Ahmed wrote: > > > > On Mon, Sep 23, 2024 at 6:17=E2=80=AFPM Kanchana P Sridhar > > > + * The cgroup zswap limit check is done once at the beginning= of an > > > + * mTHP store, and not within zswap_store_page() for each pag= e > > > + * in the mTHP. We do however check the zswap pool limits at = the > > > + * start of zswap_store_page(). What this means is, the cgrou= p > > > + * could go over the limits by at most (HPAGE_PMD_NR - 1) pag= es. > > > + * However, the per-store-page zswap pool limits check should > > > + * hopefully trigger the cgroup aware and zswap LRU aware glo= bal > > > + * reclaim implemented in the shrinker. If this assumption ho= lds, > > > + * the cgroup exceeding the zswap limits could potentially be > > > + * resolved before the next zswap_store, and if it is not, th= e next > > > + * zswap_store would fail the cgroup zswap limit check at the= start. > > > + */ > > > > I do not really like this. Allowing going one page above the limit is > > one thing, but one THP above the limit seems too much. I also don't > > Hmm what if you have multiple concurrent zswap stores, from different > tasks but the same cgroup? If none of them has charged, they would all > get greenlit, and charge towards the cgroup... > > So technically the zswap limit checking is already best-effort only. > But now, instead of one page per violation, it's 512 pages per > violation :) Yeah good point about concurrent operations, we can go 512 pages above limit * number of concurrent swapouts. That can be a lot of memory. > > Yeah this can be bad. I think this is only safe if you only use > zswap.max as a binary knob (0 or max)... > > > like relying on the repeated limit checking in zswap_store_page(), if > > anything I think that should be batched too. > > > > Is it too unreasonable to maintain the average compression ratio and > > use that to estimate limit checking for both memcg and global limits? > > Johannes, Nhat, any thoughts on this? > > I remember asking about this, but past Nhat might have relented :) > > https://lore.kernel.org/linux-mm/CAKEwX=3DPfAMZ2qJtwKwJsVx3TZWxV5z2ZaU1Ep= k1UD=3DDBdMsjFA@mail.gmail.com/ > > We can do limit checking and charging after compression is done, but > that's a lot of code change (might not even be possible)... It will, > however, allow us to do charging + checking in one go (rather than > doing it 8, 16, or 512 times) > > Another thing we can do is to register a zswap writeback after the > zswap store attempts to clean up excess capacity. Not sure what will > happen if zswap writeback is disabled for the cgroup though :) > > If it's too hard, the average estimate could be a decent compromise, > until we figure something smarter. We can also do what we discussed before about double charging. The pages that are being reclaimed are already charged, so technically we don't need to charge them again. We can uncharge the difference between compressed and uncompressed sizes after compression and call it a day. This fixes the limit checking and the double charging in one go. I am a little bit nervous though about zswap uncharing the pages from under reclaim, there are likely further accesses of the page memcg after zswap. Maybe we can plumb the info back to reclaim or set a flag on the page to avoid uncharging it when it's freed.