From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 891DFCCF9EC for ; Wed, 25 Sep 2024 19:39:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F5D16B00B6; Wed, 25 Sep 2024 15:39:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A5F46B00B7; Wed, 25 Sep 2024 15:39:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06DE36B00B8; Wed, 25 Sep 2024 15:39:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DBE5D6B00B6 for ; Wed, 25 Sep 2024 15:39:44 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9AAEDA114D for ; Wed, 25 Sep 2024 19:39:44 +0000 (UTC) X-FDA: 82604275488.15.AC4FD3C Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf03.hostedemail.com (Postfix) with ESMTP id BDC7720008 for ; Wed, 25 Sep 2024 19:39:42 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rzSRJabh; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727293023; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BRf5FDL3l16TNNr46xYNgg62gse3t2K9YAHdsVBG9xo=; b=3h3LCwg/1ZN0zCnD6trpn8Ce2B3SMaUUH2iR0PQlNSeGeWOSpKTm6RD6Fek2DGlWU/KL1P nlhlSsYCjI7elYbUYD9POIdF8jTFpNShWOhdIYGerneC1lSibl4usH6cVCrWLFW0+vWkrg SOWM2gWb8ykdzlZuFcPVb5ubkuyoATQ= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rzSRJabh; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727293023; a=rsa-sha256; cv=none; b=X8LBLcT8/sCqWCbBnomuQ5PlmrAxYT4rzQfMdzmWcMe3K0nAftTRYvohfcHnKlT3/PqCpu 55oL7qCXC/oaFkEf6kMCB4HgeJKKuSo8vJblmjFbaIeXbgOAGPd/ye/CWy4mwsctCNQRxq d1fJ/7r9l/GoScCiksgQkeXYZnSE+H4= Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-5c5cf26b95aso130204a12.3 for ; Wed, 25 Sep 2024 12:39:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727293181; x=1727897981; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BRf5FDL3l16TNNr46xYNgg62gse3t2K9YAHdsVBG9xo=; b=rzSRJabh6mIsD/pcAhzywSGaeyma4AjAAkXGpBjXeiQc8IsoHGip7PZS6/wxqYVFSc 6zlOAhNhbNRLY/8uTwcKodBwmr92huZMWx0vk4NL5aOQ8rYj0zyRcsYjlSegHLAJWaMl PyL9rO7Hlv7l/Cjf83+bwFaF0jUKPlHL8jSISjbcaDMaD9+5aAexVK49AwJMklpjkP5f HEQ6p6Elb+Ub1GLy0KWdSu6rHKqygBoJJui+aUjZY30hx9JXeJZDaut1BQRn0k9K9z8B Z6r7tv6vv/sLOD00Xkt9lOy0xxd1LiAleN5bYvz32yZXYZU28O2dkQMwhkfD2zhMEevV V6PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727293181; x=1727897981; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BRf5FDL3l16TNNr46xYNgg62gse3t2K9YAHdsVBG9xo=; b=njJ/aebJVMdtA20FbCYpBmswPPJPeZO6UIOBByjR63ZMhJZCqcNWiqgw9Uv5oj+5+h NnCSJhiZYGOm/vzQYSR/lY8Mx4/5KnZ03N07Cz15L9AUItcHzgVf9/qw8TZbcgl/5jdo Xf3irEpXKWheufEiyn9CQuBVltMPY2beFulNUAJvMXJGYHPPoEFvJ/eFoR8MOOpHB8oi /beZmxB3wG95iiG1v1qIIlFUEVrk+XndATzxyaQamerfMIC/0VWI9ijt8ZhR03dsRzrO wYsBPkbNJGVN0p/F5ecXGs9BLrb1+1zMhe4JFHBCiH9dCvFmZRKItX+8jXivjz9SW6Lv yqCg== X-Forwarded-Encrypted: i=1; AJvYcCXIyvV+2f9uEFuk2M5bjOxGG6sKETkXARxHUcvNEMcq1h5aw5oAQZkZZQEsROH/SYdIJPUv/j4aLg==@kvack.org X-Gm-Message-State: AOJu0YwPPAVTDunMESpLhF20kFvYNas/YrhWvF+8YRRvVRK8qWZrbyCI kwTyUnkwZ7YgUEdyMy2LOyL6itr85JhdV+g46aFwN2XYcILpyo9JqzrfwnKiv9KO4u8BOkDEgzZ 0zRjdkAQSb0ITEPolyYn/IcuiaN9uWGwPO1JNQIhWhOBS+P/vlg== X-Google-Smtp-Source: AGHT+IGsm5WipEoFgkl2AHqllylYtlPz5oLoODHf4tgs9ZcjlV4TgkbXAbGUS9vIGnKjvEe+YdDhV3kbUTwueNKFDgc= X-Received: by 2002:a17:907:7f20:b0:a8a:cc5a:7f3c with SMTP id a640c23a62f3a-a93a06bd943mr380451066b.58.1727293180940; Wed, 25 Sep 2024 12:39:40 -0700 (PDT) MIME-Version: 1.0 References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <20240924011709.7037-7-kanchana.p.sridhar@intel.com> <20240925134008.GA875661@cmpxchg.org> <20240925192006.GB876370@cmpxchg.org> In-Reply-To: <20240925192006.GB876370@cmpxchg.org> From: Yosry Ahmed Date: Wed, 25 Sep 2024 12:39:02 -0700 Message-ID: Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store(). To: Johannes Weiner Cc: Kanchana P Sridhar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, shakeel.butt@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, nanhai.zou@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BDC7720008 X-Stat-Signature: xa9or4ph5s4ekwufzxsfqoi65uey5fwk X-Rspam-User: X-HE-Tag: 1727293182-602150 X-HE-Meta: U2FsdGVkX18jAIGqRVRtnw+GW3X2Lu2PL5IJHCf+pTZA/WS5NvnkBJeniqt0OaQMLnj3qMWlS23LA7PnFl4VuYsV0ikGf0sruoTMhY6DG1Uiz5gqMJ/RFJfaTfLHIipG2oPouU09mLqdchv6T4subOjIWyklks4XwuYwSaz0IdrDc0xDKPWb3iUn+OvlSW0PU+yAs6KVDupoKoQBnwjaiU4YxMnTz29nFVY6mqkSBTLQKvG8r4kuHImUjMCrYimvi3FMDZkl2/2tUgaqhNH25CILe14S+d0i/Lo2Z/RlgpZTpEpOMwUcI//ksKSaAQwtjOBRroXoh5laOPWcMqNc9aR3hXIb+ik9yo3mXvhifyRLj3lK9gv4sAUljZKegxgzhZwRS1PGWoPjJTUgQ73b0uJd6zrtKVC89fNtbHclHiW8oGhn1HR6aDKMKuQM0efyJqche1Rkpg05lLfKoGETr/oT4KWiqP7NQAEe9TYsG2ubT7PGqAPUcXnR4COJD18aaeIdeeLLcoVc+yQaQMaN7xXK42DNpxn50kGXXQa45sw2fPEATWXbC8hnJn6QODiWx+bx1ASBZMsmAVdjRtUR6l8ZCIAi3kfGgS7+DEUalsTb965I2KZvbdHpEU2c2rFWFO+S6Bfg1LzhP/fAinzV0vfcOP3QKzQJaMnz7feDpHTrkADp8xWy8a8QahGMTtuNcLbsMjWPorJzTryR/MlCcSgkDPfgooxRL9ee1DLs5co+U9WbhEwC/UckD61mg/8QqYoOS7HpGWJ/wQqvXFMZLj+QaKQ7NlnY+rixaX/nUSzz7PvWONvW8EP9ZY++5HVrBOkj9kTO+T2Jp/8DbzU4EMChuuV7Jr6SnvPJOQnYUxxcmTYsGZ6LJO+jER7Yir1pvX52NJdrGN3aA3JCmGUiOaE5HCGZDbm/XJSwPIRfP1Pet4ir+VDLvEpRSuS6uBBisUsoS0DDTJL9d6CRD0G GT+nS/WN /QOJ0nM2kkVbS7yasZKXa1qZOtKYSV9MUr260k9M2baOF2vq69z1q9bCad60KAO0e/eIessJagvnWceUskwHEu9uxrbzb4k0yU//mHrlmoSmX3s8+bioON8sE0iHwuFE0ha8NEgPYRlrtpuuToESBDPO6wHvf5KUqmTjNuFNw/73DXhn9nLP204puk0tt4XYlWN3BZp7A6saFeGtuj1VQLdDhgG+NDGqDu6YwZ4s8H7k3aH75XNsV/VvCvYnJAFwLoo4dWmSXblQIyG8gQg+jOUFusmq6WmOejJ1u9vyRKcfUEQs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 25, 2024 at 12:20=E2=80=AFPM Johannes Weiner wrote: > > On Wed, Sep 25, 2024 at 11:30:34AM -0700, Yosry Ahmed wrote: > > Johannes wrote: > > > If this ever becomes an issue, we can handle it in a fastpath-slowpat= h > > > scheme: check the limit up front for fast-path failure if we're > > > already maxed out, just like now; then make obj_cgroup_charge_zswap() > > > atomically charge against zswap.max and unwind the store if we raced. > > > > > > For now, I would just keep the simple version we currently have: chec= k > > > once in zswap_store() and then just go ahead for the whole folio. > > > > I am not totally against this but I feel like this is too optimistic. > > I think we can keep it simple-ish by maintaining an ewma for the > > compression ratio, we already have primitives for this (see > > DECLARE_EWMA). > > > > Then in zswap_store(), we can use the ewma to estimate the compressed > > size and use it to do the memcg and global limit checks once, like we > > do today. Instead of just checking if we are below the limits, we > > check if we have enough headroom for the estimated compressed size. > > Then we call zswap_store_page() to do the per-page stuff, then do > > batched charging and stats updates. > > I'm not sure what you gain from making a non-atomic check precise. You > can get a hundred threads determining down precisely that *their* > store will fit exactly into the last 800kB before the limit. We just get to avoid overshooting in cases where we know we probably can't fit it anyway. If we have 4KB left and we are trying to compress a 2MB THP, for example. It just makes the upfront check to avoid pointless compression a little bit more meaningful. > > > If you think that's an overkill we can keep doing the limit checks as > > we do today, > > I just don't see how it would make a practical difference. > > What would make a difference is atomic transactional charging of the > compressed size, and unwinding on failure - with the upfront check to > avoid pointlessly compressing (outside of race conditions). > > And I'm not against doing that in general, I am just against doing it > per default. > > It's a lot of complexity, and like I said, the practical usecase for > limiting zswap memory to begin with is quite unclear to me. Zswap is > not a limited resource. It's just memory. And you already had the > memory for the uncompressed copy. So it's a bit strange to me to say > "you have compressed your memory enough, so now you get sent to disk > (or we declare OOM)". What would be a reason to limit it? Technically speaking if we have a global zswap limit, it becomes a limited resource and distributing it across workloads can make sense. That being said, I am not aware of any existing use cases for that. The other use case is controlling when writeback kicks in for different workloads. It may not make sense for limit-based reclaim, because as you mentioned the memory is limited anyway and workloads should be free to compress their own memory within their limit as they please. But it may make sense for proactive reclaim, controlling how much memory we compress vs how much memory we completely evict to disk. Again, not aware of any existing use cases for this as well. > > It sort of makes sense as a binary switch, but I don't get the usecase > for a granular limit. (And I blame my own cowardice for making the > cgroup knob a limit, to keep options open, instead of a switch.) > > All that to say, this would be better in a follow-up patch. We allow > overshooting now, it's not clear how overshooting by a larger amount > makes a categorical difference. I am not against making this a follow-up, if/when the need arises. My whole point was that using EWMA (or similar) we can make the upfront check a little bit more meaningful than "We have 1 byte of headroom, let's go compress a 2MB THP!". I think it's not a lot of complexity to check for headroom based on an estimated compression size, but I didn't try to code it, so maybe I am wrong :) > > > but I would still like to see batching of all the limit checks, > > charging, and stats updates. It makes little sense otherwise. > > Definitely. One check, one charge, one stat update per folio.