From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB47FCCF9EC for ; Wed, 25 Sep 2024 19:49:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81B6F6B00B4; Wed, 25 Sep 2024 15:49:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C8F96B00B5; Wed, 25 Sep 2024 15:49:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 690DF6B00B6; Wed, 25 Sep 2024 15:49:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4B8626B00B4 for ; Wed, 25 Sep 2024 15:49:56 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F0785121159 for ; Wed, 25 Sep 2024 19:49:55 +0000 (UTC) X-FDA: 82604301150.03.4404BFE Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf20.hostedemail.com (Postfix) with ESMTP id 21BBA1C0002 for ; Wed, 25 Sep 2024 19:49:53 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XukVfURQ; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727293634; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uDIzP+sh/uI4eNppc+3qR6pBT/O+j/o04qIWIoM9cP4=; b=4FUg32FYw9bvHF/cu/EKFK40Qzmh5bNbL53cwy+RqLuJWHm0UfuvS/qwwNKDN/d3DDDHE4 WUMTrssF3XdilwrwS5H4aq623M/qbsxLyyBlomFxeqnfEQtUSz8ZDVi0pzpNc5iTdRrgYq 0qGH7uU/qYYRqVBosZLC+o2Kmm3ZtBA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XukVfURQ; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727293634; a=rsa-sha256; cv=none; b=dWIixH9sZl4pNw8SE66jS5qFDWRvpDQPSJE9rAXY2h/HsT1RLUhRFBtKONTqpWmTXzAuNW ykNQP6/aCoewC/LSipl5LNezkoZ4GmcivYiXjoWDSUYnp0Lod8pOafC5s7TPtyHEzwpVe0 G+xAfAgkPeOUtIsR+i86NeLDzs820x0= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a8a789c4fc5so232334566b.0 for ; Wed, 25 Sep 2024 12:49:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727293792; x=1727898592; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=uDIzP+sh/uI4eNppc+3qR6pBT/O+j/o04qIWIoM9cP4=; b=XukVfURQN/ym/HpnGAC+8l/gb+KQtmV1xkyJTDr7d/LK1HPSEcQVFG1cwaH0nhRjJy gD9nA22cnvJOOaKilOrkpXzDOPTnRV2uO0BSxCAoxr2sZ1RDe+JH+Epwj0/DpmftfGZM 1AkoClTZDl8eOPn+pKmR/F9zi6VE+Ek+yXboib5yGCoN4OvzfQi/xpdklkouunD01zfG pWIP9ZsQYDPx5NLBitkLY6W3JDCV75zr4KVo8WL/FU/PBt2j65RzYbX9srUUiMYAUi1X zOTQ01RCq/GEZqDPjSb/7CJKSXi2mnYsgcHJLAA2KluphE3QyRa6+5qP7/OPLXiCxS69 z//w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727293792; x=1727898592; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uDIzP+sh/uI4eNppc+3qR6pBT/O+j/o04qIWIoM9cP4=; b=fi3VtCqnRLd/TbhvkuR+vPfV06Kd7Nccgky7Ngf/+MczlUvNTGiC2NE4NhRWR3zRxs Bir/TAPN8yOMHdW8/IH9XCHRujaCmXFR67kKwDQpmz037DFujzbdV83VIuqWUShp/fsj BQYQbqXVGltRgfmIiXdM3fH95O1I61GJUcqLvw7uNcIZRag+JsrQW1/I0CzAGUnSt+pb rpv2HeVbKp7ejdg/xeqJlW4vyH0a/YBZ2wkGR6DJKzEf8prlt3qTB9NnuyumxQMCFtaM h7AZYV+yMLQjMrz76jkY96DOBL5v07/Bpnm7KTanDnvwrlQF4TXP8C7BGDSPauVgfpEr I2rw== X-Forwarded-Encrypted: i=1; AJvYcCX40MQ3QBTxt27hGhhOm8HGL3q6+4uiCJbuNeaxBYsQ+uqWezO8VFc88PoASJ4p6OBHshsc9Xvghw==@kvack.org X-Gm-Message-State: AOJu0Yzh1pssBCuLSu/fs2oBNzwkFJLzRWtDbXso3T0QjwVL/zgHit5v uEuB6NIEXzWc0irJ+PcNeu2xtzpNcFRVJeQ4b5W8mseaRRg3KC/oy4RYOKt8W7kfvAuazhyTH0+ vOr9jXvhQmr5OlXgm3pr/jyhpaU2SGxDRUzFy X-Google-Smtp-Source: AGHT+IGj6Bwqln8E9y6xXVNdF13Fn9r97uuyLN+uVhMBcmBw5v8NlAuMrgaqTe5hDv61S0qWRxXL2USPDIEMZtPGH44= X-Received: by 2002:a17:907:9728:b0:a90:34e8:6761 with SMTP id a640c23a62f3a-a93b157ccfbmr84071766b.6.1727293791591; Wed, 25 Sep 2024 12:49:51 -0700 (PDT) MIME-Version: 1.0 References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <20240924011709.7037-7-kanchana.p.sridhar@intel.com> <20240925134008.GA875661@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Wed, 25 Sep 2024 12:49:13 -0700 Message-ID: Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store(). To: "Sridhar, Kanchana P" Cc: Johannes Weiner , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "nphamcs@gmail.com" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "shakeel.butt@linux.dev" , "ryan.roberts@arm.com" , "Huang, Ying" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 21BBA1C0002 X-Stat-Signature: 7hnfk9eprpjykypbqiitp6h9px5qjyas X-Rspam-User: X-HE-Tag: 1727293793-669039 X-HE-Meta: U2FsdGVkX19MAlGKm1s9rx6xeMFigMebPeyNYvSuTrxL2zz1sKQjfkLl1S6B7L2FwEq/J/j/NkSwccNAFo9pncVdRmmDpmpqv8CLTjsbUZbNJm4JuXWuYa3pJCuFIuzstr/AxyXt85gjEOpvNxUXvb8Y5umy1Ss2bto7WiNH9RiABzWYN4/dAF1DmgqOxoQu7EwWtHGZ1QE+4lSbVtzlx+pgMB7Dq/vksj1BqM1OMdVpHcCo4fMaxCOyCV0/ueOrh3edChTq6vM+lgfmr4F8Ociop8XD78ve19KVwnl3rgelKCcwWm1Ix1jl9X1Crr9l0FjUl3GbEXS/Gh5wVJrMqZgZwZvrIMYA9D1vPYzTCfqIaL9SpxIgeqoD3m3OuXKIPM3E+8eROcqMyK2tfgPuPlvMAXgbis4FauqjgUvfCZUt9v77XPYjBxui+YoUE7oEIkNOVQ4RFaLK6LhXxcT73rE4x3r1rFaDnKy9XRYvoLNZ+s7vXxprNfl2YbpFygf89R0PLw1nSGpIIzn1Pg2kxW+T4+lBlksIZNZleqrEtJbzBJbdgA2l/hfgE4wqKTKcjXHxpLB81c5TBt8dvwvyx/c7ujODJMDQLDrlwer3ffENI3zvEIsHTqym16rTs8BB1CJh0/VEMS4qfglPXaIAxtq2U7yeyL+1tsmCyZFvuzor2gtrCDVlb/oU/PVx3S7SMiUNXhP/hyHkwtwhC3GvpIcP6muZ1qjPVavrb3c7HYpkHGVUTyteIb3OOFtXPE0dNfN2rZ7iEEHK5c/AI9HIjiXDnEFS0PYdRLoS5InC8Q7r7j0dlsYLGOwWBxSSyZpmS1ho4BLWVBmROpVBHyvExnRMflkn0qiYZV1ohLIgcyzYXllX6Zo+mZJ1d4zDBwyP1fcpQmKjYDgwTr5YfGq2DtPWh8myP93k00u4EgnT8EYKTNO76jzHvPgGcaRU+r+P/16cYWkJI/vuhvwOqsu t1nwF+47 BpWzw6pA1hkCvBKR9meV016NnOpwRkUf0TIijGQiuMGh+H4cBsjqO/XDxdiXXqYWnv/eL2OSIYr0cdoy2jIspPHM9U9Odg7CsXL5FA8SyD/UheRec2rKs5wjtP2KpwDZNWYNuHQkxvy0B9W0xl27LqNkF+gGQ0YozdvR8FVHPlBH9j8vNjGc7loslCZ3BWeMbItySPDvlIVcUToDplC94wLBb8IL/vaVDHI4YmxCWut3ZWBMbjEVDQxwFyupdnj9+IP9R5DFFGHssBlQdwEOK4qboqpygDLe/HzLk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > > > > > + /* > > > > > + * Check cgroup limits: > > > > > + * > > > > > + * The cgroup zswap limit check is done once at the beginning of an > > > > > + * mTHP store, and not within zswap_store_page() for each page > > > > > + * in the mTHP. We do however check the zswap pool limits at the > > > > > + * start of zswap_store_page(). What this means is, the cgroup > > > > > + * could go over the limits by at most (HPAGE_PMD_NR - 1) pages. > > > > > + * However, the per-store-page zswap pool limits check should > > > > > + * hopefully trigger the cgroup aware and zswap LRU aware global > > > > > + * reclaim implemented in the shrinker. If this assumption holds, > > > > > + * the cgroup exceeding the zswap limits could potentially be > > > > > + * resolved before the next zswap_store, and if it is not, the next > > > > > + * zswap_store would fail the cgroup zswap limit check at the start. > > > > > + */ > > > > > > > > I do not really like this. Allowing going one page above the limit is > > > > one thing, but one THP above the limit seems too much. I also don't > > > > like relying on the repeated limit checking in zswap_store_page(), if > > > > anything I think that should be batched too. > > > > > > > > Is it too unreasonable to maintain the average compression ratio and > > > > use that to estimate limit checking for both memcg and global limits? > > > > Johannes, Nhat, any thoughts on this? > > > > > > I honestly don't think it's much of an issue. The global limit is > > > huge, and the cgroup limit is to the best of my knowledge only used as > > > a binary switch. Setting a non-binary limit - global or cgroup - seems > > > like a bit of an obscure usecase to me, because in the vast majority > > > of cases it's preferable to keep compresing over declaring OOM. > > > > > > And even if you do have some granular limit, the workload size scales > > > with it. It's not like you have a thousand THPs in a 10M cgroup. > > > > The memcg limit and zswap limit can be disproportionate, although that > > shouldn't be common. > > > > > > > > If this ever becomes an issue, we can handle it in a fastpath-slowpath > > > scheme: check the limit up front for fast-path failure if we're > > > already maxed out, just like now; then make obj_cgroup_charge_zswap() > > > atomically charge against zswap.max and unwind the store if we raced. > > > > > > For now, I would just keep the simple version we currently have: check > > > once in zswap_store() and then just go ahead for the whole folio. > > > > I am not totally against this but I feel like this is too optimistic. > > I think we can keep it simple-ish by maintaining an ewma for the > > compression ratio, we already have primitives for this (see > > DECLARE_EWMA). > > > > Then in zswap_store(), we can use the ewma to estimate the compressed > > size and use it to do the memcg and global limit checks once, like we > > do today. Instead of just checking if we are below the limits, we > > check if we have enough headroom for the estimated compressed size. > > Then we call zswap_store_page() to do the per-page stuff, then do > > batched charging and stats updates. > > > > If you think that's an overkill we can keep doing the limit checks as > > we do today, > > but I would still like to see batching of all the limit checks, > > charging, and stats updates. It makes little sense otherwise. > > Thanks Johannes and Yosry for these suggestions and pointers. > I think there is general agreement about the batch charging and > zswap_stored_pages/stats updates. Yosry, does "batching of limit > checks" imply the same as a simple check for being over the cgroup > limit at the start of zswap_store and not doing this check in > zswap_store_page? Does this also imply a zswap_pool_get_many()? > Would appreciate it if you can help clarify. Yes I think we should batch as much as possible in zswap_store(), and only do the things are truly per-page in zswap_store_page(). The limit checks, stats updates, zswap_pool refs, charging etc. Batching all of these things should be clear wins. > > The main question in my mind about using the EWMA checks is, > will it add overhead to the normal zswap reclaim path; and if so, > would a simple limit check at the start of zswap_store as suggested > by Johannes suffice. I can run a few experiments to quantify this > overhead, and maybe we can revisit this? If you look at ewma_##name##_add() in include/linux/average.h, it's really just a bunch of bit shifts, so I am not concerned about runtime overhead. My discussion with Johannes is more about if the complexity is justified, I'd wait for that discussion to settle. Either way, we should check the limits once in zswap_store().