From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0023CCF9E7 for ; Wed, 25 Sep 2024 18:31:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A4F76B00A9; Wed, 25 Sep 2024 14:31:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 154FA6B00AF; Wed, 25 Sep 2024 14:31:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE90C6B00B4; Wed, 25 Sep 2024 14:31:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CDF696B00A9 for ; Wed, 25 Sep 2024 14:31:16 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E681F160534 for ; Wed, 25 Sep 2024 18:31:15 +0000 (UTC) X-FDA: 82604102910.29.80E18A7 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf29.hostedemail.com (Postfix) with ESMTP id CAE21120014 for ; Wed, 25 Sep 2024 18:31:12 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=IBAd2lyV; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727289037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=U6+i/7mcDfuzEJiw3xSsDAEL4FTrEvTwh7sTExEl5Jg=; b=8HwqoevWKC+D0lFo+KAq8j0JMalfsw/kk+b70iG+GXl3eUK1SBd+zW5h/iRjx8iTJ2DEPU s1W2tad7/KIFlO76ddKhwkIFcmPEYdEhbLY7s2S/pgppln90xuBDiM1XLazqnUZYuBXPL2 OUyYI9DikEmHlsHT2WJCNYsnQGAeH7g= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=IBAd2lyV; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727289037; a=rsa-sha256; cv=none; b=4ZS+jv1MSPKnfo9hwBgFmAiTgneE4zmVRnigtcrq1ZDS6qlUzrrDbHtKqacGhXu6y06bZ2 AQInd9QMp7w2z3syO8KP1brL/Ny/QAtA0+NlP+ci2ZiSi0+Eb/6SUqfPCbewGt8yZ90d0r E8g8XfXOez0DCO8D+aJxu4VDo/6Th9I= Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a8d3cde1103so22299966b.2 for ; Wed, 25 Sep 2024 11:31:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727289071; x=1727893871; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=U6+i/7mcDfuzEJiw3xSsDAEL4FTrEvTwh7sTExEl5Jg=; b=IBAd2lyVXmncBcGROi4yX0uFRMdPK0VZkr9sOJU+X7WXnlN21t9YibEF3sm9tFISSq WEyJ0rJxtryBxAaIGYqGXBMkih3erHpGBxLOE4DgMpii2MzsR3u1NE5naLDWe6iitE+9 lTkc+03gCs8aOc+mxtMwFIUtmsDmI3wIpqsz3DfPG7B8iU53yrxPokS4aQ1JStL2ftkx jHIOVK7Yon1TlmjyKQTgsyKMZpvIvq+PJ8iaf5s/c+whs3SJAAZBW3FlRvPuZjZYJrAm bwF6mUKtPFh5N41SZLqIhPfN5YgvW8kwxvIUn975dQemub6rqXt7BiKRrHi9E3fnBY7t VUHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727289071; x=1727893871; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=U6+i/7mcDfuzEJiw3xSsDAEL4FTrEvTwh7sTExEl5Jg=; b=lsnxRZ5BsLjD9DWsRJFDGMupG7ecmRr/zpB6NQLEuofU1rzXB7nDxTh+MBxluf1TxV k5V24VXqsBO90d03vfCV+YN7BRi2+xHPH+6VXNItU5Io2IyO/KOrg7OSj+yzxXpKjygK lCoF9NRwTZvemHOZsMAwLDJ9CM5qSyEBlI5dfc95ihr2CLs1PXTYHvYHFG6khP8BGii0 Bl43R+IIQW0fHQal3FVm4nIyDuUQ3qtfAVQgHCBIMfpIpHpSyJxumuuqsDCuZMaiRwnB q+vQ/KUZnNvCfdfySJuW05KW9VCIep2aKr6Vpwcn4hUG//FtOAYMuArUzNsYBf+ZRaF2 3nzQ== X-Forwarded-Encrypted: i=1; AJvYcCV5CMYXhVDa5HUXzDp4Xlv95+7HODZHKCRO4ZRMH7n8RBycUCnTny3eH+R9dlBgZ647QqAXo00pIw==@kvack.org X-Gm-Message-State: AOJu0YzfZNMMdREDe+CekGG2Wp2kr7CDzYaY1uZXWTD6Mpp/OfCAn4Dw 798VqHFN5LlfmsRaWaTLFGoD013wmcImh+gw3qJBOpixcnaWRoWwpPKxAB7NwmRp8XCd74cwGYI Qml07C3nfpJhjH5EzeJGpgylQ2EkGTMgPo/h5 X-Google-Smtp-Source: AGHT+IH0U/Sj/v3RDFLKDFGHHDOUO1yFKD7gU9F+YCdLZqdw2Vcxx9QjbEcC92mesiGInr9j/gAI39Tn6vVe50TjVNA= X-Received: by 2002:a17:907:3d8d:b0:a91:158a:a900 with SMTP id a640c23a62f3a-a93a069bf67mr391239166b.58.1727289070910; Wed, 25 Sep 2024 11:31:10 -0700 (PDT) MIME-Version: 1.0 References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <20240924011709.7037-7-kanchana.p.sridhar@intel.com> <20240925134008.GA875661@cmpxchg.org> In-Reply-To: <20240925134008.GA875661@cmpxchg.org> From: Yosry Ahmed Date: Wed, 25 Sep 2024 11:30:34 -0700 Message-ID: Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store(). To: Johannes Weiner Cc: Kanchana P Sridhar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, shakeel.butt@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, nanhai.zou@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Stat-Signature: r4n6njtr7t3uwz8613thcgbpf3n8p7cm X-Rspamd-Queue-Id: CAE21120014 X-Rspamd-Server: rspam11 X-HE-Tag: 1727289072-614578 X-HE-Meta: U2FsdGVkX19aCBW/g7K6w81P9f9RD6y9k1SxBjyewfesKWgGR1IGYsulX4bLGCWJyvzzNOnBqGQZZw3FqWDASo6epFCYPTNIz8Dk/WFELRy1J+NRiHXYewZhsJd6/Lg4+JtZdnstF5RXy9DUj2wW5x6VnIunop0LCPuLE2V7L7ws2Ooi/gJ6Jx0h4B+l8S72bGl/ymgd3GEsjt7Oh9ykZUTF54yy5gJOjBYPB1mDs+Vu9Y6oKaV6/YjmImORI+qlINkK8HojrYrCH24ZRIoCFFDy6wkjP2mZuZAE1/Z84cpD26gU1vxRr0uHIX1jORcSkl4w/W1i6XujNR3FHQILJ7jaQ1DLvCdVAtiicruvHkQtzoo20UAKxL59lPBomhTqqaVWtPHNVnkHpz588/IWwpK0y+u332nB8JHIqR6/+0mwgxNPh2CH0zh8bdjIFjSrKArUBvyU4Y+yHUPPt///N2QFUQsnrrrxbUpNJtTVbA2UB5SScxfJIn2k25RIsLbgQltQHbbfezXib6nHOc+Pu7EMBxNL/tjRjp5Oo8136zkljit38+KmK46qavXfWo5ShdHGHYGoyhiqHYrlyYhOjUn8Phze9x+CD41jAqL23WvFhPDEAwThyYmkBLckkoOXQKmH1WbgC/30wL5vkMMAduCuZ1UYNocfLhPYUGlZuH2Et3kRH+rMVeuzD2lcRXj1MSd9mEzKCppXuzEF+TSZugNzuFxwh8F7Aewk31nQcrXRnLe0f/KvFv7OZb6TU0HiF+trFxUuPV3q29wK7fG4QPYN1LrhsayDUoT3Eq3XCseqXWHxqxgJR6g4+OJdXKC32nQxG/AgL0ycJEDLTBVD7aIC1Ks+WMWKCx8k4xNoEZ0AafhPiOKL+99OLzKQmYhG3vaieiv2vlO0aY2GeeK0eAKqEHGxOCQZ11d3G3QcQ5Ur0zfLZnsV+vvD1fXyte63A53AjgSWyaLIRo0EzYE stpMwnsN 7eAeNGqUjg3a0ASw+bQ37zIFFRiCBPVcNX+97z+GqAGJm72Rn7MgeH3xzOqfzpDFz+sCMbR8KuRKgnyUYLbWoEc48cVdpGCmNEFsldhqtGK93mzjZsQTEOxUr6uKqsdeuzp2wQf8DfmGMyNpFRyZkc57F+pcOYEGuFmHICa2fNx4hPSzJhjYp8UXXFfDBV7Dq46Sy8epRAflVA4QnAKm7WDO5YwHmsfgykUzDfHuMkE+MKzIH9dmmzsYRww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > > > + /* > > > + * Check cgroup limits: > > > + * > > > + * The cgroup zswap limit check is done once at the beginning of an > > > + * mTHP store, and not within zswap_store_page() for each page > > > + * in the mTHP. We do however check the zswap pool limits at the > > > + * start of zswap_store_page(). What this means is, the cgroup > > > + * could go over the limits by at most (HPAGE_PMD_NR - 1) pages. > > > + * However, the per-store-page zswap pool limits check should > > > + * hopefully trigger the cgroup aware and zswap LRU aware global > > > + * reclaim implemented in the shrinker. If this assumption holds, > > > + * the cgroup exceeding the zswap limits could potentially be > > > + * resolved before the next zswap_store, and if it is not, the next > > > + * zswap_store would fail the cgroup zswap limit check at the start. > > > + */ > > > > I do not really like this. Allowing going one page above the limit is > > one thing, but one THP above the limit seems too much. I also don't > > like relying on the repeated limit checking in zswap_store_page(), if > > anything I think that should be batched too. > > > > Is it too unreasonable to maintain the average compression ratio and > > use that to estimate limit checking for both memcg and global limits? > > Johannes, Nhat, any thoughts on this? > > I honestly don't think it's much of an issue. The global limit is > huge, and the cgroup limit is to the best of my knowledge only used as > a binary switch. Setting a non-binary limit - global or cgroup - seems > like a bit of an obscure usecase to me, because in the vast majority > of cases it's preferable to keep compresing over declaring OOM. > > And even if you do have some granular limit, the workload size scales > with it. It's not like you have a thousand THPs in a 10M cgroup. The memcg limit and zswap limit can be disproportionate, although that shouldn't be common. > > If this ever becomes an issue, we can handle it in a fastpath-slowpath > scheme: check the limit up front for fast-path failure if we're > already maxed out, just like now; then make obj_cgroup_charge_zswap() > atomically charge against zswap.max and unwind the store if we raced. > > For now, I would just keep the simple version we currently have: check > once in zswap_store() and then just go ahead for the whole folio. I am not totally against this but I feel like this is too optimistic. I think we can keep it simple-ish by maintaining an ewma for the compression ratio, we already have primitives for this (see DECLARE_EWMA). Then in zswap_store(), we can use the ewma to estimate the compressed size and use it to do the memcg and global limit checks once, like we do today. Instead of just checking if we are below the limits, we check if we have enough headroom for the estimated compressed size. Then we call zswap_store_page() to do the per-page stuff, then do batched charging and stats updates. If you think that's an overkill we can keep doing the limit checks as we do today, but I would still like to see batching of all the limit checks, charging, and stats updates. It makes little sense otherwise.