From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1171AC5ACB3 for ; Tue, 21 Nov 2023 03:35:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B64F6B0248; Mon, 20 Nov 2023 22:35:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7628D6B024C; Mon, 20 Nov 2023 22:35:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DD2D6B0250; Mon, 20 Nov 2023 22:35:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 49FDB6B0248 for ; Mon, 20 Nov 2023 22:35:04 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 26F5C806A9 for ; Tue, 21 Nov 2023 03:35:04 +0000 (UTC) X-FDA: 81480545328.27.486BBD9 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by imf08.hostedemail.com (Postfix) with ESMTP id 7B522160004 for ; Tue, 21 Nov 2023 03:35:01 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=HKpXDycG; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700537702; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TW5Y1PGU61acO0Ys5tuhfUcH3Plxyjuvd5N7WVyM5PE=; b=sqB4JWrSKHNKeH+a3pJOv6Owrpo0FldKM6MjOCxCKfsDMdPM3z5HTohcwruhwuBRvId7K1 zAPjqZgS+EzsPDcSiFjujqh+RMUmJEQkEdiUHD7eOZwEJwxcOioIR9Gwh/p5c90qoMgVF4 4Kr3tleDMZ18AGOVPbO/v3Xg9Q+m1mA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=HKpXDycG; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700537702; a=rsa-sha256; cv=none; b=iRzGMr2EkVDoPpxbvc6kxoW6ErULDTl3V6Kk4+m9RBXFgeE5f//Zo0gKxTfBkSmkKQ1dud mx3wUku/5Z48AfovpOWw5sk+J2iWkGNtsMYCHbjmY5KbCPYQNrVBMH+Qybg2LEkQnu8/SH UjTb2SxwSM1he7NhdPtqAD62ct21bss= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700537701; x=1732073701; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=n0R9KcRqMiNMDOv9C7sCQJS2G5NNQeFlnHxrrCEZSuA=; b=HKpXDycG/NE6TryaXsNNkGdA5WqlmFZiS7VpyIGbe+87VUZUFiRkCjF7 Knp0lRTEOfpEO54eo6tunWfvEMlT0d/L3siUQQUfOij1r9+nP7vkKyrzB 45601kE5zkJ8t3gSTYKBCTLTIkXU5Sx8YDrCmCqLHnlT7qPd710TZ0JjW n+aSvFr3T07ekWmORxGpp8syrVMcFSjplDbiZ1U+Ueku/oTnBO98wBAYM Nln+5c1FDG43YH8QR5bokHCrp9pgqd+LSdhV18g6IvAlwOu4p87El4J1+ yQ65IBX6SXJWVgzlyR3rO8DotdgpOCewo09u+ucRhJ6cdwkt4kmPnB3tB Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="390613821" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="390613821" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 19:34:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="836924044" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="836924044" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 19:34:56 -0800 From: "Huang, Ying" To: Yosry Ahmed Cc: Chris Li , Zhongkun He , Andrew Morton , Johannes Weiner , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , linux-mm , LKML Subject: Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios In-Reply-To: (Yosry Ahmed's message of "Mon, 20 Nov 2023 18:46:13 -0800") References: <20231113130601.3350915-1-hezhongkun.hzk@bytedance.com> <8734x1cdtr.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edgkapsz.fsf@yhuang6-desk2.ccr.corp.intel.com> <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 21 Nov 2023 11:32:55 +0800 Message-ID: <871qcjbx20.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7B522160004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: td6cj45xfx1ue6pkqxjd811aek9hcxbn X-HE-Tag: 1700537701-593732 X-HE-Meta: U2FsdGVkX1+oF7thFohal6QmIdVU4Q9mgqS5aRBTG4GXxrQ89CbkOq3wCtFO0s0UXqJtaAZCCg0nTy/5s/FZOSmlT1bw7Xdaf74Df7Mc1tpqaKW2oMKbY9fnSMeLNC/nIL0AZiUXYAU3zkhQAOElQLDQF/Fm8dySpI6tur/Ohuu2Z8KT1FytlWEuFZUgRQDM79DzGTlfEU7sEIiGHA9Iv/MZQywCA8d302uxkFHilUfSVsjCIdBYoYdY1xSDC6PUO8WfNGQqsMOsKzl7MyQnCvKYsqqoXVG42XDWHf7FwX6HAzWMdnjSFdSS4X4rNL4O+i4P4kN0xQ8SgvBgY5Hkoi0vshhHAK5ImZkD6NEs+XdouDTLwrae4Tgl0dh/KktSiCyeJzK+qu7w7g/yAx6CCszhJIans1r6XIECA8PrRucYGirFRg22RC64W0sY6WlNKRhgeekO8MBm7xVYPNccQ3IG1sliYcRZKIi+nT+jMveYfIyoB6SKEkYUJBgPCJdbd5SaRsqmKwSkMLZR7ryr4jnZOO9mf/kJoyFZXzAnYGGGMZafDM5FgSTkdI1ybSViv/VBy62Hi+xmC1cQcW0b1G6F8oWXYXhvRqwKae2ZSqHqpysfIYlyzqUSD1bNW92zLzmtVg5sh4FiT6sxslGS3mp7QcRvnI9mV64HBTzBPd2Z9HW+GATqjSRgBnWscpFm9sjyQcY0xP3hF3FU/NsBAy7EYPSz5nrkCqqA+5JtidfCXzp/Knu3LBsq8wIyOrtrmX0r9NiyW7gcKqGDiGnML1wnm7d/IcgrsVDbe/37X5w60QKLGPUHxnHZFJXldNzLuWV0ahtcMuKuyAF3WQ9jMetXAyqmkFMZqNwQJiK3S/Ys5is4k6NqJxCtmNlIbyTK28UWaZFFyeTyQR6LnaHw5mRJkCUgAjEsyR3Wc9Evt1PPOUOOOMTtMmrtUiquIje8Y0Gykq11Esej8A4WBwo Vmdc7NbP T58Rp8hecYxlHb8DgD1E5lzsS1ILgwqjEVQFbpRMYx4MMmj1Xj0EbG79GMbwyIv6h8XMQwHrRn54PxCEZMsuQN7Kxljb+37O6hkQk/mUuH2CEh+8Tjj6lbtE7fLqd0qcCUeM00nNNkK3ZA+Epv8SvsgKL/RkA8wskKoinYO+i37IYxiXOunLa7JBeHR5MPajps6K8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yosry Ahmed writes: > On Mon, Nov 20, 2023 at 5:55=E2=80=AFPM Huang, Ying wrote: >> >> Yosry Ahmed writes: >> >> > On Mon, Nov 20, 2023 at 4:57=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Yosry Ahmed writes: >> >> >> >> > On Sun, Nov 19, 2023 at 7:20=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> Chris Li writes: >> >> >> >> >> >> > On Thu, Nov 16, 2023 at 12:19=E2=80=AFPM Yosry Ahmed wrote: >> >> >> >> >> >> >> >> Not bypassing the swap slot cache, just make the callbacks to >> >> >> >> invalidate the zswap entry, do memg uncharging, etc when the sl= ot is >> >> >> >> no longer used and is entering the swap slot cache (i.e. when >> >> >> >> free_swap_slot() is called), instead of when draining the swap = slot >> >> >> >> cache (i.e. when swap_range_free() is called). For all parts of= MM >> >> >> >> outside of swap, the swap entry is freed when free_swap_slot() = is >> >> >> >> called. We don't free it immediately because of caching, but th= is >> >> >> >> should be transparent to other parts of MM (e.g. zswap, memcg, = etc). >> >> >> > >> >> >> > That will cancel the batching effect on the swap slot free, maki= ng the >> >> >> > common case for swapping faults take longer to complete, righ? >> >> >> > If I recall correctly, the uncharge is the expensive part of the= swap >> >> >> > slot free operation. >> >> >> > I just want to figure out what we are trading off against. This = is not >> >> >> > one side wins all situations. >> >> >> >> >> >> Per my understanding, we don't batch memcg uncharging in >> >> >> swap_entry_free() now. Although it's possible and may improve >> >> >> performance. >> >> > >> >> > Yes. It actually causes a long tail in swapin fault latency as Chris >> >> > discovered in our prod. I am wondering if doing the memcg uncharging >> >> > outside the slots cache will actually amortize the cost instead. >> >> > >> >> > Regardless of memcg charging, which is more complicated, I think we >> >> > should at least move the call to zswap_invalidate() before the slots >> >> > cache. I would prefer that we move everything non-swapfile specific >> >> > outside the slots cache layer (zswap_invalidate(), >> >> > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), >> >> > mem_cgroup_uncharge_swap(), ..). However, if some of those are >> >> > controversial, we can move some of them for now. >> >> >> >> That makes sense for me. >> >> >> >> > When draining free swap slots from the cache, swap_range_free() is >> >> > called with nr_entries =3D=3D 1 anyway, so I can't see how any batc= hing is >> >> > going on. If anything it should help amortize the cost. >> >> >> >> In swapcache_free_entries(), the sis->lock will be held to free multi= ple >> >> swap slots via swap_info_get_cont() if possible. This can reduce >> >> sis->lock contention. >> > >> > Ah yes that's a good point. Since most of these callbacks don't >> > actually access sis, but use the swap entry value itself, I am >> > guessing the reason we need to hold the lock for all these callbacks >> > is to prevent swapoff and swapon reusing the same swap entry on a >> > different swap device, right? >> >> In, >> >> swapcache_free_entries() >> swap_entry_free() >> swap_range_free() >> >> Quite some sis fields will be accessed. > > I wasn't referring to this code. I was what's preventing us from > moving the callbacks I mentioned outside the lock (zswap_invalidate(), > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), > mem_cgroup_uncharge_swap(), ..). I think most or all of them don't > really access sis, but perhaps they need the lock to ensure the swap > entry value does not get reused? In fact, the swap entries to be freed by swapcache_free_entries() is in a state that can not be freed by other path (including swapoff()). It's swap_map value is SWAP_HAS_CACHE, but we can not find folio in swap_address_space(). To be honest, I don't know whether there are dependencies on sis->lock in these callbacks. You need to investigate them one by one. -- Best Regards, Huang, Ying