From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF3D3C5AD4C for ; Tue, 21 Nov 2023 02:46:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 514E16B0347; Mon, 20 Nov 2023 21:46:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 49E486B0349; Mon, 20 Nov 2023 21:46:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 317406B0357; Mon, 20 Nov 2023 21:46:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1CDF26B0347 for ; Mon, 20 Nov 2023 21:46:56 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D5D001209F4 for ; Tue, 21 Nov 2023 02:46:55 +0000 (UTC) X-FDA: 81480423990.04.8FE6F8E Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf12.hostedemail.com (Postfix) with ESMTP id 0D01B40003 for ; Tue, 21 Nov 2023 02:46:53 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AEnw6Xq0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700534814; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IT8XHAvcEhECV25AWmpJElN+FUmGpxVcHQIxuxXBKgE=; b=UZ3In7Igd9QwMiKqmuYZr/mlwvt9a4cnMRyTk3SluoDnjJ8+iWbOjduW8TR0o2ig2spWgF jhRbMjO/ngHF+jV2QeYwalaCUuHjhk3E1HsvMaWx6qW7ph3xp/ws+16guY1FxgUcKb6Jfn D1OVjzUtA84d4lKNOuqDbjxzh87pd4I= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AEnw6Xq0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700534814; a=rsa-sha256; cv=none; b=MR+xCfcjzKTknnP1pO4nku72ojNHXMoIyJvJo/5GDrgN78/pmh4lFSqSrW7FRrTsa+yjBH j6UpvkfLIEmbGWfpp7iglohPIWlI1Vb4Cn/u4u+PyCqLsDKUNf2BzWwWpxBk50pvEO85ha IoUawIthO7WQriLmWY2YkY4wrpLQ30I= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a013d22effcso62020766b.2 for ; Mon, 20 Nov 2023 18:46:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700534812; x=1701139612; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IT8XHAvcEhECV25AWmpJElN+FUmGpxVcHQIxuxXBKgE=; b=AEnw6Xq0fafGZe8X86U1xpMicbdgr34hL/xYLBeMebWJp6BononluL8Xv2F6jDRQwf Yr6Oci3hfTxlXmbIsUENvuIv+HtCHsGYhEBZ0NnXedD5MYzm7vq8Zbbgyg/ixBOmmQgm EnGbYXQNmfaBaaqtWTNjAhqeSpqKqMv4R/rsRn6cia1eWfb9cIZ7Im/9c/2lcDgCoLKW tvKwVQJ1qdYnM7X138HrT34Fsiaqcujyo++DRFh2pgR+mb6Prpaet5Ey8y6QS6j/W0f4 g5+HFEBAcb0jC6GHDAOSZw+SXSxYrkBLZaJMT3Ep4mABZQf3isWbQscyRXNNCVyGuw++ Kl2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700534812; x=1701139612; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IT8XHAvcEhECV25AWmpJElN+FUmGpxVcHQIxuxXBKgE=; b=c+XtUy1fQ0EwkCdM7InCDOftAgrXAXFqk1pLP16u6o3007Vdb2aOI6ciVhr4VVcze+ pw2glejowBQwT8IVmU9ZpxmqoUZhGqUAxptdlsyaLWYMJ3MJwLpPhlN4ZjNOdVDcj/yW ZBeHzbqK1zU5xgmAa9+IRUpQLnHeYssFiJOGfUKTIFxTWjGCfyZivBS2oyyKy4e8+QBS KDwBKhxvgcHUFh5BFDs+M8nElH/pNIx7QUtfERkvJGNSXlsASydFUfM+tuGSbHl4EWJA NUmbKOIv7JKn5RTGxWRvoq0uf8QKLbqkeNWeIVKufJFISL488ao0g9YtjnzeOn2UoW9h yHDg== X-Gm-Message-State: AOJu0YyddnTd3iOmNefLmsa9dX7OzcVMEKTJpiTIfj4ni3DcV0olkpx3 oS26snb9VtDAVkxZXJfh7yQGEeHNHlefe3b9ZEXtxg== X-Google-Smtp-Source: AGHT+IH87VC6gbnVo/3R0Xr0pw1M8eld0MmABD+ImDPfoq9nyWWCLUItjAGmYW4djLZJlaVZXvoObpIeJMOMTTOnIrc= X-Received: by 2002:a17:906:414a:b0:9dd:9380:bb15 with SMTP id l10-20020a170906414a00b009dd9380bb15mr6688701ejk.50.1700534812389; Mon, 20 Nov 2023 18:46:52 -0800 (PST) MIME-Version: 1.0 References: <20231113130601.3350915-1-hezhongkun.hzk@bytedance.com> <8734x1cdtr.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edgkapsz.fsf@yhuang6-desk2.ccr.corp.intel.com> <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Mon, 20 Nov 2023 18:46:13 -0800 Message-ID: Subject: Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios To: "Huang, Ying" Cc: Chris Li , Zhongkun He , Andrew Morton , Johannes Weiner , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , linux-mm , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0D01B40003 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: awa6rmtjzatektz4peameq4xz1k51npp X-HE-Tag: 1700534813-52664 X-HE-Meta: U2FsdGVkX18410h45FppyiEi9BH83sweoaIoDOhbDkIqL7TCcNp65hsoKbpwGAzb1nIwIjEp5ehR7cSzEwSwqmaIVlEpqe5wk89BcZvOHIXwjwhbtK8RRAQDgysHB5xm7oh0owRJj45XOTNwb41dl2V4zlD9h3EzYkoc0h3YSNb2dMtvcAMhSHaPS01O7JzF3uiO3gVozsXbYgF/CCm8sFJvqbOGTI63aOH0bqzRsqFU1aIVlEcyP4uwWWDwJnx3hAGL6EUYCGdEhmAhgYdSP68RgsSTg4UgAKVi1uxWtJ3aSrimhTpBllbFAKxMzAR+Wdcsjjc4KGbcoCVUmv7Vsq49Ck75T8sbDPFIQGkXsZ3J3G3384rN7jqdJxxvCiGVlM1BkTVLUs+TyKKOGmr8a+0ALbx/p98qTv1+sina9a7yex3fgWLAXf8Tp1p+s4HHmie7fLnhAe5RV2aPwqkiR9t78Q7Gr7CoVoMA5OeVpGk5WhQ4WiFzwIRTuDZvuX8uUWXnYTwC0WrwCfqXMimO4LG42HWcJ9AyZpHir5yNzEPnhIopbai5MZqa00tAt6fshjFCQh8oOkjeYfSex9evVNfO666iPnaL/SMkS+fTRrWteHSThMc0WjEy1hZyfl9FBsqazuk9+ZQRBtz/YZhgFIeTgotWBWL8201WQAzkWm7swhyC2yNgb1JuAkIzeXKjeKqN9AQQgmhkZ5uHvRab4zDCipPgjBEOFcCcH5Pd6gewz8u+nh9wI3zjYDL8sqNgy1zRqM7xPK7GyydEZJHDeXPucJtoyFyK8nAt+7xKeuiiIVk83zkVcj+mNvskGgj0eYMBidMDSuYCnqNdzxNpLD7LxmHpdyKKxN3CR5CouBf9mtGWnjqWI03quFX3s4MbRFCdmsEdWbz5XV3Y4UDT+MlKKGu94gfAm6xXVQ0WJhUqIiWchrDzp+xq0engn13FT9MH7+BMVSYOzLZpfpK ceR0SQaP vKqqjig7CBddsn6vyffd6vROmlQ6F2Zv13zMmgHFnBf7J7nN0Zy14wmI9pw8NvAlBBtWUH8tuR/JCsWZ4Tox4rGxrUeGd8Kgshyw2o2yb9livY8TLHxXEcQIbEstqRXwj6Sh3mtiu1SSlSYGqnozNXhBMtN9jiS12YUN0n93hLkJf2ESGbEQihLt2kMHjtUblA4HUdi5vp6YA3VfuMpdrysa12/auQZe3Dn90WGKjJmy4JCkb2nnpIlHNk3mITZLKUIWamA9/cpVZgr8Jl0aLflu/ug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 20, 2023 at 5:55=E2=80=AFPM Huang, Ying = wrote: > > Yosry Ahmed writes: > > > On Mon, Nov 20, 2023 at 4:57=E2=80=AFPM Huang, Ying wrote: > >> > >> Yosry Ahmed writes: > >> > >> > On Sun, Nov 19, 2023 at 7:20=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Chris Li writes: > >> >> > >> >> > On Thu, Nov 16, 2023 at 12:19=E2=80=AFPM Yosry Ahmed wrote: > >> >> >> > >> >> >> Not bypassing the swap slot cache, just make the callbacks to > >> >> >> invalidate the zswap entry, do memg uncharging, etc when the slo= t is > >> >> >> no longer used and is entering the swap slot cache (i.e. when > >> >> >> free_swap_slot() is called), instead of when draining the swap s= lot > >> >> >> cache (i.e. when swap_range_free() is called). For all parts of = MM > >> >> >> outside of swap, the swap entry is freed when free_swap_slot() i= s > >> >> >> called. We don't free it immediately because of caching, but thi= s > >> >> >> should be transparent to other parts of MM (e.g. zswap, memcg, e= tc). > >> >> > > >> >> > That will cancel the batching effect on the swap slot free, makin= g the > >> >> > common case for swapping faults take longer to complete, righ? > >> >> > If I recall correctly, the uncharge is the expensive part of the = swap > >> >> > slot free operation. > >> >> > I just want to figure out what we are trading off against. This i= s not > >> >> > one side wins all situations. > >> >> > >> >> Per my understanding, we don't batch memcg uncharging in > >> >> swap_entry_free() now. Although it's possible and may improve > >> >> performance. > >> > > >> > Yes. It actually causes a long tail in swapin fault latency as Chris > >> > discovered in our prod. I am wondering if doing the memcg uncharging > >> > outside the slots cache will actually amortize the cost instead. > >> > > >> > Regardless of memcg charging, which is more complicated, I think we > >> > should at least move the call to zswap_invalidate() before the slots > >> > cache. I would prefer that we move everything non-swapfile specific > >> > outside the slots cache layer (zswap_invalidate(), > >> > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), > >> > mem_cgroup_uncharge_swap(), ..). However, if some of those are > >> > controversial, we can move some of them for now. > >> > >> That makes sense for me. > >> > >> > When draining free swap slots from the cache, swap_range_free() is > >> > called with nr_entries =3D=3D 1 anyway, so I can't see how any batch= ing is > >> > going on. If anything it should help amortize the cost. > >> > >> In swapcache_free_entries(), the sis->lock will be held to free multip= le > >> swap slots via swap_info_get_cont() if possible. This can reduce > >> sis->lock contention. > > > > Ah yes that's a good point. Since most of these callbacks don't > > actually access sis, but use the swap entry value itself, I am > > guessing the reason we need to hold the lock for all these callbacks > > is to prevent swapoff and swapon reusing the same swap entry on a > > different swap device, right? > > In, > > swapcache_free_entries() > swap_entry_free() > swap_range_free() > > Quite some sis fields will be accessed. I wasn't referring to this code. I was what's preventing us from moving the callbacks I mentioned outside the lock (zswap_invalidate(), arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), mem_cgroup_uncharge_swap(), ..). I think most or all of them don't really access sis, but perhaps they need the lock to ensure the swap entry value does not get reused? > > -- > Best Regards, > Huang, Ying