From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49ADAC54FB9 for ; Tue, 21 Nov 2023 03:37:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6E476B0358; Mon, 20 Nov 2023 22:37:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C1E3A6B035B; Mon, 20 Nov 2023 22:37:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE59A6B035C; Mon, 20 Nov 2023 22:37:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A05A46B0358 for ; Mon, 20 Nov 2023 22:37:45 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6C72A40A9A for ; Tue, 21 Nov 2023 03:37:45 +0000 (UTC) X-FDA: 81480552090.02.00240C9 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf29.hostedemail.com (Postfix) with ESMTP id 917DA120006 for ; Tue, 21 Nov 2023 03:37:43 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=S7YMDQI1; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700537863; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D6JP77IJAFcT7qYQ8fIPmy5AeO6uJhz8/ThBZXngqfQ=; b=ddaDAA0IuQJNh3Bo/82aC1M0wPqc84m3478Aq4Uof8aDzZcAQlWgVrCibD5PxgAx7tSqOe uroprTlLzzMh7q4zPj7Lyk4opEv7liVwxIgBYEp5rY+Edz+pxqT+paG5B7p8twj48hWe5+ bgV/8/XjRYlGsKXK7iNUe0S6+e0TVsU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=S7YMDQI1; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700537863; a=rsa-sha256; cv=none; b=qsFebb7p1lwzBjIKCQYkgASqjIxZX05A+K+TklaIBdNmWaFz5AQrTbr6etCXvQKbr6/rNY o8/ciKgS60/29xBs6VqIa8DN+zgbuM050a/gitqFnX5cD1qdzloaZtDQ//8dubx0GKjqUF JEm5YDKl0LJXh3aEG655y2c8SSAyxhY= Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-9f27af23443so689858966b.0 for ; Mon, 20 Nov 2023 19:37:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700537862; x=1701142662; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=D6JP77IJAFcT7qYQ8fIPmy5AeO6uJhz8/ThBZXngqfQ=; b=S7YMDQI1ykWYCiLdcp3qk5NmcaM3M5Bo5kZeQaxeVA8dZXHU5fMXx5SSxqV0Mwxcin N3/F+rB0DrhbNs6uYNAJLfelgYWTuX3oC4G/1eWYqVNwHgXm+VKmXKbm8IrhFSAEMSqt oUYJnt3LwlAPjeolwvCuGYrYRLYMfZ62q152ijyz2zhEWolvIo7Pf+r596fDj8gZp8pi k0KHlwRcxpvZlHbbTwzUP0Xn2sKDDJyNBo8aAk+/PqJeHW7ZgSkzuQowDYzoAfPg/yHB IFH4IXDLh9Id3q61CAtIDBimUJuQtM7mSnrvZx0bqJBYKSiqdrXqjWqrPxkSIXBQGvPd efww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700537862; x=1701142662; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D6JP77IJAFcT7qYQ8fIPmy5AeO6uJhz8/ThBZXngqfQ=; b=wTyR9Ef68NAw64CFcH1Oy5ud/9eLogHN8cjq7DiGkWghx94GdikttiNlqY8jH2WJNJ SLdpDrtu9904aqsrgubRmZ+QDJ7IYP/qGVjuuGzJzwX6GyqDLjCwvNbvQxkVbk9iogWR DvKjBRV/2CF6Ft7VGSUAyWCOWqQpA0HXcy7Q3KlWxZnQs7s6lcN5zVBUfbpgWNC0gUni +6jZvVcF/tQE/TPLCPkr90iD7OQgQF61+bZ1uMyhOZDFI/Om3RGI9dwEPF65DDH/bU2G Qhso16Eb5kYiHJjGuauXhJuUoYQnMzew4fNjCQXkgDyyMgiYtbfGnhXUnw8bRj0OTEVo fpyQ== X-Gm-Message-State: AOJu0Yw9BDrt65ePBRpM5snk2DgpnaBGxasbyg8hhwWIafXgjrHoA2DU QLAE/IZt6fqYW42n07KhZR7hf4p2DAVxY5SscrcCkg== X-Google-Smtp-Source: AGHT+IFSVsiy9K7Vb1LpBknGoBKKIsCYVMduksOP0PyaPxHgTyui5kGNuohqFfHP60cDN3HGY1EfEQ2m/n2viXtGiLE= X-Received: by 2002:a17:906:29ca:b0:a00:231a:92f8 with SMTP id y10-20020a17090629ca00b00a00231a92f8mr2231136eje.15.1700537861827; Mon, 20 Nov 2023 19:37:41 -0800 (PST) MIME-Version: 1.0 References: <20231113130601.3350915-1-hezhongkun.hzk@bytedance.com> <8734x1cdtr.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edgkapsz.fsf@yhuang6-desk2.ccr.corp.intel.com> <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> <871qcjbx20.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <871qcjbx20.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Mon, 20 Nov 2023 19:37:05 -0800 Message-ID: Subject: Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios To: "Huang, Ying" Cc: Chris Li , Zhongkun He , Andrew Morton , Johannes Weiner , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , linux-mm , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 917DA120006 X-Rspam-User: X-Stat-Signature: 3p9mfnqu1huwdn83ee9p3uy95zqt79ia X-Rspamd-Server: rspam01 X-HE-Tag: 1700537863-805845 X-HE-Meta: U2FsdGVkX18TclKyXgsKF5v/KNmJOW1wTPOPguO0GWyihcrzITcZQ4bKG7jMTl28BLWMVvOAezvTVXJmcUKARgNC3423DCRD8KwXSi9bCtZtXHjnN9+77arvep4tQnHcfjMh0cev7V+kcavJdAY9mvz2XNe14HzwB4ZWnpbLUMLaesKOUDahz8iv4eooNwU9ZkYV55ilO6NW6//EXa9equeg2IXMMah3onYByeUM9hTq3Zwkxgni3PIrP7O+CaZSaW0XXjvXqFiue/P3te2kmta9Sd/sGJTfcPWMGH/TdaGfuiyt0xTe5zzxvcRrsZ2BkmarAc3LDI7xhlPC2EnoZNXwoxf3ReccilIdUuhBpK0xN9ns2DvdqXZEVkYmY8u4KYIPb5p6woj1U9NFRAA1IKve2udv7EkvAsT3e1/Tw83pawGrITUKLzBlJUlgCjqE6rcEdAWcAl+GMJKo3TjJpFBo6hfwBawOzK0w5Egj3TfhaqczdzYwbE2vjK4zrAFnzpVjLwBWZgQYDdkhD2TLCW82JzNiB7xrAeABhxpPcAlMrV/8sDyTWa7/JcZGQ1ceRY2Dg6qKeJQOKtVkaBtmWe/Nioc/eRoydY/13TQ6zxZLBC+SRUBmcvOIfdwHL8q7rUmk6qvh96zjDEZgAvB+mD92yegH7e4q8NYjZQ5HavIJdwRT42x7gDoE4IbFR+WvZzmWYkVT+OQzLHOgXGRiIfCLlHzg+8+BVIWnXSdmTBrsf8nE/p+B+SVhVuJ/UEqAGKFKN9V9mMbmAcREEWdiMf5Q2QPpXOe5SMng5ccmZ3Z3SeRA/hXqwwuZLsCinJy/4OD0GV4r5CcCA5G7kOsfUlVB5kbZA7rXZQTOOodak3xPAwE/UsL4qy3pSdNRO9H4zGP9Vw+lXww4F1ABi8POHSLxL7X8awjX3KxCFhVozdGPqtNI13kdgTTXATKx3wRJxNCr6YyvTS4TONczG75 /D7j9mOM DC31xfdmIJtDW4/G9B3kO37iYUmOSqXF0lIEVDRJ/cAOyfIvQkAPITxp/lUPdaCxsoLJvxvU1L8dtuaOqcH7BXiGg7EE+Cf1qsG0ct07iKmwoa5cIY2McMn3mRiWRjbOaIu/8KN1Mvym994x9Fe2UBRMRGJrubNYXPm3jD4WlT5VipviUiAYRxSO6sQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 20, 2023 at 7:35=E2=80=AFPM Huang, Ying = wrote: > > Yosry Ahmed writes: > > > On Mon, Nov 20, 2023 at 5:55=E2=80=AFPM Huang, Ying wrote: > >> > >> Yosry Ahmed writes: > >> > >> > On Mon, Nov 20, 2023 at 4:57=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Yosry Ahmed writes: > >> >> > >> >> > On Sun, Nov 19, 2023 at 7:20=E2=80=AFPM Huang, Ying wrote: > >> >> >> > >> >> >> Chris Li writes: > >> >> >> > >> >> >> > On Thu, Nov 16, 2023 at 12:19=E2=80=AFPM Yosry Ahmed wrote: > >> >> >> >> > >> >> >> >> Not bypassing the swap slot cache, just make the callbacks to > >> >> >> >> invalidate the zswap entry, do memg uncharging, etc when the = slot is > >> >> >> >> no longer used and is entering the swap slot cache (i.e. when > >> >> >> >> free_swap_slot() is called), instead of when draining the swa= p slot > >> >> >> >> cache (i.e. when swap_range_free() is called). For all parts = of MM > >> >> >> >> outside of swap, the swap entry is freed when free_swap_slot(= ) is > >> >> >> >> called. We don't free it immediately because of caching, but = this > >> >> >> >> should be transparent to other parts of MM (e.g. zswap, memcg= , etc). > >> >> >> > > >> >> >> > That will cancel the batching effect on the swap slot free, ma= king the > >> >> >> > common case for swapping faults take longer to complete, rig= h? > >> >> >> > If I recall correctly, the uncharge is the expensive part of t= he swap > >> >> >> > slot free operation. > >> >> >> > I just want to figure out what we are trading off against. Thi= s is not > >> >> >> > one side wins all situations. > >> >> >> > >> >> >> Per my understanding, we don't batch memcg uncharging in > >> >> >> swap_entry_free() now. Although it's possible and may improve > >> >> >> performance. > >> >> > > >> >> > Yes. It actually causes a long tail in swapin fault latency as Ch= ris > >> >> > discovered in our prod. I am wondering if doing the memcg uncharg= ing > >> >> > outside the slots cache will actually amortize the cost instead. > >> >> > > >> >> > Regardless of memcg charging, which is more complicated, I think = we > >> >> > should at least move the call to zswap_invalidate() before the sl= ots > >> >> > cache. I would prefer that we move everything non-swapfile specif= ic > >> >> > outside the slots cache layer (zswap_invalidate(), > >> >> > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), > >> >> > mem_cgroup_uncharge_swap(), ..). However, if some of those are > >> >> > controversial, we can move some of them for now. > >> >> > >> >> That makes sense for me. > >> >> > >> >> > When draining free swap slots from the cache, swap_range_free() i= s > >> >> > called with nr_entries =3D=3D 1 anyway, so I can't see how any ba= tching is > >> >> > going on. If anything it should help amortize the cost. > >> >> > >> >> In swapcache_free_entries(), the sis->lock will be held to free mul= tiple > >> >> swap slots via swap_info_get_cont() if possible. This can reduce > >> >> sis->lock contention. > >> > > >> > Ah yes that's a good point. Since most of these callbacks don't > >> > actually access sis, but use the swap entry value itself, I am > >> > guessing the reason we need to hold the lock for all these callbacks > >> > is to prevent swapoff and swapon reusing the same swap entry on a > >> > different swap device, right? > >> > >> In, > >> > >> swapcache_free_entries() > >> swap_entry_free() > >> swap_range_free() > >> > >> Quite some sis fields will be accessed. > > > > I wasn't referring to this code. I was what's preventing us from > > moving the callbacks I mentioned outside the lock (zswap_invalidate(), > > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), > > mem_cgroup_uncharge_swap(), ..). I think most or all of them don't > > really access sis, but perhaps they need the lock to ensure the swap > > entry value does not get reused? > > In fact, the swap entries to be freed by swapcache_free_entries() is in > a state that can not be freed by other path (including swapoff()). It's > swap_map value is SWAP_HAS_CACHE, but we can not find folio in > swap_address_space(). Interesting, it would be even nicer if we can move them outside the lock. > > To be honest, I don't know whether there are dependencies on sis->lock > in these callbacks. You need to investigate them one by one. Yeah moving these callbacks outside batching and the lock is very intriguing but needs to be done carefully. We don't need to do it all at once, we can start with zswap_invalidate() and move them as we see fit. It would be nice if the code is refactored such that it's clear what callbacks are made immediately when the entry is no longer used and what callbacks are made when the swap slot is being freed. > > -- > Best Regards, > Huang, Ying