From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C0F6C47258 for ; Wed, 24 Jan 2024 03:15:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 893146B006E; Tue, 23 Jan 2024 22:15:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 81C116B0072; Tue, 23 Jan 2024 22:15:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BC5D6B0074; Tue, 23 Jan 2024 22:15:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5D8D86B006E for ; Tue, 23 Jan 2024 22:15:30 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F3EA1120310 for ; Wed, 24 Jan 2024 03:15:29 +0000 (UTC) X-FDA: 81712739178.23.6AE1EC1 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by imf28.hostedemail.com (Postfix) with ESMTP id EA312C0008 for ; Wed, 24 Jan 2024 03:15:26 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WYL2llkR; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706066128; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lqLoI1TxgxVd7wTzkuf3FrKyjkODssBRF7Z2M7PToJs=; b=x9EZMX2D027dq47HdF5RYhdS2dKM2Aji9kTOlEJdJPsLnxfPrRsba0qnPQ1/+g2SvcFYC0 ZI24w9FKCVnv8RguMNDFVM07aOHzASlCmAD3lc1ZpXU4iXj/DrPNuC/WFAnoZkev3lypa/ y4aPSTGRHZx+pi6dB3e40JJQWGcwCkA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WYL2llkR; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706066128; a=rsa-sha256; cv=none; b=QGPEZdUkVlSXFsHDvPNS1XN0KFo1R7CL9r8En93y3YL+2WTQ1Z7VeVFIjr8FOMgh8Wz7Ao xCpMpR1Gjyg1+2eFJBf4iygJ5xck9aK5fgeJDq6BcBHj0bu8QLppKQSLxN6kAC/MzS2QrI V7h3qWBeO70s+MFWsLazUrAhM57gZYA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706066127; x=1737602127; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=5JIQfkuXIc6IPaZJLDvYI4lW9FHnJAl+rx8UUzTpfbE=; b=WYL2llkRMqu3Ow6XDYI5Qjw8ChrBlXB89krvcOZZpx7sJbQKXMluvdNm yHcr6JxehCLfdrfSgf0K7WAin+gB4qGy87KWPuMVcx0EpuXZb7BcyaT5F LLC/J2+3QhnYoqTln5J44BEVP9iBQAZz95XM9FdUfUccHggkQJrNV193F K0oCipE+KLPnxgIzsubUNvAUM8MeiChIGpLFYQzrw/YSL0VpQexTTkgL4 QFUgjZluWTHhfU9obS42IxffinsR1DKZCcUG/6MMpBj4IycPGziZERLa+ 3l3H8sItF9Ec+XVugZ9bmvLr+D5CZA3q6fnkGMZpRP5YyegFRXpqVKHRy g==; X-IronPort-AV: E=McAfee;i="6600,9927,10962"; a="8840921" X-IronPort-AV: E=Sophos;i="6.05,215,1701158400"; d="scan'208";a="8840921" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jan 2024 19:15:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,215,1701158400"; d="scan'208";a="1789059" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by ORVIESA003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jan 2024 19:15:22 -0800 From: "Huang, Ying" To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chris Li , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] mm: swap: update inuse_pages after all cleanups are done In-Reply-To: (Yosry Ahmed's message of "Tue, 23 Jan 2024 01:40:31 -0800") References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-2-yosryahmed@google.com> <87wms0toh4.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 24 Jan 2024 11:13:26 +0800 Message-ID: <878r4ftodl.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: EA312C0008 X-Stat-Signature: zg4gg9z33nwj441x6sja5nqkqaewmqwz X-Rspam-User: X-HE-Tag: 1706066126-552485 X-HE-Meta: U2FsdGVkX18gr+eKwuikKhmzHjz+ZYXVRn6iddZegBYD1Pqkgy/118mvmjWq9ecwiU+apH4OzR64j2ot1L7+EsI2sKswDuuwvmwM2qBJKaNMdhbZju7mQ8YZ/3oesL4pjk9M4odkWlbl5N+rd6S9orVJ6DGFuIaqbl6ZXsWeoyECY5ZyOhSTdgl1sitJBnPknNXhJgMHddSGzuiZPerjILw82UpeWX1KUJPFlDr+7o/GKChSjEAY59P4h3T1TpFWiX8a8M7B0XoTRH6JdmZMZIO09PzZU8SxYhXjP23PNFpstVFExWSYBJEI8qP05h4KnUpzCwUn3AgdPCSyW9D7yU1iabwVmsnb6Litf4kAYJKdJpfu6CpIFh3P0BaUMiIKMz8TXZaJKMk6NO+liBc33dJRsEcdchGAnzrgHR0USGluiyAOkEWqWZkar1ILwX31prN5KjBJDVz7DlZO/taNwW+E/1Fft54TU0bqPt/pJ246Oun2QMy09jleUCTy8ef/FChY3i7uNleniLnEdukrF6pDnCrpyEqTSd9Ii7OCinemg15EMUv0L7+MY3dkkYg9jBkg/Ah4iMCRcGa+EdT6gHRPaCnxWguC6nriroeE9Xb9/BKEIBZv8LH9Lj6brVB+BFSYBrj5NWFhxLHsW7e/cljBbsPTQdG4WmsXPyVy2cMjxo4y41BwPUbZ0YUFDoB9evgFYYCPokL5RBFEuVcO0jMmLDaOQ3KozlWAE3v/Cq/MHV+jAOuDgN0a2lXS20iCJQW1f57lZFvmhkVuBTk6aeb0YA1y7wMaJnVsxnwT6rEeWUehE2uHwhvcP/Xl/8o7pcI3xWRZqAbE7pwLg05GpHEZFLEb+Zfarf26qKjtYZbKxByEV/qCpqcp/dCqvuLsNXKL7CQiai4y5qV3HHsDuaSzZ53ox9++bS033Ocw4FRfE9J+2t/+XC1gMdBc1QytN5MTsK/+T0UXD88rXC5 shqs9O8T lWg7gtieHprZUOoKfQKQ2mviuqEnRbMoUvGA2WdJtpkhRd/qaXY0GEKwRL4AJuhLFbehnjViXkq3yubyYHFfE0b794uRyPpwJMfpktrbsDj6B2WKEpFW7SYBAdQGZ7Yb2gbg+QlyhjUkK9VNlZ4zEoPJLiChZ7X/2XnAgVDlNGkOtEaGAslgWoXd7N85wqYn6wYKCKHqrJ2Eg55TYrkARxCOxcD/34UtpP3Oc7wb+WuB/zKE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yosry Ahmed writes: > On Tue, Jan 23, 2024 at 1:01=E2=80=AFAM Huang, Ying wrote: >> >> Yosry Ahmed writes: >> >> > In swap_range_free(), we update inuse_pages then do some cleanups (arch >> > invalidation, zswap invalidation, swap cache cleanups, etc). During >> > swapoff, try_to_unuse() uses inuse_pages to make sure all swap entries >> > are freed. Make sure we only update inuse_pages after we are done with >> > the cleanups. >> > >> > In practice, this shouldn't matter, because swap_range_free() is called >> > with the swap info lock held, and the swapoff code will spin for that >> > lock after try_to_unuse() anyway. >> > >> > The goal is to make it obvious and more future proof that once >> > try_to_unuse() returns, all cleanups are done. >> >> Defines "all cleanups". Apparently, some other operations are still >> to be done after try_to_unuse() in swap_off(). > > I am referring to the cleanups in swap_range_free() that I mentioned abov= e. > > How about s/all the cleanups/all the cleanups in swap_range_free()? Sounds good for me. >> >> > This also facilitates a >> > following zswap cleanup patch which uses this fact to simplify >> > zswap_swapoff(). >> > >> > Signed-off-by: Yosry Ahmed >> > --- >> > mm/swapfile.c | 4 ++-- >> > 1 file changed, 2 insertions(+), 2 deletions(-) >> > >> > diff --git a/mm/swapfile.c b/mm/swapfile.c >> > index 556ff7347d5f0..2fedb148b9404 100644 >> > --- a/mm/swapfile.c >> > +++ b/mm/swapfile.c >> > @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struc= t *si, unsigned long offset, >> > if (was_full && (si->flags & SWP_WRITEOK)) >> > add_to_avail_list(si); >> > } >> > - atomic_long_add(nr_entries, &nr_swap_pages); >> > - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); >> > if (si->flags & SWP_BLKDEV) >> > swap_slot_free_notify =3D >> > si->bdev->bd_disk->fops->swap_slot_free_notify; >> > @@ -752,6 +750,8 @@ static void swap_range_free(struct swap_info_struc= t *si, unsigned long offset, >> > offset++; >> > } >> > clear_shadow_from_swap_cache(si->type, begin, end); >> > + atomic_long_add(nr_entries, &nr_swap_pages); >> > + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); >> >> This isn't enough. You need to use smp_wmb() here and smp_rmb() in >> somewhere reading si->inuse_pages. > > Hmm, good point. Although as I mentioned in the commit message, this > shouldn't matter today as swap_range_free() executes with the lock > held, and we spin on the lock after try_to_unuse() returns. Yes. IIUC, this patch isn't needed too because we have spinlock already. > It may still be more future-proof to add the memory barriers. Yes. Without memory barriers, moving code doesn't guarantee memory order. > In swap_range_free, we want to make sure that the write to > si->inuse_pages in swap_range_free() happens *after* the cleanups > (specifically zswap_invalidate() in this case). > In swap_off, we want to make sure that the cleanups following > try_to_unuse() (e.g. zswap_swapoff) happen *after* reading > si->inuse_pages =3D=3D 0 in try_to_unuse(). > > So I think we want smp_wmb() in swap_range_free() and smp_mb() in > try_to_unuse(). Does the below look correct to you? > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 2fedb148b9404..a2fa2f65a8ddd 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -750,6 +750,12 @@ static void swap_range_free(struct > swap_info_struct *si, unsigned long offset, > offset++; > } > clear_shadow_from_swap_cache(si->type, begin, end); > + > + /* > + * Make sure that try_to_unuse() observes si->inuse_pages reachin= g 0 > + * only after the above cleanups are done. > + */ > + smp_wmb(); > atomic_long_add(nr_entries, &nr_swap_pages); > WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > } > @@ -2130,6 +2136,11 @@ static int try_to_unuse(unsigned int type) > return -EINTR; > } > > + /* > + * Make sure that further cleanups after try_to_unuse() returns h= appen > + * after swap_range_free() reduces si->inuse_pages to 0. > + */ > + smp_mb(); > return 0; > } We need to take care of "si->inuse_pages" checking at the beginning of try_to_unuse() too. Otherwise, it looks good to me. > Alternatively, we may just hold the spinlock in try_to_unuse() when we > check si->inuse_pages at the end. This will also ensure that any calls > to swap_range_free() have completed. Let me know what you prefer. Personally, I prefer memory barriers here. -- Best Regards, Huang, Ying