From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81476D116F6 for ; Fri, 25 Oct 2024 05:56:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0386E6B0082; Fri, 25 Oct 2024 01:56:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2AA66B0083; Fri, 25 Oct 2024 01:56:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF2056B0085; Fri, 25 Oct 2024 01:56:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C07D26B0082 for ; Fri, 25 Oct 2024 01:56:06 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 08FE51A0536 for ; Fri, 25 Oct 2024 05:55:32 +0000 (UTC) X-FDA: 82711062882.09.65B6E62 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) by imf26.hostedemail.com (Postfix) with ESMTP id 7AC2514000D for ; Fri, 25 Oct 2024 05:55:49 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hCGStoc4; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729835594; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GnhC9EkqefGMTmcNCou+F/M4QCNdHIyF5ea9uxh0Xqs=; b=dB4qNrgRhRyG6squlbYNPFmmti3OUh4eyRH2vWU3Q3e+5vIk26OzE0a08QaQ36pDNog+6N 1Adw3Sl/1nW4zQAOtKdA6rc6IotIwyIvT7NxYEYWnaklKpflQDr4tIdM6VBsE/uWZMT+YW VLs5LvyxbgJXPuzXY3z/2b9ONEsE5y4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729835594; a=rsa-sha256; cv=none; b=L3N2CAr4FkUIeduXNZ6ecKX16TFCQpYd8DQtx0MqtPz44QYYFZsH41lUKGnOst6B2con8F x8tYWsDeLNPgt+Mq6JeRkIeBnJ6BcXhdfoxNUjKdu1eOtUy4KAvMSTWsYrn0OnWI6+VgBq IBlRAkE0IqxclVX8Vd7oeEQWKC8hUWA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hCGStoc4; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729835764; x=1761371764; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=Uf5oyK7DkQ/NijyKDxgnZnurfzIiu5PuGCQI0tFnGvw=; b=hCGStoc4AvvGptWwCG0/AP+Ix3gqDiY9D/BUJi36iznslyle4iUc8nC5 N0pukl0tXioBB/rQmR8ieTc0+srI+7wNn0qjb4iNAMU4STkxAc9yIsrsv 4EHev66VbdP84lM1Bukh6PQvdGRWLNv7YvAvp1HcR9frYx6ubcztEViF3 qK0vrbUSVGtfYKHMpm926IooXYyrG3YGIv4wWoJczp3EGfwrrNY3Eh/Mw P7BYl7kFKpwK8WRDF6BkvH4HK82ij5U7Yla2xAJimFZS5Eipj6A4WdxeX 03BEw6oPwEvWXfe51UuE3TW2hEt1VPCwfgwh1ctgcIF+82FvwpiDTpm0R A==; X-CSE-ConnectionGUID: PItKLSawRryvjR3HQKBhlg== X-CSE-MsgGUID: rcx8BJGyR/OxYWI3Aqtqgw== X-IronPort-AV: E=McAfee;i="6700,10204,11235"; a="33405549" X-IronPort-AV: E=Sophos;i="6.11,231,1725346800"; d="scan'208";a="33405549" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2024 22:56:02 -0700 X-CSE-ConnectionGUID: e1DzDRDoStaHcGnnie9Eyw== X-CSE-MsgGUID: T3pfqavCQmezuHIJ96jdaA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,231,1725346800"; d="scan'208";a="118285353" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2024 22:56:02 -0700 From: "Huang, Ying" To: Kairui Song Cc: linux-mm Subject: Re: cluster_alloc_swap_entry() may change per-cpu cluster of another CPU randomly In-Reply-To: (Kairui Song's message of "Thu, 24 Oct 2024 11:50:59 +0800") References: <871q0641xd.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 25 Oct 2024 13:52:28 +0800 Message-ID: <87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7AC2514000D X-Stat-Signature: inb7p5h4by3jf5hu71tme7h3ixqr7uhx X-Rspam-User: X-HE-Tag: 1729835749-412334 X-HE-Meta: U2FsdGVkX187MFGNotLZkQW2IMM08IbRqxCD1BoUTdpl7/2wtnbbyc8LoGgOgqIE75YNZPWrfsuyaCL+6lJoeN68fan19fyiO41UMhQtbd4tFE+OZ6QdYy/OGOv3HbvC3B1w7eMziLrFjzn8HDYhUb1NhztkKBM3xOabWdqI7Lnzm6jtHLPWTrSaFFl6QCeCVRxAaYSMBebeGcRkgJu7sPYCL2utbzRg35kA5axshSKboEBOnHMIbjRX8OlC0m7avdJEhg+/UYadeBL5X8HidfiWko6hsDJhx+lmuM/puzLTf81Iu1yzZn1fHo/qBhutoNjsoQWZzQGra0Sy3WZAHTJ7E5enOROFyFsCQIbYHdKFCZHSCm3w/B87/sIwEyI5lMWqc0lAG8oXDYGkAYVa5YnIPkTlOAyI/zB88hyZerOHGM90PNZEDCrClfEkUM04iuXZtxS+lOmmPbrW+yWGKml4FO9U9KsVKuL46QTcqkYkXE/WDHHp3QvJvasQxb85wzW6O0v728x+RlE+Sg+AaQubBRy333UrvcgdzdccYt2rj+0I6Q9i1CmuC3t1AejFMoZg3fuBtRbm9moH5QqPRMP3XxYiKPOtmshE9wbghv3UvZpKRCxZqjuJC1GCKLcn+ITslpVeYAFDMgK4N/SlBLR0ltFNLoHADQFvBaT7t9MfNudqv7QJC8N/Sq9e4/q/iOXFe9C+pSCFNOlSIUTNHF8x8/pSh7IGaAQCVLJUeniPX7OXuTnKngkJDl95mF2cfo+qWBAD1ZnIuvNyeGd4VeBtcQ3a+rIdjU6buldg4AJOFHm4SyutoFmtyP7fZcxloByobtq/9jquNiRpcUlSkOtumO7jdvDmKwH34qWXy7qdknxba1syIVMoi7hQzw/DNiZ8WtdyzbFRfeEd9bor9HBYbPSXNNocn5FedLmxJN/iuaSEu1VV9MhaMDfImeMZOg4hGcu0A/AIH0HsLPs MRxox7L0 1X3f1tfM5uCkIEzSgrybiNd+3WqaCp4Zj8yttb7gFQyEAyxQXUnUIaItJbde3a7dawhGOXJXwvJLTTdasEN+cswnqMfHZGH3Lt+elOkJ8pMiGOINgOTa5psXTXnwlCqrhvNpjFKUzkgWKbSjSWfVpwd0ccQiGJsOgeh8MTJUfWoWtudHsCHhs67d4XoZJ6WbEPWSQxk0nYcH099/Pld5CMsRWOT9AXPY/6HFUQNcRIfELy0Z03tMLSfWTYQONDSfHFOn7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kairui Song writes: > On Thu, Oct 24, 2024 at 11:29=E2=80=AFAM Huang, Ying wrote: >> >> Hi, Kairui, >> >> When reading new swap cluster allocation code. Sorry, I didn't find >> time to review your patch at the first place. I found that in >> cluster_alloc_swap_entry(), the following code path is possible, >> >> cluster =3D this_cpu_ptr(si->percpu_cluster); >> offset =3D alloc_swap_scan_cluster(); >> ... >> spin_unlock(&ci->lock); >> spin_unlock(&si->lock); >> /* migrate to another cpu */ >> spin_lock(&ci->lock); >> spin_lock(&si->lock); >> cluster->next[order] =3D offset; >> >> That is, the per cpu cluster of a CPU may be changed on another CPU. I >> guess that this will not cause some functionality issue. However, this >> makes code harder to be reasoned. Is it possible to avoid unlock before >> changing per cpu cluster? Or, give up updating per cpu cluster if we >> need to unlock. > > Hi Ying > > Yes, I've noticed this when working on the new si->lock series. It may > cause fragmentation, only if the CPU the current allocation migrates > from requested another swap allocation, in such case it may waste one > per cpu cluster. > > No functionality or leak issue, but make things harder to reason > indeed, as you mentioned. > > In the new series I used a local lock to prevent migration, so the cpu > is stable during the whole process. However that is not doable unless > si->lock is removed from allocation path or there are lock dependency > issue. > > Do we need to fix this separately in the current tree? The > fragmentation is overall better with these patches, so I think that is > causing any issues. Before these patches per CPU cluster can also be > stolen by other CPUs so these were not guaranteed to be accurate. > > One easy way we can do is set the cluster->next[order] to 0 before > scanning for a new cluster to use, and check if it is 0 before setting > next[order] again. > > That may still cause fragmentation as migration will cause extra > clusters to be scanned and used, which is hard to avoid without > changing too much code (like the things being done in the new si->lock > series). Find another possible more serious problem. After unlock/lock cycle, the state of swap device may be changed (for example, swapoffed). So in the previous code, si->flags +=3D SWP_SCANNING; if (!(si->flags & SWP_WRITEOK)) is used for that. I don't find they are used after unlock/lock cycle. Can you check it? -- Best Regards, Huang, Ying