From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 81476D116F6
	for <linux-mm@archiver.kernel.org>; Fri, 25 Oct 2024 05:56:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 0386E6B0082; Fri, 25 Oct 2024 01:56:07 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id F2AA66B0083; Fri, 25 Oct 2024 01:56:06 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id DF2056B0085; Fri, 25 Oct 2024 01:56:06 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id C07D26B0082
	for <linux-mm@kvack.org>; Fri, 25 Oct 2024 01:56:06 -0400 (EDT)
Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 08FE51A0536
	for <linux-mm@kvack.org>; Fri, 25 Oct 2024 05:55:32 +0000 (UTC)
X-FDA: 82711062882.09.65B6E62
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12])
	by imf26.hostedemail.com (Postfix) with ESMTP id 7AC2514000D
	for <linux-mm@kvack.org>; Fri, 25 Oct 2024 05:55:49 +0000 (UTC)
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=hCGStoc4;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1729835594;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=GnhC9EkqefGMTmcNCou+F/M4QCNdHIyF5ea9uxh0Xqs=;
	b=dB4qNrgRhRyG6squlbYNPFmmti3OUh4eyRH2vWU3Q3e+5vIk26OzE0a08QaQ36pDNog+6N
	1Adw3Sl/1nW4zQAOtKdA6rc6IotIwyIvT7NxYEYWnaklKpflQDr4tIdM6VBsE/uWZMT+YW
	VLs5LvyxbgJXPuzXY3z/2b9ONEsE5y4=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729835594; a=rsa-sha256;
	cv=none;
	b=L3N2CAr4FkUIeduXNZ6ecKX16TFCQpYd8DQtx0MqtPz44QYYFZsH41lUKGnOst6B2con8F
	x8tYWsDeLNPgt+Mq6JeRkIeBnJ6BcXhdfoxNUjKdu1eOtUy4KAvMSTWsYrn0OnWI6+VgBq
	IBlRAkE0IqxclVX8Vd7oeEQWKC8hUWA=
ARC-Authentication-Results: i=1;
	imf26.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=hCGStoc4;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1729835764; x=1761371764;
  h=from:to:cc:subject:in-reply-to:references:date:
   message-id:mime-version:content-transfer-encoding;
  bh=Uf5oyK7DkQ/NijyKDxgnZnurfzIiu5PuGCQI0tFnGvw=;
  b=hCGStoc4AvvGptWwCG0/AP+Ix3gqDiY9D/BUJi36iznslyle4iUc8nC5
   N0pukl0tXioBB/rQmR8ieTc0+srI+7wNn0qjb4iNAMU4STkxAc9yIsrsv
   4EHev66VbdP84lM1Bukh6PQvdGRWLNv7YvAvp1HcR9frYx6ubcztEViF3
   qK0vrbUSVGtfYKHMpm926IooXYyrG3YGIv4wWoJczp3EGfwrrNY3Eh/Mw
   P7BYl7kFKpwK8WRDF6BkvH4HK82ij5U7Yla2xAJimFZS5Eipj6A4WdxeX
   03BEw6oPwEvWXfe51UuE3TW2hEt1VPCwfgwh1ctgcIF+82FvwpiDTpm0R
   A==;
X-CSE-ConnectionGUID: PItKLSawRryvjR3HQKBhlg==
X-CSE-MsgGUID: rcx8BJGyR/OxYWI3Aqtqgw==
X-IronPort-AV: E=McAfee;i="6700,10204,11235"; a="33405549"
X-IronPort-AV: E=Sophos;i="6.11,231,1725346800"; 
   d="scan'208";a="33405549"
Received: from orviesa001.jf.intel.com ([10.64.159.141])
  by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2024 22:56:02 -0700
X-CSE-ConnectionGUID: e1DzDRDoStaHcGnnie9Eyw==
X-CSE-MsgGUID: T3pfqavCQmezuHIJ96jdaA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.11,231,1725346800"; 
   d="scan'208";a="118285353"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2024 22:56:02 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm <linux-mm@kvack.org>
Subject: Re: cluster_alloc_swap_entry() may change per-cpu cluster of
 another CPU randomly
In-Reply-To: <CAMgjq7ArBOZjTquDr5BhN+xtqKLbE3dV-7rXB-mzz_UL_m3HNQ@mail.gmail.com>
	(Kairui Song's message of "Thu, 24 Oct 2024 11:50:59 +0800")
References: <871q0641xd.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<CAMgjq7ArBOZjTquDr5BhN+xtqKLbE3dV-7rXB-mzz_UL_m3HNQ@mail.gmail.com>
Date: Fri, 25 Oct 2024 13:52:28 +0800
Message-ID: <87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 7AC2514000D
X-Stat-Signature: inb7p5h4by3jf5hu71tme7h3ixqr7uhx
X-Rspam-User: 
X-HE-Tag: 1729835749-412334
X-HE-Meta: U2FsdGVkX187MFGNotLZkQW2IMM08IbRqxCD1BoUTdpl7/2wtnbbyc8LoGgOgqIE75YNZPWrfsuyaCL+6lJoeN68fan19fyiO41UMhQtbd4tFE+OZ6QdYy/OGOv3HbvC3B1w7eMziLrFjzn8HDYhUb1NhztkKBM3xOabWdqI7Lnzm6jtHLPWTrSaFFl6QCeCVRxAaYSMBebeGcRkgJu7sPYCL2utbzRg35kA5axshSKboEBOnHMIbjRX8OlC0m7avdJEhg+/UYadeBL5X8HidfiWko6hsDJhx+lmuM/puzLTf81Iu1yzZn1fHo/qBhutoNjsoQWZzQGra0Sy3WZAHTJ7E5enOROFyFsCQIbYHdKFCZHSCm3w/B87/sIwEyI5lMWqc0lAG8oXDYGkAYVa5YnIPkTlOAyI/zB88hyZerOHGM90PNZEDCrClfEkUM04iuXZtxS+lOmmPbrW+yWGKml4FO9U9KsVKuL46QTcqkYkXE/WDHHp3QvJvasQxb85wzW6O0v728x+RlE+Sg+AaQubBRy333UrvcgdzdccYt2rj+0I6Q9i1CmuC3t1AejFMoZg3fuBtRbm9moH5QqPRMP3XxYiKPOtmshE9wbghv3UvZpKRCxZqjuJC1GCKLcn+ITslpVeYAFDMgK4N/SlBLR0ltFNLoHADQFvBaT7t9MfNudqv7QJC8N/Sq9e4/q/iOXFe9C+pSCFNOlSIUTNHF8x8/pSh7IGaAQCVLJUeniPX7OXuTnKngkJDl95mF2cfo+qWBAD1ZnIuvNyeGd4VeBtcQ3a+rIdjU6buldg4AJOFHm4SyutoFmtyP7fZcxloByobtq/9jquNiRpcUlSkOtumO7jdvDmKwH34qWXy7qdknxba1syIVMoi7hQzw/DNiZ8WtdyzbFRfeEd9bor9HBYbPSXNNocn5FedLmxJN/iuaSEu1VV9MhaMDfImeMZOg4hGcu0A/AIH0HsLPs
 MRxox7L0
 1X3f1tfM5uCkIEzSgrybiNd+3WqaCp4Zj8yttb7gFQyEAyxQXUnUIaItJbde3a7dawhGOXJXwvJLTTdasEN+cswnqMfHZGH3Lt+elOkJ8pMiGOINgOTa5psXTXnwlCqrhvNpjFKUzkgWKbSjSWfVpwd0ccQiGJsOgeh8MTJUfWoWtudHsCHhs67d4XoZJ6WbEPWSQxk0nYcH099/Pld5CMsRWOT9AXPY/6HFUQNcRIfELy0Z03tMLSfWTYQONDSfHFOn7
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Kairui Song <ryncsn@gmail.com> writes:

> On Thu, Oct 24, 2024 at 11:29=E2=80=AFAM Huang, Ying <ying.huang@intel.co=
m> wrote:
>>
>> Hi, Kairui,
>>
>> When reading new swap cluster allocation code.  Sorry, I didn't find
>> time to review your patch at the first place.  I found that in
>> cluster_alloc_swap_entry(), the following code path is possible,
>>
>>   cluster =3D this_cpu_ptr(si->percpu_cluster);
>>   offset =3D alloc_swap_scan_cluster();
>>     ...
>>       spin_unlock(&ci->lock);
>>       spin_unlock(&si->lock);
>>       /* migrate to another cpu */
>>       spin_lock(&ci->lock);
>>       spin_lock(&si->lock);
>>   cluster->next[order] =3D offset;
>>
>> That is, the per cpu cluster of a CPU may be changed on another CPU.  I
>> guess that this will not cause some functionality issue.  However, this
>> makes code harder to be reasoned.  Is it possible to avoid unlock before
>> changing per cpu cluster?  Or, give up updating per cpu cluster if we
>> need to unlock.
>
> Hi Ying
>
> Yes, I've noticed this when working on the new si->lock series. It may
> cause fragmentation, only if the CPU the current allocation migrates
> from requested another swap allocation, in such case it may waste one
> per cpu cluster.
>
> No functionality or leak issue, but make things harder to reason
> indeed, as you mentioned.
>
> In the new series I used a local lock to prevent migration, so the cpu
> is stable during the whole process. However that is not doable unless
> si->lock is removed from allocation path or there are lock dependency
> issue.
>
> Do we need to fix this separately in the current tree? The
> fragmentation is overall better with these patches, so I think that is
> causing any issues. Before these patches per CPU cluster can also be
> stolen by other CPUs so these were not guaranteed to be accurate.
>
> One easy way we can do is set the cluster->next[order] to 0 before
> scanning for a new cluster to use, and check if it is 0 before setting
> next[order] again.
>
> That may still cause fragmentation as migration will cause extra
> clusters to be scanned and used, which is hard to avoid without
> changing too much code (like the things being done in the new si->lock
> series).

Find another possible more serious problem.  After unlock/lock cycle,
the state of swap device may be changed (for example, swapoffed).  So in
the previous code,

	si->flags +=3D SWP_SCANNING;

	if (!(si->flags & SWP_WRITEOK))

is used for that.  I don't find they are used after unlock/lock cycle.
Can you check it?

--
Best Regards,
Huang, Ying