From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C06A6CF854C
	for <linux-mm@archiver.kernel.org>; Thu,  3 Oct 2024 00:42:28 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 27CAF6B00C3; Wed,  2 Oct 2024 20:42:28 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 22BBD6B00D7; Wed,  2 Oct 2024 20:42:28 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 0F4CF6B00C7; Wed,  2 Oct 2024 20:42:28 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id D46AA6B04E7
	for <linux-mm@kvack.org>; Wed,  2 Oct 2024 20:42:27 -0400 (EDT)
Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 65408AC3F5
	for <linux-mm@kvack.org>; Thu,  3 Oct 2024 00:42:27 +0000 (UTC)
X-FDA: 82630439934.10.ED9A983
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	by imf18.hostedemail.com (Postfix) with ESMTP id 13AC21C0018
	for <linux-mm@kvack.org>; Thu,  3 Oct 2024 00:42:24 +0000 (UTC)
Authentication-Results: imf18.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=eqHCz5b0;
	spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1727916105;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=4+0UJkjSCEg/M2Op3ZNLJG+9INY+HbxWNWdevUVPmsE=;
	b=2yEwxqMaDnR4/RLAdUTX9tquKKnS3DsvDBolX2kF1PpUSTLma4qXZCVE3Mj8IIbqi9aqfT
	BCJumT2OuGAhBOQbYOEbWwa8lmI5l2KejiajSBhrsnUeiJEevwucDHddpgJEwWL7h0S9bu
	SfN5/Omh1BJNujRWmf7an6921OHs4iM=
ARC-Authentication-Results: i=1;
	imf18.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=eqHCz5b0;
	spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727916105; a=rsa-sha256;
	cv=none;
	b=l6Vd75F8MW60FM1762rD2AcaDBn08EBMgG5GvFgdpT/FV38MT/87CpfpKUhXELJn4T0WTA
	jGd6cb94e4EFDiJnODMON6WuWcCg+5PyeNzC0LED/d18YucBQ8XjLCPvuGfDhLK32Y3fsZ
	/7BkJ+gyDLJIhZNkbZhxcYaazEX8XNI=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1727916145; x=1759452145;
  h=from:to:cc:subject:in-reply-to:references:date:
   message-id:mime-version:content-transfer-encoding;
  bh=QapNPDWW0D6xJOFkvf3OmH3dvhr7KwJvo4JUoMz4Ih8=;
  b=eqHCz5b0E+BcjaEc6v7DJr+Qb8GLR32ZTLERhWo4KC/d38SWhSKdsf+5
   82Poan482eXm63h/1OXQX9K4BYrANpzflPEQZQFcqboDuUZ7NYDpYmeqJ
   2DqxT39RRVPg+KDGA6dm34hOsstH80n6laGrWi029MOVPk6r7ZoLtRX8i
   XWPS1zP+tu0onTsq9nWZA/AvPFMHH1kN2Sf0ydYCIhX6DneGOkAGsTvew
   a+ZAODbbRyQOvgDVkWGV9itu8JZbJFyZD/ouLnZbvP/Mzkvbsbktp1f+C
   lXO61LwOnalUI0B/MIq3CUT+CTOIQkAYsMYNekkeWJ+ajsfFCuGpCtt4j
   w==;
X-CSE-ConnectionGUID: cIryx1XjS5WYQaeDTia2pA==
X-CSE-MsgGUID: 1c/Nmyt9Q4yUQ1CVD/B2tw==
X-IronPort-AV: E=McAfee;i="6700,10204,11213"; a="49625986"
X-IronPort-AV: E=Sophos;i="6.11,173,1725346800"; 
   d="scan'208";a="49625986"
Received: from orviesa006.jf.intel.com ([10.64.159.146])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Oct 2024 17:42:24 -0700
X-CSE-ConnectionGUID: fSM5io+RQ22OuT3atUyjTg==
X-CSE-MsgGUID: B/nHh/MZTy+qdU6hSAldDA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.11,173,1725346800"; 
   d="scan'208";a="74426139"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Oct 2024 17:42:19 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: Barry Song <21cnbao@gmail.com>,  akpm@linux-foundation.org,
  chrisl@kernel.org,  david@redhat.com,  hannes@cmpxchg.org,
  hughd@google.com,  kaleshsingh@google.com,  linux-kernel@vger.kernel.org,
  linux-mm@kvack.org,  liyangouwen1@oppo.com,  mhocko@suse.com,
  minchan@kernel.org,  sj@kernel.org,  stable@vger.kernel.org,
  surenb@google.com,  v-songbaohua@oppo.com,  willy@infradead.org,
  yosryahmed@google.com,  yuzhao@google.com
Subject: Re: [PATCH] mm: avoid unconditional one-tick sleep when
 swapcache_prepare fails
In-Reply-To: <CAMgjq7D5qoFEK9Omvd5_Zqs6M+TEoG03+2i_mhuP5CQPSOPrmQ@mail.gmail.com>
	(Kairui Song's message of "Thu, 3 Oct 2024 02:30:29 +0800")
References: <87y137nxqs.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<20241002015754.969-1-21cnbao@gmail.com>
	<CAMgjq7D5qoFEK9Omvd5_Zqs6M+TEoG03+2i_mhuP5CQPSOPrmQ@mail.gmail.com>
Date: Thu, 03 Oct 2024 08:38:46 +0800
Message-ID: <87ed4ynhpl.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Stat-Signature: ifdgsw4swn6ih83dpin57yppz3w5edwk
X-Rspamd-Queue-Id: 13AC21C0018
X-Rspamd-Server: rspam11
X-HE-Tag: 1727916144-764677
X-HE-Meta: U2FsdGVkX1+o67KcohNBiOFdnI7ztw1mk2o8p2Es7ycfBxW9+m/jVcfWW0RdO/qOGbqMQvdyWbkwSrw6kP1HFOrvPKRtJimyRGbUx1PhbwbiKzWHW+EpHptk1Fc8J73mpUpnvcX6jRWCV8Z+jjM2z3Q2QVEJ5s4xwt5/y04xsWoWqERixpmrtAQCxdo/puc8uB2Tq8AqXLv95FFDASCa4DptQifSdDmSqvGPNsdQ9P5wCRrCBSfDcCUwZ2FrfVbK4P7NtLL/kT6sndXr05p1CAPG2TkbP99z6XQMQuhkrxsW2oQs4w90Odh8pLT2E31AGSgGNQLWKjBrACzw38bo3gfWQfSBRXMvch1Y6ESOT96Cj/dGTgghdXInZlTOuJp8ZELk+K7ftUaHU81c308TfimYGI2SmxtO8KQj1IipG1xIBAcQCnSwZzE9HT62xlgvSMzgQ26f6UyN7X+QwALPkjrMXvTpINrd/9VIrVRkz7iYAHxADpjU1jxNLLTWBwhJuEPlZmIzRYFsmdAukVxqZMjo57Dnf8uNx3oPOuBTa1jsEzBFlxVtu4iA4fQXxhTG130gLG8IWG3aEJT8dm6a7NWtH/MzbTzHc+sHZ2YWo8K/DDcOlQ77W9hVfkIWo2qFccYindnO0HRaALpaoHEDPvOxAiQbHApokuAsraKtbDdQtzSEIBgvZQkWCEZ5QgQ+MBe7WEGZBvcRz8/RO+9zCpjJJ5vgLqltfDqcwaOHJW+h73FLg18X66hTG5dnjCsEPhkZop650R8JN5qzaeQwvjF7JgonKviM1+9bdYwdX1n/doKLOa1Ws8OnGEBqLAo1pPNMhf/vx77g9tvE7Q+qEdtybd32oRjiVvts2bi9Oe0L9hWsn7M4LGW9tD14Uu0FfWtOqkSC91Y/q0sUb8O5bVUAL9DKMcssx+ZD3S1GkG57bH/14fZM/ITJGKDFodQ6lUtBsT5yMn8hHtouQiD
 egfAJtj3
 JcNkzLA/0TnSEGPEq7zTf48fXQzAMYanYLDzFAo3oSygmRfh9HlKNVuEViYysbNRCeNhm6Np2Lg1emDLJFuHc1vVBwtRNqTW/lxIXqMlDDQWuVD0VVg5C+prWdwwAx36yK1sXK0SS73B0ZPX9S9/6NV++Bur31CqzbJL6iWmV8F4xprIYLOUWw2yxdsMEQ+iqe5LW9T+5tO5pQiUgwI2NnlL9VVmMT3m269EKOnNNGSnWfD7K+xrrIT+gUsLCQ0A+fq8hBYR9wrkDlYP39+2jOz5cLW4Xm8DfVKqTm4FT8c7/hF5AF7AjXfRTFHlVZgscWw5WwctZfZSnfc9DtEGLiBKW3F32ElJ1O/83+lmt9WRi4QC0ycJ6XhSuQg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Hi, Kairui,

Kairui Song <ryncsn@gmail.com> writes:

> On Wed, Oct 2, 2024 at 10:02=E2=80=AFAM Barry Song <21cnbao@gmail.com> wr=
ote:
>>
>> On Wed, Oct 2, 2024 at 8:43=E2=80=AFAM Huang, Ying <ying.huang@intel.com=
> wrote:
>> >
>> > Barry Song <21cnbao@gmail.com> writes:
>> >
>> > > On Tue, Oct 1, 2024 at 7:43=E2=80=AFAM Huang, Ying <ying.huang@intel=
.com> wrote:
>> > >>
>> > >> Barry Song <21cnbao@gmail.com> writes:
>> > >>
>> > >> > On Sun, Sep 29, 2024 at 3:43=E2=80=AFPM Huang, Ying <ying.huang@i=
ntel.com> wrote:
>> > >> >>
>> > >> >> Hi, Barry,
>> > >> >>
>> > >> >> Barry Song <21cnbao@gmail.com> writes:
>> > >> >>
>> > >> >> > From: Barry Song <v-songbaohua@oppo.com>
>> > >> >> >
>> > >> >> > Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcach=
e")
>> > >> >> > introduced an unconditional one-tick sleep when `swapcache_pre=
pare()`
>> > >> >> > fails, which has led to reports of UI stuttering on latency-se=
nsitive
>> > >> >> > Android devices. To address this, we can use a waitqueue to wa=
ke up
>> > >> >> > tasks that fail `swapcache_prepare()` sooner, instead of always
>> > >> >> > sleeping for a full tick. While tasks may occasionally be woke=
n by an
>> > >> >> > unrelated `do_swap_page()`, this method is preferable to two s=
cenarios:
>> > >> >> > rapid re-entry into page faults, which can cause livelocks, and
>> > >> >> > multiple millisecond sleeps, which visibly degrade user experi=
ence.
>> > >> >>
>> > >> >> In general, I think that this works.  Why not extend the solutio=
n to
>> > >> >> cover schedule_timeout_uninterruptible() in __read_swap_cache_as=
ync()
>> > >> >> too?  We can call wake_up() when we clear SWAP_HAS_CACHE.  To av=
oid
>> > >> >
>> > >> > Hi Ying,
>> > >> > Thanks for your comments.
>> > >> > I feel extending the solution to __read_swap_cache_async() should=
 be done
>> > >> > in a separate patch. On phones, I've never encountered any issues=
 reported
>> > >> > on that path, so it might be better suited for an optimization ra=
ther than a
>> > >> > hotfix?
>
> Hi Barry and Ying,
>
> For the __read_swap_cache_async case, I'm not really against adding a
> similar workqueue, but if no one is really suffering from it, and if
> the workqueue do causes extra overhead, maybe we can ignore it for the
> __read_swap_cache_async case now, and I plan to resent the following
> patch:
> https://lore.kernel.org/linux-mm/20240326185032.72159-9-ryncsn@gmail.com/=
#r
>
> It removed all schedule_timeout_uninterruptible workaround and other
> similar things, and the performance will go even higher.

Sounds good to me.  Please resend it.  It's more complex than Barry's
fix.  So, I suggest to merge Barry's version first.

>> > >>
>> > >> Yes.  It's fine to do that in another patch as optimization.
>> > >
>> > > Ok. I'll prepare a separate patch for optimizing that path.
>> >
>> > Thanks!
>> >
>> > >>
>> > >> >> overhead to call wake_up() when there's no task waiting, we can =
use an
>> > >> >> atomic to count waiting tasks.
>> > >> >
>> > >> > I'm not sure it's worth adding the complexity, as wake_up() on an=
 empty
>> > >> > waitqueue should have a very low cost on its own?
>> > >>
>> > >> wake_up() needs to call spin_lock_irqsave() unconditionally on a gl=
obal
>> > >> shared lock.  On systems with many CPUs (such servers), this may ca=
use
>> > >> severe lock contention.  Even the cache ping-pong may hurt performa=
nce
>> > >> much.
>> > >
>> > > I understand that cache synchronization was a significant issue befo=
re
>> > > qspinlock, but it seems to be less of a concern after its implementa=
tion.
>> >
>> > Unfortunately, qspinlock cannot eliminate cache ping-pong issue, as
>> > discussed in the following thread.
>> >
>> > https://lore.kernel.org/lkml/20220510192708.GQ76023@worktop.programmin=
g.kicks-ass.net/
>> >
>> > > However, using a global atomic variable would still trigger cache br=
oadcasts,
>> > > correct?
>> >
>> > We can only change the atomic variable to non-zero when
>> > swapcache_prepare() returns non-zero, and call wake_up() when the atom=
ic
>> > variable is non-zero.  Because swapcache_prepare() returns 0 most time=
s,
>> > the atomic variable is 0 most times.  If we don't change the value of
>> > atomic variable, cache ping-pong will not be triggered.
>>
>> yes. this can be implemented by adding another atomic variable.
>>
>> >
>> > Hi, Kairui,
>> >
>> > Do you have some test cases to test parallel zram swap-in?  If so, that
>> > can be used to verify whether cache ping-pong is an issue and whether =
it
>> > can be fixed via a global atomic variable.
>> >
>>
>> Yes, Kairui please run a test on your machine with lots of cores before
>> and after adding a global atomic variable as suggested by Ying. I am
>> sorry I don't have a server machine.
>
> I just had a try with the build kernel test which I used for the
> allocator patch series, with -j64, 1G memcg on my local branch:
>
> Without the patch:
> 2677.63user 9100.43system 3:33.15elapsed 5452%CPU (0avgtext+0avgdata
> 863284maxresident)k
> 2671.40user 8969.07system 3:33.67elapsed 5447%CPU (0avgtext+0avgdata
> 863316maxresident)k
> 2673.66user 8973.90system 3:33.18elapsed 5463%CPU (0avgtext+0avgdata
> 863284maxresident)k
>
> With the patch:
> 2655.05user 9134.21system 3:35.63elapsed 5467%CPU (0avgtext+0avgdata
> 863288maxresident)k
> 2652.57user 9104.87system 3:35.07elapsed 5466%CPU (0avgtext+0avgdata
> 863272maxresident)k
> 2665.44user 9155.97system 3:35.92elapsed 5474%CPU (0avgtext+0avgdata
> 863316maxresident)k
>
> Only three test runs, the main bottleneck for the test is still some
> other locks (list_lru lock, swap cgroup lock etc), but it does show
> the performance seems a bit lower. Could be considered a trivial
> amount of overhead so I think it's acceptable for the SYNC_IO path.

Thanks!  The difference appears measurable although small.  And, in some
use cases, multiple memcg may be used, so list_lru, swap cgroup lock
will be less contended.

--
Best Regards,
Huang, Ying