From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C06A6CF854C for ; Thu, 3 Oct 2024 00:42:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27CAF6B00C3; Wed, 2 Oct 2024 20:42:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 22BBD6B00D7; Wed, 2 Oct 2024 20:42:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F4CF6B00C7; Wed, 2 Oct 2024 20:42:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D46AA6B04E7 for ; Wed, 2 Oct 2024 20:42:27 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 65408AC3F5 for ; Thu, 3 Oct 2024 00:42:27 +0000 (UTC) X-FDA: 82630439934.10.ED9A983 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by imf18.hostedemail.com (Postfix) with ESMTP id 13AC21C0018 for ; Thu, 3 Oct 2024 00:42:24 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=eqHCz5b0; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727916105; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4+0UJkjSCEg/M2Op3ZNLJG+9INY+HbxWNWdevUVPmsE=; b=2yEwxqMaDnR4/RLAdUTX9tquKKnS3DsvDBolX2kF1PpUSTLma4qXZCVE3Mj8IIbqi9aqfT BCJumT2OuGAhBOQbYOEbWwa8lmI5l2KejiajSBhrsnUeiJEevwucDHddpgJEwWL7h0S9bu SfN5/Omh1BJNujRWmf7an6921OHs4iM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=eqHCz5b0; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727916105; a=rsa-sha256; cv=none; b=l6Vd75F8MW60FM1762rD2AcaDBn08EBMgG5GvFgdpT/FV38MT/87CpfpKUhXELJn4T0WTA jGd6cb94e4EFDiJnODMON6WuWcCg+5PyeNzC0LED/d18YucBQ8XjLCPvuGfDhLK32Y3fsZ /7BkJ+gyDLJIhZNkbZhxcYaazEX8XNI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727916145; x=1759452145; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=QapNPDWW0D6xJOFkvf3OmH3dvhr7KwJvo4JUoMz4Ih8=; b=eqHCz5b0E+BcjaEc6v7DJr+Qb8GLR32ZTLERhWo4KC/d38SWhSKdsf+5 82Poan482eXm63h/1OXQX9K4BYrANpzflPEQZQFcqboDuUZ7NYDpYmeqJ 2DqxT39RRVPg+KDGA6dm34hOsstH80n6laGrWi029MOVPk6r7ZoLtRX8i XWPS1zP+tu0onTsq9nWZA/AvPFMHH1kN2Sf0ydYCIhX6DneGOkAGsTvew a+ZAODbbRyQOvgDVkWGV9itu8JZbJFyZD/ouLnZbvP/Mzkvbsbktp1f+C lXO61LwOnalUI0B/MIq3CUT+CTOIQkAYsMYNekkeWJ+ajsfFCuGpCtt4j w==; X-CSE-ConnectionGUID: cIryx1XjS5WYQaeDTia2pA== X-CSE-MsgGUID: 1c/Nmyt9Q4yUQ1CVD/B2tw== X-IronPort-AV: E=McAfee;i="6700,10204,11213"; a="49625986" X-IronPort-AV: E=Sophos;i="6.11,173,1725346800"; d="scan'208";a="49625986" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Oct 2024 17:42:24 -0700 X-CSE-ConnectionGUID: fSM5io+RQ22OuT3atUyjTg== X-CSE-MsgGUID: B/nHh/MZTy+qdU6hSAldDA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,173,1725346800"; d="scan'208";a="74426139" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Oct 2024 17:42:19 -0700 From: "Huang, Ying" To: Kairui Song Cc: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, liyangouwen1@oppo.com, mhocko@suse.com, minchan@kernel.org, sj@kernel.org, stable@vger.kernel.org, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com Subject: Re: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails In-Reply-To: (Kairui Song's message of "Thu, 3 Oct 2024 02:30:29 +0800") References: <87y137nxqs.fsf@yhuang6-desk2.ccr.corp.intel.com> <20241002015754.969-1-21cnbao@gmail.com> Date: Thu, 03 Oct 2024 08:38:46 +0800 Message-ID: <87ed4ynhpl.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: ifdgsw4swn6ih83dpin57yppz3w5edwk X-Rspamd-Queue-Id: 13AC21C0018 X-Rspamd-Server: rspam11 X-HE-Tag: 1727916144-764677 X-HE-Meta: U2FsdGVkX1+o67KcohNBiOFdnI7ztw1mk2o8p2Es7ycfBxW9+m/jVcfWW0RdO/qOGbqMQvdyWbkwSrw6kP1HFOrvPKRtJimyRGbUx1PhbwbiKzWHW+EpHptk1Fc8J73mpUpnvcX6jRWCV8Z+jjM2z3Q2QVEJ5s4xwt5/y04xsWoWqERixpmrtAQCxdo/puc8uB2Tq8AqXLv95FFDASCa4DptQifSdDmSqvGPNsdQ9P5wCRrCBSfDcCUwZ2FrfVbK4P7NtLL/kT6sndXr05p1CAPG2TkbP99z6XQMQuhkrxsW2oQs4w90Odh8pLT2E31AGSgGNQLWKjBrACzw38bo3gfWQfSBRXMvch1Y6ESOT96Cj/dGTgghdXInZlTOuJp8ZELk+K7ftUaHU81c308TfimYGI2SmxtO8KQj1IipG1xIBAcQCnSwZzE9HT62xlgvSMzgQ26f6UyN7X+QwALPkjrMXvTpINrd/9VIrVRkz7iYAHxADpjU1jxNLLTWBwhJuEPlZmIzRYFsmdAukVxqZMjo57Dnf8uNx3oPOuBTa1jsEzBFlxVtu4iA4fQXxhTG130gLG8IWG3aEJT8dm6a7NWtH/MzbTzHc+sHZ2YWo8K/DDcOlQ77W9hVfkIWo2qFccYindnO0HRaALpaoHEDPvOxAiQbHApokuAsraKtbDdQtzSEIBgvZQkWCEZ5QgQ+MBe7WEGZBvcRz8/RO+9zCpjJJ5vgLqltfDqcwaOHJW+h73FLg18X66hTG5dnjCsEPhkZop650R8JN5qzaeQwvjF7JgonKviM1+9bdYwdX1n/doKLOa1Ws8OnGEBqLAo1pPNMhf/vx77g9tvE7Q+qEdtybd32oRjiVvts2bi9Oe0L9hWsn7M4LGW9tD14Uu0FfWtOqkSC91Y/q0sUb8O5bVUAL9DKMcssx+ZD3S1GkG57bH/14fZM/ITJGKDFodQ6lUtBsT5yMn8hHtouQiD egfAJtj3 JcNkzLA/0TnSEGPEq7zTf48fXQzAMYanYLDzFAo3oSygmRfh9HlKNVuEViYysbNRCeNhm6Np2Lg1emDLJFuHc1vVBwtRNqTW/lxIXqMlDDQWuVD0VVg5C+prWdwwAx36yK1sXK0SS73B0ZPX9S9/6NV++Bur31CqzbJL6iWmV8F4xprIYLOUWw2yxdsMEQ+iqe5LW9T+5tO5pQiUgwI2NnlL9VVmMT3m269EKOnNNGSnWfD7K+xrrIT+gUsLCQ0A+fq8hBYR9wrkDlYP39+2jOz5cLW4Xm8DfVKqTm4FT8c7/hF5AF7AjXfRTFHlVZgscWw5WwctZfZSnfc9DtEGLiBKW3F32ElJ1O/83+lmt9WRi4QC0ycJ6XhSuQg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, Kairui, Kairui Song writes: > On Wed, Oct 2, 2024 at 10:02=E2=80=AFAM Barry Song <21cnbao@gmail.com> wr= ote: >> >> On Wed, Oct 2, 2024 at 8:43=E2=80=AFAM Huang, Ying wrote: >> > >> > Barry Song <21cnbao@gmail.com> writes: >> > >> > > On Tue, Oct 1, 2024 at 7:43=E2=80=AFAM Huang, Ying wrote: >> > >> >> > >> Barry Song <21cnbao@gmail.com> writes: >> > >> >> > >> > On Sun, Sep 29, 2024 at 3:43=E2=80=AFPM Huang, Ying wrote: >> > >> >> >> > >> >> Hi, Barry, >> > >> >> >> > >> >> Barry Song <21cnbao@gmail.com> writes: >> > >> >> >> > >> >> > From: Barry Song >> > >> >> > >> > >> >> > Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcach= e") >> > >> >> > introduced an unconditional one-tick sleep when `swapcache_pre= pare()` >> > >> >> > fails, which has led to reports of UI stuttering on latency-se= nsitive >> > >> >> > Android devices. To address this, we can use a waitqueue to wa= ke up >> > >> >> > tasks that fail `swapcache_prepare()` sooner, instead of always >> > >> >> > sleeping for a full tick. While tasks may occasionally be woke= n by an >> > >> >> > unrelated `do_swap_page()`, this method is preferable to two s= cenarios: >> > >> >> > rapid re-entry into page faults, which can cause livelocks, and >> > >> >> > multiple millisecond sleeps, which visibly degrade user experi= ence. >> > >> >> >> > >> >> In general, I think that this works. Why not extend the solutio= n to >> > >> >> cover schedule_timeout_uninterruptible() in __read_swap_cache_as= ync() >> > >> >> too? We can call wake_up() when we clear SWAP_HAS_CACHE. To av= oid >> > >> > >> > >> > Hi Ying, >> > >> > Thanks for your comments. >> > >> > I feel extending the solution to __read_swap_cache_async() should= be done >> > >> > in a separate patch. On phones, I've never encountered any issues= reported >> > >> > on that path, so it might be better suited for an optimization ra= ther than a >> > >> > hotfix? > > Hi Barry and Ying, > > For the __read_swap_cache_async case, I'm not really against adding a > similar workqueue, but if no one is really suffering from it, and if > the workqueue do causes extra overhead, maybe we can ignore it for the > __read_swap_cache_async case now, and I plan to resent the following > patch: > https://lore.kernel.org/linux-mm/20240326185032.72159-9-ryncsn@gmail.com/= #r > > It removed all schedule_timeout_uninterruptible workaround and other > similar things, and the performance will go even higher. Sounds good to me. Please resend it. It's more complex than Barry's fix. So, I suggest to merge Barry's version first. >> > >> >> > >> Yes. It's fine to do that in another patch as optimization. >> > > >> > > Ok. I'll prepare a separate patch for optimizing that path. >> > >> > Thanks! >> > >> > >> >> > >> >> overhead to call wake_up() when there's no task waiting, we can = use an >> > >> >> atomic to count waiting tasks. >> > >> > >> > >> > I'm not sure it's worth adding the complexity, as wake_up() on an= empty >> > >> > waitqueue should have a very low cost on its own? >> > >> >> > >> wake_up() needs to call spin_lock_irqsave() unconditionally on a gl= obal >> > >> shared lock. On systems with many CPUs (such servers), this may ca= use >> > >> severe lock contention. Even the cache ping-pong may hurt performa= nce >> > >> much. >> > > >> > > I understand that cache synchronization was a significant issue befo= re >> > > qspinlock, but it seems to be less of a concern after its implementa= tion. >> > >> > Unfortunately, qspinlock cannot eliminate cache ping-pong issue, as >> > discussed in the following thread. >> > >> > https://lore.kernel.org/lkml/20220510192708.GQ76023@worktop.programmin= g.kicks-ass.net/ >> > >> > > However, using a global atomic variable would still trigger cache br= oadcasts, >> > > correct? >> > >> > We can only change the atomic variable to non-zero when >> > swapcache_prepare() returns non-zero, and call wake_up() when the atom= ic >> > variable is non-zero. Because swapcache_prepare() returns 0 most time= s, >> > the atomic variable is 0 most times. If we don't change the value of >> > atomic variable, cache ping-pong will not be triggered. >> >> yes. this can be implemented by adding another atomic variable. >> >> > >> > Hi, Kairui, >> > >> > Do you have some test cases to test parallel zram swap-in? If so, that >> > can be used to verify whether cache ping-pong is an issue and whether = it >> > can be fixed via a global atomic variable. >> > >> >> Yes, Kairui please run a test on your machine with lots of cores before >> and after adding a global atomic variable as suggested by Ying. I am >> sorry I don't have a server machine. > > I just had a try with the build kernel test which I used for the > allocator patch series, with -j64, 1G memcg on my local branch: > > Without the patch: > 2677.63user 9100.43system 3:33.15elapsed 5452%CPU (0avgtext+0avgdata > 863284maxresident)k > 2671.40user 8969.07system 3:33.67elapsed 5447%CPU (0avgtext+0avgdata > 863316maxresident)k > 2673.66user 8973.90system 3:33.18elapsed 5463%CPU (0avgtext+0avgdata > 863284maxresident)k > > With the patch: > 2655.05user 9134.21system 3:35.63elapsed 5467%CPU (0avgtext+0avgdata > 863288maxresident)k > 2652.57user 9104.87system 3:35.07elapsed 5466%CPU (0avgtext+0avgdata > 863272maxresident)k > 2665.44user 9155.97system 3:35.92elapsed 5474%CPU (0avgtext+0avgdata > 863316maxresident)k > > Only three test runs, the main bottleneck for the test is still some > other locks (list_lru lock, swap cgroup lock etc), but it does show > the performance seems a bit lower. Could be considered a trivial > amount of overhead so I think it's acceptable for the SYNC_IO path. Thanks! The difference appears measurable although small. And, in some use cases, multiple memcg may be used, so list_lru, swap cgroup lock will be less contended. -- Best Regards, Huang, Ying