From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 63D2EC3DA6E
	for <linux-mm@archiver.kernel.org>; Mon, 25 Dec 2023 07:10:09 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id D2EE18E0003; Mon, 25 Dec 2023 02:10:08 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id CDD8D8E0001; Mon, 25 Dec 2023 02:10:08 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B7E128E0003; Mon, 25 Dec 2023 02:10:08 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id A45D88E0001
	for <linux-mm@kvack.org>; Mon, 25 Dec 2023 02:10:08 -0500 (EST)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 204B91A0362
	for <linux-mm@kvack.org>; Mon, 25 Dec 2023 07:10:08 +0000 (UTC)
X-FDA: 81604466496.21.98833C0
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11])
	by imf24.hostedemail.com (Postfix) with ESMTP id A6F8E18000D
	for <linux-mm@kvack.org>; Mon, 25 Dec 2023 07:10:05 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=kzFumd3k;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1703488206;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=j0+Pc5CdvWW8ArkO7mdAYOXAOkB3UXK0jR0SM2lwIEE=;
	b=W8c4NXovJEH9dR8KXGnB0XcWHLVgmrBl2D+9AAk/vNBQQidW93PhfMRbY5dLrK11stfUkz
	oipD54SpMqBybfPIxWrmBR9xneQGRAVH+0iWPjOKqLhJX3BE9dtykBF98XuZPp2Bbd9CSl
	OjiPAVItUS6gXNR9mfPcQ9h+GzLyMZA=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=kzFumd3k;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703488206; a=rsa-sha256;
	cv=none;
	b=y5cG0+1KDp9OF1TPGp0eLajhWYltgrtvSk5ensfahuyWuEP28Iy5CYx64q71j0YEuSKL+H
	mWNQZkT7hTBJ0xe9w8bq6kCJgrualwzCysVPSlpnPeZCxHrOdOirenph/yzQ4P1NrmnENF
	x7bJ08Pris0C/UA1IrJDOUKv2JYpuuA=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1703488205; x=1735024205;
  h=from:to:cc:subject:in-reply-to:references:date:
   message-id:mime-version;
  bh=dzZ7z0rN7lG5oRVUv+N/PVnDDGgfhgQlgF7xn3bk66Y=;
  b=kzFumd3kXa0yZe4phuN0ozeMKj7dsuTWAUWuOxZQEuusshWZIiJV2gHk
   wzhId5cuPDtzJsQdT9Juyx7rWQygRR9FANU2TUaoDFqgLB1zJgkcLl3Wb
   CbqPVCTJFKR0y0hz2lN3vOvyFZ/Qacws2iqjjzp/FizNYGqLwLlYoLqL4
   k04nLzbUN6ijl9XrSXpwzE7TM0t1g8k+GCRyY0WUbtNnLC3ZXOx0nSE2/
   SL+MmEN5PMnN27EADlNNlWWw0sCNXoEbRJ+KqXzhUBsVPbGUFR8mqDR8N
   ZuNTLLHpSjEP9eiqybvvjmq1KRsAOC+ONfw0jA07TN3w3ThqcRpdc2z2B
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10934"; a="3105346"
X-IronPort-AV: E=Sophos;i="6.04,302,1695711600"; 
   d="scan'208";a="3105346"
Received: from orsmga008.jf.intel.com ([10.7.209.65])
  by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Dec 2023 23:10:03 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10934"; a="806604119"
X-IronPort-AV: E=Sophos;i="6.04,302,1695711600"; 
   d="scan'208";a="806604119"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Dec 2023 23:09:58 -0800
From: "Huang, Ying" <ying.huang@intel.com>
To: Chris Li <chrisl@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
  linux-kernel@vger.kernel.org,  linux-mm@kvack.org,  Wei Xu
 <weixugc@google.com>,  Yu Zhao <yuzhao@google.com>,  Greg Thelen
 <gthelen@google.com>,  Chun-Tse Shao <ctshao@google.com>,  Suren
 Baghdasaryan <surenb@google.com>,  Yosry Ahmed <yosryahmed@google.com>,
  Brain Geffon <bgeffon@google.com>,  Minchan Kim <minchan@kernel.org>,
  Michal Hocko <mhocko@suse.com>,  Mel Gorman
 <mgorman@techsingularity.net>,  Nhat Pham <nphamcs@gmail.com>,  Johannes
 Weiner <hannes@cmpxchg.org>,  Kairui Song <kasong@tencent.com>,  Zhongkun
 He <hezhongkun.hzk@bytedance.com>,  Kemeng Shi
 <shikemeng@huaweicloud.com>,  Barry Song <v-songbaohua@oppo.com>,  Hugh
 Dickins <hughd@google.com>, Tim Chen <tim.c.chen@linux.intel.com>
Subject: Re: [PATCH] mm: swap: async free swap slot cache entries
In-Reply-To: <ZYYY1VBKdLHH-Kl3@google.com> (Chris Li's message of "Fri, 22 Dec
	2023 15:16:37 -0800")
References: <20231221-async-free-v1-1-94b277992cb0@kernel.org>
	<20231222115208.ab4d2aeacdafa4158b14e532@linux-foundation.org>
	<ZYYY1VBKdLHH-Kl3@google.com>
Date: Mon, 25 Dec 2023 15:07:59 +0800
Message-ID: <87o7eeg3ow.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Rspamd-Queue-Id: A6F8E18000D
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Stat-Signature: chc8t6k8ragxwtaz61x1jgfwp6fdq1qp
X-HE-Tag: 1703488205-644086
X-HE-Meta: U2FsdGVkX18DpA8t6laM4k9/vbQQVWkye8KjoiFQp0TG3pQksMOi7v/UZiKytI+UHpkUMsyJFytGOAFWdMMQnWXP+d82y16x3Qy3D5x4qu5YV/Yc7jizRh4bWNQce9vczvsweotb2f1ORzv2dCcAQrNaIaxteqMGIFH8oh+TZVrORPQnAQqlsrVXNf7NXeAUVcjXUD1ORO7GcqbWpfjv8hVw9obJS0PusAH+1FhVW4HerNWrdJpqUa9tcj2VI8nOCCuvL4wLz2YO2f879LK6GPG/8IgF0C4YGim0tzvtJSO4hzwx7XcD/B1Y6PeKcB6To5txM3Sh886nd5WsHA1nlUMg6RZh/EoTR3M5gPiO0c3zBwsTS4VEBwtQHEwIEdP+lW+Q6y4utARcmVc0nhzjLCipLt4ADPxpn0y+iCI3t9XaM1a1XioshHDxEPWszl9makBOIGbguHgSFCcxD/8hS1KPlNdxEMvn95vh+bZEwurBYDssm3pA22JTmFQMRtfgxidlEa3PpZIDOm5jwFNPZ5FM+KKW7OkC/j6msVU/x4JFaxwRGxL0Qnv/SqSc2dphRUFFoX5hUFYIS51ysMqepc3M7AqiUzIWvVN2vqM4YnkiZHnLl3K7pBQac167dVIjRl5mlMGbS/uIY5Qc5qdvRGKyTIph96mjsTAqAaelLTAsSU4EMyWuRLbObQGJuBcWleyH0cJZmrcF4FKtw1dcqAHfNndkskhJPCy6mP8t+mfGGghrqBzWeFuILxGeSyVVWnwo7O0n++1i5/1F8A1qhDymbAfnnzxjN4nDmuVEikc3NZ97oLoxG46Is6kOT3rBHVpK5Z3xsKm1Q+NAfcOlWMcjBtYqhLk+IDqpbcMc8tT/GmuDjSozGGEQsebY80IfxoFpmahvuJWRqWXZnbQdZ1Zjp5HypX1eQ1S0j0y4sGv+WdPvezHzda+/E44H7woGs/QpyPSAoz3DybIpTVR
 6T1NcdHS
 2K4qwXb4AAdlfBxqSpu8UCODkMKz0ZqGvQ43SFwDPbNAewjW10ulGq7OX17ZMc03qieGVULVrX7CE43PfyljJZz17m2eOQZHvGQMfPv8nTTZ4KlPqvBSfWm/A7pADgOGnfvdKly911pJi/qetrXwuqnEtIcFOxuH4KoHTLZKkDfoFscQy6SqXOqMkLGWh29xR0lQTxfB6vj0uwho=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Chris Li <chrisl@kernel.org> writes:

> On Fri, Dec 22, 2023 at 11:52:08AM -0800, Andrew Morton wrote:
>> On Thu, 21 Dec 2023 22:25:39 -0800 Chris Li <chrisl@kernel.org> wrote:
>> 
>> > We discovered that 1% swap page fault is 100us+ while 50% of
>> > the swap fault is under 20us.
>> > 
>> > Further investigation show that a large portion of the time
>> > spent in the free_swap_slots() function for the long tail case.
>> > 
>> > The percpu cache of swap slots is freed in a batch of 64 entries
>> > inside free_swap_slots(). These cache entries are accumulated
>> > from previous page faults, which may not be related to the current
>> > process.
>> > 
>> > Doing the batch free in the page fault handler causes longer
>> > tail latencies and penalizes the current process.
>> > 
>> > Move free_swap_slots() outside of the swapin page fault handler into an
>> > async work queue to avoid such long tail latencies.
>> 
>> This will require a larger amount of total work than the current
>
> Yes, there will be a tiny little bit of extra overhead to schedule the job
> on to the other work queue.
>
>> scheme.  So we're trading that off against better latency.
>> 
>> Why is this a good tradeoff?
>
> That is a very good question. Both Hugh and Wei had asked me similar questions
> before. +Hugh.
>
> The TL;DR is that it makes the swap more palleralizedable.
>
> Because morden computers typically have more than one CPU and the CPU utilization
> is rarely reached to 100%. We are actually not trading the latency for some one
> run slower. Most of the time the real impact is that the current swapin page fault
> can return quicker so more work can submit to the kernel sooner, at the same time
> the other idle CPU can pick up the non latency critical work of freeing of the
> swap slot cache entries. The net effect is that we speed things up and increase
> the overall system utilization rather than slow things down.

You solution depends on there is enough idle time in the system.  This
isn't always true.

In general, all async solutions have 2 possible issues.

a) Unrelated applications may be punished.  Because they may wait for
CPU which is running the async operations.  In the original solution,
the application swap more will be punished.

b) The CPU time cannot be charged to appropriate applications.  The
original behavior isn't perfect too.  But it's better than async worker.

Given the runtime of worker is at 100us level, these issues may be not
severe.  But I think that you may need to explain them at least.

And, when swap slots freeing batching was introduced, it was mainly used
to reduce the lock contention of sis->lock (via swap_info_get_cont()).
So, we may move some operations (e.g., mem_cgroup_uncharge_swap,
clear_shadow_from_swap_cache(), etc.) out of batched operation (before
calling free_swap_slot()) to reduce the latency impact.

> The test result of chromebook and Google production server should be able to show
> that it is beneficial to both laptop and server workloads, making them more responsive
> in swap related workload.

--
Best Regards,
Huang, Ying