From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5911C3DA6E for ; Sat, 23 Dec 2023 04:41:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49FA76B0080; Fri, 22 Dec 2023 23:41:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44F506B0081; Fri, 22 Dec 2023 23:41:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33E396B0082; Fri, 22 Dec 2023 23:41:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 255986B0080 for ; Fri, 22 Dec 2023 23:41:10 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EADF1A0212 for ; Sat, 23 Dec 2023 04:41:09 +0000 (UTC) X-FDA: 81596833458.26.C58D42D Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf17.hostedemail.com (Postfix) with ESMTP id A9D3640002 for ; Sat, 23 Dec 2023 04:41:07 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=iNmwvLao; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703306468; a=rsa-sha256; cv=none; b=XssrdQznv3HBVJE0EhCXo7k+bi4eJf44l9/aSKZj0tEnlBa3wB0kvZ354i31oqHc6ID/5C CIq/Cwq4dNyM4wBDMHTr7c5C8qEDiGYsgAn4n6kr2D9887q/xN4zN6EUEkD2Z0rSPhdEYc qOJ7O0n9YmWAlfQncMC1KAMRQwH2wXQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=iNmwvLao; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703306468; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vCp12TV95IjSG3gPJ0enxhO7wwpKd03HrhSH+1jmBcU=; b=PzPJ3OVghoa2oAiKaWGrdM9y3FfRe6XpRzsBl8W6uK63pJLfQeL1tVCRSvCKQnLcWFAhcs Unval60jCs30iM8p7FAMxV7ggLk2QKJNJ+ewdA/OalYtkoluLcapY8NSSFrxP60Mu5+LHT PSCHzhEQs1iwnusXEhRcKX3CAoBdxdo= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 6FBABCE23BF; Sat, 23 Dec 2023 04:41:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E181BC433C8; Sat, 23 Dec 2023 04:41:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703306463; bh=ae+AgZwj/0a4jP0U1bLVGLJHsqDorTg6Qy9O6uB+rqM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=iNmwvLao+vzaGFtr3bfYb4UMa7V2Dx+mNJqaR30Fz0CuYhClxpasfx688JdlJUHd+ eLfxQm88C11jCzjXjkS18qCES2KioeBHwqLJJrpoNFt3ca7SCtfvUGM3U05BTle8eh XdJ709Tb5F9w7ALGuoIT+05SyioKP/387rdbfrionia+82as0TzDvLF7uK3NWtfpaa sDeasWXe010AWGfHAksrkUJ4t71/MNJSE0X6ijj6taINmoLZijCAGACZ5l+oPLlXJS GGyDWT21ucbEaCAw8DEfVw5x3guZ8VT+qxrz0OJjZmhDS92sIz1ZSEOHDSbDwUMVRx yqlHxTfFXAxJQ== Date: Fri, 22 Dec 2023 20:41:01 -0800 From: Chris Li To: Nhat Pham Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Wei Xu , Yu Zhao , Greg Thelen , Chun-Tse Shao , Suren Baghdasaryan , Yosry Ahmed , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song Subject: Re: [PATCH] mm: swap: async free swap slot cache entries Message-ID: References: <20231221-async-free-v1-1-94b277992cb0@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A9D3640002 X-Stat-Signature: p876h9jji85qbn5a6669f8x9scrobag6 X-HE-Tag: 1703306467-619278 X-HE-Meta: U2FsdGVkX19UPGJkGzVwYHYNOgHeIKrUFuTdw2h7Gd7czOlLjSiGF8MZAj7z1efwnf1v1RIZ6C6LbJcuHmVle/abPRfF+q75+qK0S7264QzhbWJsSvgVFrLmfoZM8Aytht3v8RympLMxkPrAJaOFImGbVdI+AgKusUcNr80t3V9ub/7fj5qCEAmBbsd63zFZRDxsoW7w4A0Tw5KMYpfFCR2W3IIk4+ryrstHhrrJaV8lfuzxfnrX4V4LwVpvbRZI4FmAPK0RLv/BN3a08E/1aBeiVhtp7+Zq5CTmjg1NJuB0nsx5fTTYjWqCEmYsvtcSlMf5uB9J/V+ic79939Xau33C9OzrtIHQ2YCHDSLAq7MVQT3FWCdmPbJ0L3UIsl5564PR6MBp5PKwqNhxYMsNUrDEn8SRYMVtBDYR3xzbtZVXDSB/J7yGHz6bt/QJJ2akDVe4VPQaAftWPM2cu0ZUh0dkGRDXtycIMIYfvZajMqCgUs7Hgc/p1Hf5jT1ciwzVUPbBdQAMWbf7lupHv1E04zYB8zthYMbW0v9xEjj0kFy1VE5Y8+XvfJIftKerVuzlyqedvnb7TnIjigOy6cUF3TJYXbgZlfPKSoHWc1RdqtRU2UkdYaVrGhv50GjlPG3O3IDkA6k8wf3cs50UPCT2Xhckgys9IYnouPcZVUW+Xd/J6hSVDF7BouFAIhcpH5kTvB6CjlJYZISrB4eYXtddyEGNx6nVM8HHQN8Pyl1n/bTJm9sAerOoTBvO7DjsoU1W7W0IInsbsuGID0GJW5j2lrai9m0EX4golBLzRw1q5MMdakWx+lp2ICacOKfMEE3wLLQY87Y3hk1trG3PnuZ1kKNzpqf2tnAw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 22, 2023 at 05:44:19PM -0800, Nhat Pham wrote: > On Thu, Dec 21, 2023 at 10:25 PM Chris Li wrote: > > > > We discovered that 1% swap page fault is 100us+ while 50% of > > the swap fault is under 20us. > > > > Further investigation show that a large portion of the time > > spent in the free_swap_slots() function for the long tail case. > > > > The percpu cache of swap slots is freed in a batch of 64 entries > > inside free_swap_slots(). These cache entries are accumulated > > from previous page faults, which may not be related to the current > > process. > > > > Doing the batch free in the page fault handler causes longer > > tail latencies and penalizes the current process. > > > > Move free_swap_slots() outside of the swapin page fault handler into an > > async work queue to avoid such long tail latencies. > > > > Testing: > > > > Chun-Tse did some benchmark in chromebook, showing that > > zram_wait_metrics improve about 15% with 80% and 95% confidence. This benchmark result is using zram. There are 3 micro benchmarks of all showing about 15% improvement with a slightly different confidence level. That is where the 80%-90% come from. > > > > I recently ran some experiments on about 1000 Google production > > machines. It shows swapin latency drops in the long tail > > 100us - 500us bucket dramatically. > > > > platform (100-500us) (0-100us) > > A 1.12% -> 0.36% 98.47% -> 99.22% > > B 0.65% -> 0.15% 98.96% -> 99.46% > > C 0.61% -> 0.23% 98.96% -> 99.38% > > Nice! Are these values for zram as well, or ordinary (SSD?) swap? I > imagine it will matter less for swap, right? Those production servers only use zswap. There is no zram there. For ordinary SSD swap the latency reduction is also there in terms of absolute us. However the raw savings get shadowed by the SSD IO latency, typically in the 100us range. In terms of percentage, you don't have as dramatica an effect compared to the memory compression based swapping(zswap and zram). > > @@ -348,3 +362,10 @@ swp_entry_t folio_alloc_swap(struct folio *folio) > > } > > return entry; > > } > > + > > +static int __init async_queue_init(void) > > +{ > > + swap_free_queue = create_workqueue("async swap cache"); > > nit(?): isn't create_workqueue() deprecated? from: > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html#application-programming-interface-api > > I think there's a zswap patch proposing fixing that on the zswap side. > Yes, I recall I saw that patch. I might acked on it as well. Very good catch. I will fix it in the V2 spin. Meanwhile, I will wait on it a bit to collect the other review feedback. Thans for catching that. Chris