From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FAADD1CDC7 for ; Tue, 22 Oct 2024 09:21:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F73F6B007B; Tue, 22 Oct 2024 05:21:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B1256B0082; Tue, 22 Oct 2024 05:21:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 320AF6B0083; Tue, 22 Oct 2024 05:21:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 098CB6B007B for ; Tue, 22 Oct 2024 05:21:29 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 221061C6881 for ; Tue, 22 Oct 2024 09:21:10 +0000 (UTC) X-FDA: 82700694846.02.CB6D7E1 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf20.hostedemail.com (Postfix) with ESMTP id 257331C0008 for ; Tue, 22 Oct 2024 09:21:06 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VWpFbefl; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729588735; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2MnIFIX+6xraPlAS1iyzSKgBzODyPlZ10mxzOTitsig=; b=Dy7SKUK/I+0JVezJJno4oe/aNqNzbvtdvLjke6ZRhnXq4DvTJK8OY1a4w23VyeAGXI/N7f cJN7o3a1N6jWlNA8YYN+jprkb6uBIKe3l95Cr4Ld1qObO7Gp4LPbFhtf8C5rC2iCBn75/C 659npu0bpbfbfSuTvPpWg9jOf7Uz9QU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729588735; a=rsa-sha256; cv=none; b=JEqCzHnL5B2D6t5b3sODX47UdednkH1CJhJKeti84lbubI/Nlk58+B3+wI8sjFLfSZCU/k jURCjvW5K9cxJvLpSeKeDDQ9fyYWj/XdeCGg99nNkrp/+2Y09ZXnyGa+0JNaAhfHQJ2bxE ivl5TqF8DfB1XlppH3jNHUWbf937Zto= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VWpFbefl; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2fb5fa911aaso81908551fa.2 for ; Tue, 22 Oct 2024 02:21:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729588885; x=1730193685; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2MnIFIX+6xraPlAS1iyzSKgBzODyPlZ10mxzOTitsig=; b=VWpFbeflj1UaZPR9oBvX7JG2ZhgNG5uS9chDhItqLUR4ObjwRjOh3SUOxLeMeHl6dJ hK4l2j0DoVO+wX+gGojqQIisG9oVPyK6OYuYoIbj3jOA1uq3CSI5o+IdDPj6IlD6a/Tt iLTRkHggo/J3dhS+TEzIiqAQ0k5L9Uo9YAa0OwunVTzpr0WDtJGsxn5vKzzIFGW4MDd5 Sa5O0x6l61jQTr1cTMpjhf1VjLodo83uHiTTMVXhz39VUGxjl0Yz6wRZNJacildD0Lds bHu6Wqh8vALs122DPOk+psXAx9XFRnLPuCIfl4LNWEQXhTSQ7J61RTYW1qqXKMp8yotf wl3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729588885; x=1730193685; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2MnIFIX+6xraPlAS1iyzSKgBzODyPlZ10mxzOTitsig=; b=mYSlFfWXDD7rxy2cdMyHpGuIQCWeMvCC5Dcbfoib+dJc3oVsL7MDvN1WlasmNm2+Ln WT+tI+GKMmk9JTghWhwGYkQerQ7riCWU2kQ1xBeH50TKKbGolnHyshPDRAaUt+o0oF4J Q2cCzbb09BXaX+YltyV1AmZ//v6krAEq9BQ16VsJt8hVitFTv54W91ASoyfcgSC1qTHj 7PGdB95MdotZ7T1nfc6qblo1sYxejFrBzQh62cBe/yVWKkwLk7nNQC/ogH6Y6cDclsDW o0HVhUHNwGBNi/krCggih8z5y9GwO5z0CSCtg/KXq/E2AwKNY0tu4OeBir+HdPRI982r bCbg== X-Forwarded-Encrypted: i=1; AJvYcCUWsDeTJwVzV3q/8z/mOtS2y6XrrkWQgtUxT2Vf5XyVyO93vvLBcf6KcF1aZaKG4Aoj3wU9qAaAxQ==@kvack.org X-Gm-Message-State: AOJu0YzrqoK071xsMNxpDRbKZfxoR2F9W5q3/j6vBgtuMNU01mE9nFUl BUlt/ml1VqtbrUExUty9n2lERNPjkbzfsG0CP/o64eS+pJGyo7mEsOX7IbX6g/BmeNPRFVFjZdn 0nv6nNiEgaZX9b5CwnSfpn+pyJrA= X-Google-Smtp-Source: AGHT+IEpRJOnGVCrLd3D1lgbc5zQRynFC+x2WBiNT3XYfjMNbJXyJtpREncJbqfP/fAYIk1eF9Tmxp1CFLlb149Ah50= X-Received: by 2002:a05:651c:2124:b0:2fa:d7ea:a219 with SMTP id 38308e7fff4ca-2fb8320f101mr106955081fa.37.1729588884388; Tue, 22 Oct 2024 02:21:24 -0700 (PDT) MIME-Version: 1.0 References: <87ikuani1f.fsf@yhuang6-desk2.ccr.corp.intel.com> <20241008130807.40833-1-21cnbao@gmail.com> <87set6m73u.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87set6m73u.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Kairui Song Date: Tue, 22 Oct 2024 17:21:07 +0800 Message-ID: Subject: Re: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails To: "Huang, Ying" Cc: akpm@linux-foundation.org, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, liyangouwen1@oppo.com, mhocko@suse.com, minchan@kernel.org, sj@kernel.org, stable@vger.kernel.org, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, Barry Song <21cnbao@gmail.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 8yf7wy75qq5g91dg7gdwjf8ek9yuztrz X-Rspamd-Queue-Id: 257331C0008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729588866-611109 X-HE-Meta: U2FsdGVkX1+nwveGtzRKX6BvdlV3gq6SdXzjTGtcnbUygMxOFsoevC7DQ51Y0wOWBma6HrdQtLFqUy71mFeTYbi/CqCYiWF01fiMvLA+JYzeMvwUJwdsZvhl3ahSU1STPExUgKbDVK4dt9tiZf/448ipgs8++2sa3zrTEHcJlKkccw5eYN7nKlmT/1Zz8ruqkGvC7TK8Ww0lyebadTrDNLMdra3DrlAGz+vG2Pzf0SFy1rw5A9q1kWRfeVr+rDr1oRagi+Bm4Fn0ZuHGR6c04PqtohJiNTu3fhcqtydk9CWf9FFtFBRwC/sJBxtJiqsjYe/71wkoF/Wzt+wM3lUHcPiQFSGZKIqRqs1lwU7EtSGxnc+fVIvhG7FnsXPeF75bAL9bqpJGki9Md1jf232RUY2zxUh7yWpe76FiNCcBb0bRmjTW7IoBlBa5jKdzRjhp/OwOA2CSo6w3vQ7q2KpBKET7PZEBSv8nMnNLUJer/Kl3eVWxJXv9wJ0fErNl9mYI1fG9gXVv1oggXZUq05/ELszA8w6smdtWPomI99e7BhOfVM2JMjOjDRcmM2yKaHtuczcj0dH78UiQTt2Oet570TRPcAxLl4MAgpxGNQBWkrCA2TYmxkjrr9m6CdRRbGQa5yIaADgl6mZU9zOQKPgmZuHS2+vJ4F+5KmZdOPI51HAZ6ceJL7yfI4sSjqlkwqrQ4IaM3AyFdVbSnwTR2IllQixYXnTdqYrvlCkhKL2wenudpcAVhaCJvk25Fb+ZBAxabSnVNJ/Mp8TJ195lkEl9OyotB3PAJEuu/EVJsMcy+C7ZMW5mnRqYmsTgHmCAF7R+6iiiI7S6OcrMAeL6AFpu3GujP7ASEAH2C4mCAfOzZfjxApzrxeb1UpoMhLNDup1YWPaa+tvK/mg8o7aaPVC3HJKNdMAejrAKtII1BcK7etu4/9Ni+9JhCnzYC5YxL1ezKY4wz6Il+JO59yM2iOA vs8DvHrN 7OYhE1fzjE/V/MFj30kZzK3EOTssIRHSFU320Xcbhyn1JJ2WNzEwZcY2mm0bGGK79ORfMOCximetB47rLdj6DB5S3mFONBeGOuuTPmgcL0zFKfAOarPiAYykaqGU1rOPKDUmVPJTjCQvxBG/L4C/ljoNcQq0Q1si51jtXnXwQwT9yV0HHn6jaDsWKHke55UO9ApZQSS44cLG2dvcaWGiZoiUPeOLro2xwCBTp4u17MbeaOVTery+a7ZryHpYO00vRy4vN8ejfjzP+xWd56o9R9A5KTxdC7Fx1xvJcv8rM5fq0sl/lthfvsTz7S8j5KCgrqe4M4qX6DIgS7bwbBZnyUi3rljWwHVw8sdymqTY9l1DrM2/8ZKibD6D7DqA1gYkNtPfAetdoN6vtpoBHxK3D0XT9TlfJDN5kuS3MG4uSu3bv4egEubWnZbAOkk7l0LxoNYTRCVk8Mu0GAhr5v+wgDyL8OWY4GOzX3pnASe5a5WvxJiHfGriq1XvolkiN2cqVgd17QRItqHtGOrU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 9, 2024 at 8:55=E2=80=AFAM Huang, Ying w= rote: > > Barry Song <21cnbao@gmail.com> writes: > > > On Thu, Oct 3, 2024 at 8:35=E2=80=AFAM Huang, Ying wrote: > >> > >> Barry Song <21cnbao@gmail.com> writes: > >> > >> > On Wed, Oct 2, 2024 at 8:43=E2=80=AFAM Huang, Ying wrote: > >> >> > >> >> Barry Song <21cnbao@gmail.com> writes: > >> >> > >> >> > On Tue, Oct 1, 2024 at 7:43=E2=80=AFAM Huang, Ying wrote: > >> >> >> > >> >> >> Barry Song <21cnbao@gmail.com> writes: > >> >> >> > >> >> >> > On Sun, Sep 29, 2024 at 3:43=E2=80=AFPM Huang, Ying wrote: > >> >> >> >> > >> >> >> >> Hi, Barry, > >> >> >> >> > >> >> >> >> Barry Song <21cnbao@gmail.com> writes: > >> >> >> >> > >> >> >> >> > From: Barry Song > >> >> >> >> > > >> >> >> >> > Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapc= ache") > >> >> >> >> > introduced an unconditional one-tick sleep when `swapcache_= prepare()` > >> >> >> >> > fails, which has led to reports of UI stuttering on latency= -sensitive > >> >> >> >> > Android devices. To address this, we can use a waitqueue to= wake up > >> >> >> >> > tasks that fail `swapcache_prepare()` sooner, instead of al= ways > >> >> >> >> > sleeping for a full tick. While tasks may occasionally be w= oken by an > >> >> >> >> > unrelated `do_swap_page()`, this method is preferable to tw= o scenarios: > >> >> >> >> > rapid re-entry into page faults, which can cause livelocks,= and > >> >> >> >> > multiple millisecond sleeps, which visibly degrade user exp= erience. > >> >> >> >> > >> >> >> >> In general, I think that this works. Why not extend the solu= tion to > >> >> >> >> cover schedule_timeout_uninterruptible() in __read_swap_cache= _async() > >> >> >> >> too? We can call wake_up() when we clear SWAP_HAS_CACHE. To= avoid > >> >> >> > > >> >> >> > Hi Ying, > >> >> >> > Thanks for your comments. > >> >> >> > I feel extending the solution to __read_swap_cache_async() sho= uld be done > >> >> >> > in a separate patch. On phones, I've never encountered any iss= ues reported > >> >> >> > on that path, so it might be better suited for an optimization= rather than a > >> >> >> > hotfix? > >> >> >> > >> >> >> Yes. It's fine to do that in another patch as optimization. > >> >> > > >> >> > Ok. I'll prepare a separate patch for optimizing that path. > >> >> > >> >> Thanks! > >> >> > >> >> >> > >> >> >> >> overhead to call wake_up() when there's no task waiting, we c= an use an > >> >> >> >> atomic to count waiting tasks. > >> >> >> > > >> >> >> > I'm not sure it's worth adding the complexity, as wake_up() on= an empty > >> >> >> > waitqueue should have a very low cost on its own? > >> >> >> > >> >> >> wake_up() needs to call spin_lock_irqsave() unconditionally on a= global > >> >> >> shared lock. On systems with many CPUs (such servers), this may= cause > >> >> >> severe lock contention. Even the cache ping-pong may hurt perfo= rmance > >> >> >> much. > >> >> > > >> >> > I understand that cache synchronization was a significant issue b= efore > >> >> > qspinlock, but it seems to be less of a concern after its impleme= ntation. > >> >> > >> >> Unfortunately, qspinlock cannot eliminate cache ping-pong issue, as > >> >> discussed in the following thread. > >> >> > >> >> https://lore.kernel.org/lkml/20220510192708.GQ76023@worktop.program= ming.kicks-ass.net/ > >> >> > >> >> > However, using a global atomic variable would still trigger cache= broadcasts, > >> >> > correct? > >> >> > >> >> We can only change the atomic variable to non-zero when > >> >> swapcache_prepare() returns non-zero, and call wake_up() when the a= tomic > >> >> variable is non-zero. Because swapcache_prepare() returns 0 most t= imes, > >> >> the atomic variable is 0 most times. If we don't change the value = of > >> >> atomic variable, cache ping-pong will not be triggered. > >> > > >> > yes. this can be implemented by adding another atomic variable. > >> > >> Just realized that we don't need another atomic variable for this, jus= t > >> use waitqueue_active() before wake_up() should be enough. > >> > >> >> > >> >> Hi, Kairui, > >> >> > >> >> Do you have some test cases to test parallel zram swap-in? If so, = that > >> >> can be used to verify whether cache ping-pong is an issue and wheth= er it > >> >> can be fixed via a global atomic variable. > >> >> > >> > > >> > Yes, Kairui please run a test on your machine with lots of cores bef= ore > >> > and after adding a global atomic variable as suggested by Ying. I am > >> > sorry I don't have a server machine. > >> > > >> > if it turns out you find cache ping-pong can be an issue, another > >> > approach would be a waitqueue hash: > >> > >> Yes. waitqueue hash may help reduce lock contention. And, we can hav= e > >> both waitqueue_active() and waitqueue hash if necessary. As the first > >> step, waitqueue_active() appears simpler. > > > > Hi Andrew, > > If there are no objections, can you please squash the below change? Ove= n > > has already tested the change and the original issue was still fixed wi= th > > it. If you want me to send v2 instead, please let me know. > > > > From a5ca401da89f3b628c3a0147e54541d0968654b2 Mon Sep 17 00:00:00 2001 > > From: Barry Song > > Date: Tue, 8 Oct 2024 20:18:27 +0800 > > Subject: [PATCH] mm: wake_up only when swapcache_wq waitqueue is active > > > > wake_up() will acquire spinlock even waitqueue is empty. This might > > involve cache sync overhead. Let's only call wake_up() when waitqueue > > is active. > > > > Suggested-by: "Huang, Ying" > > Signed-off-by: Barry Song > > --- > > mm/memory.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index fe21bd3beff5..4adb2d0bcc7a 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -4623,7 +4623,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > /* Clear the swap cache pin for direct swapin after PTL unlock */ > > if (need_clear_cache) { > > swapcache_clear(si, entry, nr_pages); > > - wake_up(&swapcache_wq); > > + if (waitqueue_active(&swapcache_wq)) > > + wake_up(&swapcache_wq); > > } > > if (si) > > put_swap_device(si); > > @@ -4641,7 +4642,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > } > > if (need_clear_cache) { > > swapcache_clear(si, entry, nr_pages); > > - wake_up(&swapcache_wq); > > + if (waitqueue_active(&swapcache_wq)) > > + wake_up(&swapcache_wq); > > } > > if (si) > > put_swap_device(si); > > Hi, Kairui, > > Do you have time to give this patch (combined with the previous patch > from Barry) a test to check whether the overhead introduced in the > previous patch has been eliminated? Hi Ying, Barry I did a rebase on mm tree and run more tests with the latest patch: Before the two patches: make -j96 (64k): 33814.45 35061.25 35667.54 36618.30 37381.60 37678.75 make -j96: 20456.03 20460.36 20511.55 20584.76 20751.07 20780.79 make -j64:7490.83 7515.55 7535.30 7544.81 7564.77 7583.41 After adding workqueue: make -j96 (64k): 33190.60 35049.57 35732.01 36263.81 37154.05 37815.50 make -j96: 20373.27 20382.96 20428.78 20459.73 20534.59 20548.48 make -j64: 7469.18 7522.57 7527.38 7532.69 7543.36 7546.28 After adding workqueue with workqueue_active() check: make -j96 (64k): 33321.03 35039.68 35552.86 36474.95 37502.76 37549.04 make -j96: 20601.39 20639.08 20692.81 20693.91 20701.35 20740.71 make -j64: 7538.63 7542.27 7564.86 7567.36 7594.14 7600.96 So I think it's just noise level performance change, it should be OK in either way. > > -- > Best Regards, > Huang, Ying >