From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7210BC27C53 for ; Wed, 19 Jun 2024 09:11:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 082A16B0266; Wed, 19 Jun 2024 05:11:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 031E36B026C; Wed, 19 Jun 2024 05:11:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3B086B026E; Wed, 19 Jun 2024 05:11:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C5C186B0266 for ; Wed, 19 Jun 2024 05:11:53 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 76E80C0D93 for ; Wed, 19 Jun 2024 09:11:53 +0000 (UTC) X-FDA: 82247070906.07.8950A33 Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf28.hostedemail.com (Postfix) with ESMTP id AF8F7C0004 for ; Wed, 19 Jun 2024 09:11:51 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718788303; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4fIyHxlJcCCgl75LYMwqyMjmqzj6OP0aPwZGcOjx+O4=; b=MRmzL7FUpJaWlzOR1JKtgwL2ov6XM4kcJ4R5T6yxpuvtOnoh4WpSfI/uHj2UmgUN6o36Yl coVvSM/Ha02wzNLvdljJMpnCU6sJN15lNMBVJOW+nA0Dn8o5E8qU80TaVUSOv9yIB1p4sd /+AjjdAxTr1wFzYPEWJxFyVYvrvOtCo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718788303; a=rsa-sha256; cv=none; b=q6DIt1ALcGg/sbxGZGESV9OGEV+KKz/iZQ+PGuLDR/0IODZUc5rDnxMa2uweI2OLKhKZ5u oFJuKGG4UyHxzUmfTqn9p2mmo93GxBxQOHK0rcv5W7vVwl7311g4+X3wYKTtYcoRqxgdOc q9uX03z3/vcu5AZ1VSZuhzd/hFv5wKU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-4ecf8213dc6so1965777e0c.3 for ; Wed, 19 Jun 2024 02:11:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718788311; x=1719393111; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4fIyHxlJcCCgl75LYMwqyMjmqzj6OP0aPwZGcOjx+O4=; b=M8gstBKwX1g+6ISJQ/pFQ/FGMSCrfVga9gOL/CqLVtflOarU/MLB6I4c6aFFrXvZiT razpgoQXGrtOAL3WGb4gdpAS8eu7CT982EaD8wDun9CRg3sr8LAoNk56jAuSmyiOm6HY FYqJ7hoHiRDvJUGI3IMT+0cF+LK38lAIbZdXmn4HeZiTsSL6j1yCW9DACVqrSoyku3Rq pSQKh4epWHUoHGW33l1/KzuUUdEg622BM24OFZGK7Jg0eH1jGR6O88a4238tJLKOUiE6 UfFNpy/yD3kDqcNdsYZuMLsEZG8jWTKJglCUElNFU227dnkF+eUXpR+ww0+l22QlQ9Nl MsqA== X-Forwarded-Encrypted: i=1; AJvYcCVrZbblk0BZ5ycbsgfK4SRZ4xPTvglsMyDr+0Odnam3exB8dgPJ5COMDFf6yT3BMdmB0yUSaJODNzkmG7aUUOKRpyA= X-Gm-Message-State: AOJu0YzuIkLaXCuZMdeVkHj7o/lkDi9yWN40KxQqWx1EPHhw1q9ksUfH 1Dd3GabCwuVSgStiVbigdZEBJFcCbAykQL5Ci47TJxkyeVvwtgI3j8Cg/5hu60kD9569kdQqEOW zUmMpecNcVjsbOAGnjZpt+PN8+Q8= X-Google-Smtp-Source: AGHT+IFjnX+Hn3g+YYR3b1mFHL2xyW+Lwm+EvEdbA2lohRi2wird6psc1h8B1eaR1xSaWnFeTdkFtJ2P4fk43QoK2+U= X-Received: by 2002:a05:6122:c88:b0:4ec:f6f2:f1cd with SMTP id 71dfb90a1353d-4ef276e5110mr2246166e0c.9.1718788310620; Wed, 19 Jun 2024 02:11:50 -0700 (PDT) MIME-Version: 1.0 References: <20240618232648.4090299-1-ryan.roberts@arm.com> In-Reply-To: <20240618232648.4090299-1-ryan.roberts@arm.com> From: Barry Song Date: Wed, 19 Jun 2024 21:11:39 +1200 Message-ID: Subject: Re: [RFC PATCH v1 0/5] Alternative mTHP swap allocator improvements To: Ryan Roberts Cc: Andrew Morton , Chris Li , Kairui Song , "Huang, Ying" , Kalesh Singh , Hugh Dickins , David Hildenbrand , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Shuai Yuan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: AF8F7C0004 X-Stat-Signature: nbgexs66fzb71bju96cyra3ycu7b6tws X-Rspam-User: X-HE-Tag: 1718788311-855897 X-HE-Meta: U2FsdGVkX1/TzNKpVWZJkCZFpVSEf8DujvTQnJXM5HsY/ChQIo0X6AGqXO7fl7fGAswTVGxgW8rOj//zsWnEQppqwfuCxQJ9ZYiPZr2Yd1Im+UtpwQnqoKG1++lD8SbDlApGfqbTTa5pPLCqlFkoibmLJ2sE/UZAleQ9JTkaEx3i0l/0XFWOc2Hhwn7aDB3XYZ9zeDzNYmpeQSd6/SMisaq3mv6FzJXUL3ysh3mmjebjbJI8BJzBi24p9f3h7xc4ALHWG0dW2VEu4yy4IJfQj2H5Z+ziiEwsVJiaurLr+L82NtEQOgyE3+pH8SwviIy+yBhpPD6r5DGXgRUFF16db8yhrR7bQIdQFt2aFE/9qb07wC9GEx5YaljEQlzR079lZqa+zWLZAW9ccM5NfUwTO1m4aWOAJzg74YirGfhLLFbK4zVPB2EXo5hM1xVhugipqEgVBuxz1XmaseRD3ECb/lz8QFlZEurDKgqFJHP78wRvvDfwRuh4bl5mwGUlakeMU/VxYDpLB33LuPy/aLSPThJ7MVjq/ctE1lgVeyaihHfJsnIOEI23PbiJAPnfYNEt00bu9LF08okSyD9jec2xS0wG8rSHWeSlU4s+G8bD32dJFy8sBmwJBNW4C4iSdj/6PT8z+yC2HmANAqKNTf507lJ04aELfV3KNnHA8ryyvukGRmTvuCl8WcVm0E9hLmtDwvzqPeZWdEI9bcTlZvFN74y6KJUeYESBNQbFNCr1RmfNQPzHlNTXpW6C5vtozzcxvVGd3Vef9tcdmh+WZZZpnU/okBa2V1kyxCzWfBUbvufYkKopdSrIL+g7YVvP80WrG31ioczM9pWL+g7yTn/2JJVbqkQ6zpsvonsyAhGlXNfAiaRrgfPyVVLRygdQ6HqtyfQLKXWiYyHuMTA7RRaV5RRmJsswMYDCR6Kv1tjtmG9nU3/3wmqM90zIBjl5kyAWTalb0swJKWazdeHDKkr inTJHc+R hqY1U/QO9fSGIq+O6xI515qVmJFkcANyvL5iAEImiod1as+aRjJbA2PxWKncHwPwSY307iHFCdFBtTOxOSFx5+q2eQLPSU2eGmQzgRW6ctNEoYVKiTcu/vneU09GJX3LvbsrybtiSPw2YU+jBAaVa17VopP7Go+2/Ga59wPtQxtANfxO1a1k/zAp2phPDWgt1Y/CHg+q7S3B9Ms4Jvm2a0mY8SGjmL4LrVUtlLoj+IPNi1RTDskmAzbipZ5bLhJqJfqKRsY6NPM0kVkDYGrzgFREYaWbasnhbZLgEFNIdiWKDeM39AnQMe4M1nuIZfdivTXKA6G60M1DV9bFhPpi2BXWjal5LPov9Cph/KijA+fhlhMHcgl6SoLSpXD8yTtkBGFwerTPy1olQDK16KAulNVti1BlcKSnrDQg+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 19, 2024 at 11:27=E2=80=AFAM Ryan Roberts wrote: > > Hi All, > > Chris has been doing great work at [1] to clean up my mess in the mTHP sw= ap > entry allocator. But Barry posted a test program and results at [2] showi= ng that > even with Chris's changes, there are still some fallbacks (around 5% - 25= % in > some cases). I was interested in why that might be and ended up putting t= his PoC > patch set together to try to get a better understanding. This series ends= up > achieving 0% fallback, even with small folios ("-s") enabled. I haven't d= one > much testing beyond that (yet) but thought it was worth posting on the st= rength > of that result alone. > > At a high level this works in a similar way to Chris's series; it marks a > cluster as being for a particular order and if a new cluster cannot be al= located > then it scans through the existing non-full clusters. But it does it by s= canning > through the clusters rather than assembling them into a list. Cluster fla= gs are > used to mark clusters that have been scanned and are known not to have en= ough > contiguous space, so the efficiency should be similar in practice. > > Because its not based around a linked list, there is less churn and I'm > wondering if this is perhaps easier to review and potentially even get in= to > v6.10-rcX to fix up what's already there, rather than having to wait unti= l v6.11 > for Chris's series? I know Chris has a larger roadmap of improvements, so= at > best I see this as a tactical fix that will ultimately be superseeded by = Chris's > work. > > There are a few differences to note vs Chris's series: > > - order-0 fallback scanning is still allowed in any cluster; the argument= in the > past was that swap should always use all the swap space, so I've left t= his > mechanism in. It is only a fallback though; first the the new per-order > scanner is invoked, even for order-0, so if there are free slots in clu= sters > already assigned for order-0, then the allocation will go there. > > - CPUs can steal slots from other CPU's current clusters; those clusters = remain > scannable while they are current for a CPU and are only made unscannabl= e when > no more CPUs are scanning that particular cluster. > > - I'm preferring to allocate a free cluster ahead of per-order scanning, = since, > as I understand it, the original intent of a per-cpu current cluster wa= s to > get pages for an application adjacent in the swap to speed up IO. > > I'd be keen to hear if you think we could get something like this into v6= .10 to > fix the mess - I'm willing to work quickly to address comments and do mor= e > testing. If not, then this is probably just a distraction and we should > concentrate on Chris's series. Ryan, thank you very much for accomplishing this. I am getting Shuai Yuan's (CC'd) help to collect the latency histogram of add_to_swap() for both your approach and Chris's. I will update you with the results ASAP. I am also anticipating Chris's V3, as V1 seems quite stable, but V2 has caused a couple of crashes. > > This applies on top of v6.10-rc4. > > [1] https://lore.kernel.org/linux-mm/20240614-swap-allocator-v2-0-2a513b4= a7f2f@kernel.org/ > [2] https://lore.kernel.org/linux-mm/20240615084714.37499-1-21cnbao@gmail= .com/ > > Thanks, > Ryan > > Ryan Roberts (5): > mm: swap: Simplify end-of-cluster calculation > mm: swap: Change SWAP_NEXT_INVALID to highest value > mm: swap: Track allocation order for clusters > mm: swap: Scan for free swap entries in allocated clusters > mm: swap: Optimize per-order cluster scanning > > include/linux/swap.h | 18 +++-- > mm/swapfile.c | 164 ++++++++++++++++++++++++++++++++++++++----- > 2 files changed, 157 insertions(+), 25 deletions(-) > > -- > 2.43.0 > Thanks Barry