From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99DE0C27C4F for ; Fri, 21 Jun 2024 08:48:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 298438D0149; Fri, 21 Jun 2024 04:48:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 247C68D0138; Fri, 21 Jun 2024 04:48:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10F798D0149; Fri, 21 Jun 2024 04:48:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DDBDE8D0138 for ; Fri, 21 Jun 2024 04:48:22 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CFF73140B00 for ; Fri, 21 Jun 2024 08:48:21 +0000 (UTC) X-FDA: 82254269202.29.7A9AEF6 Received: from mail-ua1-f46.google.com (mail-ua1-f46.google.com [209.85.222.46]) by imf03.hostedemail.com (Postfix) with ESMTP id 197F32000F for ; Fri, 21 Jun 2024 08:48:19 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718959692; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja4lyIFZFRcSIQN5xjT6xHwvTVomO3KRoG0Nx+9MBcg=; b=Q9YF5SM2MIe75ohs44V/4UCjVvEUA81eEDWDoQ/SKHvgQ4iQSmKb4AySykp5PG8YKGYUVq 94az8NtjnPi+TN4vae88zKtkN7eDuVjqpwxtVQFZHCzv52pJb9qU46dpADvkcRFqy0dvLR nUVqFmMLq713bZkhqyFXaFxejbaK0s8= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718959692; a=rsa-sha256; cv=none; b=l3aGIVu0RrE+XqgEUxFnGYG3GgVPuRMQwPW1G0oMduWN3xT1zZSFEFQkWepDzCPyZO4KVv ROcVssj5m0iCWeg5XUkQtLiu6e2KpxJwqBbf5ZcZkog6kWWH+Gb7WTxPBUZiomgYOsBRya l/qH17rI9qa+trJVpJIrj4bmZJGoehY= Received: by mail-ua1-f46.google.com with SMTP id a1e0cc1a2514c-80d64c81784so416641241.3 for ; Fri, 21 Jun 2024 01:48:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718959699; x=1719564499; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ja4lyIFZFRcSIQN5xjT6xHwvTVomO3KRoG0Nx+9MBcg=; b=oTpqwBIIJPLxsjZoFTmzgFX314AFqRD9WmivrjGKygmK/AZ21cSACSdSF2uli7FXMN 2x8AVyYTBzGL1g+L5UmGDqtbjuN0smsVqbxQ+lUbT7eXA6Ujjv2FhDCGQkG/sI6o6CBJ bYfS+2gREXz25COWolwOwoD4ScJY9I2EQ/6OoiQ5I/4UiaLyVDWbqzqucqT4kZiXGzj+ S28vXtsVeydbymHZ7JNeoiEsYzCm71FCb5Hx6BPp1Jq+BNnrZFD76yd75XCGJXqTInM7 fQ5Mo/Cgakpn+aL9evC/j/VnUhAZx9zQ8NhWXeiqlAw7KbBVMR7vYl6sKTyEJM77LuiA cPeQ== X-Forwarded-Encrypted: i=1; AJvYcCUN/Ld7mXDxsgfi0Ye9c1bwF6T8EekMKSHajk81oaLHKduI/mfV3FiInSkz40KBSB5vuY4RrhiHR1n+Xuc9uhIHUs4= X-Gm-Message-State: AOJu0Yzd7Fcx4xkfsaJwUAsBH8cv3sQyhFGr3e1R1VfnJ4S5X8U8s5cX oL0PKu7WFWAKAf9EAe1ik6OJkqdROUs6WoUqN7TEZuWZtqfZF34zuS6Q/Xau0Gu74rWgG9henbe Do4jy7aLMLV6PEsrNaUCmFmxGZno= X-Google-Smtp-Source: AGHT+IF153qYDBPMeB8NRQG3ram+D+e0BH1Qw+ET5oq5kbceE2mUS0IGcM646+di2IZTQWD+CnJMy5wCf1Ggbh83YO8= X-Received: by 2002:a67:fe91:0:b0:48d:b59a:689c with SMTP id ada2fe7eead31-48f1303bee1mr8341178137.15.1718959699015; Fri, 21 Jun 2024 01:48:19 -0700 (PDT) MIME-Version: 1.0 References: <20240618232648.4090299-1-ryan.roberts@arm.com> <48859779-45ba-445d-8ce0-486575a3fd7b@arm.com> In-Reply-To: <48859779-45ba-445d-8ce0-486575a3fd7b@arm.com> From: Barry Song Date: Fri, 21 Jun 2024 20:48:07 +1200 Message-ID: Subject: Re: [RFC PATCH v1 0/5] Alternative mTHP swap allocator improvements To: Ryan Roberts Cc: Andrew Morton , Chris Li , Kairui Song , "Huang, Ying" , Kalesh Singh , Hugh Dickins , David Hildenbrand , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Shuai Yuan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: h94zm6nmuqsyrteo1pzxbm94i64r6kb4 X-Rspam-User: X-Rspamd-Queue-Id: 197F32000F X-Rspamd-Server: rspam02 X-HE-Tag: 1718959699-260632 X-HE-Meta: U2FsdGVkX1+TK093Fm0rn8YR/t+VWCEtTN4luAtYSaA6EwIegqNujFZbkFEPTWiEwgcJPEIABMYpxR3uXemiezZOnd+4OT4rAdlTHfQurgHn+CN25m2NwxqbFihc2SxcsrAznbB+XRK/xFYAELuj+ozGjxJ732gSkoqQ7U6gEHFDselofLzT0GVYwxi75cEK2ZcbA7vqzqf2YGl7EsoUdHZghhENwxXpvfwLvlC+oO1coblI54wEQCpSwxzlb/mj6k/m7nNGGsrpSSx9Ct7DSRuaCoEOqiANo6QDMieyQwu6qYMkktV3gii5bvVbCvFOLlQ8GcQw9R9cR9Zenuq0H1T962a49F8mGBQt8scSDJ0FGGVf14BKtMoeyyaamKJc2ue2M3TcTOfrx3o4QSrBvbcm9tXddu/fmF/KUAYEQa6+Upf8sTUFItAQDLh1kXwIG/dgMsgZCPCxL49c66EUIvkc4L+hvdpI5aVq7sHJxa5WItgGQiwETLRS8nzV/eDtIDOQbPdShhUAorev/cRO/n/TgcAyPitKTji3Xjxil+uajfYYeso0i3bWiW7wCOq4CHs+VW0Thw8Vp/05DH6Tpr7tANl8iBeLg98cF2xI9F2f1JSKvcP5pIzd4D14id562k7+/dW+l+x6Uj6LFXpGCQxjgkIxyWLWPzikxyyVem2pntYl69OfJNYhsj66j3ZLbRtW6POB891+3Q2zuOzsDAkGkjzWhLFFAU7OfvGgKZ0halh4NK27omlzUuO/kBYhrvrqOQ3jOCNmnwQd9aQ9wFVj5nrYLVvuswTobDAz6iqOikO8JjaXYreb3QMEdLIqzUEwx0F/Ud/o3Zo7WKXp3vy/hNSsq0Bt5qEYiYVxafa5XgblUkg/ybtv3eYE/6xX569cYt1V0M/5Go94ZuQdaIur8jF2mr01Z0YPyEWcb9t978VkgSnC4KvkPr+ez6d2m1y2lyAm/THQi9ecDLO hOUBYW5f zvrSo6xxfFLBBZwQrDoKdfMI7FTly6Zxgrsu2EVojuZLHsE/Jiz1JGuuMJiX9iIDq8hMcyWoHPQgNRBXw+Gg449w8alrPI1i159MeLa3wB2U2B9xdxTIyTC7ea+oje2rZluSPF6ADhl27aIo6NVkbSUdbYC7d0R1LLf5aEFt+Jt3I9h9XAJz26YPdxYs3Z95u8QnQBPySb7W+Qp+u37zd4ISVLe+Ei75VgtyMTjm1FVQovcpmhqZRY0nV2zNbXlfJWjI0diqE5nVPNmVjDzjtZhJyc2LJ5QMoMlMY1ZUQjI1UT1ZGJ5PlJQldQGHO1RSmTnU6F3F40Ff34CtvMe3jf4bLEn+zLVsseYtWILTmIdIB+cy8fUifQBMsWenP0eZmwBCV1OWDOJbK05O35z5hY6J/eFkJrZ2J9OPF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 19, 2024 at 9:18=E2=80=AFPM Ryan Roberts = wrote: > > On 19/06/2024 10:11, Barry Song wrote: > > On Wed, Jun 19, 2024 at 11:27=E2=80=AFAM Ryan Roberts wrote: > >> > >> Hi All, > >> > >> Chris has been doing great work at [1] to clean up my mess in the mTHP= swap > >> entry allocator. But Barry posted a test program and results at [2] sh= owing that > >> even with Chris's changes, there are still some fallbacks (around 5% -= 25% in > >> some cases). I was interested in why that might be and ended up puttin= g this PoC > >> patch set together to try to get a better understanding. This series e= nds up > >> achieving 0% fallback, even with small folios ("-s") enabled. I haven'= t done > >> much testing beyond that (yet) but thought it was worth posting on the= strength > >> of that result alone. > >> > >> At a high level this works in a similar way to Chris's series; it mark= s a > >> cluster as being for a particular order and if a new cluster cannot be= allocated > >> then it scans through the existing non-full clusters. But it does it b= y scanning > >> through the clusters rather than assembling them into a list. Cluster = flags are > >> used to mark clusters that have been scanned and are known not to have= enough > >> contiguous space, so the efficiency should be similar in practice. > >> > >> Because its not based around a linked list, there is less churn and I'= m > >> wondering if this is perhaps easier to review and potentially even get= into > >> v6.10-rcX to fix up what's already there, rather than having to wait u= ntil v6.11 > >> for Chris's series? I know Chris has a larger roadmap of improvements,= so at > >> best I see this as a tactical fix that will ultimately be superseeded = by Chris's > >> work. > >> > >> There are a few differences to note vs Chris's series: > >> > >> - order-0 fallback scanning is still allowed in any cluster; the argum= ent in the > >> past was that swap should always use all the swap space, so I've lef= t this > >> mechanism in. It is only a fallback though; first the the new per-or= der > >> scanner is invoked, even for order-0, so if there are free slots in = clusters > >> already assigned for order-0, then the allocation will go there. > >> > >> - CPUs can steal slots from other CPU's current clusters; those cluste= rs remain > >> scannable while they are current for a CPU and are only made unscann= able when > >> no more CPUs are scanning that particular cluster. > >> > >> - I'm preferring to allocate a free cluster ahead of per-order scannin= g, since, > >> as I understand it, the original intent of a per-cpu current cluster= was to > >> get pages for an application adjacent in the swap to speed up IO. > >> > >> I'd be keen to hear if you think we could get something like this into= v6.10 to > >> fix the mess - I'm willing to work quickly to address comments and do = more > >> testing. If not, then this is probably just a distraction and we shoul= d > >> concentrate on Chris's series. > > > > Ryan, thank you very much for accomplishing this. > > > > I am getting Shuai Yuan's (CC'd) help to collect the latency histogram = of > > add_to_swap() for both your approach and Chris's. I will update you wit= h > > the results ASAP. > > Ahh great - look forward to the results! Essentially, we are measuring two types of latency: * Small folio swap allocation * Large folio swap allocation The concept code is like diff --git a/mm/swap_state.c b/mm/swap_state.c index 994723cef821..a608b916ed2f 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -185,10 +185,18 @@ bool add_to_swap(struct folio *folio) VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); + start_time =3D ktime_get(); + entry =3D folio_alloc_swap(folio); if (!entry.val) return false; + end_time =3D ktime_get(); + if (folio_test_large(folio)) + trace_large_swap_allocation_latency(ktime_sub(end_time - start_time)); + else + trace_small_swap_allocation_latency(ktime_sub(end_time - start_time)); + /* * XArray node allocations from PF_MEMALLOC contexts could * completely exhaust the page allocator. __GFP_NOMEMALLOC Then, we'll generate histograms for both large and small allocation latency. We're currently encountering some setup issues. Once we have the data, I'll provide updates to you and Chris. Additionally, I noticed some comments suggesting that Chris's patch might negatively impact the swap allocation latency of small folios. Perhaps the data can help clarify this. > > > > > I am also anticipating Chris's V3, as V1 seems quite stable, but V2 has > > caused a couple of crashes. > > > >> > >> This applies on top of v6.10-rc4. > >> > >> [1] https://lore.kernel.org/linux-mm/20240614-swap-allocator-v2-0-2a51= 3b4a7f2f@kernel.org/ > >> [2] https://lore.kernel.org/linux-mm/20240615084714.37499-1-21cnbao@gm= ail.com/ > >> > >> Thanks, > >> Ryan > >> > >> Ryan Roberts (5): > >> mm: swap: Simplify end-of-cluster calculation > >> mm: swap: Change SWAP_NEXT_INVALID to highest value > >> mm: swap: Track allocation order for clusters > >> mm: swap: Scan for free swap entries in allocated clusters > >> mm: swap: Optimize per-order cluster scanning > >> > >> include/linux/swap.h | 18 +++-- > >> mm/swapfile.c | 164 ++++++++++++++++++++++++++++++++++++++----= - > >> 2 files changed, 157 insertions(+), 25 deletions(-) > >> > >> -- > >> 2.43.0 > >> > Thanks Barry