From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0672C27C53 for ; Wed, 19 Jun 2024 09:18:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BF966B00A7; Wed, 19 Jun 2024 05:18:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 148716B00A8; Wed, 19 Jun 2024 05:18:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 010586B00A9; Wed, 19 Jun 2024 05:18:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D2CC96B00A7 for ; Wed, 19 Jun 2024 05:18:02 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8B4EB40FB6 for ; Wed, 19 Jun 2024 09:18:02 +0000 (UTC) X-FDA: 82247086404.24.4612597 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id BF3A8140010 for ; Wed, 19 Jun 2024 09:18:00 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718788674; a=rsa-sha256; cv=none; b=v20XJCXt2IequQiT/RdieX71/4tDZRn/UmrFVfJMqvRACLgfuHvjbjFh2H9o2tFCuwx4EU xW/piutbbPBVDLOHl86oFBgFEasMJt3rOA6PGpDEbvIoVzPu7Le8fHFHUdosk0tK9dGPyn zVK0EQZVMagtviY7aNHQ3se6I7p6UWU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718788674; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=srWFboSxrSB1REiXj1zSWC8sar0okb5EZmYh+6nfhjI=; b=jwHShCTQ3mb9lYgHXBABNaY0FpV7TBLbkdl4eCYWVUQ+jDFA7H3Eq3WgIUkEMpxsOZswEJ fRYjtZUEkho3Tm+ZfurfouX1rqHokiWhma9QZS5t0TlhP+oGPnkOUdklgmnf/wqPf8Op/f uNr+Jufsd2+Rrbes2Vk98d4/VAB8BbE= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 828A1DA7; Wed, 19 Jun 2024 02:18:24 -0700 (PDT) Received: from [10.1.36.163] (XHFQ2J9959.cambridge.arm.com [10.1.36.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3F1CE3F7B4; Wed, 19 Jun 2024 02:17:58 -0700 (PDT) Message-ID: <48859779-45ba-445d-8ce0-486575a3fd7b@arm.com> Date: Wed, 19 Jun 2024 10:17:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 0/5] Alternative mTHP swap allocator improvements Content-Language: en-GB To: Barry Song Cc: Andrew Morton , Chris Li , Kairui Song , "Huang, Ying" , Kalesh Singh , Hugh Dickins , David Hildenbrand , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Shuai Yuan References: <20240618232648.4090299-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: BF3A8140010 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: bxaxbj8hrujcn49fddsm4xnnd1zbbc45 X-HE-Tag: 1718788680-934860 X-HE-Meta: U2FsdGVkX18L0Q5xVH/3VGhJc09HtyE1A6mnK25ToroydAT3eZmkmwaNRq3nv89KLZu7kpNeBUZvCCU87GotFdXlYW6L1pQ3sCVjpae+d8yLWo0O9BiIVXDVlA/NkGNTDwIobEK0rS54kRsholeb9PeOp8VHfgSPjm0q8bcS+MkZkIcmBkV/iecQOOwIq2GdC7TV1N0qNyoRSOlRNxNgP/HpsUYyDsLY/jhox7g6DPjSHGD1mi6mHv+WnWoMT92MROrJ1VE7GwXsVK7kaZySVBkXm5mSVYgaips0+OPEPHmVrtv6gk6tRnWGLHRxBmk7CUTqNeFaCE+05OZGAxMgBkpMhBRNY9UJKT6ihzJitsh7lwQL2sPPRyLoOgNg1SAsAicAx0fJm/lef/XAW8NxeGqzwb+W7IYBR3vzs+vNp2RSELLzYsPPTLUVUTrHo5p4JOwiEgC2yubrGbmo3vYsPVjYBWfpudW0OTqX6mvrTnRBxWrLYDX6FF1hLp8SsyqloTwR0dkWvtnJrWQyza4NC3WRGmo6PPbCAnfYGNm4mI0svCQw1+7jRZ0EPSSwwOVN1Ds2BdRmGGWMhyKSbZ+2Y5O+MhxP/VsaHXECS3GinRn9OXTHPb5UxbvgVBO7rcjwq6pFl96A+adnV/L0xUaFekVsWW/BG7W09U4QoVnXu+lfknyBkmiwFauL6ewWo52LMpsw0XaYaiqS279ncJ4Hdydkkaf+vCvq4GEODlSvQPrb37TTl+UQr86PGJk+vgdb+5f+wNfmP0aO8lstjOJwQvxBgFx3aRB3BOnIk1nQbIEdn8XdDzVZrvzEJEB/yP4ADt9F9n1tWSeQTNhl6N0No+MAFEi/GrXxYeT0C2XERFAToypauGqtEK0u2mpo1c5tysZ+jwjkM6jJXNniuyV3mMiYNgoew73ijs3HSaO1E29kE+IrtnHyHrLlb+pYAE5L6Zf4KVw9bpEVLQVPcP3 3zpc6XmQ VjMFLjfkCn5V5PiQbizdPJXkQO/bGYpCw0DuGwHCyHVbHO4NzlOeAgwECT/5HTOR2PBXy9vz9kg7Y8+D3kKHOM2+0QGlNC1uFc1cnHaxb3qXW3QKkj3y0XF+CECDrq0YE1CNjFtIZdc9+EB9bXk7wvlR3RDdHKPWk8hBJVBjFP65bQ+337evqUlzEKF+WlW8huDoQExDqWPtpRghpBkY1dO4kGbR8QfFLlsJ0UU+R+U8PfjVgoTMpom+YujmN+BiC7ETX4XROwv9lEJi9EcJ9fyUD9IKa/hL0mbbzLz/JQB5+dj72ydC98eGoog== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 19/06/2024 10:11, Barry Song wrote: > On Wed, Jun 19, 2024 at 11:27 AM Ryan Roberts wrote: >> >> Hi All, >> >> Chris has been doing great work at [1] to clean up my mess in the mTHP swap >> entry allocator. But Barry posted a test program and results at [2] showing that >> even with Chris's changes, there are still some fallbacks (around 5% - 25% in >> some cases). I was interested in why that might be and ended up putting this PoC >> patch set together to try to get a better understanding. This series ends up >> achieving 0% fallback, even with small folios ("-s") enabled. I haven't done >> much testing beyond that (yet) but thought it was worth posting on the strength >> of that result alone. >> >> At a high level this works in a similar way to Chris's series; it marks a >> cluster as being for a particular order and if a new cluster cannot be allocated >> then it scans through the existing non-full clusters. But it does it by scanning >> through the clusters rather than assembling them into a list. Cluster flags are >> used to mark clusters that have been scanned and are known not to have enough >> contiguous space, so the efficiency should be similar in practice. >> >> Because its not based around a linked list, there is less churn and I'm >> wondering if this is perhaps easier to review and potentially even get into >> v6.10-rcX to fix up what's already there, rather than having to wait until v6.11 >> for Chris's series? I know Chris has a larger roadmap of improvements, so at >> best I see this as a tactical fix that will ultimately be superseeded by Chris's >> work. >> >> There are a few differences to note vs Chris's series: >> >> - order-0 fallback scanning is still allowed in any cluster; the argument in the >> past was that swap should always use all the swap space, so I've left this >> mechanism in. It is only a fallback though; first the the new per-order >> scanner is invoked, even for order-0, so if there are free slots in clusters >> already assigned for order-0, then the allocation will go there. >> >> - CPUs can steal slots from other CPU's current clusters; those clusters remain >> scannable while they are current for a CPU and are only made unscannable when >> no more CPUs are scanning that particular cluster. >> >> - I'm preferring to allocate a free cluster ahead of per-order scanning, since, >> as I understand it, the original intent of a per-cpu current cluster was to >> get pages for an application adjacent in the swap to speed up IO. >> >> I'd be keen to hear if you think we could get something like this into v6.10 to >> fix the mess - I'm willing to work quickly to address comments and do more >> testing. If not, then this is probably just a distraction and we should >> concentrate on Chris's series. > > Ryan, thank you very much for accomplishing this. > > I am getting Shuai Yuan's (CC'd) help to collect the latency histogram of > add_to_swap() for both your approach and Chris's. I will update you with > the results ASAP. Ahh great - look forward to the results! > > I am also anticipating Chris's V3, as V1 seems quite stable, but V2 has > caused a couple of crashes. > >> >> This applies on top of v6.10-rc4. >> >> [1] https://lore.kernel.org/linux-mm/20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org/ >> [2] https://lore.kernel.org/linux-mm/20240615084714.37499-1-21cnbao@gmail.com/ >> >> Thanks, >> Ryan >> >> Ryan Roberts (5): >> mm: swap: Simplify end-of-cluster calculation >> mm: swap: Change SWAP_NEXT_INVALID to highest value >> mm: swap: Track allocation order for clusters >> mm: swap: Scan for free swap entries in allocated clusters >> mm: swap: Optimize per-order cluster scanning >> >> include/linux/swap.h | 18 +++-- >> mm/swapfile.c | 164 ++++++++++++++++++++++++++++++++++++++----- >> 2 files changed, 157 insertions(+), 25 deletions(-) >> >> -- >> 2.43.0 >> > > Thanks > Barry