From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AE64C27C53 for ; Fri, 7 Jun 2024 09:43:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9533C6B00AC; Fri, 7 Jun 2024 05:43:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 903176B00B4; Fri, 7 Jun 2024 05:43:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CAD56B00B8; Fri, 7 Jun 2024 05:43:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 57A2C6B00AC for ; Fri, 7 Jun 2024 05:43:21 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 11356A3192 for ; Fri, 7 Jun 2024 09:43:21 +0000 (UTC) X-FDA: 82203604602.12.AD13135 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 3D43018000C for ; Fri, 7 Jun 2024 09:43:19 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717753399; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pMrJBN0qHYaDa9moFF5GYtG5oDRMxpHdfoUb0qbUIRw=; b=IDqsK1/MBGBq+NbhFQuKJL8tYZxT0+gTg2XbfGQyqeZO4viQsPq39XfQQW+oAk5fy9cp5z uwNV2MI85pjqXSiS8FWdxXqj2UZRfAMKZY8JHmzZSYWbf2L1F2IB2v4LpV9DLOOuyy0/rq GA7RqbEAZZ0jroB4+rAUrZ7hoUMowC0= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717753399; a=rsa-sha256; cv=none; b=AQmoRmuQnXNNegTkqVC8oSVZLcZXyjp3BEUiRHQxDO8RkXwqKfn3t/Isl36nWKj7ErTP93 AfQ5d3LQ7TvauDcAJFhDkWjG2qWwWxG2FOejFH6Arrmche7vJo03MI2Tz2neLhNdV8X1gn jqR10+kAHQSQhmHCGxeA6KANRGE9+U8= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 96B9F2F4; Fri, 7 Jun 2024 02:43:42 -0700 (PDT) Received: from [10.57.70.246] (unknown [10.57.70.246]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 30C913F762; Fri, 7 Jun 2024 02:43:17 -0700 (PDT) Message-ID: <968fec1a-9a54-4b2d-a54c-653d84393c82@arm.com> Date: Fri, 7 Jun 2024 10:43:15 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order Content-Language: en-GB To: Chris Li , Andrew Morton Cc: Kairui Song , "Huang, Ying" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song References: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> From: Ryan Roberts In-Reply-To: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 3D43018000C X-Stat-Signature: afsyo5hhsqyo5x9tkkngrgbenoceure4 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1717753399-635340 X-HE-Meta: U2FsdGVkX19c62hxaPmIFEohHnXIZSG7AtYlaplJOy5g44YfZwqXNHkms093Rc8omz72DBEGz/z9ki5fQtpOf7RjsxsZgT8u1QV6MaiY3aawLRZYjl+Q75iS1/cxrEV8i9ti+0fRO79oaM+5yn3b8xk6t5XTE83zUGgDDgOLlEJ7+99bRb1v5G3bhm3oub8QSmBZKQ1VKVxMorRiNNSxBMpiSDCpGCJ9Z651W59X9TbXYM3wFzz7ASiX4l0sg+qq4MKCverengdgR/IbSEvf9nNZiYCXYayeyhJrhk94Fhwngcnurdjwtvy6xDgHSzEKhw4XuGxBLnfx5SLahqhI3GaKYjj0Bt68RY+DhmL2auCiKVAOU7jDPiUNjfdadn4pRomMsBSBLNQZ6f82XYPlJzd09NL4CBLzOAxKzCzw+v/nvWpPltDgl6Yt6mYaAQe8DM1n9mKVmCPE8eXoxICoKghJVryH2TZvwKBmNwNKr+UHxONOmrOBtE2EiH4q97q7qloE8WRs52mcb4qod6Deou0kWzRz46PHhDhYS7OAjRRJQS169DpfOYcip6/I1+AjTKy3bH8Xc+lZIMfT/uUmJ9mbCH9Yai3A6Yo05/Ryqw2X4+uu6JbkIeL98NboEEHjFcW3dtEp/BN0aNb2BLHjZL5V6rwbCmcU8SkGIdY6/JlN4ag+Hx2iqQiG2pOpOC4qp22KrSTTjA/qYzkJ8VrdnF4ktw0zpNczDPN/ewzaFtKDMEXWeXXX4AoAzwyqESEh4TI53r49hfRMg3RwxSJ4ODQchrA/m1Dx3fOEbzA7ipsNpqo9BW8PLqMU5v4DtuA54VyM5t1UEZJubgrwYeHrfg/De200sAp4KXmm6Pqeg3LgJj42hMJxnmosrMO/78uM1EyhNSgOD3L/Jj77SzFqfnoLKwhYxNDmzKRXiVdYf/+B7N2UwWhxdEGPR2lJCxKtQjDTEEJFAZix1O9CqVZ wGIF8T16 F6EYe97v1b2kciXJjNwGcKOwqGxhtJz52aJrFkPHABbuJ0DLKQWGwu02Yv+pkX3kW3GkgiOk45f+UAvLbP5qQiS0XGczy/Btdex3vRhnZd7cImOSm4eGVPEwimJqB30ZLhcruwob9+gcDuVQiw0F+N2FR+hK3JMajIZEBbBdow0/Z2bHcREQzbb3udvZM3ECLIw44jEZnskw5gEs7Anm2HTrZjLPVKvZnq2InA0XNtyIV0oXNtoBln4Vf3U+Zd8245QY9yl2sXFWsTPNocqgIF9xm3pRGsBuN2Iw1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Sorry I'm late to the discussion - I've been out for the last 3.5 weeks and just getting through my mail now... On 24/05/2024 18:17, Chris Li wrote: > This is the short term solutiolns "swap cluster order" listed > in my "Swap Abstraction" discussion slice 8 in the recent > LSF/MM conference. I've read the article on lwn and look forward to watching the video once available. The longer term plans look interesting. > > When commit 845982eb264bc "mm: swap: allow storage of all mTHP > orders" is introduced, it only allocates the mTHP swap entries > from new empty cluster list. That works well for PMD size THP, > but it has a serius fragmentation issue reported by Barry. Yes, that was a deliberate initial approach to be conservative, just like the original PMD-size THP support. I'm glad to see work to improve the situation! > > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@mail.gmail.com/ > > The mTHP allocation failure rate raises to almost 100% after a few > hours in Barry's test run. > > The reason is that all the empty cluster has been exhausted while > there are planty of free swap entries to in the cluster that is > not 100% free. > > Address this by remember the swap allocation order in the cluster. > Keep track of the per order non full cluster list for later allocation. I don't immediately see how this helps because memory is swapped back in per-page (currently), so just because a given cluster was initially filled with entries of a given order, doesn't mean that those entries are freed in atomic units; only specific pages could have been swapped back in, meaning the holes are not of the required order. Additionally, scanning could lead to order-0 pages being populated in random places. My naive assumption was that the obvious way to solve this problem in the short term would be to extend the scanning logic to be able to scan for an arbitrary order. That way you could find an allocation of the required order in any of the clusters, even a cluster that was not originally allocated for the required order. I guess I should read your patches to understand exactly what you are doing rather than making assumptions... Thanks, Ryan > > This greatly improve the sucess rate of the mTHP swap allocation. > While I am still waiting for Barry's test result. I paste Kairui's test > result here: > > I'm able to reproduce such an issue with a simple script (enabling all order of mthp): > > modprobe brd rd_nr=1 rd_size=$(( 10 * 1024 * 1024)) > swapoff -a > mkswap /dev/ram0 > swapon /dev/ram0 > > rmdir /sys/fs/cgroup/benchmark > mkdir -p /sys/fs/cgroup/benchmark > cd /sys/fs/cgroup/benchmark > echo 8G > memory.max > echo $$ > cgroup.procs > > memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 -t 32 -B binary & > > /usr/local/bin/memtier_benchmark -S /tmp/memcached.socket \ > -P memcache_binary -n allkeys --key-minimum=1 \ > --key-maximum=18000000 --key-pattern=P:P -c 1 -t 32 \ > --ratio 1:0 --pipeline 8 -d 1024 > > Before: > Totals 48805.63 0.00 0.00 5.26045 1.19100 38.91100 59.64700 51063.98 > After: > Totals 71098.84 0.00 0.00 3.60585 0.71100 26.36700 39.16700 74388.74 > > And the fallback ratio dropped by a lot: > Before: > hugepages-32kB/stats/anon_swpout_fallback:15997 > hugepages-32kB/stats/anon_swpout:18712 > hugepages-512kB/stats/anon_swpout_fallback:192 > hugepages-512kB/stats/anon_swpout:0 > hugepages-2048kB/stats/anon_swpout_fallback:2 > hugepages-2048kB/stats/anon_swpout:0 > hugepages-1024kB/stats/anon_swpout_fallback:0 > hugepages-1024kB/stats/anon_swpout:0 > hugepages-64kB/stats/anon_swpout_fallback:18246 > hugepages-64kB/stats/anon_swpout:17644 > hugepages-16kB/stats/anon_swpout_fallback:13701 > hugepages-16kB/stats/anon_swpout:18234 > hugepages-256kB/stats/anon_swpout_fallback:8642 > hugepages-256kB/stats/anon_swpout:93 > hugepages-128kB/stats/anon_swpout_fallback:21497 > hugepages-128kB/stats/anon_swpout:7596 > > (Still collecting more data, the success swpout was mostly done early, then the fallback began to increase, nearly 100% failure rate) > > After: > hugepages-32kB/stats/swpout:34445 > hugepages-32kB/stats/swpout_fallback:0 > hugepages-512kB/stats/swpout:1 > hugepages-512kB/stats/swpout_fallback:134 > hugepages-2048kB/stats/swpout:1 > hugepages-2048kB/stats/swpout_fallback:1 > hugepages-1024kB/stats/swpout:6 > hugepages-1024kB/stats/swpout_fallback:0 > hugepages-64kB/stats/swpout:35495 > hugepages-64kB/stats/swpout_fallback:0 > hugepages-16kB/stats/swpout:32441 > hugepages-16kB/stats/swpout_fallback:0 > hugepages-256kB/stats/swpout:2223 > hugepages-256kB/stats/swpout_fallback:6278 > hugepages-128kB/stats/swpout:29136 > hugepages-128kB/stats/swpout_fallback:52 > > Reported-by: Barry Song <21cnbao@gmail.com> > Tested-by: Kairui Song > Signed-off-by: Chris Li > --- > Chris Li (2): > mm: swap: swap cluster switch to double link list > mm: swap: mTHP allocate swap entries from nonfull list > > include/linux/swap.h | 18 ++-- > mm/swapfile.c | 252 +++++++++++++++++---------------------------------- > 2 files changed, 93 insertions(+), 177 deletions(-) > --- > base-commit: c65920c76a977c2b73c3a8b03b4c0c00cc1285ed > change-id: 20240523-swap-allocator-1534c480ece4 > > Best regards,