From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED279C25B7E for ; Tue, 28 May 2024 21:04:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C68C6B00AC; Tue, 28 May 2024 17:04:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 775546B00AD; Tue, 28 May 2024 17:04:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63E2E6B00AE; Tue, 28 May 2024 17:04:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4783D6B00AC for ; Tue, 28 May 2024 17:04:54 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E3CE51A02EF for ; Tue, 28 May 2024 21:04:53 +0000 (UTC) X-FDA: 82169034066.25.21C798A Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf22.hostedemail.com (Postfix) with ESMTP id 4A4EEC0021 for ; Tue, 28 May 2024 21:04:50 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=T4Xp1GuF; spf=pass (imf22.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716930292; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9Qar1QHVcwcMAmeEEsWX2uTMYYV2Fl+TUDrS+22iA6w=; b=cF/Z8AR0mA+byzH7Ji+vDNN+GQspzG+yOlub5hmkz+f37xvKBndLMX0eDYUes32fLeRrRq IBrStxyVc/jlgpDcRTd6QAsH2GLlPMks1XEVvp9ThAxM5Zb0KKKpgYVh7IYjZSADSJkF07 PF4B1T/uMzCXtRRgYhSXQIOBQRb6qhU= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=T4Xp1GuF; spf=pass (imf22.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716930292; a=rsa-sha256; cv=none; b=tcyyf5TqheHu6voLJVSXWp5ipPXlx2xbdU5T/sQLe9TieVIbxhqIF6dBAUTAdWYIYQWy+e GUXoB8co7oelfRaGMCfT7QKiRYXJdU5rQIcIm81LxFg/aBxUEdiMQqXa4lDBkFKLv0CfHR 0uGndcMr4SYG3jIvOBaKgekeAcQizzU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id D7971CE1750 for ; Tue, 28 May 2024 21:04:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1CADAC4AF08 for ; Tue, 28 May 2024 21:04:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716930287; bh=FQKBHUMwkBg9cm1I18Hsi7G/Wdg85FGQc52gDleKmxU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=T4Xp1GuFRLL1ToWfAXx3Vlkk8rqfejFmIbHOhFcF6hSyyQvD35ztsF5bT2Yd5323b zR7FQ0fsmYUofuujnDdqOyvVoGdu1DfeW8xv7kDLiAMffzbkWRlYitLhCPoDa0NnVL iEqX6qOV45rEhiQdjHnMjm9FHwFG4+97bikGcatC1yRwpuOnTdNRszefS3yUSatWvU uYVXppjby84a9DvpMjV7NWFm4hdNGh2gUuQfUJhhb+LkGZioNK/PTEPUeOqoa2Qn0g HrpCWdsMDL+95zhnYS6PhfFUIRY0epr4j8roW6Pcm2tu+bqGWJlcGsMKlhadQXFU35 88EVuw5HQupgg== Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-529614b8c29so2198591e87.2 for ; Tue, 28 May 2024 14:04:47 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCWtOPkQTv1nzYMZzueGjuJeWLU38MrX4tfumeJSBKrjzjP0ITmrphd/GAhsWH8/5kn6UdyZVq3Ms6z+tWBfC55N+Gg= X-Gm-Message-State: AOJu0YwyFqp+dZ6ZWV5b1Z6/kveUmamp4/jFDWD1sChqLxjbibULpcDg 9sxM0uNdINOoZrP22tZq4MMhDCeqaYlhyn20pe8v43llgmmts/KPhbitFE/kgFrlbX3T/FwfRN6 wS4+tlJrDxhAB77Njw8CAVgEkIg== X-Google-Smtp-Source: AGHT+IH3r7x8C5fW6foR+0V0Ga6GIwmxLEEA8CoeTj8OnCZQxMuj6eV2dscFklGcPHU6QnPqmT9Ohamn038HopG4uso= X-Received: by 2002:a19:8c1d:0:b0:51b:9254:91e7 with SMTP id 2adb3069b0e04-529679322a3mr10216024e87.61.1716930285747; Tue, 28 May 2024 14:04:45 -0700 (PDT) MIME-Version: 1.0 References: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> In-Reply-To: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> From: Chris Li Date: Tue, 28 May 2024 14:04:34 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order To: Andrew Morton Cc: Kairui Song , Ryan Roberts , "Huang, Ying" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4A4EEC0021 X-Stat-Signature: 6oe6izgb9pe1zcmgnppdk4reeusm36rb X-Rspam-User: X-HE-Tag: 1716930290-843653 X-HE-Meta: U2FsdGVkX18o0ky6E42/eEb+zMIYGXxC3savwjV3zY1AP5OnVJ7LRCUnMzx1bZwxWX1xjj/jyWyxSnxASMyrxFtvoRe0gBcjE+Hqz/VVRsrbK8gHteQtHMHfecL1iCuXHcpEHC+ZE5iKvcWOiXQA27GEgAHRL7ibboGN3JIKXrGPCfpO795S5AuspKdKFhQ9bHG/GbeS8RostK1/hAzjU4Vy4BooigSu/25SYtnI5ftdokScQiRj//RqWOFPoCIYwhxwUBfr53IR94+c8M1M3D9RLx1BUIamHQeC+8MrvJzGRLOaSSqwmnBQBalkopmkIeDDcSAjb/OKiH+reuGIbZqjbDV2n8l3DObmK9lLK8AzgVe2hFfar0QaiDwEvRDg21M07OznTxtnxnkoxV34WeLN26tX3TeiGh2ynb46Ge00dzS6ZdP6deOULgDrzWZZmSOwDtzQ1GU6YGtl9HdJ9xPfLKg91d/t/D1B0FAGabRkEFlJEvJto/MwKFtP8BCHkzDY4r0Wfwpa7TEMgyB4geNPg/2gIPq1Ae+GQqdpkvY/QsOKW1MEe04rYD4Iqfj2v/+6KDCNVq6KKhwdFJG9Syu7a8KL1m2q0M6+QPFD6z14IC11uSt2r2DP4BWqWBwxvcMVNc2d+5lRIh7ICEz+04BlLciVhe8WEWne21ZFo+G6+LHSaTvF05UGj8SKwgz8ARvhbixIFMsDwj26ASOe3ZbdN0TodCTxYh94TJ0yR7scuUAXIZBHepZElNuHUPW11t8ySgZw57tFJWnBIgOt86R7S/xhxgzAY5dTchkLUtSZwAdYYNTWTbV6RxSyMuaUBSsMoVtlci0PBCSXb9J2RuQTr9CJX7GX3CugO6DfDMPfJXwx4MzSmRwWoY9Z3J64dDe4C14wfkTG/WuUcsTN2grW1BqBv89i8OWcdiw64YCbNdkgyd7duucvyNpSqmucDRdpCMQKWgIk+AHooTn q2INlk+3 g/X2Q6h6TnpSNRliyTD8D2ZAWW8PSf+8ZJiC7hMmhyilfwAeG55Ce9i3mubA7pozXzcT9tGRDrxwXFzhWeyBLzNpeDIs6g/hOXRHq3J3VGznWFG81BYOVh2AVdMEPkc4OhPYN+iFHp528kAZKtm6+unAVub5+LedxYq4IQeTNcgC81UXkOR94b/YqJMWM0wxoeiPw22BokhVPSr0+zTTXTs9K+DE7hOzGqVEtOHN31qV3TGmobSgTcJlWty5FSza0Ovc8W2pV5Wd1HS3wQyXmFVu1CypgfqSl0Rxkk7KVrQPbcXuafGbXZOeyzT54/t9t8+lhZocXGXsMrJ4yEiMO8nkXUzVGNF0GynSM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I am spinning a new version for this series to address two issues found in this series: 1) Oppo discovered a bug in the following line: + ci =3D si->cluster_info + tmp; Should be "tmp / SWAPFILE_CLUSTER" instead of "tmp". That is a serious bug but trivial to fix. 2) order 0 allocation currently blindly scans swap_map disregarding the cluster->order. Given enough order 0 swap allocations(close to the swap file size) the order 0 allocation head will eventually sweep across the whole swapfile and destroy other cluster order allocations. The short term fix is just skipping clusters that are already assigned to higher orders. In the long term, I want to unify the non-SSD to use clusters for locking and allocations as well, just try to follow the last allocation (less seeking) as much as possible. Chris On Fri, May 24, 2024 at 10:17=E2=80=AFAM Chris Li wrote= : > > This is the short term solutiolns "swap cluster order" listed > in my "Swap Abstraction" discussion slice 8 in the recent > LSF/MM conference. > > When commit 845982eb264bc "mm: swap: allow storage of all mTHP > orders" is introduced, it only allocates the mTHP swap entries > from new empty cluster list. That works well for PMD size THP, > but it has a serius fragmentation issue reported by Barry. > > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah= +NSgNQ@mail.gmail.com/ > > The mTHP allocation failure rate raises to almost 100% after a few > hours in Barry's test run. > > The reason is that all the empty cluster has been exhausted while > there are planty of free swap entries to in the cluster that is > not 100% free. > > Address this by remember the swap allocation order in the cluster. > Keep track of the per order non full cluster list for later allocation. > > This greatly improve the sucess rate of the mTHP swap allocation. > While I am still waiting for Barry's test result. I paste Kairui's test > result here: > > I'm able to reproduce such an issue with a simple script (enabling all or= der of mthp): > > modprobe brd rd_nr=3D1 rd_size=3D$(( 10 * 1024 * 1024)) > swapoff -a > mkswap /dev/ram0 > swapon /dev/ram0 > > rmdir /sys/fs/cgroup/benchmark > mkdir -p /sys/fs/cgroup/benchmark > cd /sys/fs/cgroup/benchmark > echo 8G > memory.max > echo $$ > cgroup.procs > > memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 -t 32 -B bi= nary & > > /usr/local/bin/memtier_benchmark -S /tmp/memcached.socket \ > -P memcache_binary -n allkeys --key-minimum=3D1 \ > --key-maximum=3D18000000 --key-pattern=3DP:P -c 1 -t 32 \ > --ratio 1:0 --pipeline 8 -d 1024 > > Before: > Totals 48805.63 0.00 0.00 5.26045 1.= 19100 38.91100 59.64700 51063.98 > After: > Totals 71098.84 0.00 0.00 3.60585 0.= 71100 26.36700 39.16700 74388.74 > > And the fallback ratio dropped by a lot: > Before: > hugepages-32kB/stats/anon_swpout_fallback:15997 > hugepages-32kB/stats/anon_swpout:18712 > hugepages-512kB/stats/anon_swpout_fallback:192 > hugepages-512kB/stats/anon_swpout:0 > hugepages-2048kB/stats/anon_swpout_fallback:2 > hugepages-2048kB/stats/anon_swpout:0 > hugepages-1024kB/stats/anon_swpout_fallback:0 > hugepages-1024kB/stats/anon_swpout:0 > hugepages-64kB/stats/anon_swpout_fallback:18246 > hugepages-64kB/stats/anon_swpout:17644 > hugepages-16kB/stats/anon_swpout_fallback:13701 > hugepages-16kB/stats/anon_swpout:18234 > hugepages-256kB/stats/anon_swpout_fallback:8642 > hugepages-256kB/stats/anon_swpout:93 > hugepages-128kB/stats/anon_swpout_fallback:21497 > hugepages-128kB/stats/anon_swpout:7596 > > (Still collecting more data, the success swpout was mostly done early, th= en the fallback began to increase, nearly 100% failure rate) > > After: > hugepages-32kB/stats/swpout:34445 > hugepages-32kB/stats/swpout_fallback:0 > hugepages-512kB/stats/swpout:1 > hugepages-512kB/stats/swpout_fallback:134 > hugepages-2048kB/stats/swpout:1 > hugepages-2048kB/stats/swpout_fallback:1 > hugepages-1024kB/stats/swpout:6 > hugepages-1024kB/stats/swpout_fallback:0 > hugepages-64kB/stats/swpout:35495 > hugepages-64kB/stats/swpout_fallback:0 > hugepages-16kB/stats/swpout:32441 > hugepages-16kB/stats/swpout_fallback:0 > hugepages-256kB/stats/swpout:2223 > hugepages-256kB/stats/swpout_fallback:6278 > hugepages-128kB/stats/swpout:29136 > hugepages-128kB/stats/swpout_fallback:52 > > Reported-by: Barry Song <21cnbao@gmail.com> > Tested-by: Kairui Song > Signed-off-by: Chris Li > --- > Chris Li (2): > mm: swap: swap cluster switch to double link list > mm: swap: mTHP allocate swap entries from nonfull list > > include/linux/swap.h | 18 ++-- > mm/swapfile.c | 252 +++++++++++++++++----------------------------= ------ > 2 files changed, 93 insertions(+), 177 deletions(-) > --- > base-commit: c65920c76a977c2b73c3a8b03b4c0c00cc1285ed > change-id: 20240523-swap-allocator-1534c480ece4 > > Best regards, > -- > Chris Li >