From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0757EC52D7C for ; Mon, 19 Aug 2024 21:27:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A81E6B0082; Mon, 19 Aug 2024 17:27:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 758206B0083; Mon, 19 Aug 2024 17:27:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6204A6B0085; Mon, 19 Aug 2024 17:27:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 43A4B6B0082 for ; Mon, 19 Aug 2024 17:27:32 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A9FD6121596 for ; Mon, 19 Aug 2024 21:27:31 +0000 (UTC) X-FDA: 82470281502.05.FC0AEA3 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf27.hostedemail.com (Postfix) with ESMTP id 2B72940008 for ; Mon, 19 Aug 2024 21:27:27 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=NRIvyYtC; spf=pass (imf27.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724102745; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l+8OcCz7qVSMN2RcIs2dAqXGK0HtTTa+iTmyUBptqWs=; b=zNvuF5qUZuvciWu+/SP9XquLhBr3qTxilLdFMgGd9ncOu8VRu9xTlunsBjH1WSqTfvYO6I EaM35i2lCIye6+SquC8R5bqW13IuGZNG8sXgW8JwAVqHV33xYW9ItknvstI5PBf/FQBfCm tMCKgVyv4pksbpdjveCdHz0tRoHC9LQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=NRIvyYtC; spf=pass (imf27.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724102745; a=rsa-sha256; cv=none; b=A91I74eF828XxqI+xVUEZ7F7NjCIs7DqdTFcjZRKKjEjys5TpwnuvoVNipxGzg6/SF0m4T B0YLqImh/Y0gI7W4Tnr9Tbjmv5B0xsw6APrlelVuPJJ0rVk8tOdUMMh7niisHnXYBDNVJV VWbJZurtbGmYsINrNOmbbd3nnnXsyzc= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id AA6C9CE0B1E for ; Mon, 19 Aug 2024 21:27:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98058C4AF14 for ; Mon, 19 Aug 2024 21:27:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724102842; bh=S4kh0/Eh3K6kZK8lvHik8ZeQrONg9hM17fHH8RMuyIE=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=NRIvyYtC7JAYcGE1zDeYx4Asg2lKrE26OIhnR/6N5Pxg5vyy0v/Kqp1o8oHvIGoc+ MvYz4QXfe9gh7Iea5vRAmUmffDte7Nhyys21Nd5VowmIQ5e6BV8WPNg6w48yDhRLy5 AF0CLLRz7p+mRB8jri5cGNF67wkzLwfNI43dLFrKUSPwdPXBXs/Md7OBh0CTeInZ0n z+ZrjAxvMW93uR5kZ66eIxakDpT4hDqZNAcd4AfO2KIiE6vmeUw0W5tJlsWhiMN0mV 5FZ2AfwQn0qFyhtyvbLAO/A8+BRqx7wz4AElQDAtKLf0fpdKHahPmuJO9GvoOxu2OM Bqt6vzp6b9VRA== Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-6b5c37a3138so19004097b3.1 for ; Mon, 19 Aug 2024 14:27:22 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCUQ/Hp4w7wdiPOcQsrOW3uNRMqqJw+2/AghW7OIiitni326l7uWvzvJ5gpEsQ1pdku1zRjjeCHupRKjWDrQXg2xgCg= X-Gm-Message-State: AOJu0YwL9NxbN5YCjrKizxWXPR88EaPTWJAbk3/AIsl+s+VrFvKldKJP PqPTXX4XqxZjiR1Pe8XaG+FqZVNp0MmfyLnOauetHMjcuPRXwaxiIBCzjKZyRh5BmgCHLsI1J8m T9pu5vhPBd4f21/vJZMGHC8gSys1rrTsJdLnkAA== X-Google-Smtp-Source: AGHT+IHkY0Mt5ULs8jpDmsWONACCBqFATj6UWmYPvOOtw9yfwIeD2mNjPJbkHH9yAqZjMYom0C9vVWp+DT1R2ef8y1I= X-Received: by 2002:a05:690c:670f:b0:6b1:8d2:8819 with SMTP id 00721157ae682-6b1ba7e665fmr149331347b3.15.1724102841886; Mon, 19 Aug 2024 14:27:21 -0700 (PDT) MIME-Version: 1.0 References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <87h6bw3gxl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sevfza3w.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ttfghq7o.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Chris Li Date: Mon, 19 Aug 2024 14:27:11 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v5 0/9] mm: swap: mTHP swap allocator base on swap cluster order To: Kairui Song Cc: "Huang, Ying" , Hugh Dickins , Andrew Morton , Ryan Roberts , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2B72940008 X-Stat-Signature: xwqk1e4jjt7wo4cjnui3m1km6j98honj X-Rspam-User: X-HE-Tag: 1724102847-1609 X-HE-Meta: U2FsdGVkX1/5zOe0l78iXSRHF16PZ4VSHKT1n7rsQWuD5rApHob5OEiPOXVqFnIY5CtE9KAbhimdvxlJrpteB7q8rVc06LHxsA2m5Gn3d4VqmCDwK5hytJXBUMCiEU4Brg8cqklbTze5qBwTyua6cpkEakdM8i8SVq09bnKZC+d2gi4qwef9RCKaYD79Jp86070Uw/akvNRKmAa7zFLx0HNmi9Xz7W7USRre3IL/aUto3G1pjAzSxsF0q6U7lzISvUAwKudaYYbhpcQ2jrf8WrLR1QKx3g8Tjv4Eu3Xav5hle5a9OJ+J9uNzpPiU4Eg/jIofj/zXRmoHZJN9/WxQDqnl0Hfn5qDy2umjIdaQ6ZtHBoS9vmBJJ7vJouvbQ7MpTCrYBUqQJcc+q4y8QsvJJ5BsIAnfVe085QEK9+CJHjJHfiBdsz6Av4dZXhiNN8prssnFmazG/ja2lXePxYt7S6V7DKT3TIfz94kHTNUEaXeRKcGIeq9ZZ3uOvR7DsKnTqpCe4k2cg0mQs9dQhleE0WHpefu9Xun7+cQWwpb/kILR2FniN93JMuGHySeCZM5aifowov7vjHp0XrNxj6wp33eoIL0A0pMlaSJtuHWkiZZnyUO2pkGnjPoxkWCe8g8BccnwAYiDBnK8+zY/K1STmefcjR9HoPHh/sYIdPNUIuZgtuhiuOp0r2K1nhKvdlv9aN51FjGNXCq2fgNX5mD4H8YhXuzyE5PHQuijehceY+IFo8tg+VzmlpJAh6C32pa5UM5O9BumzhyOf7RDh90cVX2g2Z4/xiGkDsi6Tlv+Issxi4POxbySv2CYgP9i/8TrvGsEfJQjlTM3Ah3Z/1ySBG1PCRvDk8s/XN138IsVQvNkhJ0JPW2s76b48qwI30dcrAzTuWOtujc0Sxkw6Nq4cvgz40Y8/k2DPODogvMsq0XNE+VKqDs8pNq3+mueaN2TYGIGMmq/tSBodLR+G6B Ti7ddUNM fK2ux6vaF7hEqSHeBj405gWbaBkTBREv5uNeSnxpinhz/GrUT4ulfFtuw/sZHJ8arbgs5xMVPpK5NEkeLQn/KNxUi5CKDJVO8xfgFQsd5Ca7vFbxGIgVaPDGCEu8O63bSuPF19l9Ul3Wf3QC69b5tKuquXv3qLNv45M8I197GJVk3Hlh+um14PYWKmQmlTV0340H6Bz1dmjmHL4GGuaYmk96Ktf5kMD7C5080h1TRR/n+onQV5Ylk1XJxzrnvnxKQ8Phniib+3AJnmH73CUJDRl6FxYVYxBHDQbpIHM6wNRlyu8cAmHKyzk++ojDjwxmKmD2kLZTd7wfBHFqclPszwG0/W1KUi7qjplZV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Kairui, On Mon, Aug 19, 2024 at 1:48=E2=80=AFAM Kairui Song wrot= e: > > On Mon, Aug 19, 2024 at 4:31=E2=80=AFPM Huang, Ying wrote: > > > > Kairui Song writes: > > > > > On Fri, Aug 16, 2024 at 3:53=E2=80=AFPM Chris Li = wrote: > > >> > > >> On Thu, Aug 8, 2024 at 1:38=E2=80=AFAM Huang, Ying wrote: > > >> > > > >> > Chris Li writes: > > >> > > > >> > > On Wed, Aug 7, 2024 at 12:59=E2=80=AFAM Huang, Ying wrote: > > >> > >> > > >> > >> Hi, Chris, > > >> > >> > > >> > >> Chris Li writes: > > >> > >> > > >> > >> > This is the short term solutions "swap cluster order" listed > > >> > >> > in my "Swap Abstraction" discussion slice 8 in the recent > > >> > >> > LSF/MM conference. > > >> > >> > > > >> > >> > When commit 845982eb264bc "mm: swap: allow storage of all mTH= P > > >> > >> > orders" is introduced, it only allocates the mTHP swap entrie= s > > >> > >> > from the new empty cluster list. It has a fragmentation issu= e > > >> > >> > reported by Barry. > > >> > >> > > > >> > >> > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJ= JhgMQdSMp+Ah+NSgNQ@mail.gmail.com/ > > >> > >> > > > >> > >> > The reason is that all the empty clusters have been exhausted= while > > >> > >> > there are plenty of free swap entries in the cluster that are > > >> > >> > not 100% free. > > >> > >> > > > >> > >> > Remember the swap allocation order in the cluster. > > >> > >> > Keep track of the per order non full cluster list for later a= llocation. > > >> > >> > > > >> > >> > This series gives the swap SSD allocation a new separate code= path > > >> > >> > from the HDD allocation. The new allocator use cluster list o= nly > > >> > >> > and do not global scan swap_map[] without lock any more. > > >> > >> > > >> > >> This sounds good. Can we use SSD allocation method for HDD too= ? > > >> > >> We may not need a swap entry allocator optimized for HDD. > > >> > > > > >> > > Yes, that is the plan as well. That way we can completely get ri= d of > > >> > > the old scan_swap_map_slots() code. > > >> > > > >> > Good! > > >> > > > >> > > However, considering the size of the series, let's focus on the > > >> > > cluster allocation path first, get it tested and reviewed. > > >> > > > >> > OK. > > >> > > > >> > > For HDD optimization, mostly just the new block allocations port= ion > > >> > > need some separate code path from the new cluster allocator to n= ot do > > >> > > the per cpu allocation. Allocating from the non free list doesn= 't > > >> > > need to change too > > >> > > > >> > I suggest not consider HDD optimization at all. Just use SSD algo= rithm > > >> > to simplify. > > >> > > >> Adding a global next allocating CI rather than the per CPU next CI > > >> pointer is pretty trivial as well. It is just a different way to fet= ch > > >> the next cluster pointer. > > > > > > Yes, if we enable the new cluster based allocator for HDD, we can > > > enable THP and mTHP for HDD too, and use a global cluster_next instea= d > > > of Per-CPU for it. > > > It's easy to do with minimal changes, and should actually boost > > > performance for HDD SWAP. Currently testing this locally. > > > > I think that it's better to start with SSD algorithm. Then, you can ad= d > > HDD specific optimization on top of it with supporting data. > > Yes, we are having the same idea. > > > > > BTW, I don't know why HDD shouldn't use per-CPU cluster. Sequential > > writing is more important for HDD. > > >> > >> > > >> > >> Hi, Hugh, > > >> > >> > > >> > >> What do you think about this? > > >> > >> > > >> > >> > This streamline the swap allocation for SSD. The code matches= the > > >> > >> > execution flow much better. > > >> > >> > > > >> > >> > User impact: For users that allocate and free mix order mTHP = swapping, > > >> > >> > It greatly improves the success rate of the mTHP swap allocat= ion after the > > >> > >> > initial phase. > > >> > >> > > > >> > >> > It also performs faster when the swapfile is close to full, b= ecause the > > >> > >> > allocator can get the non full cluster from a list rather tha= n scanning > > >> > >> > a lot of swap_map entries. > > >> > >> > > >> > >> Do you have some test results to prove this? Or which test bel= ow can > > >> > >> prove this? > > >> > > > > >> > > The two zram tests are already proving this. The system time > > >> > > improvement is about 2% on my low CPU count machine. > > >> > > Kairui has a higher core count machine and the difference is hig= her > > >> > > there. The theory is that higher CPU count has higher contention= s. > > >> > > > >> > I will interpret this as the performance is better in theory. But > > >> > there's almost no measurable results so far. > > >> > > >> I am trying to understand why don't see the performance improvement = in > > >> the zram setup in my cover letter as a measurable result? > > > > > > Hi Ying, you can check the test with the 32 cores AMD machine in the > > > cover letter, as Chris pointed out the performance gain is higher as > > > core number grows. The performance gain is still not much (*yet, base= d > > > on this design thing can go much faster after HDD codes are > > > dropped which enables many other optimizations, this series > > > is mainly focusing on the fragmentation issue), but I think a > > > stable ~4 - 8% improvement with a build linux kernel test > > > could be considered measurable? > > > > Is this the test result for "when the swapfile is close to full"? > > Yes, it's about 60% to 90% full during the whole test process. If ZRAM > is completely full the workload will go OOM, but testing with madvice BTW, one trick to avoid ZRAM completely full causing OOM is to have two zram devices and assign different priorities. Let the first zram get 100% full then the swap overflow to the second ZRAM device, which has more swap entries to avoid the OOM. Chris > showed no performance drop.