From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AFC3C5320E for ; Sun, 18 Aug 2024 17:00:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49EE98D00E3; Sun, 18 Aug 2024 13:00:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44E3C8D00B8; Sun, 18 Aug 2024 13:00:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 315A98D00E3; Sun, 18 Aug 2024 13:00:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1473A8D00B8 for ; Sun, 18 Aug 2024 13:00:03 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A9BB71C3C60 for ; Sun, 18 Aug 2024 17:00:02 +0000 (UTC) X-FDA: 82465978644.11.5630B98 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf17.hostedemail.com (Postfix) with ESMTP id A94E840023 for ; Sun, 18 Aug 2024 17:00:00 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PnPo2Myb; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724000362; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V1Vm8+96GMI1FNFk6AbvT7rBGApi2ofDmhSHbmdVImo=; b=yubJpxVa7pmeaiT+vCZq9viHG2yVMsBMBr21jAl/DuR8sHltVeuPKEgrat8FvNaQdsZmLH xSfTflpFDmNipeYUFC/gD4+YNPHz6crQMEVgSjwrKXANdXqGwWpRFFSOSyKGSkpO+VCiRJ w/nvkn/mKs0LexpD5x6nGcQTVvs2+14= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PnPo2Myb; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724000362; a=rsa-sha256; cv=none; b=ZUA+/bQcFUA74Rox6J6go7EYshuuh4RFBlSFioqe7SJ/mlEcmFaOHG386NeXAfktvHXxMW BAR7rb3x6cDwb3Jq6r3qnwxu+Q0jcvl2deVhnDnEZ3prx+OQujfcAnKPslYTsLVcFR9KUL L7NUmmqf0ciuVDSa2KfvqubDbPqZugI= Received: by mail-lj1-f176.google.com with SMTP id 38308e7fff4ca-2f01e9f53e3so2676861fa.1 for ; Sun, 18 Aug 2024 10:00:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724000399; x=1724605199; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=V1Vm8+96GMI1FNFk6AbvT7rBGApi2ofDmhSHbmdVImo=; b=PnPo2Myb43tgSXQpYrLeyr6dvZkveEfHJmO6vw1NGuYxg62aF/M67x9FJkSCcnticC rcouCwn1KG+1T1U1u9OuHxtYnqBjQc1UEhKrF+s3X8zlzlzoGirbpESaCnOw79eN0lQ1 bLAEZyYR3S4M3c1sbqcKmISKJ3fLNvxiNVojNMki/87eOX+PAAIIIgoC51+2uE1JGora 0xmj7bB1GtI/H6s0kwaxXjKyTpe0IhlM47lGl1uEFDKeBCB2o/jo/Dd49hXmTSqkBbWJ Dx0R5Sds9+Kok5HyBHfuoO1S4FUhlR3ZwiD1OraFYqJQf3U+MlF06yslGFsgTHBgzooN 4nNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724000399; x=1724605199; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V1Vm8+96GMI1FNFk6AbvT7rBGApi2ofDmhSHbmdVImo=; b=omtvOHkhTDTUXQ0kLnw90jprWH8z2INkp4lyhKoXro1vjXQC4EH4/duuWGTh1IfbU/ 7HSZ/T9ur6oycGzQ2fXshFh633do/0Azs6x0bo35gsU2YrMa0xRXczzIE92DjyT/uSG9 4TaPKTe7G5IXiUgMMKV3ItqC5yaAxMOgGcxkuDHbM2hlm9gDGm4zWlTs47ySM9d9wTs8 QpSJHEpOtG7LF3bL1VBWow+MYNedNvY8blw8szEWzbFgsG3pb+3G6pd9FbR+yXAM8gJk YgAWPYDVRp6FNeXA9VxiKbHYH0tVx+zM2Ihx1yl/4HUZT1W71i1CQfUV50aqm/jrOEp5 f3vQ== X-Forwarded-Encrypted: i=1; AJvYcCWZCNNYi0dvie8vj4hACcF4MvmrdPK05ZzJ0DQTvNCdz2t4PrkvI7WOfp5sTde5m0BqPduTSTJ6RkPuCoeLGUhm2Y4= X-Gm-Message-State: AOJu0YzyA+jVd2qhVlTRRqO1eIn0ieeYuLzbGoMam0BPCVjH/+L7pnq1 AG03miNZ8JJQ+VW/EBZ5lxiyF5vSVSgJKJbV4hSgdz17sANiFKPAfjKVIUN9PmUU2cAeOAgB4QK 9Q2hjIfKsSq6PEbthGwLtnDph1Zg= X-Google-Smtp-Source: AGHT+IFWkdktNIAelch9T0/8VhMrXRkjqTjTUPAFrskolyViGom3cW+o6CzbRHE1IufVHbsW12reJH+l6E/AIR7hb3w= X-Received: by 2002:a2e:a7ca:0:b0:2f3:c384:71ee with SMTP id 38308e7fff4ca-2f3c384724dmr63014221fa.33.1724000398411; Sun, 18 Aug 2024 09:59:58 -0700 (PDT) MIME-Version: 1.0 References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <87h6bw3gxl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sevfza3w.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Kairui Song Date: Mon, 19 Aug 2024 00:59:41 +0800 Message-ID: Subject: Re: [PATCH v5 0/9] mm: swap: mTHP swap allocator base on swap cluster order To: Chris Li , "Huang, Ying" Cc: Hugh Dickins , Andrew Morton , Ryan Roberts , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: A94E840023 X-Stat-Signature: htwtbnjet9i3i43iecpz36wgp7587ttm X-HE-Tag: 1724000400-937147 X-HE-Meta: U2FsdGVkX19q2rA6Fvj7PQ5iMGNGFpVE6kjP4fLn/05j5fUhTHJ/CmO5ag1XaEsN8ttc1zl4qv2BXV98yUskk9iG/4FGzW0WeIu9YrzWgsG1gSDJ+0zWIDnB7gMabspXp44N0G8pftuZarvjc6xMHJC/xTV3A9v+EhyVl5S5mgqdKl9kVF7ZTHHNaoM2IG1gzf4XlxiQ0bMbjkDvJp/o9s6CYfzK7X/qdI7OF2LKmAWlzLu8LfeXS7pjadAdBePFJRv84QEITRg5qX4qF7YoCFUmRmLiaMqWMnZZB3n/BpXVu8GudjSGuIFyLDhCirOKxKHUiHNhH3KZhq2+H8M46iRx8Dbpz7pYJHV4xujgie+YhCKUNLKSdQMuGInPnnj5ChiUJ1IjadUxMETa8wLW0GhlEoGhr7lTcfFeHS8z2803dtsUD8EiA+S9dy1xyPIVXm2WElQAiEs0l9SjoDQ1xTgE4qYy5UgCB9reVN5h2qKrOklb42kprDNvL24Y+pFg1zBmqZ1It3iBXuOXTQK5VROFhrekWjrC2n138GgU1uMPFZpdWmaayN8adovvRAkzOZZIbbu76hN5trxRXD+qM+quGouX+hEByQVP5buc/5mYx6Y2UNyrzTFMGfxCJTW+dVlS8XXRQm/NBFXM/y8Z7bTM5yVkvy8KgoaxhoOsqbEy33KGVk3eOV6Y3xoDAZAB8+LRAvOQ/Vevp1FG6LJYSxCg8p50V+ft+YoG6pRK3IY6OyY0VjkupsheNBStGqs6MN6SXB2IVVnvzzzfH9zTt9dxrPW+DT7KlC8Wu7wdtpUSAwTlq81tZBWbriGZ1nwFY8610QuKQ5/DayjqadSYivOzjZ8Z7wngfiA0dpZBH8vuZ8XtGF+nXJryyiPAAlzKGPh2fJojtCgvWdhuXGnkZjg4WxPuffD2wJvvJ6h0KJvhT6O9FKewt+jd+oNP/dRs+ZKEkq6KCd6g3sZTFCK xYnR67bK 2yuLNeMZWI0GsKGgcrzw9blWe4UyZmuFD1Wj+swgduoI9lMziV6sS625wpF6TlvMAxZlEwHdMcE/RF32GhXqPsadQFDBa199Vm42EdNXt8oTcwUPwYcm0t0TFZxfYDgRUvsZUMGCeN5XHO6IcNu9OXPR5XNo98hvgk1egGsa+sYVJy+k8RRAERLROlyrCaupAYj0kAb8NeEekadaxh8M+dHDqZY8zRDQSne/knMHVJBlsaRgStIYcvd/EW6ZeZeT8nF2nl0ZqgNDJeDuihU9BabhBq/Xz/v3rVGH8DkMMbDdEFox/IvrY9sjdgHdMSEd5+NzSBUqmegpbaFaeHGUO1Ugk7TBJr/LXyWWH19FgK2+rvntDG5uVxB3MonDwHP/1omyRb8Wf2GSjm0x2E8ESltiLrQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 16, 2024 at 3:53=E2=80=AFPM Chris Li wrote: > > On Thu, Aug 8, 2024 at 1:38=E2=80=AFAM Huang, Ying = wrote: > > > > Chris Li writes: > > > > > On Wed, Aug 7, 2024 at 12:59=E2=80=AFAM Huang, Ying wrote: > > >> > > >> Hi, Chris, > > >> > > >> Chris Li writes: > > >> > > >> > This is the short term solutions "swap cluster order" listed > > >> > in my "Swap Abstraction" discussion slice 8 in the recent > > >> > LSF/MM conference. > > >> > > > >> > When commit 845982eb264bc "mm: swap: allow storage of all mTHP > > >> > orders" is introduced, it only allocates the mTHP swap entries > > >> > from the new empty cluster list. It has a fragmentation issue > > >> > reported by Barry. > > >> > > > >> > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQ= dSMp+Ah+NSgNQ@mail.gmail.com/ > > >> > > > >> > The reason is that all the empty clusters have been exhausted whil= e > > >> > there are plenty of free swap entries in the cluster that are > > >> > not 100% free. > > >> > > > >> > Remember the swap allocation order in the cluster. > > >> > Keep track of the per order non full cluster list for later alloca= tion. > > >> > > > >> > This series gives the swap SSD allocation a new separate code path > > >> > from the HDD allocation. The new allocator use cluster list only > > >> > and do not global scan swap_map[] without lock any more. > > >> > > >> This sounds good. Can we use SSD allocation method for HDD too? > > >> We may not need a swap entry allocator optimized for HDD. > > > > > > Yes, that is the plan as well. That way we can completely get rid of > > > the old scan_swap_map_slots() code. > > > > Good! > > > > > However, considering the size of the series, let's focus on the > > > cluster allocation path first, get it tested and reviewed. > > > > OK. > > > > > For HDD optimization, mostly just the new block allocations portion > > > need some separate code path from the new cluster allocator to not do > > > the per cpu allocation. Allocating from the non free list doesn't > > > need to change too > > > > I suggest not consider HDD optimization at all. Just use SSD algorithm > > to simplify. > > Adding a global next allocating CI rather than the per CPU next CI > pointer is pretty trivial as well. It is just a different way to fetch > the next cluster pointer. Yes, if we enable the new cluster based allocator for HDD, we can enable THP and mTHP for HDD too, and use a global cluster_next instead of Per-CPU for it. It's easy to do with minimal changes, and should actually boost performance for HDD SWAP. Currently testing this locally. > > >> > > >> Hi, Hugh, > > >> > > >> What do you think about this? > > >> > > >> > This streamline the swap allocation for SSD. The code matches the > > >> > execution flow much better. > > >> > > > >> > User impact: For users that allocate and free mix order mTHP swapp= ing, > > >> > It greatly improves the success rate of the mTHP swap allocation a= fter the > > >> > initial phase. > > >> > > > >> > It also performs faster when the swapfile is close to full, becaus= e the > > >> > allocator can get the non full cluster from a list rather than sca= nning > > >> > a lot of swap_map entries. > > >> > > >> Do you have some test results to prove this? Or which test below ca= n > > >> prove this? > > > > > > The two zram tests are already proving this. The system time > > > improvement is about 2% on my low CPU count machine. > > > Kairui has a higher core count machine and the difference is higher > > > there. The theory is that higher CPU count has higher contentions. > > > > I will interpret this as the performance is better in theory. But > > there's almost no measurable results so far. > > I am trying to understand why don't see the performance improvement in > the zram setup in my cover letter as a measurable result? Hi Ying, you can check the test with the 32 cores AMD machine in the cover letter, as Chris pointed out the performance gain is higher as core number grows. The performance gain is still not much (*yet, based on this design thing can go much faster after HDD codes are dropped which enables many other optimizations, this series is mainly focusing on the fragmentation issue), but I think a stable ~4 - 8% improvement with a build linux kernel test could be considered measurable?