From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80519C531DC for ; Fri, 16 Aug 2024 07:47:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2CBD8D0055; Fri, 16 Aug 2024 03:47:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EDC0C8D0002; Fri, 16 Aug 2024 03:47:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA3888D0055; Fri, 16 Aug 2024 03:47:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B5E898D0002 for ; Fri, 16 Aug 2024 03:47:53 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 39AB5A911E for ; Fri, 16 Aug 2024 07:47:53 +0000 (UTC) X-FDA: 82457329626.11.9D04788 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 4A77F180007 for ; Fri, 16 Aug 2024 07:47:51 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=M0oCmpQi; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723794416; a=rsa-sha256; cv=none; b=1nbH/6I/PZIyzivCjV4eqG6CXJ/GLwejZpQid72RmvOjZYXS11yrrM5NlWxYcOxlSMSiU0 S06yKonK1ubzJ81O/vGD1kk73gfJQQdR2J0YkyiRIct6EWtz4GNfRUr3cc7TtpF+v/3bB7 pauMmySMNE8nC1dxHe0qTuUXR2bJCwo= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=M0oCmpQi; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723794416; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nU3H8bD8eLOIjfNiMjwSpo81VUhEA+JAnGZvmP//mpg=; b=vdDjRxN0BoZQJ/RoUNiDyTOba3Ad5AUAQU2j2lQk4z1dvwInl7gwhnSa/0MeaYNbgssPBh qnFq+tk91jRKfyeI7jSIPcT+H3oMq5p8NfDrxXBJcKcRYuaA2cX69O9yqBAXIY12lGdNMR Iz2SBKy8U9sQTEp2A370D1zhzwLHvxo= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 7E7F862107 for ; Fri, 16 Aug 2024 07:47:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFC66C4AF0F for ; Fri, 16 Aug 2024 07:47:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723794470; bh=JHKKZBy7vNeUhEayQtEzd+i+b2SH1IXhnsnWiZ13YP8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=M0oCmpQiZBsQCNaPOq6HCmRtG7wSHWjpvkl2C4xs1kUoNz4bTFq3HeQ58ER7k8B34 uagvi+jV4d14jhldDndlcsfm4WgEuYprJXeB9/3YMsWT7dCSzTvGc9gXQPaQx2xyGI 478wvTsr8dALrcwE7zLHQ3YHx/il+FQfEIbTm7I0LNa2SnnQfFerEuWLzxZYmCGcKw 3miAuAvztIVYj6rxxruAAnedRIhZ5R8j7RRBHD1MDtxAgFgV0BEmMUkx3yUa0DpRVF 4hwsiXkdEtxKZzNOiX/EjPzG/zW8xJUwDNSrujWdi4mtgi6LFjlwFL0UlSN1QsA4Br E9Lh0reoZbm6A== Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-69df49d92b8so16594677b3.3 for ; Fri, 16 Aug 2024 00:47:49 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVho3mCePWOwHBYnRrkwkmt1UvWvNZ0e3mxVpoRQjHgvQRqPYuvvS5yp+PYH03eq1L0vyi0hh7Se2Y2jtDwxUQzZKI= X-Gm-Message-State: AOJu0YzesXZp1fsyuEEhQUG2wEp8oeCD/4RCz6Ct4kk9hvv6spYZ8M0x 2X+1keGlJsgFpkgFdczbcOVA0bb5dhTjDk/8P44ToXra+9qVPWqdasebH+prIegPrsrqT0RHq+J 65IwmvyvNyJA5q3BvkIBEiaQV8pNl39tHAPiSow== X-Google-Smtp-Source: AGHT+IEQd3AlU0NiE+7VfRAc2+iGypr+DriGLoAKlGDaOwMv/aID0+ljHMf+DIaSus8DKYdtbQWQZiZtP6bDRC0dF40= X-Received: by 2002:a05:690c:7684:b0:687:e11:8c34 with SMTP id 00721157ae682-6b1bb75e76emr20434097b3.30.1723794469227; Fri, 16 Aug 2024 00:47:49 -0700 (PDT) MIME-Version: 1.0 References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <87h6bw3gxl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sevfza3w.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87sevfza3w.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Chris Li Date: Fri, 16 Aug 2024 00:47:37 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v5 0/9] mm: swap: mTHP swap allocator base on swap cluster order To: "Huang, Ying" Cc: Hugh Dickins , Andrew Morton , Kairui Song , Ryan Roberts , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4A77F180007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: s7a7ewe4eke5mc5pfmjify8h8bikpyba X-HE-Tag: 1723794471-814233 X-HE-Meta: U2FsdGVkX18bQSLjULD81kb3/TTm3rCS4gXVt+1vmJ2Zxvl3/u1nIeyuGVky7J/mlVjIwnTEOnMHuIsf/qGidlHfGHznbzeyxp4HDIAKv0V4bTvfkTGOmKyEwE0PCGwg5KkAmPP81d5X8TVuag8BZ8Bkcq0gHH12zwrr3yTQaSsuZRYUl1rglpt+DxWjexBXHU0VpssvoIwk8t1A6WdfPuDXEcdLdL4VrquaemyIybMIAjx6+r3ez1OI9WX9z15aSIf61KwSQXP0NjpFinZEADVz2y0VNbFA0i3nH/Bm3WRDRjhRFGeHb3tuAdT0qhyBzhZFgzg6Tgcz95hAJpcGjyQB1cD6Ph5OQW8uKRwIPeLqpG4uoEGZ3iR0PpY8adK5Q7Q+XE+xCgGGn1qu7rtOeaQWMKXfeALcob4zxAfpAOCIeLLFrZ1g3KXhyInJwNOLFLcBLMNIVy+RqaIgl+f9BzQkLuKAPiwnGHrUu4DNHEKmBiz/eWjPeIkB01FSHAQDPGP/hLDFf8FgjDZiU2znoHAXCXWCK6NG1CUHtiMimpmdCdta3yK7VEpQK69BQwTAuWLnhFO0KTKHRC1gnvkeE7NVCxNR6a2BxLCxakx5sU1qFtfgdON7gYLddVVTT9YneAowINQEdVLOivz3Pt3PX1Mp+GzAuY61CLfHtlSABgsbZItrriZsrzpVStNv0TTKAUd5RVYa+lJ648LNPWzmgswWy8RaiWn3DNHYTSmP3k5bm5EDCEDLutrpw1oKi0WdFsNFyZIs9IrUPI4U+Lx8KYF9+skbyIBDaAJmycbjh6xGKK051wlVpqFquz0kxX4G9r13SJByPt5ugvXh6DHBjPl2p0+CUx/BKJfcXDLPU07MPkMeX+ow/gn0O6gE825UaXSDwvQSwWJNjpH7rXQgYCqQlfhllBt4SsnhqLjA5anHRoF7guI0xJUSgBx62OIpsHjuKC3Rh6cYQK1+z6m JYI+oK1e q56pnjRy9cnABJ88ar3nnaJlD76KW7z7iwbtSt36xVtK+uZwyjla7/JKS7sqbp3QbmNTwY05zQtUMfyD6DmsogJV/l3jsFn0jEvOUHbDl2f+A8RkOyXdgWkttXKi01p9ctiUWcroGEJdmXTd9UmpCsm8AAmWPiXK0rBoWqhIK8OFYADk9lqyCkndfls17c04HhCIRUCIddcCaRkKFXsWFRbOQLnwvon8WJTd9A0tTo3HTCqD97JAS0TLxk1tTrtARGorLaJJIv+joUmKMJS9uc4G7tOZ2+PsLgXVqhKveLPE+Ku+PMzAXBAnqBRQAscXp/CzBEeZNmOATsombMCfor0egMlmTpPoJVt/fmc1n4rQNrnw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 8, 2024 at 1:38=E2=80=AFAM Huang, Ying w= rote: > > Chris Li writes: > > > On Wed, Aug 7, 2024 at 12:59=E2=80=AFAM Huang, Ying wrote: > >> > >> Hi, Chris, > >> > >> Chris Li writes: > >> > >> > This is the short term solutions "swap cluster order" listed > >> > in my "Swap Abstraction" discussion slice 8 in the recent > >> > LSF/MM conference. > >> > > >> > When commit 845982eb264bc "mm: swap: allow storage of all mTHP > >> > orders" is introduced, it only allocates the mTHP swap entries > >> > from the new empty cluster list. It has a fragmentation issue > >> > reported by Barry. > >> > > >> > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdS= Mp+Ah+NSgNQ@mail.gmail.com/ > >> > > >> > The reason is that all the empty clusters have been exhausted while > >> > there are plenty of free swap entries in the cluster that are > >> > not 100% free. > >> > > >> > Remember the swap allocation order in the cluster. > >> > Keep track of the per order non full cluster list for later allocati= on. > >> > > >> > This series gives the swap SSD allocation a new separate code path > >> > from the HDD allocation. The new allocator use cluster list only > >> > and do not global scan swap_map[] without lock any more. > >> > >> This sounds good. Can we use SSD allocation method for HDD too? > >> We may not need a swap entry allocator optimized for HDD. > > > > Yes, that is the plan as well. That way we can completely get rid of > > the old scan_swap_map_slots() code. > > Good! > > > However, considering the size of the series, let's focus on the > > cluster allocation path first, get it tested and reviewed. > > OK. > > > For HDD optimization, mostly just the new block allocations portion > > need some separate code path from the new cluster allocator to not do > > the per cpu allocation. Allocating from the non free list doesn't > > need to change too > > I suggest not consider HDD optimization at all. Just use SSD algorithm > to simplify. Adding a global next allocating CI rather than the per CPU next CI pointer is pretty trivial as well. It is just a different way to fetch the next cluster pointer. > > >> > >> Hi, Hugh, > >> > >> What do you think about this? > >> > >> > This streamline the swap allocation for SSD. The code matches the > >> > execution flow much better. > >> > > >> > User impact: For users that allocate and free mix order mTHP swappin= g, > >> > It greatly improves the success rate of the mTHP swap allocation aft= er the > >> > initial phase. > >> > > >> > It also performs faster when the swapfile is close to full, because = the > >> > allocator can get the non full cluster from a list rather than scann= ing > >> > a lot of swap_map entries. > >> > >> Do you have some test results to prove this? Or which test below can > >> prove this? > > > > The two zram tests are already proving this. The system time > > improvement is about 2% on my low CPU count machine. > > Kairui has a higher core count machine and the difference is higher > > there. The theory is that higher CPU count has higher contentions. > > I will interpret this as the performance is better in theory. But > there's almost no measurable results so far. I am trying to understand why don't see the performance improvement in the zram setup in my cover letter as a measurable result? > > > The 2% system time number does not sound like much. But consider this > > two factors: > > 1) swap allocator only takes a small percentage of the overall workload= . > > 2) The new allocator does more work. > > The old allocator has a time tick budget. It will abort and fail to > > find an entry when it runs out of time budget, even though there are > > still some free entries on the swapfile. > > What is the time tick budget you mentioned? I was under the impression that the previous swap entry allocation code will not scan 100% of the swapfile if there is only one entry left. Please let me know if my understanding is not correct. /* time to take a break? */ if (unlikely(--latency_ration < 0)) { if (n_ret) goto done; spin_unlock(&si->lock); cond_resched(); spin_lock(&si->lock); latency_ration =3D LATENCY_LIMIT; } > > > The new allocator can get to the last few free swap entries if it is > > available. If not then, the new swap allocator will work harder on > > swap cache reclaim. > > > > From the swap cache reclaim aspect, it is very hard to optimize the > > swap cache reclaim in the old allocation path because the scan > > position is randomized. > > The full list and frag list both design to help reduce the repeat > > reclaim attempt of the swap cache. > > [snip] > > -- > Best Regards, > Huang, Ying