From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCFA5C25B76 for ; Wed, 5 Jun 2024 07:30:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F2546B009A; Wed, 5 Jun 2024 03:30:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A1BC6B009C; Wed, 5 Jun 2024 03:30:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 369196B009D; Wed, 5 Jun 2024 03:30:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 19CA56B009A for ; Wed, 5 Jun 2024 03:30:58 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BB06B161132 for ; Wed, 5 Jun 2024 07:30:57 +0000 (UTC) X-FDA: 82196013354.04.CE07328 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf27.hostedemail.com (Postfix) with ESMTP id 3AD4C40003 for ; Wed, 5 Jun 2024 07:30:54 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=RUtd4ISU; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717572655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tdIONejAFRv3sLvBPf3dAK2F1P60MyFWiaakQY+FOwQ=; b=prMQ1SD8bwLZ1ZpnLuf1FwTos8VmO+KvF5TmTaLx8mtS43piL3kRgx/W64fmAjeUnOXs1x Qf9lmlF8J/msJpAjSC9kNCb+DHj22YuPsxGJMDOIcQd5SiAlZbxyklYwXLNo4YznAhLyfH xaY9+IC/bm7PMf1XmCAVn/acwQzLxas= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=RUtd4ISU; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717572656; a=rsa-sha256; cv=none; b=2NTT0/2ft2BJcSzUrvZcH/+x4l4VRp5SvW6D5Kl3oAQWz08YXnLxJlT+DFrcJSeOzjkGlg ZniUqr+uA5I+s6M5iVLfFUF4E3qZfGmfSlnu34w2hXSXDqzOIuMwazbcKfXs2FA0mHm1UA GSoBgU48GZGDkwWYThYgSo5eCtlYW5c= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id F3E53CE0A89 for ; Wed, 5 Jun 2024 07:30:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C539C4AF07 for ; Wed, 5 Jun 2024 07:30:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717572651; bh=tdIONejAFRv3sLvBPf3dAK2F1P60MyFWiaakQY+FOwQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=RUtd4ISUT256nXOttcIgVu/VCrb6d/J7ndGWGQ9bUrGAg4cgmHGzNsY0UBUWmEOeK 9wPZ9+Nw9VIAXQ9Bna61pY5f9d8Avq1kW4pIISOwKPbBAtMx8ApMT6qGsNu9Moon0A JzqGlX5Y8lIuHIUTLublS6y3vFR4MOODGJ3SnN30efY/tKdykENudKRAQqC2d91mCw na0CWq7bttPRvoIgwS/QmIf6ZMJlQ4tv2IdVFm/VKexF5ctfSIAvw1q/k/QdqSHIiI gRscZdJvMVv0G72mGPf3JU4MWMjSi3VQbIw47Z0HbBLFeoWGuwUgw0rxxukq1vSvwr aTnqvNYHgrOWQ== Received: by mail-il1-f174.google.com with SMTP id e9e14a558f8ab-3748d68b713so19806415ab.1 for ; Wed, 05 Jun 2024 00:30:51 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCUdHMR+KmtZMZNniJzgrMvCof9qMEV9tsldBSbL8NpkB8b9D8K1hT6ktfHLo8HZuzdtangi0zfLrJFHerUBVYf5kss= X-Gm-Message-State: AOJu0YwedlXr8odZDrJRn2JQTh+bGckHTaLWt5WN6ox+ryezgLUzKbfM AWhLbTYompzBBUSFlRtYt/YucHeIDRrP1OUGB5CzqtXsntecBbcEMESGd9otJEvMY5+q1u531x+ G/aVxecbgTxbYGog850Ip1WqirWVaZMn+vyN8 X-Google-Smtp-Source: AGHT+IHR7m8/Ne/rtH9M3WCRRPGTHSXaba3NqVYuxhimwk/JGC16tbocBQWOfRNiQLoA6xPNUAhuPGKbmZ16PnQ4ISQ= X-Received: by 2002:a05:6e02:1fee:b0:374:a422:baf with SMTP id e9e14a558f8ab-374b1ef3102mr19251655ab.12.1717572650560; Wed, 05 Jun 2024 00:30:50 -0700 (PDT) MIME-Version: 1.0 References: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> <87cyp5575y.fsf@yhuang6-desk2.ccr.corp.intel.com> <875xuw1062.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o78mzp24.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Chris Li Date: Wed, 5 Jun 2024 00:30:38 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order To: Kairui Song Cc: "Huang, Ying" , Andrew Morton , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3AD4C40003 X-Stat-Signature: tpho1o6cwcodysta9d1qokoa1nox6za3 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717572654-517791 X-HE-Meta: U2FsdGVkX1/WS0/NtXa15CazyHZnwvVHuyObn9o33Ap8gd6KLj/Uw/xjtm4wNjvjw7oeqr/B9uih7iJgDU+8tF2rE102/yGnGGckUhrlho1f1VSF5QvHbhUHjlC67Kw32/NkswJK/PN1uGT49y7CKJiO5q4zbaGRdLLKdxyIh/7/jA6sp+fwzAzFcOkXJ2pylw0/tBbP0ZuWUb1eLdQpM+Glzjb6UAo1fU/r1PioO3PF1SW7H8AJwacQnbT36MhDrwvp0z/nJM0Pm/SUBcUXpBONsRKTBYnkgYERKuACEB0Q4rADgjlRc6LBZU1IowLyvJAZfG2IqWGJD4t7Y7XGP8GFf0jI44WsGVuPPT4l2XalXXXkvUEAX7JRvV9yAFY9GiNl/u4MH2QrQEZhcWMvw5UGAPNMpODUER3QiJQCSK6Pw0iRQFf6ngKUf5MfBHEenYF02i3kS0HmlH8Wc/gq5TKTBgN1PEWC2vKNiAws1CD01LDjkfB+ULpLEwBoRDfQy85PcQSC04BHSPCFhDh3j3hDJBMKCpjkJ6o4YS4rTgF3fHOueJgOV4rugagNbOZUG2dJm9axOi+iDxqnfduuUAaqaCWhzGj2P1HFcqpi+3Mz2flniPXaQAxf/0Ba1JKdUtYUu4skWWU7pr8CgeWV4UMoOlXi8ubor3Qq5PP734Zq6aXrKofSHDEqqA4nQDXdSGtLYst/vYJNH+iGcZRo1NcAGA3OppetvfjCwyDS9ZwBHcuxHG7i5RCAuM4HQcB9GGpml3lGpW380iMXjCR2UTjiTEyUSvhXY2y8SPcjtMHs5bXVdT3nt5T9y4g7sqxICJB4EA2LeoYkmvX+jxXmqUdSkB9LXyxlTFXxkvunlobwpBuDeA6rYUgX0bxxoH16Mgw9CoSrXXjYjku8TELZAtWWdN8OhO2oIOxtOmIaeiU2hHxuitcGJHkiEZg65uYsTk5diIvcHJHHHpTXaVz EsrjrAYs IAn75ey6GoucQ27BB0ZGXqY716/GwHacoubkQUpo1S4kCwkjaRDUb+ofLyx7yZJeRjUec88fPSCapvaxALqKXKVgu7mWVdtk8w6sxPS8mzAEupma/fsWAt74Ruu5xDOs9YDR886mIX5N4wJEX5aMQqtkqSF4f+Tx8f5Ky4j9Bx1qITNiX0QicG5C1bnn4Zl5y9uJB2janjiPxblpRbpoeITepKRqLX6CxK+c6fcyDvsmdgIh/GlI+PHOUiftdi13ayUq5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 31, 2024 at 5:40=E2=80=AFAM Kairui Song wrot= e: > > On Fri, May 31, 2024 at 10:37=E2=80=AFAM Huang, Ying wrote: > > > > For specific configuration, I believe that we can get reasonable > > high-order swap entry allocation success rate for specific use cases. > > For example, if we only do limited maximum number order-0 swap entries > > allocation, can we keep high-order clusters? > > Isn't limiting order-0 allocation breaks the bottom line that order-0 > allocation is the first class citizen, and should not fail if there is > space? We need to have high order and low order swap allocation working. Able to recover from the swapfile full case. > > Just my two cents... > > I had a try locally based on Chris's work, allowing order 0 to use > nonfull_clusters as Ying has suggested, and starting with low order > and increase the order until nonfull_cluster[order] is not empty, that > way higher order is just better protected, because unless we ran out > of free_cluster and nonfull_cluster, direct scan won't happen. That does not help the Android test case Barry is running because Android tries to keep the swapfile full. It will hit the case both empty and nonfull list all used up. When it performs the low memory kill. There will be a big change in the ratio of low vs high order swap. Allocating high order swap entries should be able to recover from that. > > More concretely, I applied the following changes, which didn't change > the code much: > - In scan_swap_map_try_ssd_cluster, check nonfull_cluster first, then > free_clusters, then discard_cluster. I consider high the nonfull list before the empty list. The current allocation tries to make the HAS_CACHE only swap entry stay in the disk for a longer time before recycling it. If the folio is still in swap cache and not dirty, it can skip the write out and directly reuse the swap slot during reclaim. I am not sure this code path is important now, it seems when the swap slot is free, it will remove the HAS_CACHE as well. BTW, I noticed that the discard cluster doesn't check if the swap cache has a folio point to it. After discarding it just set the swap_map to 0. I wonder if swap cache has a folio in that discarded slot that would hit the skip writeback logic. If that is triggerable, it would be a corruption bug. The current SSD allocation also has some command said old SSD can benefit from not writing to the same block too many times to help the wear leveling. I don't think that is a big deal now, even cheap SD cards have wear leveling nowadays. > - If it's order 0, also check for (int i =3D 0; i < SWAP_NR_ORDERS; ++i) > nonfull_clusters[i] cluster before scan_swap_map_try_ssd_cluster > returns false. Ideally to have some option to reserve some high order swap space so order 0 can't pollute high order clusters. Chris > > A quick test still using the memtier test, but decreased the swap > device size from 10G to 8g for higher pressure. > > Before: > hugepages-32kB/stats/swpout:34013 > hugepages-32kB/stats/swpout_fallback:266 > hugepages-512kB/stats/swpout:0 > hugepages-512kB/stats/swpout_fallback:77 > hugepages-2048kB/stats/swpout:0 > hugepages-2048kB/stats/swpout_fallback:1 > hugepages-1024kB/stats/swpout:0 > hugepages-1024kB/stats/swpout_fallback:0 > hugepages-64kB/stats/swpout:35088 > hugepages-64kB/stats/swpout_fallback:66 > hugepages-16kB/stats/swpout:31848 > hugepages-16kB/stats/swpout_fallback:402 > hugepages-256kB/stats/swpout:390 > hugepages-256kB/stats/swpout_fallback:7244 > hugepages-128kB/stats/swpout:28573 > hugepages-128kB/stats/swpout_fallback:474 > > After: > hugepages-32kB/stats/swpout:31448 > hugepages-32kB/stats/swpout_fallback:3354 > hugepages-512kB/stats/swpout:30 > hugepages-512kB/stats/swpout_fallback:33 > hugepages-2048kB/stats/swpout:2 > hugepages-2048kB/stats/swpout_fallback:0 > hugepages-1024kB/stats/swpout:0 > hugepages-1024kB/stats/swpout_fallback:0 > hugepages-64kB/stats/swpout:31255 > hugepages-64kB/stats/swpout_fallback:3112 > hugepages-16kB/stats/swpout:29931 > hugepages-16kB/stats/swpout_fallback:3397 > hugepages-256kB/stats/swpout:5223 > hugepages-256kB/stats/swpout_fallback:2351 > hugepages-128kB/stats/swpout:25600 > hugepages-128kB/stats/swpout_fallback:2194 > > High order (256k) swapout rate are significantly higher, 512k is now > possible, which indicate high orders are better protected, lower > orders are sacrificed but seems worth it. >