From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4900C27C52 for ; Wed, 5 Jun 2024 07:40:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F6AA6B007B; Wed, 5 Jun 2024 03:40:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A66B6B0082; Wed, 5 Jun 2024 03:40:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36E526B009D; Wed, 5 Jun 2024 03:40:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 16AE36B007B for ; Wed, 5 Jun 2024 03:40:47 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C8C7DA28CD for ; Wed, 5 Jun 2024 07:40:46 +0000 (UTC) X-FDA: 82196038092.27.16ED9C6 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf20.hostedemail.com (Postfix) with ESMTP id CFE971C001A for ; Wed, 5 Jun 2024 07:40:44 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bsMhGd81; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717573245; a=rsa-sha256; cv=none; b=z3aE6UaUicZLATIg8MVwJ09WPUYCuDfDgkyyQ1IvD4Z1XawkdKPmOkPke2W5azNuOnspER jFHPNzFIBhaBCjByXibbVNdQh2YAWKnjAKrVOFIdk8/kDgtDjcieJLAwTBlOd5wBTW29Kp YY12XzOK6LtCaA/AE5Im/pgPW4Ylo/c= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bsMhGd81; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717573245; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=27eVfk5k9ogVUxgEfCIkcKu+ttYBmT77mWl1vbZ/hEM=; b=fWTMelGlIT9wLJtqYPmD5gIvxoYoGXH34ExDNhprKex2InEJcjnetqjIt8UOwywjd+z0UY gQNB2iABVUgD5cLtNXb7s2ZIDKyRz+pqiprWkcFX+R1h3lVJDV/TeUUc6/yx74xzacFiFt ingJkle+NZnjHyM2eC7MdGR0VbbdQgM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id A9232616A6 for ; Wed, 5 Jun 2024 07:40:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5DDDFC4AF09 for ; Wed, 5 Jun 2024 07:40:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717573243; bh=zK6BX1O+d9Bs1RfAOR1UlWcy9231mkCd/UkoYZoWvME=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=bsMhGd81xhOyZsDPXSqMm7AYxU0NpI2Ubbqp9kbARXQ7Uvqvv3mNR4vuz3h4/TpCb rDdz2yPNbBQyNvBaD43YeDPdzpoIAahmshACWOpzB3TqOyPaLMXwY7kk0nKr4LWuyl 0LTo/Xdda1IinzgSEJtIIh2YQ4x9L2rTQodHnS9TnVXgFBuRvAnkvdG4Bp5X1/Qg+L zJcfW866AfSJOahjsfIjZKXGuFqjGriw10jjyKJiPSoxO8ASovlqXe2EP13oNhCXik QGvrQYUvyp1yY8MjrVV0cQvQwWKyvwImeexM9p9VTFTMmDX/BUO5pjsdL/vknuet4I rnzGPKnwocFUA== Received: by mail-il1-f171.google.com with SMTP id e9e14a558f8ab-37482449fd8so22436785ab.3 for ; Wed, 05 Jun 2024 00:40:43 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCU3kCqbSzcD3gU3PDW+bmjXnW7MECRix5eIbMo3ciXh/TGoiKyVvQXMtwwUUVkmkBYgwwZyoWWNyYPXlo30UdyfZNE= X-Gm-Message-State: AOJu0YzUSCKlQLbID4FrkoZOw0ZXJigGPeVkj+SQVxTAE0FmnL3Kf5Mg jGoy44iBH48zivu+q/xFAS01pOmQ2+dHg50e7q4wtChRXiufXDqulDsCzGwqyZWJ2QYLI51GSh7 BTabuB3EOgS/ii72VY7Kq6mmAFvv+5LoTWwnA X-Google-Smtp-Source: AGHT+IHmGJ8AhtUCXqb4hQ7CJ6bgjOcHsu08a3szb0FbGiC0M4Xtvt6GWTF+CzaO09z1WICKVBny5kKUN5BKkPKB0fM= X-Received: by 2002:a05:6e02:148b:b0:374:aa48:1a86 with SMTP id e9e14a558f8ab-374b1ef313emr18335855ab.13.1717573242656; Wed, 05 Jun 2024 00:40:42 -0700 (PDT) MIME-Version: 1.0 References: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> <87cyp5575y.fsf@yhuang6-desk2.ccr.corp.intel.com> <875xuw1062.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o78mzp24.fsf@yhuang6-desk2.ccr.corp.intel.com> <87frttcgmc.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87frttcgmc.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Chris Li Date: Wed, 5 Jun 2024 00:40:31 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order To: "Huang, Ying" Cc: Kairui Song , Andrew Morton , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CFE971C001A X-Stat-Signature: hik1jngcu4etz1jywyfof6afqdo7an38 X-HE-Tag: 1717573244-980044 X-HE-Meta: U2FsdGVkX19ieXEoXEx2N6BaGk8Qavba3isiSbDHAgFfnSNgXz5JpetpCllvbu8M/evgXEEsibGQt1ma/cSPlDd2o2xzwgiv9NIrZe1smewkyiVOoPq5w7WCK5ZbK2FBjQ2o5pMwzlDYc7d6/icYSQ2y74WUf0EeElLEbTgJNZtVY0zw+M4RQ1g4+hBuoVG2xb7ZDznh1KMPM4INXTeYeKxo2yQZvsXLTTWJuVelzRiyG1XedKncrsNPSiCiHdapI2dvt6K2rSsGibO+iC+dK42snT6ZkPOQwjXniVGKdOPCQ9s102iF3DEf0eAACeoX35lXlnHf69EJ1LBdKBfZNnUh7A3MoQ3RB3NJP2dALB3stBZJQLc/1ibebOpISnBMBiWuOPo9NFrR6hTEYpLF7oX8k1ObhoHZK7N2w2DyBZJ1S0gEWpuungb2dPKpHMLuKXl50T2kSHUiD0ZJmMXNJXRhhJroQnptbH5OWoPLk8b5nmQa6JbLiQSklfv+rVuYZTJHmp3jdWbmAThE8js51IrAllUN0JP+pDv7dbA16hfX9MbuQ4SnBHUR49Iw0lHdhDj3BS9oKYyxiJQq0JkNJLJ7d0orPm8xv33Xz8lKNHC5g+swIrlaXulzqabxC5VRNJTpg/QxzpI0fUWrOSbs0FfPLjvgZcoxASd7oWJjjUj1jyrU+2H80Q9e/iCWWyTuXt24wNgWa0nLMmD+kgT09KaXB6vygzDD5nAkJgi5oAGaAC1UCRXkRdr0FcR4HFmDHfkx76C67r+q7DYKsxPfPRFUlIZ+Mc/AZRq31vHlsxMMj5eUp6HAiIW3MAMyDiWaX/t4+wyPuVvfrWkhMnslzi+lAN9wxHSGZ5KAo6hqI3Xd6c0U5Dfgrmt4PZj6Xd9yBRraFNJGEkYB6goRukBhEZOlYzVb75JADXQaXoYLxvmzG0/PaK8nxmc/0a5PorncCYE2NpVRjQdBVClZ6pQ ka3O6Qm1 52N7ejhG3BmPv4Y5PzL/ReoLPWNmYl+/GwW8MZkSoMjER1TifOuxiuoHSjtW6zyODZNNZX2Un67NApFadI0RFRodDY/TpNhpabCCxwWxEZd0z1GgUidLRAcejlA9CncJno8DgZYlTKl18Lrx4EuA7+mTZa9ifgTg/uEBH4YdZrFzBS2mPKa6B5TvQ/lt8OW0KLNuQ7O+N0uSUYg1UfeE1P6jl5ohyYb/tLNXkIMXDEyCnubR7jxmuA1cCyWF9J3UunrVsALkJoBDUN/c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 4, 2024 at 12:29=E2=80=AFAM Huang, Ying = wrote: > > Kairui Song writes: > > > On Fri, May 31, 2024 at 10:37=E2=80=AFAM Huang, Ying wrote: > > Isn't limiting order-0 allocation breaks the bottom line that order-0 > > allocation is the first class citizen, and should not fail if there is > > space? > > Sorry for confusing words. I mean limiting maximum number order-0 swap > entries allocation in workloads, instead of limiting that in kernel. What interface does it use to limit the order 0 swap entries? I was thinking the kernel would enforce the high order swap space reservation just like hugetlbfs does on huge pages. We will need to introduce some interface to specify the reservation. > > > Just my two cents... > > > > I had a try locally based on Chris's work, allowing order 0 to use > > nonfull_clusters as Ying has suggested, and starting with low order > > and increase the order until nonfull_cluster[order] is not empty, that > > way higher order is just better protected, because unless we ran out > > of free_cluster and nonfull_cluster, direct scan won't happen. > > > > More concretely, I applied the following changes, which didn't change > > the code much: > > - In scan_swap_map_try_ssd_cluster, check nonfull_cluster first, then > > free_clusters, then discard_cluster. > > - If it's order 0, also check for (int i =3D 0; i < SWAP_NR_ORDERS; ++i= ) > > nonfull_clusters[i] cluster before scan_swap_map_try_ssd_cluster > > returns false. > > > > A quick test still using the memtier test, but decreased the swap > > device size from 10G to 8g for higher pressure. > > > > Before: > > hugepages-32kB/stats/swpout:34013 > > hugepages-32kB/stats/swpout_fallback:266 > > hugepages-512kB/stats/swpout:0 > > hugepages-512kB/stats/swpout_fallback:77 > > hugepages-2048kB/stats/swpout:0 > > hugepages-2048kB/stats/swpout_fallback:1 > > hugepages-1024kB/stats/swpout:0 > > hugepages-1024kB/stats/swpout_fallback:0 > > hugepages-64kB/stats/swpout:35088 > > hugepages-64kB/stats/swpout_fallback:66 > > hugepages-16kB/stats/swpout:31848 > > hugepages-16kB/stats/swpout_fallback:402 > > hugepages-256kB/stats/swpout:390 > > hugepages-256kB/stats/swpout_fallback:7244 > > hugepages-128kB/stats/swpout:28573 > > hugepages-128kB/stats/swpout_fallback:474 > > > > After: > > hugepages-32kB/stats/swpout:31448 > > hugepages-32kB/stats/swpout_fallback:3354 > > hugepages-512kB/stats/swpout:30 > > hugepages-512kB/stats/swpout_fallback:33 > > hugepages-2048kB/stats/swpout:2 > > hugepages-2048kB/stats/swpout_fallback:0 > > hugepages-1024kB/stats/swpout:0 > > hugepages-1024kB/stats/swpout_fallback:0 > > hugepages-64kB/stats/swpout:31255 > > hugepages-64kB/stats/swpout_fallback:3112 > > hugepages-16kB/stats/swpout:29931 > > hugepages-16kB/stats/swpout_fallback:3397 > > hugepages-256kB/stats/swpout:5223 > > hugepages-256kB/stats/swpout_fallback:2351 > > hugepages-128kB/stats/swpout:25600 > > hugepages-128kB/stats/swpout_fallback:2194 > > > > High order (256k) swapout rate are significantly higher, 512k is now > > possible, which indicate high orders are better protected, lower > > orders are sacrificed but seems worth it. > > Yes. I think that this reflects another aspect of the problem. In some > situations, it's better to steal one high-order cluster and use it for > order-0 allocation instead of scattering order-0 allocation in random > high-order clusters. Agree, the scan loop on swap_map[] has the worst pollution. Chris