From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3D27C4167B for ; Wed, 29 Nov 2023 07:48:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0DDA06B0389; Wed, 29 Nov 2023 02:48:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 08D9C6B038B; Wed, 29 Nov 2023 02:48:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E95FE6B038F; Wed, 29 Nov 2023 02:48:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D972E6B0389 for ; Wed, 29 Nov 2023 02:48:12 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id ACDC3160478 for ; Wed, 29 Nov 2023 07:48:12 +0000 (UTC) X-FDA: 81510213624.08.E3A2BD0 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf11.hostedemail.com (Postfix) with ESMTP id C67084000B for ; Wed, 29 Nov 2023 07:48:10 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=drYYhkkR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701244090; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1LsmkOiZmmVlmozGbrCeR5+nJa1SsA881rwPlAyz/nI=; b=0HII6Y1uW0duu0qDPgTzaICM8yOlJDZkS5DEKHXNFyR08kCiQSiLfKGkONDEQJ8Wja6mRn JGKq6k77V2ih6A38BPmqU7+C+ZrhcW5J6nlGq84ZSCnzBRt/dfQXU7Z9J3zXdC9qGbK0+/ IlCLWUSZw4O064FWItZE+saNXzJplpU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=drYYhkkR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701244090; a=rsa-sha256; cv=none; b=TeMbZUGZAkWvaIIkh/GRMrQ1uQpDjt26/iGLKJdVtGES6rPEIDb3l43Ft9uTD7W9T7RhHA UbkNXZB6CkertcEcpCVhsKtBOaCZbtmrem9JwpTaSwj9NJb8NFPS1UzRd7aqrPhwqkrnbE XOvW9pm5ynv/KWEjTsjCBaflB0pd39w= Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6cdce15f0a3so449533b3a.1 for ; Tue, 28 Nov 2023 23:48:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701244089; x=1701848889; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1LsmkOiZmmVlmozGbrCeR5+nJa1SsA881rwPlAyz/nI=; b=drYYhkkRM7gUwOM24FTtYeG4pfduC9Jl2rfC310Bc5nYK3vAQu+Jlfx9AvCG9fEQos o+JQ980CY16UOWne7eD5BQDZzXV+t9jxTBrqdRFM0ACWkHMeJeFeRxaP619DY+x+6vnz G9YudinvZMwxfMmjxZW6C6MOV5imUomBgib//GU2dui0l8lcquITAvfUz5/vf2rXK7fe DEl23pvOXeE8qnfbKdYiomI0U+7h19oVBh2oA7sGsy69UgLa0O1jSMld9dpa6bTmUCmq 2nVd9mzxy5xK9R86httbwa+/m096D0T2XkLzvqU0tWqZ0IfixrzJzNAh6Sn5mJYULKji ZqJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701244089; x=1701848889; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1LsmkOiZmmVlmozGbrCeR5+nJa1SsA881rwPlAyz/nI=; b=CzjdH4Dna0Cr8KIWvuvXBJJi8QUI3hEPh/fW2XvVo7JB61lpMb6hgnX4sAeIM8fKpq IE9iAFuL2fm/aPtvhOYCITYsNpchFayA15kFJWxOZc9km4zsWGB88PBUWCgpJLW05Fu8 Gfe8oHyOjSLBr9liht/5qerjZEkmKk5P8DmJdyTE91SBtLA83OCtOLO40b+jFNqZMpAb 0USCM09MyF79Hk29lSBVL0Ouu5Y63wtQL4fquQrrNb5uwfxmRXBqcnN0Mij+5VsP3YsZ cKxuZtSELI2FjcrrCRmwY6jIozEe860LuXfIylNbYCqSi5/8Gn8sDpLNnPqKqMBGXj4b QKjg== X-Gm-Message-State: AOJu0YwUC4Aah1G+vkse+UbUnpD11lhwXjIz0YGIhQFtNIgBBbOXY0if AzPe8NqD63C9a0LIlHBsfoI= X-Google-Smtp-Source: AGHT+IHconXlwv2Oxf+iVJsoobTCdl7aYpt1BKlRVRLPOKFRFNjZgzt5EVeAUWKtoNYd5pBbwFNrwA== X-Received: by 2002:a05:6a20:841e:b0:18b:4f8b:5a05 with SMTP id c30-20020a056a20841e00b0018b4f8b5a05mr25778553pzd.30.1701244089366; Tue, 28 Nov 2023 23:48:09 -0800 (PST) Received: from barry-desktop.hub ([2407:7000:8942:5500:35f3:88a6:8194:6439]) by smtp.gmail.com with ESMTPSA id l23-20020a63ba57000000b005c2900df62esm864473pgu.63.2023.11.28.23.48.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 23:48:09 -0800 (PST) From: Barry Song <21cnbao@gmail.com> X-Google-Original-From: Barry Song To: ryan.roberts@arm.com Cc: akpm@linux-foundation.org, david@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, shy828301@gmail.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, hanchuanhua@oppo.com Subject: Re: [PATCH v3 0/4] Swap-out small-sized THP without splitting Date: Wed, 29 Nov 2023 20:47:41 +1300 Message-Id: <20231129074741.15682-1-v-songbaohua@oppo.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231025144546.577640-1-ryan.roberts@arm.com> References: <20231025144546.577640-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: C67084000B X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: jwkzsg3wjoim339wje8uacdxijm1pai6 X-HE-Tag: 1701244090-720513 X-HE-Meta: U2FsdGVkX182Kuhpx35Hj0iv526kKsAiBG8BnM0o8nHOKh3fhmYqvpAHVlqNIq7d8+YaYPPO9i9jGKiWntw5ZenamZK1d9CWZytRyXfoRfHAKs0IRTcBIUev6CyZWHdYIB7jep3xtAhsFV4FrQ7u1/Dw5NvNAiKhWQaCr+7PWnRwyBXJjCcaYm08+wJtlSvosXqRUaoEMMIuSF/n6k+nrfTMUoM/ck2oy+6DtiFStjq8zj1NF+SL/sgLo/AigEWFlvBsNIAsghI3aqwvQcIq3xZadmJFh5f+FUDLKH/kQHWLTOOrZWkZVWunWbhSIE67xXsK6s+M5kX5+1/4v6GldfTnT06LvM5hEP31VsFu1DCAuSepVs19452L2186r9l6CZRxx/X6ZfyczZjXa9YxMVwlla+mNfCRy2iR8VAbXEGXls2p4rOsv2aGcsSoXtJ1JI974VWCTAKozBB/xvlEFB5QuocgF88tqqiXiCzZefKALRDNAnIkAulJHop/fpYuubg7b2yyFcSbD8Ksg98X4LByzZmQPDGs+yhUzFgjAFNn9gM5Q5tMDhccNWjnjyNEbhIoklQRzk002ux7+spIegkSVI0ccb20rHtmbueYHPeZDmh6eny7yfaYLfUPFdDLOvVv6aGkIEDqORTrDB1X80/0djQ0Hz8KI5me67YmniGMBxENZQyG4t4euiF3Jsqye3GxjMyrmw8sROLs3WOZRxKlvSuwKCnlmBwPaSnbdeW33JxEYF6G7dx1OG5Oi2kPSzM5eJI4wgH7Y8phQ+vs1nDPnUfxLJttMCd50vB7qU18HnpTCOCDQuQjcr0E8CIhTn0JnZn5I4uN4pLvN1o2e6Zr0Ks9rLP/OXqnaEl9wyq82+4WtQQrOZfGwHJwjPANPMZHhlGUyQCFPmuYVfgbOXNyMECFq2dMLVUupuNRjr7Q7rlb21rUPX5Dfsd59YQTdi6sHtv+SbWbd9TN5oJ I/VIdM75 A0ngYWZefpq7j+wvittUf5GT42c8eyE+ZZXLOt5XWcTq6EMaNqpW/ZycA3j3UdRtVVkrYOugtmHIL21hVEgh0yoTonDz7zpQIHSgX5Mm9sUdsV5bd1WkifDlHVbaUj4hg+8rK/kDtvqHZKeOHlY5bMjFarn0jYjc8l7IYkBc0R6mIgDJsqN/I4D5s/aYtA4wnfhDvgSuJvkI2k8QEOsDaUUJiHE4oDzmVQwhXBsAoFkWhC/V1nyv6Pbe2eAlLS1+d4r40s/P3HQ1damy07G1oPmbLx+X0UBfTUs6tqaNbae+CI9iTxE1Niob9lLsKTkfsN5i3fNwp4+iryxKOW59y4uTE2z6xKoi6OU1Rrzrbr1NjD8VNUl24njf+ABWN/8bZWydFAd7Je1Ye8HBcp2U4okvHZos1TFiftPZ0UuvZVBnaqcJkPLrAQ0QjcoBUwO2pOcPfOgXjSy10n8BHzSfDwBjUK4atkB3TTwM/a8aP1LcfaJINu+OGRbE41Rws9eFjEJpJqn+dSy0pHp2QsMIr3o/WZgDvxd48WnJHHeW8FTE2HJAwSH+DmMMgkLrpCFVcrQd2uU5gScIgq1y5xH7ehw3fIjd4vpZyZiC6A0Wf2hdggsPcBZjQ/8PZj24py4VUCF/VFPJLwEkaBlSAPExQOobPnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > Hi All, > > This is v3 of a series to add support for swapping out small-sized THP without > needing to first split the large folio via __split_huge_page(). It closely > follows the approach already used by PMD-sized THP. > > "Small-sized THP" is an upcoming feature that enables performance improvements > by allocating large folios for anonymous memory, where the large folio size is > smaller than the traditional PMD-size. See [3]. > > In some circumstances I've observed a performance regression (see patch 2 for > details), and this series is an attempt to fix the regression in advance of > merging small-sized THP support. > > I've done what I thought was the smallest change possible, and as a result, this > approach is only employed when the swap is backed by a non-rotating block device > (just as PMD-sized THP is supported today). Discussion against the RFC concluded > that this is probably sufficient. > > The series applies against mm-unstable (1a3c85fa684a) > > > Changes since v2 [2] > ==================== > > - Reuse scan_swap_map_try_ssd_cluster() between order-0 and order > 0 > allocation. This required some refactoring to make everything work nicely > (new patches 2 and 3). > - Fix bug where nr_swap_pages would say there are pages available but the > scanner would not be able to allocate them because they were reserved for the > per-cpu allocator. We now allow stealing of order-0 entries from the high > order per-cpu clusters (in addition to exisiting stealing from order-0 > per-cpu clusters). > > Thanks to Huang, Ying for the review feedback and suggestions! > > > Changes since v1 [1] > ==================== > > - patch 1: > - Use cluster_set_count() instead of cluster_set_count_flag() in > swap_alloc_cluster() since we no longer have any flag to set. I was unable > to kill cluster_set_count_flag() as proposed against v1 as other call > sites depend explicitly setting flags to 0. > - patch 2: > - Moved large_next[] array into percpu_cluster to make it per-cpu > (recommended by Huang, Ying). > - large_next[] array is dynamically allocated because PMD_ORDER is not > compile-time constant for powerpc (fixes build error). > > > Thanks, > Ryan > P.S. I know we agreed this is not a prerequisite for merging small-sized THP, > but given Huang Ying had provided some review feedback, I wanted to progress it. > All the actual prerequisites are either complete or being worked on by others. > Hi Ryan, this is quite important to a phone and a must-have component, so is large-folio swapin, as i explained to you in another email. Luckily, we are having Chuanhua Han(Cc-ed) to prepare a patchset of largefolio swapin on top of your this patchset, probably a port and cleanup of our do_swap_page[1] againest yours. Another concern is that swapslots can be fragmented, if we place small/large folios in a swap device, since large folios always require contiguous swapslot, we can result in failure of getting slots even we still have many free slots which are not contiguous. To avoid this, [2] dynamic hugepage solution have two swap devices, one for basepage, the other one for CONTPTE. we have modified the priority-based selection of swap devices to choose swap devices based on small/large folios. i realize this approache is super ugly and might be very hard to find a way to upstream though, it seems not universal especially if you are a linux server (-_-) two devices are not a nice approach though it works well for a real product, we might still need some decent way to address this problem while the problem is for sure not a stopper of your patchset. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/blob/oneplus/sm8550_u_14.0.0_oneplus11/mm/memory.c#L4648 [2] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/blob/oneplus/sm8550_u_14.0.0_oneplus11/mm/swapfile.c#L1129 > > [1] https://lore.kernel.org/linux-mm/20231010142111.3997780-1-ryan.roberts@arm.com/ > [2] https://lore.kernel.org/linux-mm/20231017161302.2518826-1-ryan.roberts@arm.com/ > [3] https://lore.kernel.org/linux-mm/15a52c3d-9584-449b-8228-1335e0753b04@arm.com/ > > > Ryan Roberts (4): > mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags > mm: swap: Remove struct percpu_cluster > mm: swap: Simplify ssd behavior when scanner steals entry > mm: swap: Swap-out small-sized THP without splitting > > include/linux/swap.h | 31 +++--- > mm/huge_memory.c | 3 - > mm/swapfile.c | 232 ++++++++++++++++++++++++------------------- > mm/vmscan.c | 10 +- > 4 files changed, 149 insertions(+), 127 deletions(-) Thanks Barry