From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C97DCD98C7 for ; Wed, 11 Oct 2023 06:40:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A69208D0014; Wed, 11 Oct 2023 02:40:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F2768D0002; Wed, 11 Oct 2023 02:40:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8932E8D0014; Wed, 11 Oct 2023 02:40:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 741F58D0002 for ; Wed, 11 Oct 2023 02:40:09 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4325C80182 for ; Wed, 11 Oct 2023 06:40:09 +0000 (UTC) X-FDA: 81332230938.27.CE5406C Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by imf02.hostedemail.com (Postfix) with ESMTP id 03D558000E for ; Wed, 11 Oct 2023 06:40:06 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jqyzUigr; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697006407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9TidqYHiIn1YlQBLGgxCsMwdXj5RMuqgAHEVcvJySOw=; b=gM5mBxOKwWAnQWMCpJ3OJI7Jtbvp7IYxDM+STMEZTxBmAKq4457uqZt9346yfTef2NIkpa a2sc95NfGgpozaLFjfw0M4sU3b+5CZQCBuPDQujhRHpoqWJuHrEUhIICxDWFGRllnBLPhu YHw7GQjM8B54DvQmzAlGPWZqJLLfkk8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jqyzUigr; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697006407; a=rsa-sha256; cv=none; b=ps6Deo2ibajLUGe8blBFOsBpBczWv7/rG2s1Tnc1wLquTIelpm4ygxgkr3xYh1pu+ZEkwq JmV8opOERXbbzWSwtDo5UPt28q/PCdSYUTJ15cWKGqHZC1qxt0fFQcVgjzuVPe8dYtCjJT S57lSEtCQFga2HkVrjbh+Wv3trcJ4/g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697006407; x=1728542407; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=09Ah/xh872aHOGZOGpIWrTMNbCb6X9gQD8G1LjWG4/Q=; b=jqyzUigrd5PKb7rT0G4X0BUG7vESIV1VpvrTv5RuznSjMCFUhs+vKw7O PyRjN5T7EN/1fRQOG1pRnA/k/BLAYYWe3MGubO1aKa61C7o5ws3IPq0/x wLavK+PYKVgi2GaoxH74e8YrgRT3MPggY6guwwDOQyN4Q8MP2AWlPYTf0 YD7bgwBThp9Za245PKTi2R6nzkfRqbf/OEV/9bJJJptlcBmiA33K0+g9N G37VXtbBqd118a3s2tikxwhU24EICGz/iq20VNA/UbLsUgJkHOfJUJ8/U soNfA1lyoA93XsKhrZZLAPfceurbJyKWSTR7DzqrvxPUkYEq1e7UGYMp1 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10859"; a="387435640" X-IronPort-AV: E=Sophos;i="6.03,214,1694761200"; d="scan'208";a="387435640" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 23:40:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10859"; a="824047061" X-IronPort-AV: E=Sophos;i="6.03,214,1694761200"; d="scan'208";a="824047061" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 23:40:02 -0700 From: "Huang, Ying" To: Ryan Roberts Cc: Andrew Morton , David Hildenbrand , Matthew Wilcox , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , , Subject: Re: [RFC PATCH v1 0/2] Swap-out small-sized THP without splitting References: <20231010142111.3997780-1-ryan.roberts@arm.com> Date: Wed, 11 Oct 2023 14:37:58 +0800 In-Reply-To: <20231010142111.3997780-1-ryan.roberts@arm.com> (Ryan Roberts's message of "Tue, 10 Oct 2023 15:21:09 +0100") Message-ID: <87zg0pfyux.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 03D558000E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: hfukwbmqz4fjfemhca9c3n5nidg4ind6 X-HE-Tag: 1697006406-36786 X-HE-Meta: U2FsdGVkX1/7kfozjwgFRsn60eAdaLylWXBasxyGojsTOuRXEgbBp1+oO08b024uDr/MTpidlXbC7rz7nd1btu6p4F7eIzvTGeR31JbOWMRV4OkREt9gto+1MZ8bK9jZ3xrjrcjhIccjWqSoI3Eg6XUVh/VEJjRmQUbbKa+qOJ4BtL+5wdskxR737foHyOa1JGNHkgHhGIACXmJctqUpjmRJzQezXGUVIWsEgCASQN9E8RorLal0lhaNcEXsY6gBrzgkpKM4IRIw5OBc/AuV53pEtWwbdnF3hz9/yX39mXiHwH0e4ClFqpp6CiIOrUVjIeiU7TapPJvrb0lPOvBivj0mQcS7BpUR1WL3cbhVoWLCIskbSwuSgFVzOyTQogrwJFA7h/iGMKsDOoAfM6Yb/4bTf9w7GzRFB2p/mFU1C0dy5ucbZl/y5E4UMSmKiOt2dNSnf4sN1uNIxI57ntoyeO3RuQOv6uTuqqGnqcxCR8FgrTu0PzGd1qyRHMAy9vKoDHwuo4u2GqCv95woT4ESl9smWNiQasnD3ZvXwaVVVoXsX4GzddN+QhMYLDt++jmpbu5H+KoUmMjdOnSWvjlVahhrf9PqgkakoY4b3UxbEeyQiZ94luqCrTdVzvTUgYCL0XgTihiSwrjkgteDZmQgtN/WYKsLX9F7H4mCDZNd601K6NRqkGp6DM4zPEU1TNo/DMveF0Ihd7hYln7O9LHPwTbEbw5rlbYeKcnuvohGhQpL1GCsQR+wiVnOHONQn4hc4HoU/hVnxDUzLj9bDxhtGFCaHe6u9NLOsAz5njCJMFEVsUeci4e7JRiQC2UzIqKPO2DFzbyNRbjS9fL1XzHNGcU9UsWv7NXqGzVn6xBqZICAM3Gp8HZyv+XjbvIJ9pR3WgrSKQ6SpTlUHHsuKMzVMVbReSGQrYOI7TP2eX13JcOzDLtI85m/Ak8OmPCGvM9Vyxw0Tek7MKkzcNvTcJa xGQVujXK UmlLY9OlrKyugJhQISA/hL4RrdhOVPh74o69jPk2rQAXsbHUdtzVcsZhaY6e8aTmYt7n3LExYyTJjdOB6dzdwh+L6sU5ZSfNyz68KSFEQMkg6/iXDm0TzQ0LPSPHW8Zdu+gSVzPreAZUsRYu83fL6oYA8M8sYdRNVM+gG0l2UE2SjlneW05AeOG2DkoMORj/f3ilelFoQkhfNFg/JJ+fCdb7udENlh4cln0WaRwBByNWIPBvYkyy5cOtGynRsk98R5Hazc5gxGgKabSk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ryan Roberts writes: > Hi All, > > This is an RFC for a small series to add support for swapping out small-sized > THP without needing to first split the large folio via __split_huge_page(). It > closely follows the approach already used by PMD-sized THP. > > "Small-sized THP" is an upcoming feature that enables performance improvements > by allocating large folios for anonymous memory, where the large folio size is > smaller than the traditional PMD-size. See [1]. > > In some circumstances I've observed a performance regression (see patch 2 for > details), and this series is an attempt to fix the regression in advance of > merging small-sized THP support. > > I've done what I thought was the smallest change possible, and as a result, this > approach is only employed when the swap is backed by a non-rotating block device > (just as PMD-sized THP is supported today). However, I have a few questions on > whether we should consider relaxing those requirements in certain circumstances: > > > 1) block-backed vs file-backed > ============================== > > The code only attempts to allocate a contiguous set of entries if swap is backed > by a block device (i.e. not file-backed). The original commit, f0eea189e8e9 > ("mm, THP, swap: don't allocate huge cluster for file backed swap device"), > stated "It's hard to write a whole transparent huge page (THP) to a file backed > swap device". But didn't state why. Does this imply there is a size limit at > which it becomes hard? And does that therefore imply that for "small enough" > sizes we should now allow use with file-back swap? > > This original commit was subsequently fixed with commit 41663430588c ("mm, THP, > swap: fix allocating cluster for swapfile by mistake"), which said the original > commit was using the wrong flag to determine if it was a block device and > therefore in some cases was actually doing large allocations for a file-backed > swap device, and this was causing file-system corruption. But that implies some > sort of correctness issue to me, rather than the performance issue I inferred > from the original commit. > > If anyone can offer an explanation, that would be helpful in determining if we > should allow some large sizes for file-backed swap. swap use 'swap extent' (swap_info_struct.swap_extent_root) to map from swap offset to storage block number. For block-backed swap, the mapping is pure linear. So, you can use arbitrary large page size. But for file-backed swap, only PAGE_SIZE alignment is guaranteed. > 2) rotating vs non-rotating > =========================== > > I notice that the clustered approach is only used for non-rotating swap. That > implies that for rotating media, we will always fail a large allocation, and > fall back to splitting THPs to single pages. Which implies that the regression > I'm fixing here may still be present on rotating media? Or perhaps rotating disk > is so slow that the cost of writing the data out dominates the cost of > splitting? > > I considered that potentially the free swap entry search algorithm that is used > in this case could be modified to look for (small) contiguous runs of entries; > Up to ~16 pages (order-4) could be done by doing 2x 64bit reads from map instead > of single byte. > > I haven't looked into this idea in detail, but wonder if anybody thinks it is > worth the effort? Or perhaps it would end up causing bad fragmentation. I doubt anybody will use rotating storage to back swap now. > Finally on testing, I've run the mm selftests and see no regressions, but I > don't think there is anything in there specifically aimed towards swap? Are > there any functional or performance tests that I should run? It would certainly > be good to confirm I haven't regressed PMD-size THP swap performance. I have used swap sub test case of vm-scalbility to test. https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/ -- Best Regards, Huang, Ying