From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1685C3DA49 for ; Tue, 30 Jul 2024 08:36:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44F766B0085; Tue, 30 Jul 2024 04:36:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FF916B0088; Tue, 30 Jul 2024 04:36:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C6B56B0089; Tue, 30 Jul 2024 04:36:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 055156B0085 for ; Tue, 30 Jul 2024 04:36:47 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7F64AA2C51 for ; Tue, 30 Jul 2024 08:36:47 +0000 (UTC) X-FDA: 82395763254.04.8C42D0E Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 927E9C001C for ; Tue, 30 Jul 2024 08:36:45 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722328565; a=rsa-sha256; cv=none; b=UNdlPxNs6bUNYidk19RtIl9bZGhdW2WDhh3YAkAWuzErt6sQ7auI+JdbdrGWG/6M2luSHh HKKLB3xCvwLW4O2C3T/I/UNPlJXW0ZXCj/zvi2NPKv9RYrfUWkSnb2ZdA1up83x0LXgYZI ePvGUKJ0NDZqURWuMPUmKHq21ISjS+E= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722328565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8Zruw2Mqz1VTBPacdlP7zQJeDrwMUnK8YAUzva7Decc=; b=sNmhvJ+6weq856HTc9WbmbtfG9MHlCUWNzKafjskw57MhC/GrchMWSz5bffkMtXJTr7nNC EHhD0c4lcuXePUSnZkfTiwOoAqMLHI9ukNj3+NEoZmfzFOoghIltL1EHgI6mtt18A0oE9w rb6PSHUIaJ0VI0XKOTit+zCe23FyJNo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2D4481007; Tue, 30 Jul 2024 01:37:10 -0700 (PDT) Received: from [10.57.78.186] (unknown [10.57.78.186]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1F5BD3F766; Tue, 30 Jul 2024 01:36:40 -0700 (PDT) Message-ID: Date: Tue, 30 Jul 2024 09:36:39 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 4/4] mm: Introduce per-thpsize swapin control policy Content-Language: en-GB To: Matthew Wilcox , Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, xiang@kernel.org, yosryahmed@google.com References: <20240726094618.401593-1-21cnbao@gmail.com> <20240726094618.401593-5-21cnbao@gmail.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 927E9C001C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: mifcid7173mz99rysr54zrce56fe5yus X-HE-Tag: 1722328605-617863 X-HE-Meta: U2FsdGVkX1+0J9v4O/wCbudulg7lsueFpsTsAIDzzKQFh3lemT3uQygWFHPa24HhpewOi38XWJEZMjBIyDoafsdEHKEvXB06U5Tv6AUF+TIoPgSCTdYKtuhlLwVIsOXm47joI+uQwXr2leZwR1yPQCjluYEVxE+IGGguzIiqRyhsHsgr2I/gUytfRYhC4ujKNlXdGNDa3UEcDYOOa+n0XJ3RobbKPhlyE0amVhav3ZFvT7AA/1D2Zf1XdE9zz7DEpmFolreQnSA3WWjlR3Po/0UhSUDlDMuYIVZ6zv88flmRl3wDxofWyQSnOL3JV+/HZUr3PrnNvWOrFzHjO6F6L28XrJElHCdejk5R0HsyXBAK8lNZNlBjob5FmcUFk0PlzjnisefTzvpksbYpN/zpCct3OZyYHTMxjyhvkv3QStsawPBEaQlv5nzp+HEbPqJUI5qa5HS/S5ZEuR1m9eUtHKmndbKDMw0HDQdobCychPO4775xQIf3/uSnHm57awZX50f9ViGJc2iGjm5LcdCIfsXLGm3MrKdJ29ewrIEdchmo+1AEhU9wYyaOtNDbb06WwvfoBbaVhq7WzG5W4cF/9zv5tOw/LpMx2VtcuMAKQGZS4vbogxCPiih5aLuDvSlN60xhkAgrn4fgrj2Zu0fD6uMjiiA3PEMOrLFC9DiHB4XquuhhgGjdRl7uzZXOXl7SA9yYdSM/Pmmio2nI1+S8OgyVhWVsVVU2lUq4zjfw35SrcKQPPNSeEPFnZ0rZ3vivnh9EoppxIz4B/arUmQngJhNu3xdogUS+zlkhwyfCEx7pZtPTxpMCSTbY7EbyjuKbAfhNf6U4m2ZyaOwNSGeBApEhaZ7E11kBqsoGDyu65WzZZ5ldtza4wPRCq/QpZPTzALNyL4tybe/59ZaMi2zYSca5zUtPnt0arnruw5lDFVxjzK9k+YMQgT81expkF9EsNicflAUm+XJR/itFFz1 QBuEyphv 1OGnMH5G3lro786nRYYgGsn4eAWwUZZuOaORWQGedmUldrx2oREw2Ud1z5mtVxmu0o/AUguz4USgRrhYMJOcQ7cNnjagtqG/V+EOT0ldkGViO2lWZXIDZ8cSxUuAHzpnqQvsa8uCc0b29WaOlXVnP/lGntw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 29/07/2024 04:52, Matthew Wilcox wrote: > On Fri, Jul 26, 2024 at 09:46:18PM +1200, Barry Song wrote: >> A user space interface can be implemented to select different swap-in >> order policies, similar to the mTHP allocation order policy. We need >> a distinct policy because the performance characteristics of memory >> allocation differ significantly from those of swap-in. For example, >> SSD read speeds can be much slower than memory allocation. With >> policy selection, I believe we can implement mTHP swap-in for >> non-SWAP_SYNCHRONOUS scenarios as well. However, users need to understand >> the implications of their choices. I think that it's better to start >> with at least always never. I believe that we will add auto in the >> future to tune automatically, which can be used as default finally. > > I strongly disagree. Use the same sysctl as the other anonymous memory > allocations. I vaguely recall arguing in the past that just because the user has requested 2M THP that doesn't mean its the right thing to do for performance to swap-in the whole 2M in one go. That's potentially a pretty huge latency, depending on where the backend is, and it could be a waste of IO if the application never touches most of the 2M. Although the fact that the application hinted for a 2M THP in the first place hopefully means that they are storing objects that need to be accessed at similar times. Today it will be swapped in page-by-page then eventually collapsed by khugepaged. But I think those arguments become weaker as the THP size gets smaller. 16K/64K swap-in will likely yield significant performance improvements, and I think Barry has numbers for this? So I guess we have a few options: - Just use the same sysfs interface as for anon allocation, And see if anyone reports performance regressions. Investigate one of the options below if an issue is raised. That's the simplest and cleanest approach, I think. - New sysfs interface as Barry has implemented; nobody really wants more controls if it can be helped. - Hardcode a size limit (e.g. 64K); I've tried this in a few different contexts and never got any traction. - Secret option 4: Can we allocate a full-size folio but only choose to swap-in to it bit-by-bit? You would need a way to mark which pages of the folio are valid (e.g. per-page flag) but guess that's a non-starter given the strategy to remove per-page flags? Thanks, Ryan