From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23864C54E60 for ; Tue, 19 Mar 2024 09:22:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B34C6B0082; Tue, 19 Mar 2024 05:22:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 862606B0085; Tue, 19 Mar 2024 05:22:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7506A6B0087; Tue, 19 Mar 2024 05:22:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 647BE6B0082 for ; Tue, 19 Mar 2024 05:22:21 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 08C2480C0E for ; Tue, 19 Mar 2024 09:22:21 +0000 (UTC) X-FDA: 81913247682.16.9E3237B Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by imf26.hostedemail.com (Postfix) with ESMTP id 7FFEC140006 for ; Tue, 19 Mar 2024 09:22:17 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=acxw40zj; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710840139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y5N+BQ+lyfwCbvnXxrLxMABXxtBfHI5y+GqEEoMP9DE=; b=jFEXmXWr/G0n0OjvAxmQiH3o1dZm/8VDouOLQYpKpywlBwtWwpmSQwkZ67UG/74dfB7BG2 CJJpc2eElLpGO6xg74fvhLES8HrF9OZlRFMqOxLoocwzKBZhim2AHq+u9oZj+FjJEVzkW4 Ck4F0OZl9R1AyqCqmBy8BbAu92DDo74= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=acxw40zj; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf26.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710840139; a=rsa-sha256; cv=none; b=Lf/kezkSsJMRwqmSQr8Kbp3yG3Oded8XPEY1fJDzMJv9qPUplI+Tj15O4UDrPkbrtWj1dq Jvavsu1yf+t6QHJh3xOhxaBM1PK+DHXWfb/dDE0OiNb4KXCSk2Iad7Ha+A3ECQZru4Zeyo v5y/VIz6XX0yrFnQor0NSl/4Fbfp0ss= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710840138; x=1742376138; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=nzDazVKu3j3v0kvdOi+3UhNFcbOGNhHcgvkXfnLHLj8=; b=acxw40zjIkLbL83aWnqHUFMOo9Z/ltDxLPFUIug179dqd1QFGmIw50YP p6i9Bgz8QzDm244JgcxTo4/k3jq8xNwOl+rfF/zznbU9a+DtNZ7jA8D6U 6xtHJBLYe2FWa+D05N28W7Q7GZhI+MtdE0mi0NDkf8vlgEu5zMbqRWwoP fbHiys2yOVpEgmk/vFQCbwRwfTvfWvdAK8/qKTHAcXgvecxI6yQUKCfh+ FbBTI0vegachz9yxzJm6RkUFvtfK66Q/ja123dnU7J5au7Say+5F6nFjU gt1eIT80kXB6LbB59mMukUPNPkqmE0mk9vmoLPebQ9M8WwLh2qDMQHv5f w==; X-IronPort-AV: E=McAfee;i="6600,9927,11017"; a="17141650" X-IronPort-AV: E=Sophos;i="6.07,136,1708416000"; d="scan'208";a="17141650" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Mar 2024 02:22:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,136,1708416000"; d="scan'208";a="13645375" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Mar 2024 02:22:09 -0700 From: "Huang, Ying" To: Ryan Roberts Cc: Barry Song <21cnbao@gmail.com>, Matthew Wilcox , , , , , , , , , , , , , , , , , , , "Chuanhua Han" , Barry Song Subject: Re: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole In-Reply-To: <9ec62266-26f1-46b6-8bb7-9917d04ed04e@arm.com> (Ryan Roberts's message of "Mon, 18 Mar 2024 16:45:24 +0000") References: <20240304081348.197341-1-21cnbao@gmail.com> <20240304081348.197341-6-21cnbao@gmail.com> <87wmq3yji6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf0rx3d6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87jzm0wblq.fsf@yhuang6-desk2.ccr.corp.intel.com> <9ec62266-26f1-46b6-8bb7-9917d04ed04e@arm.com> Date: Tue, 19 Mar 2024 17:20:16 +0800 Message-ID: <87jzlyvar3.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7FFEC140006 X-Stat-Signature: 4bxb9ew4bcret4sbjczxy3cjff8t4x5r X-Rspam-User: X-HE-Tag: 1710840137-973228 X-HE-Meta: U2FsdGVkX197KaY4li74KfmU5sUs4zTZRwDG8fiZR7jYmrbTo2EAVWwabFkMVX6HCmdP9YD9l6o9aGrAlLB9uI0TtKyli/ypX12X67EZqZenxc6SxhmbFbE0XZ6ma4i8g1Z+e7vPTF5vboJ5xhMC+rvHsjAhpgnNSbnMZSNBmMzf4b+bWvrLqRo3+2e9Yrr5fdf8FncbQVxjCxEZ4OvCcRHyTvkbR6HjJfKIrOurtC5/Q1ZGHYVeY8X2ovIf8ACazFzrt/28a+okTlQDDjOxuwoDnPPVU8tM97xoSbfPO6L36rRX2qUfIO7MoM73Gnke8fmkuQeQQg3yB/docpq6TVYkHl3vHg3XugbBZ9nFUs9dwXPlerqBI4MbwM1G+lF/vHrrZyMVqV8x5vuROP+mSRJ6DpWEuvf8JW0lPu8GtJRJOeR8nv/7PTKvva1WktW0+Hj+3O4zcWfCusChcZFVCpwMBkBXcCecoQf+g/9aQvNKNwADEeQwl1B7QxHvQHRmJqDEapXr6ynExBGghPBGRCxkBNHpOjL9iSQ/SfoprQZw4i9+kB5w8bVrdBsEY9Ksztg6iD+XIN3HND7pLRh6ZlF5dBOTJx5xB8nAOGmcCqH9OHMfdcbeDxwHuLVmhrqAixnAE0u1mbKwUjucsiDHlikpreJh/09r/kux1G4rB0BEkYzfYsjFXRktgB423H4U5ltCcL7P5difmI1RWQ5pu+GacB9r68XnKyZwRwftC9ZbbHMtuBnED2Vntr2/e5LVFLmRaKs3m0YSQ8yQRyxOcmMswgj+jdsp5jqacS5xUE85cTMyMRn5tiZzz+tAcZjhTP4fBuOntnd1d75O332Ko2BQm7HYeWs/0fKApdSmnipN1iZNENkizey8oEWhiZBEUQFECbc49Q6o0ivTnadF2rU0+ecF74HedYxeYTjWRxfWSz0ZsOXp8TbMOSk4GtUFQoVtaalUaMF7NKO58VB OsdmCXb1 xZDd4MT58A0IRMoOTnJ9mAMBwufg86gvex6U2UW+qL7VeBTk6+rCZ/ZpHNVkDiRXuaH3m1muEMxdm9vSJQzAQL17xdTXYkQ9pbnI2Lt6/XaKbTmTYly5ebCfN9qvyfTyiwrwZS1ijsIrN7kLRALdDJ/HZ9+Ib+wQne9phCCOe47jlAJNK5/TPTXAidF6CUmLql7s1lnMagTUiqzM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ryan Roberts writes: >>>> I agree phones are not the only platform. But Rome wasn't built in a >>>> day. I can only get >>>> started on a hardware which I can easily reach and have enough hardware/test >>>> resources on it. So we may take the first step which can be applied on >>>> a real product >>>> and improve its performance, and step by step, we broaden it and make it >>>> widely useful to various areas in which I can't reach :-) >>> >>> We must guarantee the normal swap path runs correctly and has no >>> performance regression when developing SWP_SYNCHRONOUS_IO optimization. >>> So we have to put some effort on the normal path test anyway. >>> >>>> so probably we can have a sysfs "enable" entry with default "n" or >>>> have a maximum >>>> swap-in order as Ryan's suggestion [1] at the beginning, >>>> >>>> " >>>> So in the common case, swap-in will pull in the same size of folio as was >>>> swapped-out. Is that definitely the right policy for all folio sizes? Certainly >>>> it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm not sure >>>> it makes sense for 2M THP; As the size increases the chances of actually needing >>>> all of the folio reduces so chances are we are wasting IO. There are similar >>>> arguments for CoW, where we currently copy 1 page per fault - it probably makes >>>> sense to copy the whole folio up to a certain size. >>>> " > > I thought about this a bit more. No clear conclusions, but hoped this might help > the discussion around policy: > > The decision about the size of the THP is made at first fault, with some help > from user space and in future we might make decisions to split based on > munmap/mremap/etc hints. In an ideal world, the fact that we have had to swap > the THP out at some point in its lifetime should not impact on its size. It's > just being moved around in the system and the reason for our original decision > should still hold. > > So from that PoV, it would be good to swap-in to the same size that was > swapped-out. Sorry, I don't agree with this. It's better to swap-in and swap-out in smallest size if the page is only accessed seldom to avoid to waste memory. > But we only kind-of keep that information around, via the swap > entry contiguity and alignment. With that scheme it is possible that multiple > virtually adjacent but not physically contiguous folios get swapped-out to > adjacent swap slot ranges and then they would be swapped-in to a single, larger > folio. This is not ideal, and I think it would be valuable to try to maintain > the original folio size information with the swap slot. One way to do this would > be to store the original order for which the cluster was allocated in the > cluster. Then we at least know that a given swap slot is either for a folio of > that order or an order-0 folio (due to cluster exhaustion/scanning). Can we > steal a bit from swap_map to determine which case it is? Or are there better > approaches? [snip] -- Best Regards, Huang, Ying