From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6299AC54FB3 for ; Tue, 27 May 2025 01:18:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B12BC6B007B; Mon, 26 May 2025 21:18:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC3F26B0083; Mon, 26 May 2025 21:18:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D92F6B0085; Mon, 26 May 2025 21:18:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7898F6B007B for ; Mon, 26 May 2025 21:18:22 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 12BB41D531E for ; Tue, 27 May 2025 01:18:22 +0000 (UTC) X-FDA: 83486927244.28.A7CBC2C Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf24.hostedemail.com (Postfix) with ESMTP id 2E17618000C for ; Tue, 27 May 2025 01:18:18 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=P5+itliU; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748308700; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JcqVq5H72s7TakAUQdq3XZtzg5uRXBK5cREK+OnCK9A=; b=kMwilxwhrJ+C19Kk3yU5rOPLimE5TxB4NFZl7ORUnBzW/1lgXJPiVekvxvQTW16hjhDmJy F+n8eeMV1QolsvtXjWA/pm95+Yq+Veyl9JpFZDhbf1u7GUcOSbVjgYaQk0yExYsScnmen4 25+01rdLMMFCzTIwuH9Pm4jZoxWD10E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748308700; a=rsa-sha256; cv=none; b=nWIM+pqDL8dfmw/J+VrmZMvAbXxFBMtmtykPb4NpxlrIruiqPpeVaYmokUBUUwaywYcaSO zis2XJvSSj69xSdd2cHerR5bA8xeEpMyz1DDqdn6GIjgXiO96sAdSyrfvCyT5jZ6beu2lg plvmtBDQdAtv8hymQUU8R+NJJlAbwJI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=P5+itliU; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1748308695; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; bh=JcqVq5H72s7TakAUQdq3XZtzg5uRXBK5cREK+OnCK9A=; b=P5+itliU09diCEvQlI2Zopz+0YoSzWj5IQpJ7j3L8XoW19uJ8sYrWsmgcOKJiJXsvJmdtYEAGUNqLQsqHEmW/BNL06DnhG/p4WawgtnbRiUC5IBu86jyyiIbXSoNEdM1qeu+5KfM0IzBAfFF86dLTpLZDXXeaEX371TP3pMSXLM= Received: from DESKTOP-5N7EMDA(mailfrom:ying.huang@linux.alibaba.com fp:SMTPD_---0WbveRtX_1748308692 cluster:ay36) by smtp.aliyun-inc.com; Tue, 27 May 2025 09:18:13 +0800 From: "Huang, Ying" To: Zi Yan Cc: David Hildenbrand , Bharata B Rao , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Jonathan.Cameron@huawei.com, dave.hansen@intel.com, gourry@gourry.net, hannes@cmpxchg.org, mgorman@techsingularity.net, mingo@redhat.com, peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com, rientjes@google.com, sj@kernel.org, weixugc@google.com, willy@infradead.org, dave@stgolabs.net, nifan.cxl@gmail.com, joshua.hahnjy@gmail.com, xuezhengchu@huawei.com, yiannis@zptcorp.com, akpm@linux-foundation.org Subject: Re: [RFC PATCH v0 2/2] mm: sched: Batch-migrate misplaced pages In-Reply-To: <94BF4806-ABCD-4D01-8577-9E138A634815@nvidia.com> (Zi Yan's message of "Mon, 26 May 2025 10:20:39 -0400") References: <20250521080238.209678-1-bharata@amd.com> <20250521080238.209678-3-bharata@amd.com> <62cef618-123c-4ffa-b45a-c38b65d2a5a3@redhat.com> <5d6b92d8-251f-463b-adde-724ea25b2d89@redhat.com> <996B013E-4143-4182-959F-356241BE609A@nvidia.com> <382839fc-ea63-421a-8397-72cb35dd8052@redhat.com> <94BF4806-ABCD-4D01-8577-9E138A634815@nvidia.com> Date: Tue, 27 May 2025 09:18:12 +0800 Message-ID: <87a56yc0mj.fsf@DESKTOP-5N7EMDA> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2E17618000C X-Stat-Signature: aku54pj3zsg6q79q551gf5z3eem84s6r X-Rspam-User: X-HE-Tag: 1748308698-955686 X-HE-Meta: U2FsdGVkX1//eN9SLVN/ZnEIGAEEc0UXhqdOKKWTt+sVXfN1Awfh/kOAOErsH6/hx//miXFMbnr6akgOroZmvzU0vNNOkr6ytKBK4SpV6g+/qxlcr2WXw5fVXKCbYYbF+LawI+PMr1PMmeKq7MR8RTTzg7jHyIzx3Xp6FWP7KLZmo21Hf76TXbrYmyh8Q9FnXgdeG6pQVMIUHrw1HpaiWtfCbxij3idym4t9b+PQH+IQcpIlghm3FW2hN7sLEHpObuDcm+zf91bWUFYbkpQkx631RuFOUQ31/jUsl375emoKf5ATn07Y8x964NWLVt9umkyI2Yx68EBjk8tNHV61scHc5H3mAvVhKe3OZOleR7jeZEV7ifxnFkGPEEnFoJAD//vVPEsWzhes8tbtGfyZPu3mHuM4zchDMO/fXf0Yd7d03qwIenarcfoK2+1C9VZPVFFqU9/R1rfaihFgyM+7QfB0E+8UB9X5Ov5PDMse43uVsjtkQwhvqjBXefJllLPTRnLow3v5MMnxPJdYBLvD8FLfF1A2Vzke16cYLhZpihdh/ZrDZJM1fZJMQV4LrHQ9ZkXFjReQf38CzQQwIlEJkzl+uze+qt8xxSb0ZK5smdrZdl8ULg6yOrSGbApTQWEBY5NIWlJpMeHt80+7NJyv3IDQgaSWAyJtEZYrQQZVTfbrdQRYoovVrgphx7UleP8LBroXNrNDaiygnHufxPqnpdoi3K3jvmag4GNOcO53x38SgSMDsAZvOy6kQ3ujfmVX2O6igPrtLxthZIyAYInoOLH+IwyjGQfDM27D0GUTSu4Yf+6dkeJ64wEIzTjbUrQYwDbcKuToWALRKn/woJTC10soOS3csIicjV8/bzD7USKagsGwt8ACOUQHGdaiV9I1a7PcpBZnnrZv3Xkvo5Rkeidzx9T+oluLTanBIbPH8n5CvMaMFpImNyjH1aqpsJ+9VtAEf/ETJFA7Sq8LWqs qQLyktwm BOxq7Gd0GJlw1tEOaA36dWRpWsRTreyGbqs+QMOH5EetKtLZ1dkewNZ8XmeLB7kvL06eAzw5DYNBpEvdchoKSZ+ysjwiZUUq4DsqB7N5DNRlKp6vWDAF4+QwAy1YEIk+DthWb7KE3hyA5OKQg/TNVDuOl+qFN+//kLSB6g5cEotTecEXFTYTIg2WE/6eDlu9cud5dd4fPLy0437he/pDC86JYwHsSf5uVGaOQayX+7yXFrlSB9wUdsbbYvJJyJ9kDgyKtovYe8l/n4MBYrEH0zBzT5g1xL9BFuCKO2PljiAUvLVlBYfoASYuGIf4LO9ETmxxZypVIf12nZAPewcS4qKXUvxl9rOx1FKvcHdq5HuYVKZvOEL/85dOLNA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Zi Yan writes: > On 26 May 2025, at 5:29, David Hildenbrand wrote: > >> On 22.05.25 19:30, Zi Yan wrote: >>> On 22 May 2025, at 13:21, David Hildenbrand wrote: >>> >>>> On 22.05.25 18:38, Zi Yan wrote: >>>>> On 22 May 2025, at 12:26, David Hildenbrand wrote: >>>>> >>>>>> On 22.05.25 18:24, Zi Yan wrote: >>>>>>> On 22 May 2025, at 12:11, David Hildenbrand wrote: >>>>>>> >>>>>>>> On 21.05.25 10:02, Bharata B Rao wrote: >>>>>>>>> Currently the folios identified as misplaced by the NUMA >>>>>>>>> balancing sub-system are migrated one by one from the NUMA >>>>>>>>> hint fault handler as and when they are identified as >>>>>>>>> misplaced. >>>>>>>>> >>>>>>>>> Instead of such singe folio migrations, batch them and >>>>>>>>> migrate them at once. >>>>>>>>> >>>>>>>>> Identified misplaced folios are isolated and stored in >>>>>>>>> a per-task list. A new task_work is queued from task tick >>>>>>>>> handler to migrate them in batches. Migration is done >>>>>>>>> periodically or if pending number of isolated foios exceeds >>>>>>>>> a threshold. >>>>>>>> >>>>>>>> That means that these pages are effectively unmovable for >>>>>>>> other purposes (CMA, compaction, long-term pinning, whatever) >>>>>>>> until that list was drained. >>>>>>>> >>>>>>>> Bad. >>>>>>> >>>>>>> Probably we can mark these pages and when others want to migrate the page, >>>>>>> get_new_page() just looks at the page's target node and get a new page from >>>>>>> the target node. >>>>>> >>>>>> How do you envision that working when CMA needs to migrate this exact page to a different location? >>>>>> >>>>>> It cannot isolate it for migration because ... it's already isolated ... so it will give up. >>>>>> >>>>>> Marking might not be easy I assume ... >>>>> >>>>> I guess you mean we do not have any extra bit to indicate this page is isolated, >>>>> but it can be migrated. My point is that if this page is going to be migrated >>>>> due to other reasons, like CMA, compaction, why not migrate it to the target >>>>> node instead of moving it around within the same node. >>>> >>>> I think we'd have to identify that >>>> >>>> a) This page is isolate for migration (could be isolated for other >>>> reasons) >>>> >>>> b) The one responsible for the isolation is numa code (could be someone >>>> else) >>>> >>>> c) We're allowed to grab that page from that list (IOW sync against >>>> others, and especially also against), to essentially "steal" the >>>> isolated page. >>> >>> Right. c) sounds like adding more contention to the candidate list. >>> I wonder if we can just mark the page as migration candidate (using >>> a page flag or something else), then migrate it whenever CMA, >>> compaction, long-term pinning and more look at the page. >> >> I mean, all these will migrate the page either way, no need to add another flag for that. >> >> I guess what you mean, indicating that the migration destination >> should be on a different node than the current one. > > Yes. > >> >> Well, and for the NUMA scanner (below) to find which pages to migrate. >> >> ... to be this raises some questions: like, if we don't migrate >> immediately, could that information ("migrate this page") actually >> now be wrong? I guess a way to > > Could be. So it is better to evaluate the page before the actual migration, in > case the page is no longer needed in a remote node. > >> obtain the destination node would suffice: if the destination node >> matches, no need to migrate from that NUMA scanner. > > Right. The destination node could be calculated by certain metric like most recent > accesses or last remote node access time. Do we have the necessary information available? last_cpupid have either last accessing CPU or last scanning timestamp, not both. Any other information source? --- Best Regards, Huang, Ying > If most recent accesses are still coming > from a remote node and/or last remote node access time is within a short time frame, > the page should be migrated. Since it is possible that the page is frequently accessed > by a remote node but when it comes to migration, it is no longer needed by a remote > node and the access pattern would look like 1) a lot of remote node accesses, but > 2) the last remote node access is long time ago. > >> >> In addition, >>> periodically, the migration task would do a PFN scanning and migrate >>> any migration candidate. I remember Willy did some experiments showing >>> that PFN scanning is very fast. >> >> PFN scanning can be faster than walking lists, but I suspect it >> depends on how many pages there really are to be migrated ... and >> some other factors :) > > Yes. LRU list is good since it restricts the scanning range, but PFN scanning > itself does not have it. PFN scanning with some filter mechanism might work > and that filter mechanism is a way of marking to-be-migrated pages. Of course, > a quick re-evaluation of the to-be-migrated pages right before a migration > would avoid unnecessary work like we discussed above. > > -- > Best Regards, > Yan, Zi