From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5334C3ABC3 for ; Mon, 12 May 2025 02:08:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42B4E6B009A; Sun, 11 May 2025 22:08:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D9A56B009C; Sun, 11 May 2025 22:08:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A2D76B009D; Sun, 11 May 2025 22:08:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0CF346B009A for ; Sun, 11 May 2025 22:08:43 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 66673E3485 for ; Mon, 12 May 2025 02:08:44 +0000 (UTC) X-FDA: 83432622168.25.0C22804 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf27.hostedemail.com (Postfix) with ESMTP id B4CF640002 for ; Mon, 12 May 2025 02:08:42 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FR7Hy75P; spf=pass (imf27.hostedemail.com: domain of rientjes@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747015722; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=p4HRz+TrMLQQL1LIkkzghD7Q4lIWajdvbfBuObG1XMs=; b=JWo7W1TXeT0o7AImzI1DlzSTDAHIaGypILyDBodpc8RvMaJp10MIasf1YH+LvJtoJ5N7pl 6GJgt1EzsBf1v3U4IY6u7Iwt3GXsXK0+dS/bhaBuioahRLLODW+/m8BFAPF3uyQ2b0QGx+ npbjMe/vxXfcEij/eKBCPY3NCaYjykM= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FR7Hy75P; spf=pass (imf27.hostedemail.com: domain of rientjes@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747015722; a=rsa-sha256; cv=none; b=A+USOWRacWNZa8ckuVa1bWuqk0isgeMok3QSuWFz+pmVYnSXOhZPvyJ6P3QFGzyycgQ9ai bJxV27hPa+ekp3jCq+5ReEikOGYHu7IYwK84ARekaWMlkYMw9x/3/HwW4o0nKbBnyooOl+ Ice0IiOp0C7Z28dJHUloHBizuIR4Voo= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-22e39fbad5fso269115ad.1 for ; Sun, 11 May 2025 19:08:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747015721; x=1747620521; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=p4HRz+TrMLQQL1LIkkzghD7Q4lIWajdvbfBuObG1XMs=; b=FR7Hy75P9hdY1m7iJTa3z6IO9vXrAwKciQ1RxNA3f8Cha3GaD6pLlYaPZM6WSa8+JG IhJHT+QgQ9XOBtNnIBMAffPJIkO2pNJdu2Nv1/DdLEYM/gMbpPHYcajMkMTajxA4ZTUW gL5jTSbJEocnis+klYqPnERcuiH6hyYCG9ww20k9QLxhcrbaKwvAqCOSBVv1WctPuZB+ 3FyEgTLQqOUieEGTnv8Z35RjMsr9HMbe2p1Q+T6NSj0POq6D7cdavATXylftY+lLwZeY YtsFboLlfnHAIDN5b2BRz6ClDPzQl9ddnwzRH7fPGC+CgaTmcnidIATJqmmxe7TQMGxH JaKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747015721; x=1747620521; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=p4HRz+TrMLQQL1LIkkzghD7Q4lIWajdvbfBuObG1XMs=; b=bGeEgA5GspYua8ILxuF8qippvg2NGavOHtsirUvHZDPNvYWcga5osGYT4SoioHrwBL 2tOF1y4vAZbJRHtuM6t4Yc4vjjGDce7RKXoWXdfKoxmbTrA0D8MhK+7LYdC29ZHx66bB ns/rMGFGDL3hUJ3CSSDOrdwhyOkG9e7pAuubOrql8ZZv3IKcEnQ5nWp+9jTpevIVtgMK udPFePhNx9tB5YFurm3YvaIGe7ZISoxcgT/sza0y+ogDRh713cds8bxEFlNsYNeQcEHL KZHYtDIl6y8wkGj3/8Xw5An4i0xQkR8iXpr65WdL3Qqz717SAfPJaosvdsdA4RThKVIV PB9w== X-Gm-Message-State: AOJu0YyWxevov9v/TIWuFsFrF7rCm5IkOw29Ifi0TyEo2CdG3OnOZivT HO3LPXbz56KdxszlwLgov2+zLfzqqgusJw7skfCUz0DLevdXRc7/MHhe4g67ww== X-Gm-Gg: ASbGncv/zun0LE9R2zi4N+frddKGkZ+8T7lN9atfMe+IkNMhkz786J2lo2NGz9ZQr4i fajspr0093ItD7VvVqIIZAP0eqEHd/a+gYgiuz2saHD8ILxm2Iu34b/YIOXzHktNcbB+MrgWLXK +3VLkqZfCM2txUfd3mIVp2gUrDyowJU7vWQ5zxnkpl8/YMioP0rdFj5PDUPJAgueikkW7pHKoCx Xeqxs7RvXigkHHPvMxVs3m5Lbg8J9jeErMjO8DGzSwFAwFHSTLOSHZggwNIVSYkiXle5UypXVxU M3ht0YmU2Epn0P6VU0h5E3lWGBxjMJ8Uu+FvWb72dur78dMzfmbfW3+Bx3ECLuE6I+lQzm+Emzl YUraUbif4l+g70IroTYTGGTaxalHvfTymL+6i+kYN9qKpFA== X-Google-Smtp-Source: AGHT+IFWVR0quTksPrOw7knyWLDNacuiXxhChYEAjE65GojN7dANR6gw/b2cwS4x00Qngk/ic+MUnw== X-Received: by 2002:a17:902:f644:b0:21f:3f5c:d24c with SMTP id d9443c01a7336-22fef8077cemr3477535ad.0.1747015720953; Sun, 11 May 2025 19:08:40 -0700 (PDT) Received: from [2a00:79e0:2eb0:8:a2b5:115b:7bf9:f6fc] ([2a00:79e0:2eb0:8:a2b5:115b:7bf9:f6fc]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b2350ddb4cbsm4621295a12.52.2025.05.11.19.08.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 May 2025 19:08:40 -0700 (PDT) Date: Sun, 11 May 2025 19:08:39 -0700 (PDT) From: David Rientjes To: Davidlohr Bueso , Fan Ni , Gregory Price , Jonathan Cameron , Joshua Hahn , Raghavendra K T , "Rao, Bharata Bhasker" , SeongJae Park , Xuezheng Chu , Yiannis Nikolakopoulos , Zi Yan cc: linux-mm@kvack.org Subject: [Linux Memory Hotness and Promotion] Notes from May 8, 2025 Message-ID: <263d7140-c343-e82e-b836-ec85c52b54eb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Queue-Id: B4CF640002 X-Rspamd-Server: rspam09 X-Stat-Signature: 3gytzmh5cjc4rgfk8zxjawbqe1jfed6y X-HE-Tag: 1747015722-98428 X-HE-Meta: U2FsdGVkX1+mKJkGMNQGD2DBZAtGzR4FmrtpAUkuEoDTxCxYMzLjFGVTFsLzOyQfuav5LzmedWKJD2U7E/P6mta0+nt11Cn5eCsbC+8O8enh4GGHYhkJRMms2cL7x/3E6Iv5gvrL4rxfqNY7okT2j47EFaO/v8fjzIUzI4+0J9IoN9r3dbsxL8lyvJhituQlSg3aEJBc+JbyEgEdDjXAqRY5gclaIleeHKFQtD8MmknVfHRTaLnH4OOhopJ+TJvisZfLsrucOUESv4fVDaZJJ0EH/9UBrF2IsA29vFHeB37DI19AwJ0yBpGEY/4wfBpSMf1tQeedtC0SMBn11SIVYtwKWGlrbI46Yv1fh0mQRY2N9SqJygroUwr9CowGJsWA9IiF1irhXpO/T/PWc4qxcK1EuB6FZMBaJmXvqcg95zP9xrFMKSPT6u00N+6QVTHZH4BH60mAZKgD4M1wQ6JYPIuZjBZt4U53GIIfwnx2P/B3yVGxvPQKKERspJPtsk4Ay26EoANLZsJ85Fs6dbim0u/WS6igbCxm6CS8cMCW1xOMtw/wZ6JZISUUm/TuADhT/XOrSgzDTRwR2R9elNshI3LxW1VX0YOaXoJzgQ7W4Ich0HQFHFqwnjmrcRCcYHj10rhvd2eNjNi4UyO+91VXKkIN+LL9+RNHw5AyMsamIv2cslN58/Y7oJQrJPbpr9l9Mla8MUMefXgldFL9QgkJ8HU3pbV/FIemWUYR6EtDG0o9OjJ6RAXhJpFDyqu4h+uCDBwi+73tTFsc22Jf7hBP3c+je/lLMFy+LEq2MCgYKetPB5hJIuCo5kY9axPq+bipx691mlwdiY2b+JAk6/1bf1g01aTGEc7vjM9J6dFkjwFe3tGv2fLueQNjb5qOphYb0JtzvewoqBy4zdXkiXNI8441OgUonpVh1DNcLNIfcYExVS/7aRqC/no028tZAFr8RUoZSTgLYABAU04EPJ0 njBz7ZGP slkH83fj+YicdTqna9pW2hQERMCGVbO4OIV87E70V5J+fKSE1qGuwFQxxXlRJUzyilpc9OWOIm/xIShtiVmjOtQxBDMq9OflzlteYcQFXEfea4yvVUVqnOdeaZcaWsKjndV4oeVai2oZh77cWLX5953lH4R3335zhJGB5o5Y0Gfyt328pQIv53MPW45nBqUH2zu5xbAyRzU8QOUjWEO96s6VYwxK1QDTZr6amt3vlO9j/5EDr2KzMSM8oHtEJ+B8dxJusNZOKobg7FOGlepSgjLOu7aCZ67F4oIwo4xPO8WH6hWMyhy0MZ/OD92bCJdheB9nRhzjOH0HAsSnfpAX4RhXWf+M1HzAxe+iA6SlMQ5zG70D8Oj3wDP+u8itQkX+v866JzRF0749RXLxnYlvzF+qLHDvFyZM0VlnQFc7d9TraRZz1KtBYGR1xW30kf8HOS/2PEuwC+/0yktof/YVLYMqFdYLO9QExSM6NyInoBs3F3LahCfRq94SeJQuoyoXsRTRanqv1WxD5um7llPI/CxQB8EQf5IupkvidUWufJP9QWUJKXUo1kxRq9Yw++GkHXB3x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi everybody, Here are the notes from the inaugural Linux Memory Hotness and Promotion call that happened on Thursday, May 8. Thanks to everybody who was involved! These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- Bharata referred to the "Kernel daemon for detecting and promoting hot pages" patch series[1] and the previous industry-wide MM alignment meeting on it. Given feedback from that session, Bharata is working on separating out migration from the existing NUMA Balancing code. An early prototype is available to do this in an async manner and with batching. After the folio is isolated, it is added to a list that is part of the task structure. This would need us to track the target node id. Today this uses the last_cpupid field to track the target node id. The migration is done in task work context with task_numa_work(). This fetches the target node id and then we migrate in batch similar to Gregory Price's unmapped folio promotion patch series[2]. One obivous problem is that the last_cpupid field is carried from the old folio to the new folio as part of migration. It may not be important to carry over since this is a new start for the folio on the new node. Wei Xu asked if we really wanted to use last_cpupid for this since NUMA Balancing will not be the only way to do this migration in the future. Additional, for isolation, Google noted that too many pages can become isolated with an approach like this. Gregory Price noted that the reason why task->migrate_list was originally implemented the way it was is to limit the number of folios at any given time. If every task isolates every folio, this causes other issues. We likely will need a limit on this in his patch series[2]. Wei expressed a concern about the number of bits that can be used from struct page. Davidlohr noted that we don't need the information anymore one the folio had been queued up for promotion. last_cpupid is only needed for NUMA Balancing, the asynchronous migration context can store it anywhere: Gregory threw out the possibility of a per-cpu migrate list instead of task->migrate_list which would naturally capture the cpu accessing information. ----->o----- I asked about whether these asynchronous promotion kthreads would still be singled threaded or if they would be per NUMA node. Bharata clarified that with his current design that it was one one thread per node. If the length of the promotion list is limited, this is the same as a migration failure from the kthread, the folio gets left behind. Gregory noted that with his implementation of task->migrate_list that this is done in task_work. Davidlohr said this sounds very expensive. Gregory agreed and said that this opens us up to long running isolations depending on how long it takes to migrate with the kthread. I asked if this has to a single kthread for per NUMA node or whether this should just be a kworker. Raghavendra said that it is better off as a kthread so that throttling is centrally managed. Davidlohr said the per- node approach should work based on precedent like kswapd, kcompactd, etc. ----->o----- Zi Yan asked how the amount of work to handle the migration would be charged if this is done by the kernel in a kthread. Previously, this would be done in process context for NUMA Balancing. Wei asked how this was different from kswapd, the kernel is doing this work transparently. I said it was on behalf of the system as a whole, just like reclaim; the kernel has to be the source of truth for memory placement and there are examples like khugepaged that optimize for individiual process performance. For Zi's work, he wants the ability to charge back the cost of the migration to the process itself. If userspace decides to call move_pages(), then the cost would be charged to the process instead of kernel doing it through kpromoted. Bharata noted one cheap way to do this would be to track and charge how many folios that a particular process is queuing for promotion. ----->o----- Next meeting will be on Thursday, May 22 at 8:30am PDT (UTC-7), everybody is welcome: https://meet.google.com/jak-ytdx-hnm Topics for the next meeting: - go through the cover letter and shared drive; if you are not included, send me your email address. Email addresses must be registered as Google accounts, like using your corporate email to sign up for a gmail account or providing a personal email account that will not be shared publicly - update from Bharata on separating out migration from NUMA balancing patch series + Bharata is looking to have the next patch series posted before the next meeting - discussion on limiting the number of folios that can be isolated at any given time to not interfere with other parts of the system - following up with Raghavendra on fixing issues identified by Davidlohr in earlier series[3] - enlightening migrate_pages() for hardware assists and how this work will be charged to userspace Please let me know if you'd like to propose additional topics for discussion, thank you! [1] https://lore.kernel.org/lkml/20250306054532.221138-1-bharata@amd.com/ [2] https://lkml.iu.edu/hypermail/linux/kernel/2504.1/08111.html [3] https://lore.kernel.org/linux-mm/ff53d70a-7d59-4f0d-aad0-03628f9d8b67@amd.com/