From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B365EC71157 for ; Wed, 18 Jun 2025 03:42:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 222D46B0088; Tue, 17 Jun 2025 23:42:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FA6D6B0089; Tue, 17 Jun 2025 23:42:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 110C06B008A; Tue, 17 Jun 2025 23:42:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F18B76B0088 for ; Tue, 17 Jun 2025 23:42:24 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 96ED8121911 for ; Wed, 18 Jun 2025 03:42:24 +0000 (UTC) X-FDA: 83567123808.23.5767483 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf05.hostedemail.com (Postfix) with ESMTP id DB33410000B for ; Wed, 18 Jun 2025 03:42:22 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="SS2/JRH/"; spf=pass (imf05.hostedemail.com: domain of rientjes@google.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750218142; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=PdFvriPn849jRZRiC2wtZCm5zdQagdidT+VWT9LUz0E=; b=X6DjSXwLNQZXpUHNcR+crnLdrSqOC8Sjx+BMIVoeMDeDjtv1YiAgGLnW+q9lqyaX9lzuPw TOAH0FMnBkUJLFY0wOymyOF+Ph8XH2GRavrNoFEAQnnW9BRe3mritzxliO8/OPriOuM6/h cnte0+iKznAp83QvPtd28b/rPYGfIGQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="SS2/JRH/"; spf=pass (imf05.hostedemail.com: domain of rientjes@google.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750218142; a=rsa-sha256; cv=none; b=PQh0ZIkJwKgngWn8/e4A4W0KaIc6yW19E+p3oVsi+VrqZigzVbQo9DLwvKS2zEo5yItHJo XqITCk6u46XM3lU5iR8/xVZxivKzaow1XI2O76z3YcaNdehrO2AmBSDe/ArfI9J+7V/dUY yOD7qV1qp7/wa1ibjglYtO6P43XBGbg= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-235e389599fso105105ad.0 for ; Tue, 17 Jun 2025 20:42:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750218142; x=1750822942; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=PdFvriPn849jRZRiC2wtZCm5zdQagdidT+VWT9LUz0E=; b=SS2/JRH/LqPlTL8z125ClBae9oyHHfdlIEqXz6cOQMjGEwjpSTwnkT4cYTL7i7iVlv SKwIMIXlIsjIO4rz4Y+Bk4cd9BR5TfNy/kM9Vqd8X1s7R+t0+AwK50ZoWegUFFjOAI2G zunB10gGm8QzcRH9ru6bw/h50jl39yDarG54QRYYUAELhvqxcoQALzoQE9Goq/OyEAjo 8MMvYGNhyo8K+ZL7UtQSRH/HGkVmFPmYuATl0WivY2uvrWdcSN8uO/mkJ/cu2odFY1cD HVw/CxyQzRQx5EOr3Eq7tsj3S5CQD0XHWYy1wyXOcyigDZ3JggK1UfWSytSisjcpV3Jv qwLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750218142; x=1750822942; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PdFvriPn849jRZRiC2wtZCm5zdQagdidT+VWT9LUz0E=; b=gVADpHoR7+tezeaaX4F33Hw0aHxEBOLD1KE8O1BF4w3XWFY4ZeFX9krQwKM4gXXeDj PkbKY4OMkahGHiQh1x8Q2J0wLB2jVO9fKdMBHPbAzPlOCPsqnkIY3upR9MmptYq7Iv/s SYqWkKpR5MdE4jVE0TQrrOcSpxMTCB/eMSKQ5nO1ALsgBzHXwblyznrtBxCgvn0DPtB7 KkDDawzL99nr+i5LsN2PY8lIOLnFRBFRnaaf5iUgOhnFdhr9MIKmwqlbN0nGMps4Q+HY +KiTN+22eDf8vLUObFslaS0eR2/TmNCUkhdt8AMa5ngYuboa+qaFa6QErw/ftO/aKYLO CTKA== X-Gm-Message-State: AOJu0YyVmTNuYRFqI1ILmfcutpAu2VkQ1zIXyKsjVUDwM6ZDvzHJFUxM lee6602x/Trsr3vkZbskPBPLqKWbeHAbiVd5ihVAEiHi+TBaeI1sdOAiMU8FBGNbUQ== X-Gm-Gg: ASbGnctb1PO19vyA/s9OCy+tgYyTeirnvXhyIcs12K1gUIac62WrLKMmG7qEfHoFG2m jskXDU2at7e1o7TklCAsDzcF+RgIxOohuAGCsumHxmYnZeUOXRc+26Isfjs8EjXFYH6pUeLYe40 bmdJB61QyNCS9oH9LY7/2H2pg6iT4V2mFOaEjcSg6MGIQpv06vD8OjvpNQdbqsXK7VTfm48h0WE EBEB2j/WJa5e5XyRlHuvKu+hm75y77RH44sIOgyn5G+z3Xsyzvknvylas1U7eWA1FZNyhVcHsEI RzhRFMlg9P7b8Q+ELxisdajCvqbyJYlQr1dQTLvlt7DdJ5If0G/p0NYBkfasczf9uw6tuJue6gB l+Ha+4ppmZlUtKB2VxSPy1G7NYDZWFvolc2/NLt3rDVoDrD1VlguO3vOI7Rra6yy3brM= X-Google-Smtp-Source: AGHT+IEboOFPP+VGYz43/9HIxuRk+20pJLvWhSRAai9H/7nD0Pe6tryq1Ba9vnRlWn+gbz07md3Pag== X-Received: by 2002:a17:903:b8f:b0:231:de7c:34a3 with SMTP id d9443c01a7336-2366c5a267amr10980775ad.2.1750218141400; Tue, 17 Jun 2025 20:42:21 -0700 (PDT) Received: from [2a00:79e0:2eb0:8:1e8b:e3ea:b00f:f58e] ([2a00:79e0:2eb0:8:1e8b:e3ea:b00f:f58e]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2365d88bff1sm88971655ad.23.2025.06.17.20.42.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Jun 2025 20:42:20 -0700 (PDT) Date: Tue, 17 Jun 2025 20:42:19 -0700 (PDT) From: David Rientjes To: Davidlohr Bueso , Fan Ni , Gregory Price , Jonathan Cameron , Joshua Hahn , Raghavendra K T , "Rao, Bharata Bhasker" , SeongJae Park , Wei Xu , Xuezheng Chu , Yiannis Nikolakopoulos , Zi Yan cc: linux-mm@kvack.org Subject: [Linux Memory Hotness and Promotion] Notes from June 5, 2025 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: DB33410000B X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 9fomgasemsmhiqe34bfb7nwwd96rp1pg X-HE-Tag: 1750218142-441227 X-HE-Meta: U2FsdGVkX1/dGT9HvK/WUWR1ft1v7rj53gHq1K8GyKaEuL/Xxnf9/1/RfO/vLHyiMy0r+oEu2OIBnxZD5B/MCXIY9Ym4ApbLo9higqQm6cN4I+jA3WkRu3JDXNiMQ0HAVH1DHxgXcyJyYR+zq/94Lw7/aPwhjXpkwPQjL0hzgZQfv8z2NgZwbIvEZyAoEegeQQ2reeX7dAppi1BZmHNEQ7gyw03fZxrSaRM7nsRyLPSboXd11Irjb4DX+jjTMBZPqWmf8IJCBp7uFaX8cJ0uobxMoydsUyhJ4SlZFGwdCxI/6otNuKQbzvYmuI1bZ336+XbN38vl81jHoU0a0S7QMAcFIdBRz/jLK/AGYe79shjKmgOFFDrnhdSbGpjhYKZaHGDjT5+bduRkOS6u3VPxEFKkp/5kTXC7yGwtzSakWvKDxkREujOF1pPvpbIE6Ra/HE7Xf6hQrapU8yqKLM9OrjKNkdTd1AxLG/pbWXQkcTbeVmzSFZ0pqynkdqqjuOHSd7cLv94TuUNbDLZgEBQOAeolrk7Apl/JsDBCrY55m/oUQR9pfiGYtRPpdCckyz6TmpvqncgUzsYDGzOcE8KiD9yX852rw+OVyOcMQHNSr1uIX/zaNAFcpf8LbFDIKiJJF/ri4SkPqf84/24o12HJydR8b9EvVXerNmM49FdZMShnGmB//6+4NlE4BEI6exyrPZGyAUSsZoLVNNz1kk1Oy3uqjhuHuqwX46U/ID+6MYTkzcc5+OFbhG1Ys5p+pKl7/ctYMU4d/sJESGpts1q8njJzV3sJ8x7osi50GgyAyuukSzjvz4NOnA/U5AW3oKYwvl85l+sB3GwNJ827l20KTKmq5ltHPrev1Czmo2kMpDAb7K1DmrGoE/RH2FXCBWChO8444JTMUGj6i33ti0a0IKJ50P2CG7o0bw6gUGMXc0txE99q30U6Hs/1vuAzrwHcmXBZop4sgnFY7rzCZkq 2yzVqYSc eyt6qLVW9/oB8+zJYx4L3DFgf5HFjUAk/W3LBNYOyjGQU6S5rysA9VSETLCwMW/Q5AOTXy+wmIJOEBYfgM0+zQ/mMpde2TjgA8cR4j8gJaE66X3sIYBXcxxL6UHo294uk0yK3w1DyScSQGldSDyE5PnQOpHdYAET94Jv53BzETjHdDrvbSjqoI6bYXj4QKHuOtj9SnYAdvB1/ChQBlDSvt+GUFhBUBS/Ojb5duTcptRG6jB+xHEG3dUmT97RX8k7pkVwDv1iwWCeiHuGqr5DVSUwtDKFiuD0W+VupccKkhn+PZzfv88IbCmE/gd4ZxzT+55mnTju/Uh7B8bedJVnCOvFOtqFzgjdUazFLX4wnEUITfdCf1MoLprwPIJK4po3+S2c/GHFT35xb9iM9RloWNksKZUyEKWHQmy0CVL65XfRwvMo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi everybody, Here are the notes from the last Linux Memory Hotness and Promotion call that happened on Thursday, June 5. Thanks to everybody who was involved! These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- I recapped the previous instance and discussion around asynchronous page promotion driven through a kthread. I also discussed the previous chat about trade-offs around isolated folio lists vs pfn based tracking of memory to migrate. Bharata said that tracking pfns were much easier since they are stateless. We don't need about doing refcounts or isolating too many pages. Based on the latest iteration of the patch series, he converted this to use pfn based tracking because we don't want to keep memory isolated for too long. Previous attempts isolated these folios at the time of page fault and then batch migrating them later. Now, during page fault, we grab the pfn of the folio that has been misplaced and push that into a migrator subsystem -- a very simple subsystem that stores pfns pushed to it (NUMA Balancing as currently the first/only source) in a per-node list for the target node. This is coupled with a per-node kthread that routinely scans the list and migrates the folio to the target node. Work was underway to address some contention issue if there are multiple sources of these pfns to migrate (moving away from a mutex). ----->o----- Wei Xu asked about the pfn based tracking and how this would handle multiple sources of memory hotness with additional information without a lot of overhead. Bharata noted for locality based migration there was no need for additional information. For hot page tracking, like with page table scanning or with CHMU, then we'd need more information later. Wei suggested generalizing this now so that it is easily extensible later while still acknowledging that the metadata associated with previous kpromoted patch series was very large. He suggested a virtual map that could be sparsely populated indexed by pfn. Bharata acked that the information stored here should be concise and precised. This would be a future extension, however. Raghavendra brought up the PTE based method that he is pursuing that stored per-mm based list of folios and that this could be converted to using pfn based tracking as well. Starting with the mm, he is storing a simple list of folios to be migrated which can also use batch migration. Wei asked about the per-mm tracking. He suggested standardizing on the tracking for folios that need to be migrated, whether this is per node or per mm. Raghavendra said that scanning is done in a separate thread than the migration thread. He said that the per mm information being stored includes a timestamp for the whole mm so the next scan can determine if it was still hot. Bharata suggested one use case for storing a per-mm list of folios is to ensure that everything is tracked on a per process list and if the task exits then you can simply purge this very easily. Raghavendra noted that this also prevents system-wide lock contention. ----->o----- I asked, as I normally do (:D), about timelines for the next verison of patch series to be proposed. Bharata said this would be sent out the next week to the mailing list. He wanted to optimize the locking first. Raghavendra said this would be on track for his series as well. ----->o----- I pivoted the discussion to the testing story for this approach, including on systems that are not memory tiered. I asked about workloads of interest such as redis, mysql, specint, etc, as well as metrics of interest to collect. Yiannis Nikolakopoulos asked about the baseline demotion strategies that we could rely on. I noted that Google is primarily looking at proactive demotion using working set extensions on top of MGLRU. It would be interesting to discuss with the group about extensions to memory.reclaim to support proactive demotion triggered by userspace. Yiannis noted it would be useful to establish the baseline for the demotion strategy because that would be key to testing infrastructure. I asked if there were specific workloads that were interesting for our evaluation of these approaches. Bharata noted that his primary evaluation was being done with microbenchmarks and redis. ----->o----- Next meeting will be on Thursday, July 17 at 8:30am PDT (UTC-7), everybody is welcome: https://meet.google.com/jak-ytdx-hnm Topics for the next meeting: - update on latest series that leverages pfn tracked folios with a per- node kpromoted thread + including optimizations that avoid system-wide mutex contention + reconcile how this will overlap and interact with pte based scanning - discuss proactive demotion interface as an extension to memory.reclaim + possibly leveraging working set extensions on top of MGLRU - discuss overall testing and benchmarking methodology for various approaches as we go along + minimal viable infrastructure, testing workloads, and metrics of interest to collect - enlightening migrate_pages() for hardware assists and how this work will be charged to userspace Please let me know if you'd like to propose additional topics for discussion, thank you!