From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00D4DC021AA for ; Wed, 19 Feb 2025 01:10:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C5A12801C4; Tue, 18 Feb 2025 20:10:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 574912801BB; Tue, 18 Feb 2025 20:10:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 415082801C4; Tue, 18 Feb 2025 20:10:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 252FF2801BB for ; Tue, 18 Feb 2025 20:10:18 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 99336160956 for ; Wed, 19 Feb 2025 01:10:17 +0000 (UTC) X-FDA: 83134913274.02.D488D4B Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) by imf18.hostedemail.com (Postfix) with ESMTP id A3CCD1C0008 for ; Wed, 19 Feb 2025 01:10:14 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=sQE3k88i; dmarc=none; spf=pass (imf18.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.172 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739927414; a=rsa-sha256; cv=none; b=37PaoZJ1hskneq7KMNhK8q8REuBaD+dDUtz9PdzqnAVCtV14empZrpVPUGj5a4ucYWxQff Gm09GwXd1YfAnchZL1kSLmum32opO+AD6K2DXMJfTJZ6xw8Nz28/BxJwp1jjZ6IH8f9y97 PG9YxYgG2vd9VM/1XAAzwvU+D2qNUFA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=sQE3k88i; dmarc=none; spf=pass (imf18.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.172 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739927414; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mUpwLBw+Z1T0MkCckOh5bFyJys42TZYUDiEdEk41zSc=; b=cgezDFybFySeRfAWxFrgCuPGwriaRVN/IlFWkoW+vi9Zw8JNqG/+nRBUdZ6xhtwS1tokGL 8NXyHIhqBbTnMys9U3O8kM1GS3TY9Ur3mkjHWOFP7EIJSe9Vhu+4yuFpYBQVMDo3JtgWBK VeA7cvJtI1q5bo9qRVFcfc9VWGX5tIM= Received: by mail-qk1-f172.google.com with SMTP id af79cd13be357-7c08fc20194so570915485a.2 for ; Tue, 18 Feb 2025 17:10:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1739927413; x=1740532213; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=mUpwLBw+Z1T0MkCckOh5bFyJys42TZYUDiEdEk41zSc=; b=sQE3k88i6FMKdxZ9zfM2A1dDlPxxVVYvDoD6fgG45eHJ6oscB4H7mBBQqV79YWH4yW Dwy9xeG4+b+Xd6hTzJPjpZZ9DhPR9HijIVIGBwk0V/8b9BRNzGpk9oueRp6mruTkrxT4 /yI/aoFjAg7BFJeKORjlT6GM1w1ElTjARm/40UnC/mbHkgyN5wHNh+4eak4IowILo3EF dzF2lUbCZ3S5sm/x3RDQIPpuiMiTpItUiT2ce0vweKp1986ZoQnyQGRCgPJiQz4z+Ntg 3HAZaMoBR5O1FERMk+xaeWbbsskfByNsvE29hht6T+pxZa+kuxZWQJPItR598PjBNmdh uTqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739927413; x=1740532213; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mUpwLBw+Z1T0MkCckOh5bFyJys42TZYUDiEdEk41zSc=; b=dJZds7dV6m4pn9b6P4cIx51YQV48pQnAsLUHBqtrXvthUU8zfkAo28QAR/dtQ+jvcA WjzMgMwhPnRLbT/5t3Z5WPBh9pf4mGFIh2KaucqRBKcRvh7awrUCIugEu89FIk15k3v8 LhLLMrzX+44hdoaRQC0iqpOsO26gdCEtQUstG0EMTrf4ca4Rk5rWWyxNIHZAE1Z9/HH4 jTeYr5TXSuEgVZBZuuyzCQrRHOjBS5PWHo40++jmWLKnrRAU1qqcLjHf4kSF9/OycQaO dTdEeY4GhAypFdcCbt2H+ypHWXHMretXJdZtDz0Kierh+INsYli9EUMQo73SihqmStlF K4ug== X-Forwarded-Encrypted: i=1; AJvYcCWJIWM7lhzdMwBdH/aAmAq6QcJ9o6+CONOMgyO2BsL0gC+ZDdrMNwUjqUMz+nYat2tLwnJ9SfbPMg==@kvack.org X-Gm-Message-State: AOJu0YzCPfUzlwPYBLBM2S5+mh69oa+837njdvxigBp03PDYflrDpRcS Hl2t2T5WY5EeIWP0dsQ5l+XpuZ8LZW1p2qUcvTV9spLvuE6Xi/kGHsXSMze/mTQwoZ+/5jwMomM T X-Gm-Gg: ASbGncvuz/bN37YaWwS7V8n2UmCMEEvZWmbvnKPUpqBNL3kZIAPTtLLGCPp08OeuhzE omLkFFSbn5bqOWsfQ+kXAnEQjf1crnyZRsd0Or0/k0HQLTpw7McWKBqBCow7kq13GU0G+Gauvob 81mnYBb5JrBMcQyCPlmbEAL/VJHu03Jnh8YiTwhYtib9zL8EAwvLHiC51Pfou7QSZSVXDlG69ig GWEB3xv/XT8KCovrO55VRjjEjm0HEo6MoEGRY4i7qxFoVu17Vf1LBj+59dV040yqFITc0G96/V6 c7WcbikVRU1ezgYARfzsl4zl35XOWFuklrtpT1jdk9FGsWTZnwX8GUbXoOIKQaZ0jxkKdm5daA= = X-Google-Smtp-Source: AGHT+IH/AUdFwBNx24uUtaBVfN+n1JoWvhbYt0qh38JBEugYo+ybiMsaT1gcAkaQqXvSHqzz/jLiKA== X-Received: by 2002:a05:620a:4626:b0:7c0:a28e:4970 with SMTP id af79cd13be357-7c0a28e4ae2mr1109600285a.29.1739927413662; Tue, 18 Feb 2025 17:10:13 -0800 (PST) Received: from gourry-fedora-PF4VCD3F (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6e65dce9f9csm69069206d6.104.2025.02.18.17.10.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Feb 2025 17:10:12 -0800 (PST) Date: Tue, 18 Feb 2025 20:10:10 -0500 From: Gregory Price To: David Hildenbrand Cc: Yang Shi , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: CXL Boot to Bash - Section 3: Memory (block) Hotplug Message-ID: References: <1b4c6442-a2b0-4290-8b89-c7b82a66d358@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A3CCD1C0008 X-Stat-Signature: ys4uqgbhtgqinu6ftbr1taeii9c43sjf X-Rspam-User: X-HE-Tag: 1739927414-477012 X-HE-Meta: U2FsdGVkX1/rrb3s8SrULZ82q7lr0Rsovt8g8OgwRC36qBTHCETnDS+VXiu7u4TXVGALkHYvDemPTmsrlNz/SoQe8pBYUV4NVFMrMIuhIzWFWIvtrNV+TKKA9YNffDYxYUUpLmWDZRLrcjy5hualaL8foWPSGoQDoHsaKiWM2ZLLLWQA69k0sGm6hFVxBms7uTg/Kadr7nKq5MFkCcpubDfTQGFcyoVjMIlhi4quZ9VvdPaUPdttfqFf0CgFKSqn/Q182PfdThJMmLL4zIBnIVgtSKSX8TaXj6nDXk18kbh6B+nDtfdcsWzpDuy67H6HzOzkTX3b8kLXim47+koGB8sdeK4MR3nrho6j80Jq3Gy3GoOk3FBaEfvoLnNn7HbzVaMQ3tLvrmF28p3grzFNiJ1qlh4wipcs4EXCks2xZ1qTz9U0ktVZef5lrX1998xIIrkZe98ey950r+wgcPE/Q/vdEDexQo+REcpSiIt5yTPEAGOfl6LMDKPbARpOEdDgFkEj/0o5mUU7bYEUclstusovRIMtccke+UMprPQ6/esbaIuqDN78nT2tyNx+BwmHGLWm5JQjChM+h7i9KN9YSL3Plc1+SqbhaU1ryGjXrhtuvpi9vFFl7In0lEl++SfeNWZPudOusqI1udNWa/pWf3fMGJDFEvNaAIJP+llCZRWSzRH4n34ziquD5CrJtGztnVNbi5FuTuZAyd/M8KxOwCBKz/ylT0dgCqHl0Lai88U+YWQe/PHFvTU/dnsSSKiemnSbtk2KaM9l+zhN2OE3YaYL8rP/YSGfDw/qejM1ezv0HwFX8E09CxgSWS6jeg/2Y4Wum9orrtg72FKyFainATiV6mKe3EuuzaWbVfDf1gf4n87+JIJ6LN1gFgGOEO8tseJdV3hReDRNgDkcGTjir7GVQbY4/YP6aNsE2TiijCNXvv3blEgd1QpRBTwws4e5hgW28tFHziMs9GqNKDW Z0LUWknD nfzI38ax8mOChG/aIloKQT/UTGRklEpzFjUGovvJWZrLA7OshES0eOHewBf6bR+gu1deekOfx9yiRmbgpcoSJsKID6joMoJFV1abAN+vBrleDrc6GiU3ctkjjr0+C/3Ocim/06O5ZhZwbvbvgbC3QEh9ivHM+zVjWZTWVQSyUe2kycuZRa+39HIcnra6sp7y0naLQuoR44aU5Lxy3jQqIepcj8eR7okiWiUcPZtYPxY13JAoHZBqn0pNE5Z3sNZOj64ewL/ieGE+lUu2bpmfCud/XgbliEz+L7dGwqq7qVwCisABPMLtvkv44uYGePFDAqG9ihpSVNnfmm0hMb8UHnkABYj3tOW8gNdFmIbzYTkD8lPywtUzzZ9cA15TMZE6HvseVLcObVR0kfvBKp/U2C46At/lgJ9zgpv3H X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 18, 2025 at 09:57:06PM +0100, David Hildenbrand wrote: > > > > 2) if memmap_on_memory is on, and hotplug capacity (node1) is > > zone_movable - then each memory block (256MB) should appear > > as 252MB (-4MB of 64-byte page structs). For 256GB (my system) > > I should see a total of 252GB of onlined memory (-4GB of page struct) > > In memory_block_online(), we have: > > /* > * Account once onlining succeeded. If the zone was unpopulated, it is > * now already properly populated. > */ > if (nr_vmemmap_pages) > adjust_present_page_count(pfn_to_page(start_pfn), mem->group, > nr_vmemmap_pages); > I've validated the behavior on my system, I just mis-read my results. memmap_on_memory works as suggested. What's mildly confusing is for pages used for altmap to be accounted for as if it's an allocation in vmstat - but for that capacity to be chopped out of the memory-block (it "makes sense" it's just subtly misleading). I thought the system was saying i'd allocated memory (from the 'free' capacity) instead of just reducing capacity. Thank you for clearing this up. > > > > stupid question - it sorta seems like you'd want this as the default > > setting for driver-managed hotplug memory blocks, but I suppose for > > very small blocks there's problems (as described in the docs). > > The issue is that it is per-memblock. So you'll never have 1 GiB ranges > of consecutive usable memory (e.g., 1 GiB hugetlb page). > That makes sense, i had not considered this. Although it only applies for small blocks - which is basically an indictment of this suggestion: https://lore.kernel.org/linux-mm/20250127153405.3379117-1-gourry@gourry.net/ So I'll have to consider this and whether this should be a default. It's probably this is enough to nak this entirely. ... that said .... Interestingly, when I tried allocating 1GiB hugetlb pages on a dax device in ZONE_MOVABLE (without memmap_on_memory) - the allocation fails silently regardless of block size (tried both 2GB and 256MB). I can't find a reason why this would be the case in the existing documentation. (note: hugepage migration is enabled in build config, so it's not that) If I enable one block (256MB) into ZONE_NORMAL, and the remainder in movable (with memmap_on_memory=n) the allocation still fails, and: nr_slab_unreclaimable 43 in node1/vmstat - where previously there was nothing. Onlining the dax devices into ZONE_NORMAL successfully allowed 1GiB huge pages to allocate. This used the /sys/bus/node/devices/node1/hugepages/* interfaces to test Using the /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages with interleave mempolicy - all hugepages end up on ZONE_NORMAL. (v6.13 base kernel) This behavior is *curious* to say the least. Not sure if bug, or some nuance missing from the documentation - but certainly glad I caught it. > I thought we had that? See MHP_MEMMAP_ON_MEMORY set by dax/kmem. > > IIRC, the global toggle must be enabled for the driver option to be considered. Oh, well, that's an extra layer I missed. So there's: build: CONFIG_MHP_MEMMAP_ON_MEMORY=y CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y global: /sys/module/memory_hotplug/parameters/memmap_on_memory device: /sys/bus/dax/devices/dax0.0/memmap_on_memory And looking at it - this does seem to be the default for dax. So I can drop the existing `nuance movable/memmap` section and just replace it with the hugetlb subtleties x_x. I appreciate the clarifications here, sorry for the incorrect info and the increasing confusing. ~Gregory