From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C06C1D15DA5 for ; Mon, 21 Oct 2024 14:51:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 407EB6B0089; Mon, 21 Oct 2024 10:51:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B8946B0088; Mon, 21 Oct 2024 10:51:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 259826B0089; Mon, 21 Oct 2024 10:51:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 01AF56B0083 for ; Mon, 21 Oct 2024 10:51:30 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4DBFEAC5AA for ; Mon, 21 Oct 2024 14:51:00 +0000 (UTC) X-FDA: 82697897688.04.76BC9BB Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) by imf18.hostedemail.com (Postfix) with ESMTP id 6A9301C000A for ; Mon, 21 Oct 2024 14:51:22 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=XXsyNHx2; spf=pass (imf18.hostedemail.com: domain of gourry@gourry.net designates 209.85.210.47 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729522138; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0LCFqGRL1GXXjFNSMHo10rPFkuAROAlT9eOq0w6KPjw=; b=J/dGmhQUjKCZ142c5wsZs48EZDIYdb9wVzeN9JGkQNhTfTviJUEO8GjGZAEIr/gzVi3inJ 9DMuXaHjKSzpbIrJGSyTs96/FlfMJ3udG2VNQlq3GPaZqImssLRg3MGPcpnlh1xhGQz+qf iIdEOBA+843nWa5QnnHXRrA292OwE5E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729522138; a=rsa-sha256; cv=none; b=A2B0XxR7uKPcuhCmk0LMA1ks/BL9RsGZH6l0+ljSbPy+ksR54DQbrrJa8uitsgmSrdwL8a 9EkK7B7Ma68GsBqXF3qzAH5HBYen/JvUtDLRGNqpieq/yC62XCV7JynuS7ulcZDwBWtqE4 iVtLhrR75WKNY1jZv4/ZBZ2kYlKqW8I= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=XXsyNHx2; spf=pass (imf18.hostedemail.com: domain of gourry@gourry.net designates 209.85.210.47 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none Received: by mail-ot1-f47.google.com with SMTP id 46e09a7af769-71815313303so2187511a34.1 for ; Mon, 21 Oct 2024 07:51:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1729522288; x=1730127088; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0LCFqGRL1GXXjFNSMHo10rPFkuAROAlT9eOq0w6KPjw=; b=XXsyNHx2T9j8IWOQy9sVTgVK+1uE7CxNYur/HTn5OsM7PbJiausvXbfBN1VJlAZPgT 42fH72gvCe2hRpjyXQ+DcUoRzKupIah3Apca5eShRjcGXyyDkWjbVVxygEK/OMtYSHIp WftsesBzzEdY9exs/oYzJ84eqViq7U2Be0E7CMvD4n3025MdA17Ti4d2BOjmHxowttii TXK+/UExU+pORv4pUNhjHkYLs5YqTnDdAnur9izxI8Nz80sQ+LKUAPm0fqV/Y/Nu5qrQ u6dXliy+4aLHkPY36kfSR1VwA6oac89N/KoMzMfaTXq+GcJXbBD+5DcjOFVnOe/ByCs0 1Mhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729522288; x=1730127088; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0LCFqGRL1GXXjFNSMHo10rPFkuAROAlT9eOq0w6KPjw=; b=F8vrVmf/zQUmCOnhfiElnpZ4h+ZC0OX31ycYxFDDQsfHTAUBcu+LUt5ibIVpN8tWs2 canCa9XB4iHecy3+ONTooYbCVDUVbWvrfKyjF2vECSb0HTG3dNSrsPLvGKTRdnR1vWzw URVKu49SKSmTuxU9un5U7ahYyb7HEhLl0WkfsVorvf4Bliuko8nfFaHHeGBAu6A/zevb l+G4P6rS8oFjFAlt4gCqNnWR/SI2IblVLqyM6htMBiyphyFHF/43RY0538nMGXiP6udD d3kKvJxk8DQYONQdl8KGB2M3w9CLKW6MENL/lx81z3sIcZe2vicFiCnLPEIZzsmnvE7l /Ieg== X-Forwarded-Encrypted: i=1; AJvYcCUID2zWf1a4nEXg/vm/3fxIlbzCJhS6/N5QeHDbpBh0QpapXGKqQtX77S2QwyVYVUEyclOccCaXhw==@kvack.org X-Gm-Message-State: AOJu0YwUy+eI/o30lnWPFRawrYvhF7ULhS27tnTFZAr4tmb89jVP7frD wMt+xffsXjS8UAKOvt3YZsh4UfKHHNHT6NmZvYKPTF4PVmdpJ5rl57w6doy4w9A= X-Google-Smtp-Source: AGHT+IHcAvHvPa2U3uHHthKi6B0l3wXkXlDgXZwArI+r4vHFC+TJ7ZOjq7ukqAmwUA5y82uIugQ6AQ== X-Received: by 2002:a05:6830:2703:b0:718:cc7:c6df with SMTP id 46e09a7af769-7181a89ee87mr6856700a34.22.1729522287780; Mon, 21 Oct 2024 07:51:27 -0700 (PDT) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ce0091bc72sm17782326d6.73.2024.10.21.07.51.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Oct 2024 07:51:27 -0700 (PDT) Date: Mon, 21 Oct 2024 10:51:28 -0400 From: Gregory Price To: David Hildenbrand Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, dan.j.williams@intel.com, ira.weiny@intel.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, rafael@kernel.org, lenb@kernel.org, rppt@kernel.org, akpm@linux-foundation.org, alison.schofield@intel.com, Jonathan.Cameron@huawei.com, rrichter@amd.com, ytcoode@gmail.com, haibo1.xu@intel.com, dave.jiang@intel.com Subject: Re: [PATCH v2 0/3] mm/memblock,x86,acpi: hotplug memory alignment advisement Message-ID: References: <20241016192445.3118-1-gourry@gourry.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6A9301C000A X-Stat-Signature: ngeeebdsq54p9tz3ujr51difdbgg71c7 X-HE-Tag: 1729522282-23136 X-HE-Meta: U2FsdGVkX1+tKxvR9IeAMim95kz7sHW6zAxU5T4jNhAu58wQvTeUe5mH5AwpRQbNyZIstVFr0n90tTlEaYRtixI1olnQNkY+5s5GedMq43ZcfljQIrC+ats/UFK8guCsea17E4wusJLP2NGtPmieVcNcEA8hrCX39G4jn6LtgyFfz/lhslPF/FukurtY2G8lRjHv7NvctK1ICGus0WPmkYoP+wYDF3Xlgf12Ue+IH/VmQ/sFBFrJV/03n5V6o87BGmawR26a+2eThR7jVeE4oyEVkRKDn9Kis0/zAx/1O+iiJC0jkDHp4qRvjdLv6I/Qmnpau0BlODeMCjGmuFBKrvA4xw1NeKG/mBAl60FTOG/VxMsSi4ED2HJtOIQuTPi8n3WRa11GxiAr2aXhD3rI484ylLRbwDf8frrmgteAnbW5+8wce335VHnWHXQXLI5lnoeqzi9gv+kTH1MDPYkKXP9eVVE1uhUTH6i+Xqi4UrLKWdilX3yLNc9SJupYTOZXwDIpASedHYun+3vgqO7qYec26Mw9Z076HiARVQO7K/U4d2EjCXdGulbmrZPK4QJgT62MAjveqp/1wGvgfl0qqHG/FblnuKu6A9DUh1vDBC4te26kB6WzD+0VLDPaVgs4qMAh++XkF8BjM0qi8LrTRw5LxAASG0y9MDV/vwlqi/CYsfM++ZWkKAHmy9RWA3Ang/cov/voWfqPwnLE4EVO4lciejZWxEiRj0VUH0q0ImlAoxG3wyn8MW+nb35an2gD0qLZmr3/n+2abizgrf9VOF+u9UD3JWLuCBJipAXExP43CKc+t25uFTZFJAHtj58oNPpEeE9kjK47B/g/ZiEVP6yuub4sZzUl+TxKLnYqenJ/tbuBXR6fdgHyb7On9I2KnhtN4Ey229j0X7YY+8ya7ND9ln8OCgeN6rxpznwOI+H2JVm8qc3lnFM1JfbDPGuCqpaDpaDrDRKK1H2DG5E w4j1TRvR 8mDnniW1jdkjoLhPmOxzNaTPxJCi1tFpwG0JOn+sKm61Fd25MAjgn2wKgYhPb/YIKNCSO5W+5D7FNTCBCmCr4gVtZ915UYW3VBfoFOtkTHkY0YhDr32Ma7cAkZ0F7FdA1G1qcilUCxXAeIEHwuDhBUrGaheYpPMVRQd+FTY1olN9XSUMYh7fm9eWtdSCwS0thyk0EDzF5rWTi23qLdk/qgmFdyblx7Tseek62Jyhw9yXuvMiJ1LNnorUj+H6f3LEj1Q9DFsxbkhe4v+WSoxUsPsFRX2JMP7aGCy+6IG9ltsbar5B4S2mdLpj0USMPsoTidojf+tGDvtwL158Af3eedceCH7iw2zXczFOy0ubYB2Y7DzFeOtwcy4T9iPwo6/czKaxpVYkl+u1UoOWGbG/4x/FFCQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 21, 2024 at 11:51:38AM +0200, David Hildenbrand wrote: > > > Am 16.10.24 um 21:24 schrieb Gregory Price: > > When physical address regions are not aligned to memory block size, > > the misaligned portion is lost (stranded capacity). > > > > Block size (min/max/selected) is architecture defined. Most architectures > > tend to use the minimum block size or some simplistic heurist. On x86, > > memory block size increases up to 2GB, and is otherwise fitted to the > > alignment of non-hotplug (special purpose memory). > > > > CXL exposes its memory for management through the ACPI CEDT (CXL Early > > Detection Table) in a field called the CXL Fixed Memory Window. Per > > the CXL specification, this memory must be aligned to at least 256MB. > > > > When a CFMW aligns on a size less than the block size, this causes a > > loss of up to 2GB per CFMW on x86. It is not uncommon for CFMW to be > > allocated per-device - though this behavior is BIOS defined. > > > > This patch set provides 3 things: > > 1) implement advise/probe functions in mm/memblock.c to report/probe > > architecture agnostic hotplug memory alignment advice. > > 2) update x86 memblock size logic to consider the hotplug advice > > 3) add code in acpi/numa/srat.c to report CFMW alignment advice > > > > The advisement interfaces are design to be called during arch_init > > code prior to allocator and smp_init. start_kernel will call these > > through setup_arch() (via acpi and mm/init_64.c on x86), which occurs > > prior to mm_core_init and smp_init - so no need for atomics. > > > > There's an attempt to signal callers to advise() that probe has already > > occurred, but this is predicated on the notion that probe() actually > > occurs (which presently only happens on x86). This is to assist debugging > > future users who may mistakenly call this after allocator or smp init. > > > > Likewise, if probe() occurs more than once, we return -EBUSY to prevent > > inconsistent values from being reported - i.e. this interaction should > > happen exactly once, and all other behavior is an error / the probed > > value should be acquired via memory_block_size_bytes() instead. > > > > Suggested-by: Ira Weiny > > Suggested-by: David Hildenbrand > > Suggested-by: Dan Williams > > Signed-off-by: Gregory Price > > Just as a side note, a while ago there was a discussion about variable-sized > memory blocks -- essentially removing memory_block_size_bytes(). > If you have any links, happy to do some reading up on it. Was going to look into some more memblock behavior in the future so it's worth looking at. > > The main issue is that this would change /sys/devices/system/memory/ in ways > it could break existing user space. I believe there are other corner cases > that are a bit nasty to handle (e.g., removing parts of a larger memory > block), but likely it could be handled. > This is why I wanted to avoid a new interface in the first place and just piggyback on set_memory_block_size_order - now there are two interfaces to do the same thing and more hurdles. But I suppose the suggestive-nature of this one makes it far less offensive since it can be completely ignored. ~Gregory