From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6629ECF58F1 for ; Fri, 20 Sep 2024 09:06:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA6076B0082; Fri, 20 Sep 2024 05:06:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D567C6B0083; Fri, 20 Sep 2024 05:06:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C44CE6B0085; Fri, 20 Sep 2024 05:06:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A53466B0082 for ; Fri, 20 Sep 2024 05:06:27 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1105F1C1AFD for ; Fri, 20 Sep 2024 09:06:27 +0000 (UTC) X-FDA: 82584535614.16.1DFF7AE Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf21.hostedemail.com (Postfix) with ESMTP id 1AF041C0015 for ; Fri, 20 Sep 2024 09:06:23 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=Lt5BIMLn; dmarc=none; spf=pass (imf21.hostedemail.com: domain of gourry@gourry.net designates 209.85.218.47 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726823091; a=rsa-sha256; cv=none; b=aU7kUSqouw8M3N0feSDv0ZqKI6o92LnvnnmkzrU921C2vrq7koqDg/eE34gZpr6zl6iHy1 54eb6EtnIoBTV4OgZWH/v8vVpVUJwGVdDUIwEvpJuECHVNGtImjtVVbXmJ5XCYPumW9Ms0 WIV0aucKrVtpOloVyF5VWIaStz4ZGok= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=Lt5BIMLn; dmarc=none; spf=pass (imf21.hostedemail.com: domain of gourry@gourry.net designates 209.85.218.47 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726823091; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5ea6kApel9704WtmafZwie6CE4udCD/ZWEKovHegrZg=; b=6B1/CxkQLEBtmkKX1Etw8PA8Y4sApRiqN7+Ac6MK7nOhdCK9Fm0cdppuL1lDoS8NXyxCWa HA9AZufldnMws7Ze+3lRPJ2tG2ZoZcLbhVN569+ATZ/Z1jDctHxRNY6hV6eS3l3i2vj2Ub HDq/7bEk6e7Q3L9oqXOPqI3260DOzvo= Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a8d43657255so259130966b.0 for ; Fri, 20 Sep 2024 02:06:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1726823182; x=1727427982; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=5ea6kApel9704WtmafZwie6CE4udCD/ZWEKovHegrZg=; b=Lt5BIMLnYGZnaDZe6c+eLjYnOGBNmRk55M2pVQ1rBBq56jqdn0B8BAtRWWlXpbufcD 8TOoOaoEJPL/+ptMo5JNfv3iATOpIhzUSH+p00CNAKPYobvq0OaW+a5HZn245g6gOY+5 0m6IhZRXDDOw7Ok9AFAqR8RFFsjIgUOYMlAirlwCNOuInB6nIaSxmWrfaXAkziTcoeYW BkLjzu9qFcYxPa4t/PJLEfAt3u8jBGFsz2xjj381zr4L2WMk7JDRm4SJf2tvsv/aWedm VoYx/L+tt75ir6BmAOyqD4zNOfKKPSKB+yok/QJjS/mFlvzuQBceHXSBwLXGLmrEm8aO 5avg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726823182; x=1727427982; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5ea6kApel9704WtmafZwie6CE4udCD/ZWEKovHegrZg=; b=gO2BxvfHr4kKlDxf68/5etz/+ybIblCF1QBnF02YMLdK+XEquPoyN4x0zuWASNGFj+ Y7ERNK7TxF8MfdnYwrIa+Kh+yxarPkkHZLxPG41uZaDuv14zH7AfgOmJXHgU68Et8fwj WzYTyD2UkB3ErpvJRNexvmKv9L1FcRFnEqDnvXUlH+cvSOWt9BSMkApQt131qvAVmoc0 JEYxGI/NHQHDxv1BHAesBdmi5pcNLXIhsFlcKyhDfBOA/Cq7H0N1i3jNS5eNMGZoNND8 CPbeBdSKT2Fs3G4FpNdugBvBd07PIHnIm+QurnPq2+D80Ruk3xDsZKOB6yfdZFqIi/+Z quBQ== X-Forwarded-Encrypted: i=1; AJvYcCVr6HLdxt+e4hCJIgeEfO5lCSOK9FgpdaepQKJNDYQJpH5QGJn/ZSawtxRYQumj5Pq7ugT8GJkJIA==@kvack.org X-Gm-Message-State: AOJu0YxQkCFX9jlGi2Fq8R6rl7G2oLP2mqb3wFp1UwiCHBT0KPROUWZx yG1AAkyU+l9mdL7xAyPiewn/n/E9l3e9o+msI00BXWTsEMSuEmIn+tvqgt6d658= X-Google-Smtp-Source: AGHT+IHS5Ou0XIJ4uflfTtxR3ETCTI4FJBbvqfkbaPfxDAC5XJ69GFR67U4wnEW673qhczAiNcFAYw== X-Received: by 2002:a17:906:cad6:b0:a8d:25d3:65e4 with SMTP id a640c23a62f3a-a90d4ffcb11mr164635366b.36.1726823182340; Fri, 20 Sep 2024 02:06:22 -0700 (PDT) Received: from PC2K9PVX.TheFacebook.com ([83.68.141.146]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a90610f4971sm821742366b.90.2024.09.20.02.06.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Sep 2024 02:06:21 -0700 (PDT) Date: Fri, 20 Sep 2024 11:06:09 +0200 From: Gregory Price To: David Hildenbrand Cc: Jonathan Cameron , linux-mm@kvack.org, linux-cxl@vger.kernel.org, Davidlohr Bueso , Ira Weiny , John Groves , virtualization@lists.linux.dev, Oscar Salvador , qemu-devel@nongnu.org, Dave Jiang , Dan Williams , linuxarm@huawei.com, wangkefeng.wang@huawei.com, John Groves , Fan Ni , Navneet Singh , =?utf-8?B?4oCcTWljaGFlbCBTLiBUc2lya2lu4oCd?= , Igor Mammedov , Philippe =?iso-8859-1?Q?Mathieu-Daud=E9?= Subject: Re: [RFC] Virtualizing tagged disaggregated memory capacity (app specific, multi host shared) Message-ID: References: <20240815172223.00001ca7@Huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1AF041C0015 X-Stat-Signature: q1azgw8jsys1669k86t9jd7j4ktrmjbp X-Rspam-User: X-HE-Tag: 1726823183-65130 X-HE-Meta: U2FsdGVkX18cL9/F+96q4FAuXAsfvfmrwMApiKFSo0BhCeng6nTDQ/XYfMnkUV+OGyqxfVl0YPtaHi83DlnJlLjgpoD3qcHb7qFrN+bCyCPV8Q4WYugzLNutI+aY51VX5iqtRNLBQKDojl2Tgk4wph6yNj2pXmh4fuGFqcsca0KjOzI1NP7c/P2khgwzISKUbZNCKe+sz8ZwDBmCMUaqrjyvv9uYxNP92dvmah4eM1vTUL+dO++RqaCDmDJp1kFKkA+L7wngX6OZUlTjSftewaQOc2vzfzF2NaVXcX6wOWzzwbZcGpgGCrk+6M+vOawM1D9fHZWVL2FkVzDnjmHB2g1Ze4YoXXEKf6Taq+Ol5ElI8re0yP+b1q8PR673IWMm3jc2Er+SCmXx6jmMmZzkYFkPt8+7EBEzBx38WnWPLq8F/toU+dOD1YGwgNR8wE5rE7QY17xxhPplkz3aOhTtCr3xqFetaOkKqPD44ZA9inkFILWftIUa7+9D7nGwZHcl+7MZub78kC4wO+qhDchc1Xj30BLGls0UXUpZV00lumX5L130Iq2RbBh1XrkL6MqtzDFmF+6cwPbYwVG3veZoyck9xQZIBvIidjhRfciCh6tDBbmk6zDMpm3wFfmhW6V0jFsItnAXG2l3zyxdXpb8ju/wJaOhyDJmZMZeguRnBAa0W+YgfVlS1KRLOShtDNRyKJXyo9oqWwk8r3LjH+lyH1Y7FXUpDxuklj+LWI9PFshiD6JmfDxw5U0g/bdYt09h36Pz+kA0Bino1ltRFrr6BU3f8Dal+/FmdXTOh9uMmik/Oa5c7CjIa+1UF60MDj6xg2RLbMYOriLhQQPb59JsfSkctyzfMHY3zVN7H2xQL44TxuTuNwaAy6FUDz4EVAfe0adH14SFHcMU2TP5as4rynplffGXlWvF9wqqDIh9dj6CNL36jgyUTYlocbV0O0g2ArqjsNVU8LD1249OU+i LjWhJCNk R4t9rzFxYDCAfXQxPI87aguQAIx7rax/tA6lRaV7ZdRzMQqT85wMYatFWbQT9HDmyXPNaebrnzfGEAqkUxqnT0tQuBnPywlPy/tmmL77gsDdVaNFc0RqPC+QFcK2t4ZnkimXAnhxyLXIhjdYATDGc1ocgFNonbHOs7VriY4SFQXloOBVFWIs6aXp7nZ0o9s28IaARGOpsz3c/8BjZrEb3Ft6SBTqdBc0LnHmXr8N1tmKkL4wpxByJFcDe9j20Bw/fibJc6k5FHhLxPZG92EBuS9sBjvCT7WywgzMu+POhAtFhJSk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.133343, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > 2. Coarse grained memory increases for 'normal' memory. > > Can use memory hot-plug. Recovery of capacity likely to only be possible on > > VM shutdown. > > Is there are reason "movable" (ZONE_MOVABLE) is not an option, at least in > some setups? If not, why? > This seems like a bit of a muddied conversation. "'normal' memory" has no defined meaning - so lets clear this up a bit There is: * System-RAM (memory managed by kernel allocators) * Special Purpose Memory (generally presented as DAX) System-RAM is managed as zones - the relevant ones are * ZONE_NORMAL allows both movable and non-movable allocations * ZONE_MOVABLE only allows non-movable allocations (Caveat: this generally only applies to allocation, you can violate this with stuff like pinning) Hotplug can be thought of as two discrete mechanisms * Exposing capacity to the kernel (CXL DCD Transactions) * Exposing capacity to allocators (mm/memory-hotplug.c) 1) if the intent is to primarily utilize dynamic capacity for VMs, then the host does not need (read: should not need) to map the memory as System-RAM in the host. The VMM should be made to consume it directly via DAX or otherwise. That capacity is almost by definition "Capital G Guaranteed" to be reclaimable regardless of what the guest does. A VMM can force a guest to let go of resources - that's its job. 2) if the intent is to provide dynamic capacity to a host as System-RAM, then recoverability is dictated by system usage of that capacity. If onlined into ZONE_MOVABLE, then if the system has avoided doing things like pinning those pages it should *generally* be recoverable (but not guaranteed). For the virtualization discussion: Hotplug and recoverability is a non-issue. The capacity should never be exposed to system allocators and the VMM should be made to consume special purpose memory directly. That's on the VMM/orchestration software to get right. For the host System-RAM discussion: Auto-onlined hotplug capacity presently defaults to ZONE_NORMAL, but we discussed (yesterday, at Plumbers) changing this default to ZONE_MOVABLE. The only concern is when insufficient ZONE_NORMAL exists to support ZONE_MOVABLE capacity - but this is unlikely to be the general scenario AND can be mitigated w/ existing mechanisms. Manually onlined capacity defaults to ZONE_MOVABLE. It would be nice to make this behavior consistent, since the general opinion appears to be that this capacity should default to ZONE_MOVABLE. ~Gregory