From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27A33C54E58 for ; Wed, 20 Mar 2024 06:10:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 951AB6B0082; Wed, 20 Mar 2024 02:10:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9025D6B0085; Wed, 20 Mar 2024 02:10:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C96E6B0088; Wed, 20 Mar 2024 02:10:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6CA8E6B0082 for ; Wed, 20 Mar 2024 02:10:49 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 318BD160B77 for ; Wed, 20 Mar 2024 06:10:49 +0000 (UTC) X-FDA: 81916393818.29.6455C03 Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) by imf16.hostedemail.com (Postfix) with ESMTP id 6001B180013 for ; Wed, 20 Mar 2024 06:10:46 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=MTdkFfrR; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf16.hostedemail.com: domain of horenchuang@bytedance.com designates 209.85.222.171 as permitted sender) smtp.mailfrom=horenchuang@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710915047; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=xCZgkB2SZu0i/tSjRR46WOtj0M+sNHf+HRU976J1PX4=; b=HxiE6qMHBv6Q6preGvuAr3FD7TXTmNOywT7Chf6Q2sAOBDChEsLTBmp3nURBIVzn5uQoMb JLuf0FLgj2F9XtbefzOL7PYBHvEbzoJt12fZ8hnDqbQRnBm0OG3o7zGhO2JaeODz+KlSFZ xRpNSsRzKr8AwXysrLrMvFBmsbmQt+4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=MTdkFfrR; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf16.hostedemail.com: domain of horenchuang@bytedance.com designates 209.85.222.171 as permitted sender) smtp.mailfrom=horenchuang@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710915047; a=rsa-sha256; cv=none; b=LTF2IQqVd9ub3KEqTO7a0YpblsCIavPtk3FbBQAYFHDv7hc9VYafQx2YWyNARm39vt85ga 4Bquo4ijeIbHUP7BTabC/VO3MVOeHC3Mmlk390AkbI+SJ6M+NDv+t6Hn4/Xl/vFFGh6jur zvZ0omCWqSNj/hR/q0KrhMxxI0scY+s= Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-78a26aaefc8so2589485a.1 for ; Tue, 19 Mar 2024 23:10:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1710915045; x=1711519845; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=xCZgkB2SZu0i/tSjRR46WOtj0M+sNHf+HRU976J1PX4=; b=MTdkFfrR2LX+aECLCcs7Kc+nQ2CuZ5RS0d2AEN8SZoMi1g5YTJiV73Q/JyaLL+4j6K rNFz6FkrvNfScMav/9tHD+N72ziISXLu5SXSmTev6Q56hQOS1mV9sOb6nZSvu23im1ng Wioddb09ieGPZ5guBx1iTuRyGpfJzUrJ4ZqyVv7h6Dg0X18tD2vzcssq2g6imUjhEx3m JFpqMtGEop9OGVxkXGRqQebCvKUIf7ZDTEJetBSTNQEHlALyXo1tHmYwiMuObKVlAIkr 76RmztI9A6DKpEQ5q/7VxMHbPC7vrXHc0KJVYIVgdjYmsNb/0rdPk2tD1klL4PiLiSUn 3M3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710915045; x=1711519845; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xCZgkB2SZu0i/tSjRR46WOtj0M+sNHf+HRU976J1PX4=; b=RWLYDCKFi2esCdxH20glx/hzvwzoOT+THLINfy+aOXTw6qAHvwXUBBdAlrOm6jRPa0 fREprQ3OiJSxKlMVLAEAOGlRIDVkXJkx5oXb/WORELxpEM2egXo3onEESympFA5yrNdl ykVgqU5Lhs0VYBXmEexifsCIeqefOMLLifU0Z26bUval7B1aqbI/sfQ/5Z7b2xylUiYu OetCPEY79RWlxNRCUMDguj+dkWwY1ZdcVMuXEKux+ptHO727EP5rPmVp8m+U2KTI7Dci Z/CVWer5xGzzgFlEwX928dc474ViYhkC+B4H83jrs8g/O/lqyXLxhcxu4XVcpjPO5rDh XdNQ== X-Forwarded-Encrypted: i=1; AJvYcCVuZSDdpH7Z1CLh0J+zfq80sTQ8lZ9/LB/7z5Wguo16dmSAVTrJa1RLjHAW3S2flsEdkl7SGSRZZpXeXa7OYWF6F7c= X-Gm-Message-State: AOJu0Yy7C9cbm0PZnbSay7zJ0v5II17UsOb81n+Ad9/AzXVUqC85jHrV EoDNS9TfLFUPOf/1mlC2nJjiEtNImHQBix50e/XpFpsdRoObcAHtLO5QK5VtUIY= X-Google-Smtp-Source: AGHT+IFa111kisVq1eXAsilnKhEWjB1FxneYVefed7MzhMngJVWVGZbwSa2ElD7e4aapqGZS/vkJjQ== X-Received: by 2002:a05:620a:5d8b:b0:78a:1c41:ac4e with SMTP id xx11-20020a05620a5d8b00b0078a1c41ac4emr2285040qkn.5.1710915045498; Tue, 19 Mar 2024 23:10:45 -0700 (PDT) Received: from n231-228-171.byted.org ([130.44.215.123]) by smtp.gmail.com with ESMTPSA id r15-20020a05620a03cf00b0078a042376absm2295914qkm.22.2024.03.19.23.10.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 23:10:45 -0700 (PDT) From: "Ho-Ren (Jack) Chuang" To: "Huang, Ying" , "Gregory Price" , aneesh.kumar@linux.ibm.com, mhocko@suse.com, tj@kernel.org, john@jagalactic.com, "Eishan Mirakhur" , "Vinicius Tavares Petrucci" , "Ravis OpenSrc" , "Alistair Popple" , "Srinivasulu Thanneeru" , Dan Williams , Vishal Verma , Dave Jiang , Andrew Morton , nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Ho-Ren (Jack) Chuang" , "Ho-Ren (Jack) Chuang" , "Ho-Ren (Jack) Chuang" , qemu-devel@nongnu.org Subject: [PATCH v3 0/2] Improved Memory Tier Creation for CPUless NUMA Nodes Date: Wed, 20 Mar 2024 06:10:38 +0000 Message-Id: <20240320061041.3246828-1-horenchuang@bytedance.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 6001B180013 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: dtgima4s4der7gn435e3owjdr1xy3ima X-HE-Tag: 1710915046-677772 X-HE-Meta: U2FsdGVkX1/CiCG9KpPz4AYYycYVj9mtBJjDqjGvZq/5GSwCJVCkQLzwiYsYM1K4W+54UhSpbRUviSbnGPUXUSjeGKsJxOmsAsjUk7lrh85YggGzwxCSBZvRbdYd/R1CwJFV4NFSIfrZ0KvO/RkYrVtdjCkxPI1ZYGo0ZvzAbhzenaJkUY/q57Fr+QVKDjB7keYKR5r5C5bDjuLfOVZFg1Sw7dnWD2SpAoVyIaaSvtmnQiOAA5ZSHD5ijS3jcxYhvRaU1XGaCQn5v7oFiMzKG0X4w0wpgHIYgXzcBenKyNL1GYqQdWU1m4+uN/g85VJfGUa6bCxSO6CrILKuJD4mEITpA2PCrj1FjY0uCzmujlvTEUEEUai4VHdPWqs6DII9UbeWKRXfpLeVmDBSQTxkrHeNaistApWxeH6hnTiEXWLA28vmmxC4v7q5PxsRC1aQXvSG9nFLxkVG9Tz/cuvVy92nVdbLmFbqn1UCCNWWN9w+sDRQDs2z4G+fCZjobLWPfJbfKN0arDO6G1s3GfbwXEI9R8hoPL3v/pHGRinn333BJwKiuHVjKE1Cxm7GbvlUa5ATvh4a8gesZr+RCHOMaXYTWHK3zFXLbWHFb5R69L11504rMoRZTrf+8Zf7bEnnWYqxrWlITqzAJz/ivZJHmIWEqOqMDKrJZSs9hR+8fdtHQoIYziwZ9PLUspgeEmgv9Cpylt7Gnr4cBpoBv3xd5LYXjJC5zMUKnA8i7543O2Wi83YK9pc3uJaPskxtYDGUmEHYl3KHbEjGkkFNeYlV7KXq7KCsMMxIikVjuFXf3wQH5S59kvZ1rYv25Rdn1Uj5uyGYEotiPosPl6X19fFcTVQSTPzX+HCiirCpVW4SygLaaEn2ofaezVmgKjqG5nchEoh35vpWhdJh3II/jcOZZ1b/8aPo6n1Cumezvzq1pI48GLJsKMOTxdd135kLRKYA61+nJ0nOE2MMQbUNoKR sBZsPt5L nn0NoE4fxEahB/FP7i5hYVgrBttcSyfDCbZKsleafifGtbbP8F099f2pl3LGZmD6ECVIHi5Yi5pC3vXirJPErmMJlRkKe3OgNjFkl2/AjHt0L6GX8+YBzuALLTZRtFdRKp8b2JJP2SFav14OaTmNFWMsxtG4BijEg5dSbr+XrNGccZdaSuVztL3H6V5hxNkI//B5zyCdJa57jwPuXvYnm1PX7SDJc1QZjzmKTb5ACwIJCG8MVfjM2o1znWf6KWvm58+Xiow27fvMKBtBlgIemtzlR6k8PSZS1KkSyHEL08fNodWxQ30m1bdmuSVjpXupITG/OjO2PbxVxaqQwscujio2pmiK3r0xQwRSf070SMUxEUejBzzVyu6gHSmnplmwLcMQwKHyN398thqGhipHnQGJfIsEJn29KF6OEwmkn5Ay66DcvG5426ou1+WA2jvAPNFN/OHulTvnV0M3gBGx3vZzbvME6+IeyBgVlpNk4ya8s+7A9Ke2J2HFOmir6WZwk6/EFBB/VdJ/SdTZLw5r+IXBNBK8wGGPAVp7wpyCCDEL7lMOvMnllxMy2u2BVV+7itM+y X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When a memory device, such as CXL1.1 type3 memory, is emulated as normal memory (E820_TYPE_RAM), the memory device is indistinguishable from normal DRAM in terms of memory tiering with the current implementation. The current memory tiering assigns all detected normal memory nodes to the same DRAM tier. This results in normal memory devices with different attributions being unable to be assigned to the correct memory tier, leading to the inability to migrate pages between different types of memory. https://lore.kernel.org/linux-mm/PH0PR08MB7955E9F08CCB64F23963B5C3A860A@PH0PR08MB7955.namprd08.prod.outlook.com/T/ This patchset automatically resolves the issues. It delays the initialization of memory tiers for CPUless NUMA nodes until they obtain HMAT information and after all devices are initialized at boot time, eliminating the need for user intervention. If no HMAT is specified, it falls back to using `default_dram_type`. Example usecase: We have CXL memory on the host, and we create VMs with a new system memory device backed by host CXL memory. We inject CXL memory performance attributes through QEMU, and the guest now sees memory nodes with performance attributes in HMAT. With this change, we enable the guest kernel to construct the correct memory tiering for the memory nodes. -v3: Thanks to Ying's comments, * Make the newly added code independent of HMAT * Upgrade set_node_memory_tier to support more cases * Put all non-driver-initialized memory types into default_memory_types instead of using hmat_memory_types * find_alloc_memory_type -> mt_find_alloc_memory_type -v2: Thanks to Ying's comments, * Rewrite cover letter & patch description * Rename functions, don't use _hmat * Abstract common functions into find_alloc_memory_type() * Use the expected way to use set_node_memory_tier instead of modifying it * https://lore.kernel.org/lkml/20240312061729.1997111-1-horenchuang@bytedance.com/T/#u -v1: * https://lore.kernel.org/lkml/20240301082248.3456086-1-horenchuang@bytedance.com/T/#u Ho-Ren (Jack) Chuang (2): memory tier: dax/kmem: create CPUless memory tiers after obtaining HMAT info memory tier: dax/kmem: abstract memory types put drivers/dax/kmem.c | 20 +------ include/linux/memory-tiers.h | 13 +++++ mm/memory-tiers.c | 106 ++++++++++++++++++++++++++++++++--- 3 files changed, 114 insertions(+), 25 deletions(-) -- Ho-Ren (Jack) Chuang