From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75436C54E67 for ; Tue, 26 Mar 2024 06:26:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 057026B0089; Tue, 26 Mar 2024 02:26:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0076E6B0092; Tue, 26 Mar 2024 02:26:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEB526B0096; Tue, 26 Mar 2024 02:26:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CCBFC6B0089 for ; Tue, 26 Mar 2024 02:26:52 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9B588160652 for ; Tue, 26 Mar 2024 06:26:52 +0000 (UTC) X-FDA: 81938207064.12.B7AF9B8 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf04.hostedemail.com (Postfix) with ESMTP id 83D9C40010 for ; Tue, 26 Mar 2024 06:26:49 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=jszZbbdT; spf=pass (imf04.hostedemail.com: domain of horenchuang@bytedance.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=horenchuang@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711434410; a=rsa-sha256; cv=none; b=F6rD39UxJzWs7WMifNQP4A8WCbw98MrxeFeBbpECaa0NDH47BFqb7UKINvgKJL1NCrVkxB zaXk5YLPGu8Qn3aJzBckRgrL3yJSpWLVJ2zwJS4k3V+PQUID5wQvFVV0tuICAxM87n+E3d STU9zu8HVfox/g+KNVexCB7nPo9QLDg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=jszZbbdT; spf=pass (imf04.hostedemail.com: domain of horenchuang@bytedance.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=horenchuang@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711434410; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OrDj/ohWqSayAYgN1y4y/00hTnt9coGJh8fl1WRQsk8=; b=saHJFd9gAnos+KLLitXwk8gjLp/W0ucCXjlglRdW1UJqNtpYh/Jde74oB6Ln71ayQBuvpH vfyEe88EbY+C16xSa7iemtMAi7NBqkXi+yDjxnxZuAG/jC1s5dpcG36VmnRliVpSDbR+QH 516klhWWXN5AT2jVYSmGskTcSHZnPkI= Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-60a0579a968so55033837b3.3 for ; Mon, 25 Mar 2024 23:26:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1711434408; x=1712039208; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OrDj/ohWqSayAYgN1y4y/00hTnt9coGJh8fl1WRQsk8=; b=jszZbbdTwN2NBJiM30tjSndehBrk4hb5Ub/4Www5B0eibPEH/bgtP7/3xKwqkBd42w WKXNA1iyRK7TKSd58y6gG1P9zyyOCqkA6Y3c6bskzZLOruwUmBMhSGH6WiZNOr+5gjsn 2LEBDwjRmsGOmtZ5chCY6zGS6hAxBfK3tjEJnuk1REav40NMHu8Nw6jbqgLX3eAxKule zHG9M9Z2qT/WwbfS1mBNWT+dhcQ4GfBmV23uQm8D1Ce82uaMIJmcXY50bzMPBcbsJW7A CtISPqZbX45lz5HtuiHvbIcoD5gwo9YBmF2Evcx/sclO+uOxCzLDtrKquQyq/KYTpeQG ANBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711434408; x=1712039208; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OrDj/ohWqSayAYgN1y4y/00hTnt9coGJh8fl1WRQsk8=; b=Z2snngdmbY2ytB5rKdn/EoyW7VWl83NbN76AwdgzQ2W+g7lMp60K4y3VBpphveofMD quQffYaGwuhkSfUW+O0A+75ULRy57HRxuSkgmTW60g44xtCGbgoesgLvhS1xiaTw1/7p afxCItjZmYBj8efp7vmDKrcksfycqroxY8iVQ26n0Eq4lGnJY3dRNrJ9oGbBMjY0X4Zd W6N0HLum4WVtTT1JQjwZAKlZkAkMAqH2gvOM6BLFtlSKWrPw5Pck50zgwORcEc9jd7wB kvgKXW413iPESdMN0gJPDlm+yT/z6XZuf9NEJEguk4S8Ha8gknurZoLjmDVgHp7eH5LE Xq9w== X-Forwarded-Encrypted: i=1; AJvYcCW4m7XZuq/elZUqAc7ZlgkuI3olIeNEsWoS+jsH5ZXnI/8Nak26dKyW5MGPNYPqwoXWyZ9flA6NqT/1or3JcsLps3I= X-Gm-Message-State: AOJu0Yy6wslACUenst4ocBiQTU9qNcOr30H/kCRfGkA67Sp6tzymgZDm eTpfUgn5jmRJeEQtS90N4tVZ37p6N53YteS6GAyUVkjOoHUjC1qKm6qJGU7hP+gP8L4TJ9nperP wPCd5DnR3/dfqUsT1yWRnRE///B6VU0YWupcR+A== X-Google-Smtp-Source: AGHT+IFqYJubvpJrV33NsKYE5RdyKpfAzoF/y4eVN226q4jsK0CwHN6aACNahCgxSadd6QR1OxZuc4Vlo+dxRXUA7L4= X-Received: by 2002:a5b:20b:0:b0:dcc:744d:b486 with SMTP id z11-20020a5b020b000000b00dcc744db486mr7439625ybl.39.1711434408216; Mon, 25 Mar 2024 23:26:48 -0700 (PDT) MIME-Version: 1.0 References: <20240322070356.315922-1-horenchuang@bytedance.com> <20240322070356.315922-3-horenchuang@bytedance.com> <87cyrmr773.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ttktofoa.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87ttktofoa.fsf@yhuang6-desk2.ccr.corp.intel.com> From: "Ho-Ren (Jack) Chuang" Date: Mon, 25 Mar 2024 23:26:37 -0700 Message-ID: Subject: Re: [External] Re: [PATCH v4 2/2] memory tier: create CPUless memory tiers after obtaining HMAT info To: "Huang, Ying" Cc: Gregory Price , aneesh.kumar@linux.ibm.com, mhocko@suse.com, tj@kernel.org, john@jagalactic.com, Eishan Mirakhur , Vinicius Tavares Petrucci , Ravis OpenSrc , Alistair Popple , Srinivasulu Thanneeru , Dan Williams , Vishal Verma , Dave Jiang , Andrew Morton , nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Ho-Ren (Jack) Chuang" , "Ho-Ren (Jack) Chuang" , qemu-devel@nongnu.org, Hao Xiang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 83D9C40010 X-Stat-Signature: q9mssxnnwdekac3isw6w96zwjdbq8r19 X-Rspam-User: X-HE-Tag: 1711434409-323842 X-HE-Meta: U2FsdGVkX19yIXTO3w4RhDzGRTfCo3ONfcwGN+TJgo4tLKCfLWkJDqw+E+J/umCKuuOpnvUDxi9z2DF3SgA3goz5fj/QMYDqkdbWNL7bMIjucTNpHwYPTF0iXy+yPytGjbx4u+/HccOHm5HZf0A2LfJh2JLxw5zLbpNV9KSK+mrFe4LUGzSW6l3iIbdpALKKI2NjiRXzQtOt3h2kxjOqDn/BdbHBaI8/v+Gu0NHcLrexNWIOIWv9aaaHJkF27MwNpkByn7f28qa+FWD5R6jYin0V9AyEojAJpUCOQsjNQgLJtECreQrTik7WefiwxctxLb2HmszjI713z+fJpVNz10LoYB9IWDeP09ncytV43rZHYljsNYyWB7dPJJupZoSy3crj6PsuKmOdahXlGMhy+C1Oj2eIp5jocfkb6wwx60SDZSgFOLU3lziv1SzVkJFQjSbKVWbFUDpyIc4FIeHN91ak3QBmIdF4Nl9lkKaNo8VKMmpUy/CXQGoh9BH+UInr449KZ0wAwsTtNgz/ECu6EbkIY0neNzfIUFvMIKDQ0UQz7Yw28KwZQJWdX0YE/eKnk8Sit61Z2CSlVHnWapAaCJ8hdaCntCOxhbk/VItWVmLaYMHULJWh2+MKgrQ+Fi8W91gXA0ZrSeoSNjzaxd6Q2mDZcXeLAO7ejMasxeCx9ASGMqid5tJGI1x0kQ6a+IMliV8kaMyA9lGGKEOy+w0xzL7Kmsxs19ZTD8is5/SGIoWjwabhbmjMhN68WfnN+wD1PLp0E+KcJKqN5gV4jKZ9srxUlOd3PCQMJB1c6KF1DwbN/CZxnLdeL5TxlnlEqQc46E5SIQwifQ4qmIlLM2CbokfuuJK1fYaM3ME6D2H7qZ2biIZLhVUKijsc1+E3Hjz/+HE+Nfagyl6Smbqe1SLZfz4i8n4twEaVKa73dyD7uVs5Mgipc6oKIDcuvX7nIr5NQeHa0p3CKUizeBgnxbG 9bIjhSn0 bx73B9gJLtiyf1U4UBZweSooQiO6Q8o+neLab+A5UwkqMd8N4XxJtdEHQy7t/MJQu839AEZT3CpSSVqlMqryWpbpalHX7Kd7O6cfNxR/z2SL2k218XtDvzZlcFOuUxuszBfruuIOWLgkSpbQB9Rf/ww7CRkVEJ2NCaBG/sjre12FQJ84Vx0QXExi2Lp3IlSHg3jCH4gZTB4E29+kBmcr4kuGiQjOF8u/6rIuNO64w+Mox80e0trwd5n4CfDaH84ft1yLzGHYLDVjQhHzZEadY8zQ5tvedvzxeBpr5rUyO0qjGn9S40qxFpFqUwC7Gu8JMSrGB7frAPlxfRouTYTIMapfT47WbMUorS9ke6Gthn7Nfy+JsJ3X8xZa4rz5sB7nOH601/I2uzGm6dXQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 25, 2024 at 8:08=E2=80=AFPM Huang, Ying = wrote: > > "Ho-Ren (Jack) Chuang" writes: > > > On Fri, Mar 22, 2024 at 1:41=E2=80=AFAM Huang, Ying wrote: > >> > >> "Ho-Ren (Jack) Chuang" writes: > >> > >> > The current implementation treats emulated memory devices, such as > >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal= memory > >> > (E820_TYPE_RAM). However, these emulated devices have different > >> > characteristics than traditional DRAM, making it important to > >> > distinguish them. Thus, we modify the tiered memory initialization p= rocess > >> > to introduce a delay specifically for CPUless NUMA nodes. This delay > >> > ensures that the memory tier initialization for these nodes is defer= red > >> > until HMAT information is obtained during the boot process. Finally, > >> > demotion tables are recalculated at the end. > >> > > >> > * late_initcall(memory_tier_late_init); > >> > Some device drivers may have initialized memory tiers between > >> > `memory_tier_init()` and `memory_tier_late_init()`, potentially brin= ging > >> > online memory nodes and configuring memory tiers. They should be exc= luded > >> > in the late init. > >> > > >> > * Handle cases where there is no HMAT when creating memory tiers > >> > There is a scenario where a CPUless node does not provide HMAT infor= mation. > >> > If no HMAT is specified, it falls back to using the default DRAM tie= r. > >> > > >> > * Introduce another new lock `default_dram_perf_lock` for adist calc= ulation > >> > In the current implementation, iterating through CPUlist nodes requi= res > >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will = end up > >> > trying to acquire the same lock, leading to a potential deadlock. > >> > Therefore, we propose introducing a standalone `default_dram_perf_lo= ck` to > >> > protect `default_dram_perf_*`. This approach not only avoids deadloc= k > >> > but also prevents holding a large lock simultaneously. > >> > > >> > * Upgrade `set_node_memory_tier` to support additional cases, includ= ing > >> > default DRAM, late CPUless, and hot-plugged initializations. > >> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and > >> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()= ` to > >> > handle cases where memtype is not initialized and where HMAT informa= tion is > >> > available. > >> > > >> > * Introduce `default_memory_types` for those memory types that are n= ot > >> > initialized by device drivers. > >> > Because late initialized memory and default DRAM memory need to be m= anaged, > >> > a default memory type is created for storing all memory types that a= re > >> > not initialized by device drivers and as a fallback. > >> > > >> > Signed-off-by: Ho-Ren (Jack) Chuang > >> > Signed-off-by: Hao Xiang > >> > --- > >> > mm/memory-tiers.c | 73 ++++++++++++++++++++++++++++++++++++++++----= --- > >> > 1 file changed, 63 insertions(+), 10 deletions(-) > >> > > >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > >> > index 974af10cfdd8..9396330fa162 100644 > >> > --- a/mm/memory-tiers.c > >> > +++ b/mm/memory-tiers.c > >> > @@ -36,6 +36,11 @@ struct node_memory_type_map { > >> > > >> > static DEFINE_MUTEX(memory_tier_lock); > >> > static LIST_HEAD(memory_tiers); > >> > +/* > >> > + * The list is used to store all memory types that are not created > >> > + * by a device driver. > >> > + */ > >> > +static LIST_HEAD(default_memory_types); > >> > static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; > >> > struct memory_dev_type *default_dram_type; > >> > > >> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion __re= ad_mostly; > >> > > >> > static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms); > >> > > >> > +static DEFINE_MUTEX(default_dram_perf_lock); > >> > >> Better to add comments about what is protected by this lock. > >> > > > > Thank you. I will add a comment like this: > > + /* The lock is used to protect `default_dram_perf*` info and nid. */ > > +static DEFINE_MUTEX(default_dram_perf_lock); > > > > I also found an error path was not handled and > > found the lock could be put closer to what it protects. > > I will have them fixed in V5. > > > >> > static bool default_dram_perf_error; > >> > static struct access_coordinate default_dram_perf; > >> > static int default_dram_perf_ref_nid =3D NUMA_NO_NODE; > >> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int n= ode, struct memory_dev_type *mem > >> > static struct memory_tier *set_node_memory_tier(int node) > >> > { > >> > struct memory_tier *memtier; > >> > - struct memory_dev_type *memtype; > >> > + struct memory_dev_type *mtype; > >> > >> mtype may be referenced without initialization now below. > >> > > > > Good catch! Thank you. > > > > Please check below. > > I may found a potential NULL pointer dereference. > > > >> > + int adist =3D MEMTIER_ADISTANCE_DRAM; > >> > pg_data_t *pgdat =3D NODE_DATA(node); > >> > > >> > > >> > @@ -514,11 +521,20 @@ static struct memory_tier *set_node_memory_tie= r(int node) > >> > if (!node_state(node, N_MEMORY)) > >> > return ERR_PTR(-EINVAL); > >> > > >> > - __init_node_memory_type(node, default_dram_type); > >> > + mt_calc_adistance(node, &adist); > >> > + if (node_memory_types[node].memtype =3D=3D NULL) { > >> > + mtype =3D mt_find_alloc_memory_type(adist, &default_me= mory_types); > >> > + if (IS_ERR(mtype)) { > >> > + mtype =3D default_dram_type; > >> > + pr_info("Failed to allocate a memory type. Fal= l back.\n"); > >> > + } > >> > + } > >> > > >> > - memtype =3D node_memory_types[node].memtype; > >> > - node_set(node, memtype->nodes); > >> > - memtier =3D find_create_memory_tier(memtype); > >> > + __init_node_memory_type(node, mtype); > >> > + > >> > + mtype =3D node_memory_types[node].memtype; > >> > + node_set(node, mtype->nodes); > > > > If the `mtype` could be NULL, would there be a potential > > NULL pointer dereference. Do you have a preferred idea > > to fix this if needed? > > Initialize mtype with default_dram_type? > Sounds like a plan. Thank you. I'm working on V5. > > >> > + memtier =3D find_create_memory_tier(mtype); > >> > if (!IS_ERR(memtier)) > >> > rcu_assign_pointer(pgdat->memtier, memtier); > >> > return memtier; > >> > @@ -655,6 +671,34 @@ void mt_put_memory_types(struct list_head *memo= ry_types) > >> > } > >> > EXPORT_SYMBOL_GPL(mt_put_memory_types); > >> > > > [snip] > > -- > Best Regards, > Huang, Ying --=20 Best regards, Ho-Ren (Jack) Chuang =E8=8E=8A=E8=B3=80=E4=BB=BB