From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 559C4C3DA70 for ; Tue, 30 Jul 2024 16:52:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D42966B0089; Tue, 30 Jul 2024 12:52:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF3436B008A; Tue, 30 Jul 2024 12:52:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B947D6B008C; Tue, 30 Jul 2024 12:52:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9AEB06B0089 for ; Tue, 30 Jul 2024 12:52:53 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1B8D4A03C2 for ; Tue, 30 Jul 2024 16:52:53 +0000 (UTC) X-FDA: 82397013426.19.8124078 Received: from mail-ua1-f50.google.com (mail-ua1-f50.google.com [209.85.222.50]) by imf17.hostedemail.com (Postfix) with ESMTP id 247F340013 for ; Tue, 30 Jul 2024 16:52:50 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=LwC5Fu9Z; spf=pass (imf17.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.50 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722358343; a=rsa-sha256; cv=none; b=clic3FhLa4oDXACwqRyCMGbMgSmddl3o4yLB3sJvROBfNd6FPeU0PvqbIKyYm1qEJNY0uj 20A9PfUt4tGq5ZrLIg1vrmQD6aivXS8PuzzxXolRFFbnaBuBHb0vcPIIadXWtO/XvJQsG0 bRhK0eWpbSzlGFTtc7DYkSUltkzv3NY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=LwC5Fu9Z; spf=pass (imf17.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.50 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722358343; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7KQVeJhA8Nc1kr8jYgp61vZfNk5AG6Jp03s6AWyvfdM=; b=yVx4YtAeWbbPQFlhYWnf3MYPyuIKcz6gk8/L+aor9F1PgFlMzBgd1FndLj+5VkW6qdNWot RqCiaUkSqWJle9nUmWyVriHkokONfXnTDjWfqH4MVrlAwnZJ7KMuTxpi6gZmH2GPh9JDvp +S/1EYWT/WIuTGLB0qwRLiNYEZiiij4= Received: by mail-ua1-f50.google.com with SMTP id a1e0cc1a2514c-81ff6a80cb2so1136477241.3 for ; Tue, 30 Jul 2024 09:52:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1722358370; x=1722963170; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=7KQVeJhA8Nc1kr8jYgp61vZfNk5AG6Jp03s6AWyvfdM=; b=LwC5Fu9Zaa7HAMps9vZxtUYPYBK3ebvwfjf50YIjLxxR08BYQ4FNYy7Si2kyT1D2R1 oJ605ZyV6ZBsZJW3ZeGHyxqoQlWs4BP2j2XXJ06KzP8F6IUXyNQ/jRIixQz2BQ+8g7Ij /5WqXA+YzC5uHlbs187mTP639BuvhPiWMmr8XOVG03PTgOgY+7LIc/5Ul0P+pEeVzgPc JRHPtUypTp72I2F+Q92eCKgfGh3nAIIXjq9xfTIFhZ7KhiYZu33EFC/IDFO75v01x8pa P9YY39coNurbUOm9JC9qXo2idzKl4tnCyWgo/igNj01asXqQlSKXuCXC7XvltoR03RDS Vy4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722358370; x=1722963170; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7KQVeJhA8Nc1kr8jYgp61vZfNk5AG6Jp03s6AWyvfdM=; b=qSa1oqRbAwav1D4DL4qK+vXS/Ra80k+9WxjVlJ+541tk7d6ZI3+s4JfgT0pMR1ENqu f3ClDvCmCvTmDEFR+KyVor9J9V0THgpADAVK4tU2qZ2BcvzXGZRAx86f5O8T05unvSPC 48e3WvqEMWVxganAUdsUQg3+ObFhezr/DPpoQREcvwSVdG8f4esvsa8QbYtJiOtas9fM ndlU0UElzb4VxFjPcz2VqEk3V3n8CvqcWzUwg2qQ+Fa9DDhw3RxjUvxn1ciGGQOKQl3P guV5tX0jSCt39AbKgUrRKr331wezA+mNCOB4ELgK0u8AhHn5ryLwQkJnDiIm8LL5Eu15 Xh3A== X-Gm-Message-State: AOJu0YxYQMzKCNAXRYishmBwgVxY+ViYPZTpkwUqSmlPKIJhmnxlyv0T J9altjz75urGGcUVsfTzVg/pXTUNLRTGsuzFiLlewdde4AwyCYRIGgkZH3R4p84= X-Google-Smtp-Source: AGHT+IG3kXzdFLdaIqxcSYNO6QXWKyhicCB56NAmLiiFCX5C4mUyPUoLIOyykWtzvpmjpM43ScoJaw== X-Received: by 2002:a05:6102:548b:b0:493:bbd7:3ec0 with SMTP id ada2fe7eead31-493fad0d470mr9105172137.23.1722358370045; Tue, 30 Jul 2024 09:52:50 -0700 (PDT) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a1d73b8837sm657515485a.48.2024.07.30.09.52.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:52:49 -0700 (PDT) Date: Tue, 30 Jul 2024 02:12:27 -0400 From: Gregory Price To: "Huang, Ying" Cc: linux-mm@kvack.org, akpm@linux-foundation.org, dave.jiang@intel.com, Jonathan.Cameron@huawei.com, horenchuang@bytedance.com, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, dan.j.williams@intel.com, lenb@kernel.org, "Aneesh Kumar K.V" Subject: Re: [PATCH] acpi/hmat,mm/memtier: always register hmat adist calculation callback Message-ID: References: <20240726215548.10653-1-gourry@gourry.net> <87ttg91046.fsf@yhuang6-desk2.ccr.corp.intel.com> <877cd3u1go.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: ohenqb5rn1bxuw45ng6pigqcnawxh8n8 X-Rspamd-Queue-Id: 247F340013 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1722358370-156197 X-HE-Meta: U2FsdGVkX19f+arPLXiA/mgJj4S6W3Hd6oQ+9M/thVOXVwTBbHG9skPt1fr8IQbvJAzjMcDxY1aU084clFSr7m+gyxfOMh0ez8riN2Dih+ACMTv/Lu1tpLkm5QgYU5VXqMuJuMsh0zd+Z63LihN4zM/cpzrgijrlQblbAjNttVsskhrMyH/CY79LUEsvdfTwo4QvFBRnVDUzQ58RA0yQLw9pQoQaxW5SLVYES0hgaTYVv8wttND0loTX6qUAEiIsaXxr68RyQXDUQF7PPnj+8S2c7FvRRVQMIVQqH50ZSaNxom9h97tdwbaVDiApF8TmTznUZKiAAm0rTzuE8/2Lxk6jc0NXafw8ffGwI1kEwSCidKjrRkZnWgI+/MaUSlYk/ltl6Vaacuq4FBbxtgxTQrE82q1+HkqGrJ/5o1bOTNqAQNdqAKI2nQnGT/STjPSz9x8OghuCtRaUyV7w8OlPsyJnekiRP2kzYpIj0e5hcAxrmEuQ3zSsWEv+piULpnuJdVGCaR502bUXGLPQnpIZNz/aP6GwIvfw3P0/3lsXYvz78lp0u+1nQSu6krf75OvkIF01TYcyMkx+NLitf+mt1yPSe1kFgZO486GRafKl0kxwFT84t3ImtgYjf3GtcieoV5FH+zERd1bkZvF3/JcWtnH3rIj+ToGDPhou0Y79BNlrfBEz2653j+HYdQARWz2xF2wB0m7qVwDSqmWNybJdh7CrAk7s+rsH6IYIXvTxFcQT17pULSDIWXm6ZsUADWYWs04HkMwNQKm4lwyr2/b6wy4JFW3Zq6IMG8mfTcuJZd9cYXqg+juBAOv6ZHtroYFm0epS6rj6R8GjfMzorCStdanvYCGpdAXx+Zym7Wa1KDVy70adyybjaUWzZKpEEEGuNdU9v9zhv7uyVMTyLolsLbD3wCmVDXKSqWS493yHMK0n976F3qaF5b/+PaVqnG3VswTY+o+/EJXiqQ3kZCx nG7D8eg2 kaDFJED6ZzDtYVqg7FtDs7AsJcHJrk648J1+yIzva8qSU9cUKPaxZ7RrJMVduFZDDRIzXtOPqflz7o+u4rztPomlG8vmL5r7VQ+E0Q+ITWdFG8+xOtZho5oAHpBsSRvqZOP+b6c4KbCS5D1xXMXUZ96OKnrsrDElkKMLdlU2SRWtIG0fZzpGn1V0tiMSSuB8o1VoLnGdMbSb7jt3/miE/udJex26CzhLfyHd0rP3jlgKbbI8EudpYM5q1n/h5zVAZGIaaVrJWd/L2ocU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 30, 2024 at 01:19:53AM -0400, Gregory Price wrote: > On Tue, Jul 30, 2024 at 09:12:55AM +0800, Huang, Ying wrote: > > > Right now HMAT appears to be used prescriptively, this despite the fact > > > that there was a clear intent to separate CPU-nodes and non-CPU-nodes in > > > the memory-tier code. So this patch simply realizes this intent when the > > > hints are not very reasonable. > > > > If HMAT isn't available, it's hard to put memory devices to > > appropriate memory tiers without other information. In commit > > 992bf77591cb ("mm/demotion: add support for explicit memory tiers"), > > Aneesh pointed out that it doesn't work for his system to put > > non-CPU-nodes in lower tier. > > > > Per Aneesh in 992bf77591cb - The code explicitly states the intent is > to put non-CPU-nodes in a lower tier by default. > > > The current implementation puts all nodes with CPU into the highest > tier, and builds the tier hierarchy by establishing the per-node > demotion targets based on the distances between nodes. > > This is accurate for the current code > > > The current tier initialization code always initializes each > memory-only NUMA node into a lower tier. > > This is *broken* for the currently upstream code. > > This appears to be the result of the hmat adistance callback introduction > (though it may have been broken before that). > > ~Gregory Digging into the history further for the sake of completeness 6c542ab ("mm/demotion: build demotion targets based on ...") mm/demotion: build demotion targets based on explicit memory tiers This patch switch the demotion target building logic to use memory tiers instead of NUMA distance. All N_MEMORY NUMA nodes will be placed in the default memory tier and additional memory tiers will be added by drivers like dax kmem. The decision made in this patch breaks memory-tiers.c for all BIOS configured CXL devices that generate a DRAM node during early boot, but for which HMAT is absent or otherwise broken - the new HMAT code addresses the situation for when HMAT is present. Hardware supporting this style of configuration has been around for at least a few years now. I think we should at the very least consider adding an option to restore this (!N_CPU)=Lower Tier behavior - if not defaulting to the behavior when HMAT data is not present. ~Gregory