From: Ying Huang <ying.huang@intel.com>
To: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
Cc: Greg Thelen <gthelen@google.com>, Yang Shi <shy828301@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim C Chen <tim.c.chen@intel.com>,
Brice Goglin <brice.goglin@gmail.com>,
Michal Hocko <mhocko@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hesham Almatary <hesham.almatary@huawei.com>,
Dave Hansen <dave.hansen@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Alistair Popple <apopple@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Feng Tang <feng.tang@intel.com>,
Jagdish Gediya <jvgediya@linux.ibm.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers
Date: Wed, 08 Jun 2022 09:24:25 +0800 [thread overview]
Message-ID: <8e826a0ae730f6f6e43e82a26a9e22059a5a1682.camel@intel.com> (raw)
In-Reply-To: <1301311f-12f0-0fda-1245-82bb4c3f5e93@linux.ibm.com>
On Mon, 2022-06-06 at 14:32 +0530, Aneesh Kumar K V wrote:
> On 6/6/22 2:22 PM, Ying Huang wrote:
> ....
> > > > > I can move the patch "mm/demotion/dax/kmem: Set node's memory tier to
> > > > > MEMORY_TIER_PMEM" before switching the demotion logic so that on systems
> > > > > with two memory tiers (DRAM and pmem) the demotion continues to work
> > > > > as expected after patch 3 ("mm/demotion: Build demotion targets based on
> > > > > explicit memory tiers"). With that, there will not be any regression in
> > > > > between the patch series.
> > > > >
> > > >
> > > > Thanks! Please do that. And I think you can add sysfs interface after
> > > > that patch too. That is, in [1/7]
> > > >
> > >
> > > I am not sure why you insist on moving sysfs interfaces later. They are
> > > introduced based on the helper added. It make patch review easier to
> > > look at both the helpers and the user of the helper together in a patch.
> >
> > Yes. We should introduce a function and its user in one patch for
> > review. But this doesn't mean that we should introduce the user space
> > interface as the first step. I think the user space interface should
> > output correct information when we expose it.
> >
>
> If you look at this patchset we are not exposing any wrong information.
>
> patch 1 -> adds ability to register the memory tiers and expose details
> of registered memory tier. At this point the patchset only support DRAM
> tier and hence only one tier is shown
But inside kernel, we actually work with 2 tiers and demote/prmote pages
between them. With the information from your interface, users would
think that there is no any demotion/promotion in kernel because there's
only 1 tier.
> patch 2 -> adds per node memtier attribute. So only DRAM nodes shows the
> details, because the patchset yet has not introduced a slower memory
> tier like PMEM.
>
> patch 4 -> introducing demotion. Will make that patch 5
>
> patch 5 -> add dax kmem numa nodes as slower memory tier. Now this
> becomes patch 4 at which point we will correctly show two memory tiers
> in the system.
>
>
> > > > +struct memory_tier {
> > > > + nodemask_t nodelist;
> > > > +};
> > > >
> > > > And struct device can be added after the kernel has switched the
> > > > implementation based on explicit memory tiers.
> > > >
> > > > +struct memory_tier {
> > > > + struct device dev;
> > > > + nodemask_t nodelist;
> > > > +};
> > > >
> > >
> > >
> > > Can you elaborate on this? or possibly review the v5 series indicating
> > > what change you are suggesting here?
> > >
> > >
> > > > But I don't think it's a good idea to have "struct device" embedded in
> > > > "struct memory_tier". We don't have "struct device" embedded in "struct
> > > > pgdata_list"...
> > > >
> > >
> > > I avoided creating an array for memory_tier (memory_tier[]) so that we
> > > can keep it dynamic. Keeping dev embedded in struct memory_tier simplify
> > > the life cycle management of that dynamic list. We free the struct
> > > memory_tier allocation via device release function (memtier->dev.release
> > > = memory_tier_device_release )
> > >
> > > Why do you think it is not a good idea?
> >
> > I think that we shouldn't bind our kernel internal implementation with
> > user space interface too much. Yes. We can expose kernel internal
> > implementation to user space in a direct way. I suggest you to follow
> > the style of "struct pglist_data" and "struct node". If we decouple
> > "struct memory_tier" and "struct memory_tier_dev" (or some other name),
> > we can refer to "struct memory_tier" without depending on all device
> > core. Memory tier should be accessible inside the kernel even without a
> > user interface. And memory tier isn't a device in concept.
> >
>
> memory_tiers are different from pglist_data and struct node in that we
> also allow the creation of them from userspace.
I don't think that there's much difference. struct pglist_data and
struct node can be created/destroyed dynamically too. Please take a
look at
__try_online_node()
register_one_node()
try_offline_node()
unregister_one_node()
> That is the life time of
> a memory tier is driven from userspace and it is much easier to manage
> them via sysfs file lifetime mechanism rather than inventing an
> independent and more complex way of doing the same.
You needs to manage the lifetime of struct memory_tier in kernel too.
Because there are kernel users. And even if you use device core
lifetime mechanism, you don't need to embed struct device in struct
memory_tier too, you can free "separate" struct memory_tier in "release"
callback of struct device.
> > For life cycle management, I think that we can do that without sysfs
> > too.
> >
>
> unless there are specific details that you think will be broken by
> embedding struct device inside struct memory_tier, IMHO I still consider
> the embedded implementation much simpler and in accordance with other
> kernel design patterns.
In concept, struct memory_tier isn't a device. Although we expose it as
a device in sysfs. That's just an implementation detail. So I think
it's better to make struct memory_tier independent of struct device if
possible.
Via not embeding struct device in struct memory_tier, it's much easier
to dereference struct memory_tier directly in inline function in ".h".
We don't need to introduce one accessor function for each field of
struct memory_tier for that.
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2022-06-08 1:24 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-26 21:22 RFC: Memory Tiering Kernel Interfaces (v3) Wei Xu
2022-05-27 2:58 ` Ying Huang
2022-05-27 14:05 ` Hesham Almatary
2022-05-27 16:25 ` Wei Xu
2022-05-27 12:25 ` [RFC PATCH v4 0/7] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-05-27 12:25 ` [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-05-27 13:59 ` Jonathan Cameron
2022-06-02 6:07 ` Ying Huang
2022-06-06 2:49 ` Ying Huang
2022-06-06 3:56 ` Aneesh Kumar K V
2022-06-06 5:33 ` Ying Huang
2022-06-06 6:01 ` Aneesh Kumar K V
2022-06-06 6:27 ` Aneesh Kumar K.V
2022-06-06 7:53 ` Ying Huang
2022-06-06 8:01 ` Aneesh Kumar K V
2022-06-06 8:52 ` Ying Huang
2022-06-06 9:02 ` Aneesh Kumar K V
2022-06-08 1:24 ` Ying Huang [this message]
2022-06-08 7:16 ` Ying Huang
2022-06-08 8:24 ` Aneesh Kumar K V
2022-06-08 8:27 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 2/7] mm/demotion: Expose per node memory tier to sysfs Aneesh Kumar K.V
2022-05-27 14:15 ` Jonathan Cameron
2022-06-03 8:40 ` Aneesh Kumar K V
2022-06-06 14:59 ` Jonathan Cameron
2022-06-06 16:01 ` Aneesh Kumar K V
2022-06-06 16:16 ` Jonathan Cameron
2022-06-06 16:39 ` Aneesh Kumar K V
2022-06-06 17:46 ` Aneesh Kumar K.V
2022-06-07 14:32 ` Jonathan Cameron
2022-06-08 7:18 ` Ying Huang
2022-06-08 8:25 ` Aneesh Kumar K V
2022-06-08 8:29 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 3/7] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-05-27 14:31 ` Jonathan Cameron
2022-05-30 3:35 ` [mm/demotion] 8ebccd60c2: BUG:sleeping_function_called_from_invalid_context_at_mm/compaction.c kernel test robot
2022-05-27 12:25 ` [RFC PATCH v4 4/7] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Aneesh Kumar K.V
2022-06-01 6:29 ` Bharata B Rao
2022-06-01 13:49 ` Aneesh Kumar K V
2022-06-02 6:36 ` Bharata B Rao
2022-06-03 9:04 ` Aneesh Kumar K V
2022-06-06 10:11 ` Bharata B Rao
2022-06-06 10:16 ` Aneesh Kumar K V
2022-06-06 11:54 ` Aneesh Kumar K.V
2022-06-06 12:09 ` Bharata B Rao
2022-06-06 13:00 ` Aneesh Kumar K V
2022-05-27 12:25 ` [RFC PATCH v4 5/7] mm/demotion: Add support to associate rank with memory tier Aneesh Kumar K.V
2022-05-27 14:45 ` Jonathan Cameron
2022-05-27 15:45 ` Aneesh Kumar K V
2022-05-30 12:36 ` Jonathan Cameron
2022-06-02 6:41 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 6/7] mm/demotion: Add support for removing node from demotion memory tiers Aneesh Kumar K.V
2022-06-02 6:43 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 7/7] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-05-27 15:03 ` Jonathan Cameron
2022-06-02 7:35 ` Ying Huang
2022-06-03 15:09 ` Aneesh Kumar K V
2022-06-06 0:43 ` Ying Huang
2022-06-06 4:07 ` Aneesh Kumar K V
2022-06-06 5:26 ` Ying Huang
2022-06-06 6:21 ` Aneesh Kumar K.V
2022-06-06 7:42 ` Ying Huang
2022-06-06 8:02 ` Aneesh Kumar K V
2022-06-06 8:06 ` Ying Huang
2022-06-06 17:07 ` Yang Shi
2022-05-27 13:40 ` RFC: Memory Tiering Kernel Interfaces (v3) Aneesh Kumar K V
2022-05-27 16:30 ` Wei Xu
2022-05-29 4:31 ` Ying Huang
2022-05-30 12:50 ` Jonathan Cameron
2022-05-31 1:57 ` Ying Huang
2022-06-07 19:25 ` Tim Chen
2022-06-08 4:41 ` Aneesh Kumar K V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8e826a0ae730f6f6e43e82a26a9e22059a5a1682.camel@intel.com \
--to=ying.huang@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=apopple@nvidia.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=brice.goglin@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=feng.tang@intel.com \
--cc=gthelen@google.com \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox