From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
Wei Xu <weixugc@google.com>, Yang Shi <shy828301@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim C Chen <tim.c.chen@intel.com>,
Michal Hocko <mhocko@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hesham Almatary <hesham.almatary@huawei.com>,
Dave Hansen <dave.hansen@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Alistair Popple <apopple@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Johannes Weiner <hannes@cmpxchg.org>,
jvgediya.oss@gmail.com
Subject: Re: [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM
Date: Mon, 1 Aug 2022 10:10:39 +0530 [thread overview]
Message-ID: <e5545c90-9595-d08c-8a1c-1c15e3b94999@linux.ibm.com> (raw)
In-Reply-To: <87k07slnt7.fsf@yhuang6-desk2.ccr.corp.intel.com>
On 8/1/22 7:36 AM, Huang, Ying wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>
>> "Huang, Ying" <ying.huang@intel.com> writes:
>>
>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> By default, all nodes are assigned to the default memory tier which
>>>> is the memory tier designated for nodes with DRAM
>>>>
>>>> Set dax kmem device node's tier to slower memory tier by assigning
>>>> abstract distance to MEMTIER_ADISTANCE_PMEM. PMEM tier
>>>> appears below the default memory tier in demotion order.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>> drivers/dax/kmem.c | 9 +++++++++
>>>> include/linux/memory-tiers.h | 19 ++++++++++++++++++-
>>>> mm/memory-tiers.c | 28 ++++++++++++++++------------
>>>> 3 files changed, 43 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>>>> index a37622060fff..6b0d5de9a3e9 100644
>>>> --- a/drivers/dax/kmem.c
>>>> +++ b/drivers/dax/kmem.c
>>>> @@ -11,6 +11,7 @@
>>>> #include <linux/fs.h>
>>>> #include <linux/mm.h>
>>>> #include <linux/mman.h>
>>>> +#include <linux/memory-tiers.h>
>>>> #include "dax-private.h"
>>>> #include "bus.h"
>>>>
>>>> @@ -41,6 +42,12 @@ struct dax_kmem_data {
>>>> struct resource *res[];
>>>> };
>>>>
>>>> +static struct memory_dev_type default_pmem_type = {
>>>
>>> Why is this named as default_pmem_type? We will not change the memory
>>> type of a node usually.
>>>
>>
>> Any other suggestion? pmem_dev_type?
>
> Or dax_pmem_type?
>
> DAX is used to enumerate the memory device.
>
>>
>>>> + .adistance = MEMTIER_ADISTANCE_PMEM,
>>>> + .tier_sibiling = LIST_HEAD_INIT(default_pmem_type.tier_sibiling),
>>>> + .nodes = NODE_MASK_NONE,
>>>> +};
>>>> +
>>>> static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>>> {
>>>> struct device *dev = &dev_dax->dev;
>>>> @@ -62,6 +69,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>>> return -EINVAL;
>>>> }
>>>>
>>>> + init_node_memory_type(numa_node, &default_pmem_type);
>>>> +
>>>
>>> The memory hot-add below may fail. So the error handling needs to be
>>> added.
>>>
>>> And, it appears that the memory type and memory tier of a node may be
>>> fully initialized here before NUMA hot-adding started. So I suggest to
>>> set node_memory_types[] here only. And set memory_dev_type->nodes in
>>> node hot-add callback. I think there is the proper place to complete
>>> the initialization.
>>>
>>> And, in theory dax/kmem.c can be unloaded. So we need to clear
>>> node_memory_types[] for nodes somewhere.
>>>
>>
>> I guess by module exit we can be sure that all the memory managed
>> by dax/kmem is hotplugged out. How about something like below?
>
> Because we set node_memorty_types[] in dev_dax_kmem_probe(), it's
> natural to clear it in dev_dax_kmem_remove().
>
Most of required reset/clear is done as part of memory hotunplug. So
if we did manage to successfully unplug the memory, everything except
node_memory_types[node] should be reset. That makes the clear_node_memory_type
the below.
void clear_node_memory_type(int node, struct memory_dev_type *memtype)
{
mutex_lock(&memory_tier_lock);
/*
* memory unplug did clear the node from the memtype and
* dax/kem did initialize this node's memory type.
*/
if (!node_isset(node, memtype->nodes) && node_memory_types[node] == memtype){
node_memory_types[node] = NULL;
}
mutex_unlock(&memory_tier_lock);
}
With the module unload, it is kind of force removing the usage of the specific memtype.
Considering module unload will remove the usage of specific memtype from other parts
of the kernel and we already do all the required reset in memory hot unplug, do we
need to do the clear_node_memory_type above?
-aneesh
next prev parent reply other threads:[~2022-08-01 4:41 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-28 19:04 [PATCH v11 0/8] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 1/8] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-07-29 6:25 ` Huang, Ying
2022-07-29 7:24 ` Aneesh Kumar K.V
2022-08-02 2:50 ` Dan Williams
2022-08-02 3:16 ` Huang, Ying
2022-08-02 3:40 ` Dan Williams
2022-08-02 5:03 ` Aneesh Kumar K V
2022-08-02 6:57 ` Huang, Ying
2022-08-02 9:34 ` Aneesh Kumar K V
2022-08-04 0:56 ` Huang, Ying
2022-08-04 4:49 ` Aneesh Kumar K V
2022-08-04 5:19 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 2/8] mm/demotion: Move memory demotion related code Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 3/8] mm/demotion: Add hotplug callbacks to handle new numa node onlined Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM Aneesh Kumar K.V
2022-07-29 6:20 ` Huang, Ying
2022-07-29 7:19 ` Aneesh Kumar K.V
2022-08-01 2:06 ` Huang, Ying
2022-08-01 4:40 ` Aneesh Kumar K V [this message]
2022-08-01 5:10 ` Huang, Ying
2022-08-01 5:38 ` Aneesh Kumar K V
2022-08-01 6:37 ` Huang, Ying
2022-08-01 6:55 ` Aneesh Kumar K V
2022-08-01 7:13 ` Huang, Ying
2022-08-01 7:41 ` Aneesh Kumar K V
2022-08-02 1:58 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 5/8] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-07-29 6:35 ` Huang, Ying
2022-07-29 7:22 ` Aneesh Kumar K.V
2022-08-01 2:15 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 6/8] mm/demotion: Add pg_data_t member to track node memory tier details Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 7/8] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 8/8] mm/demotion: Update node_is_toptier to work with memory tiers Aneesh Kumar K.V
2022-07-29 6:39 ` Huang, Ying
2022-07-29 6:41 ` Aneesh Kumar K V
2022-07-29 6:47 ` Aneesh Kumar K V
2022-08-01 1:04 ` Huang, Ying
2022-07-29 5:30 ` [PATCH v11 0/8] mm/demotion: Memory tiers and demotion Huang, Ying
2022-07-29 6:17 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5545c90-9595-d08c-8a1c-1c15e3b94999@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=hannes@cmpxchg.org \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya.oss@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox