From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
Wei Xu <weixugc@google.com>, Yang Shi <shy828301@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim C Chen <tim.c.chen@intel.com>,
Michal Hocko <mhocko@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hesham Almatary <hesham.almatary@huawei.com>,
Dave Hansen <dave.hansen@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Alistair Popple <apopple@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Johannes Weiner <hannes@cmpxchg.org>,
jvgediya.oss@gmail.com
Subject: Re: [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM
Date: Mon, 1 Aug 2022 13:11:11 +0530 [thread overview]
Message-ID: <394c0599-2dc0-0303-cd86-bdd2d265d1ee@linux.ibm.com> (raw)
In-Reply-To: <87h72wjv27.fsf@yhuang6-desk2.ccr.corp.intel.com>
On 8/1/22 12:43 PM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>
>> On 8/1/22 12:07 PM, Huang, Ying wrote:
>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> On 8/1/22 10:40 AM, Huang, Ying wrote:
>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> On 8/1/22 7:36 AM, Huang, Ying wrote:
>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>
>>>>>>>> "Huang, Ying" <ying.huang@intel.com> writes:
>>>>>>>>
>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>
>> ....
>>
>>>>>>
>>>>>> With the module unload, it is kind of force removing the usage of the specific memtype.
>>>>>> Considering module unload will remove the usage of specific memtype from other parts
>>>>>> of the kernel and we already do all the required reset in memory hot unplug, do we
>>>>>> need to do the clear_node_memory_type above?
>>>>>
>>>>> Per my understanding, we need to call clear_node_memory_type() in
>>>>> dev_dax_kmem_remove(). After that, we have nothing to do in
>>>>> dax_kmem_exit().
>>>>>
>>>>
>>>> Ok, I guess you are suggesting to do the clear_node_memory_type even if we fail the memory remove.
>>>
>>> Can we use node_memory_types[] to indicate whether a node is managed by
>>> a driver?
>>>
>>> Regardless being succeeded or failed, dev_dax_kmem_remove() will set
>>> node_memory_types[] = NULL. But until node is offlined, we will still
>>> keep the node in the memory_dev_type (dax_pmem_type).
>>>
>>> And we will prevent dax/kmem from unloading via try_module_get() and add
>>> "struct module *" to struct memory_dev_type.
>>>
>>
>> Current dax/kmem driver is not holding any module reference and allows the module to be unloaded
>> anytime. Even if the memory onlined by the driver fails to be unplugged. Addition of memory_dev_type
>> as suggested by you will be different than that. Page demotion can continue to work without the
>> support of dax_pmem_type as long as we keep the older demotion order. Any new demotion order
>> rebuild will remove the the memory node which was not hotunplugged from the demotion order. Isn't that
>> a much simpler implementation?
>
> Per my understanding, unbinding/binding the dax/kmem driver means
> changing the memory type of a memory device. For example, unbinding
> dax/kmem driver may mean changing the memory type from dax_pmem_type to
> default_memory_type (or default_dram_type). That appears strange. But
> if we force the NUMA node to be offlined for unbinding, we can avoid to
> change the memory type to default_memory_type.
>
If we are able to unplug all the memory, we do remove the node from N_MEMORY.
If we fail to unplug the memory, we have two options.
1) Keep the same demotion order
2) Rebuild the demotion order which results in memory NUMA node not participating
in demotion.
I agree with you that we should not switch to default memory type.
The below code demonstrate how it can be done. If we want to keep
the same demotion order, we can remove establish_demotion_target() from
the below code.
void clear_node_memory_type(int node, struct memory_dev_type *memtype)
{
struct memory_tier *memtier;
pg_data_t *pgdat = NODE_DATA(node);
mutex_lock(&memory_tier_lock);
/*
* Even if we fail to unplug memory, clear the association of
* this node to this specific memory type.
*/
if (node_isset(node, memtype->nodes) && node_memory_types[node] == memtype) {
memtier = __node_get_memory_tier(node);
if (memtier) {
rcu_assign_pointer(pgdat->memtier, NULL);
synchronize_rcu();
}
node_clear(node, memtype->nodes);
if (nodes_empty(memtype->nodes)) {
list_del(&memtype->tier_sibiling);
memtype->memtier = NULL;
if (memtier && list_empty(&memtier->memory_types))
destroy_memory_tier(memtier);
}
establish_demotion_targets();
}
node_memory_types[node] = NULL;
mutex_unlock(&memory_tier_lock);
}
If we agree that we want to keep the current behavior (that is to allow kmem driver unload
even on memory unplug failure) we can go with the above change. If we are suggesting we
should prevent a driver unload, then IMHO it should be independent of memory_dev_type
(or this patch series). We should make sure we take a module reference on successful
memory online and drop it only on successful offline.
-aneesh
next prev parent reply other threads:[~2022-08-01 7:41 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-28 19:04 [PATCH v11 0/8] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 1/8] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-07-29 6:25 ` Huang, Ying
2022-07-29 7:24 ` Aneesh Kumar K.V
2022-08-02 2:50 ` Dan Williams
2022-08-02 3:16 ` Huang, Ying
2022-08-02 3:40 ` Dan Williams
2022-08-02 5:03 ` Aneesh Kumar K V
2022-08-02 6:57 ` Huang, Ying
2022-08-02 9:34 ` Aneesh Kumar K V
2022-08-04 0:56 ` Huang, Ying
2022-08-04 4:49 ` Aneesh Kumar K V
2022-08-04 5:19 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 2/8] mm/demotion: Move memory demotion related code Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 3/8] mm/demotion: Add hotplug callbacks to handle new numa node onlined Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM Aneesh Kumar K.V
2022-07-29 6:20 ` Huang, Ying
2022-07-29 7:19 ` Aneesh Kumar K.V
2022-08-01 2:06 ` Huang, Ying
2022-08-01 4:40 ` Aneesh Kumar K V
2022-08-01 5:10 ` Huang, Ying
2022-08-01 5:38 ` Aneesh Kumar K V
2022-08-01 6:37 ` Huang, Ying
2022-08-01 6:55 ` Aneesh Kumar K V
2022-08-01 7:13 ` Huang, Ying
2022-08-01 7:41 ` Aneesh Kumar K V [this message]
2022-08-02 1:58 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 5/8] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-07-29 6:35 ` Huang, Ying
2022-07-29 7:22 ` Aneesh Kumar K.V
2022-08-01 2:15 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 6/8] mm/demotion: Add pg_data_t member to track node memory tier details Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 7/8] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 8/8] mm/demotion: Update node_is_toptier to work with memory tiers Aneesh Kumar K.V
2022-07-29 6:39 ` Huang, Ying
2022-07-29 6:41 ` Aneesh Kumar K V
2022-07-29 6:47 ` Aneesh Kumar K V
2022-08-01 1:04 ` Huang, Ying
2022-07-29 5:30 ` [PATCH v11 0/8] mm/demotion: Memory tiers and demotion Huang, Ying
2022-07-29 6:17 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=394c0599-2dc0-0303-cd86-bdd2d265d1ee@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=hannes@cmpxchg.org \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya.oss@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox