From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Linux MM" <linux-mm@kvack.org>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
"Pasha Tatashin" <pavel.tatashin@microsoft.com>,
"Michal Hocko" <mhocko@suse.com>,
"Dave Jiang" <dave.jiang@intel.com>,
"Ingo Molnar" <mingo@kernel.org>,
"Dave Hansen" <dave.hansen@intel.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Logan Gunthorpe" <logang@deltatee.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v4 5/5] nvdimm: Schedule device registration on node local to the device
Date: Fri, 21 Sep 2018 07:46:46 -0700 [thread overview]
Message-ID: <6e17294f-4847-9e7a-2396-6fffaf8a8f4a@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4hqERm3YbgsE19M=8SRfrhyEo__LrLdcEj_YsLr2bLviA@mail.gmail.com>
On 9/20/2018 7:46 PM, Dan Williams wrote:
> On Thu, Sep 20, 2018 at 6:34 PM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
>>
>>
>>
>> On 9/20/2018 5:36 PM, Dan Williams wrote:
>>> On Thu, Sep 20, 2018 at 5:26 PM Alexander Duyck
>>> <alexander.h.duyck@linux.intel.com> wrote:
>>>>
>>>> On 9/20/2018 3:59 PM, Dan Williams wrote:
>>>>> On Thu, Sep 20, 2018 at 3:31 PM Alexander Duyck
>>>>> <alexander.h.duyck@linux.intel.com> wrote:
>>>>>>
>>>>>> This patch is meant to force the device registration for nvdimm devices to
>>>>>> be closer to the actual device. This is achieved by using either the NUMA
>>>>>> node ID of the region, or of the parent. By doing this we can have
>>>>>> everything above the region based on the region, and everything below the
>>>>>> region based on the nvdimm bus.
>>>>>>
>>>>>> One additional change I made is that we hold onto a reference to the parent
>>>>>> while we are going through registration. By doing this we can guarantee we
>>>>>> can complete the registration before we have the parent device removed.
>>>>>>
>>>>>> By guaranteeing NUMA locality I see an improvement of as high as 25% for
>>>>>> per-node init of a system with 12TB of persistent memory.
>>>>>>
>>>>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>>>>> ---
>>>>>> drivers/nvdimm/bus.c | 19 +++++++++++++++++--
>>>>>> 1 file changed, 17 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
>>>>>> index 8aae6dcc839f..ca935296d55e 100644
>>>>>> --- a/drivers/nvdimm/bus.c
>>>>>> +++ b/drivers/nvdimm/bus.c
>>>>>> @@ -487,7 +487,9 @@ static void nd_async_device_register(void *d, async_cookie_t cookie)
>>>>>> dev_err(dev, "%s: failed\n", __func__);
>>>>>> put_device(dev);
>>>>>> }
>>>>>> +
>>>>>> put_device(dev);
>>>>>> + put_device(dev->parent);
>>>>>
>>>>> Good catch. The child does not pin the parent until registration, but
>>>>> we need to make sure the parent isn't gone while were waiting for the
>>>>> registration work to run.
>>>>>
>>>>> Let's break this reference count fix out into its own separate patch,
>>>>> because this looks to be covering a gap that may need to be
>>>>> recommended for -stable.
>>>>
>>>> Okay, I guess I can do that.
>>>>
>>>>>
>>>>>>
>>>>>> static void nd_async_device_unregister(void *d, async_cookie_t cookie)
>>>>>> @@ -504,12 +506,25 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie)
>>>>>>
>>>>>> void __nd_device_register(struct device *dev)
>>>>>> {
>>>>>> + int node;
>>>>>> +
>>>>>> if (!dev)
>>>>>> return;
>>>>>> +
>>>>>> dev->bus = &nvdimm_bus_type;
>>>>>> + get_device(dev->parent);
>>>>>> get_device(dev);
>>>>>> - async_schedule_domain(nd_async_device_register, dev,
>>>>>> - &nd_async_domain);
>>>>>> +
>>>>>> + /*
>>>>>> + * For a region we can break away from the parent node,
>>>>>> + * otherwise for all other devices we just inherit the node from
>>>>>> + * the parent.
>>>>>> + */
>>>>>> + node = is_nd_region(dev) ? to_nd_region(dev)->numa_node :
>>>>>> + dev_to_node(dev->parent);
>>>>>
>>>>> Devices already automatically inherit the node of their parent, so I'm
>>>>> not understanding why this is needed?
>>>>
>>>> That doesn't happen until you call device_add, which you don't call
>>>> until nd_async_device_register. All that has been called on the device
>>>> up to now is device_initialize which leaves the node at NUMA_NO_NODE.
>>>
>>> Ooh, yeah, missed that. I think I'd prefer this policy to moved out to
>>> where we set the dev->parent before calling __nd_device_register, or
>>> at least a comment here about *why* we know region devices are special
>>> (i.e. because the nd_region_desc specified the node at region creation
>>> time).
>>>
>>
>> Are you talking about pulling the scheduling out or just adding a node
>> value to the nd_device_register call so it can be set directly from the
>> caller?
>
> I was thinking everywhere we set dev->parent before registering, also
> set the node...
That will not work unless we move the call to device_initialize to
somewhere before you are setting the node. That is why I was thinking it
might work to put the node assignment in nd_device_register itself since
it looks like the regions don't call __nd_device_register directly.
I guess we could get rid of nd_device_register if we wanted to go that
route.
>> If you wanted what I could do is pull the set_dev_node call from
>> nvdimm_bus_uevent and place it in nd_device_register. That should stick
>> as the node doesn't get overwritten by the parent if it is set after
>> device_initialize. If I did that along with the parent bit I was already
>> doing then all that would be left to do in is just use the dev_to_node
>> call on the device itself.
>
> ...but this is even better.
>
I'm not sure it adds that much. Basically My thought was we just need to
make sure to set the device node after the call to device_initialize but
before the call to device_add. This just seems like a bunch more work
spread the device_initialize calls all over and introduce possible
regressions.
next prev parent reply other threads:[~2018-09-21 14:46 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-20 22:24 [PATCH v4 0/5] Address issues slowing persistent memory initialization Alexander Duyck
2018-09-20 22:26 ` [PATCH v4 1/5] mm: Provide kernel parameter to allow disabling page init poisoning Alexander Duyck
2018-09-21 19:04 ` Pasha Tatashin
2018-09-21 19:41 ` Logan Gunthorpe
2018-09-21 19:52 ` Pasha Tatashin
2018-09-20 22:27 ` [PATCH v4 2/5] mm: Create non-atomic version of SetPageReserved for init use Alexander Duyck
2018-09-21 19:06 ` Pasha Tatashin
2018-09-20 22:29 ` [PATCH v4 3/5] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Alexander Duyck
2018-09-21 19:50 ` Pasha Tatashin
2018-09-21 20:03 ` Alexander Duyck
2018-09-21 20:14 ` Pasha Tatashin
2018-09-20 22:29 ` [PATCH v4 4/5] async: Add support for queueing on specific node Alexander Duyck
2018-09-21 14:57 ` Dan Williams
2018-09-21 17:02 ` Alexander Duyck
2018-09-29 8:15 ` [LKP] [async] 06f4f5bfb3: BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h kernel test robot
2018-09-20 22:29 ` [PATCH v4 5/5] nvdimm: Schedule device registration on node local to the device Alexander Duyck
2018-09-20 22:59 ` Dan Williams
2018-09-21 0:16 ` Alexander Duyck
2018-09-21 0:36 ` Dan Williams
2018-09-21 1:33 ` Alexander Duyck
2018-09-21 2:46 ` Dan Williams
2018-09-21 14:46 ` Alexander Duyck [this message]
2018-09-21 14:56 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e17294f-4847-9e7a-2396-6fffaf8a8f4a@linux.intel.com \
--to=alexander.h.duyck@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave.jiang@intel.com \
--cc=jglisse@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=logang@deltatee.com \
--cc=mhocko@suse.com \
--cc=mingo@kernel.org \
--cc=pavel.tatashin@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox