From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f71.google.com (mail-oi0-f71.google.com [209.85.218.71]) by kanga.kvack.org (Postfix) with ESMTP id 3FF3C8E0001 for ; Wed, 12 Sep 2018 01:48:43 -0400 (EDT) Received: by mail-oi0-f71.google.com with SMTP id b8-v6so1076697oib.4 for ; Tue, 11 Sep 2018 22:48:43 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id c12-v6sor43940oib.5.2018.09.11.22.48.41 for (Google Transport Security); Tue, 11 Sep 2018 22:48:41 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180910234400.4068.15541.stgit@localhost.localdomain> References: <20180910232615.4068.29155.stgit@localhost.localdomain> <20180910234400.4068.15541.stgit@localhost.localdomain> From: Dan Williams Date: Tue, 11 Sep 2018 22:48:40 -0700 Message-ID: Subject: Re: [PATCH 4/4] nvdimm: Trigger the device probe on a cpu local to the device Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Alexander Duyck Cc: Linux MM , Linux Kernel Mailing List , linux-nvdimm , pavel.tatashin@microsoft.com, Michal Hocko , Dave Jiang , Ingo Molnar , Dave Hansen , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Logan Gunthorpe , "Kirill A. Shutemov" On Mon, Sep 10, 2018 at 4:44 PM, Alexander Duyck wrote: > From: Alexander Duyck > > This patch is based off of the pci_call_probe function used to initialize > PCI devices. The general idea here is to move the probe call to a location > that is local to the memory being initialized. By doing this we can shave > significant time off of the total time needed for initialization. > > With this patch applied I see a significant reduction in overall init time > as without it the init varied between 23 and 37 seconds to initialize a 3GB > node. With this patch applied the variance is only between 23 and 26 > seconds to initialize each node. > > I hope to refine this further in the future by combining this logic into > the async_schedule_domain code that is already in use. By doing that it > would likely make this functionality redundant. Yeah, it is a bit sad that we schedule an async thread only to move it back somewhere else. Could we trivially achieve the same with an async_schedule_domain_on_cpu() variant? It seems we can and the workqueue core will "Do the right thing". I now notice that async uses the system_unbound_wq and work_on_cpu() uses the system_wq. I don't think we want long running nvdimm work on system_wq.