linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, Borislav Petkov <bp@suse.de>,
	Toshi Kani <toshi.kani@hpe.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH 3/3] resource: Introduce resource cache
Date: Wed, 19 Jun 2019 20:35:14 +0000	[thread overview]
Message-ID: <9175AC33-BB3A-4D1F-B7FC-D0B1A8F971B6@vmware.com> (raw)
In-Reply-To: <CAErSpo5eiweMk2rfT81Kwnpd=MZsOa01prPo_rAFp-MZ9F2xdQ@mail.gmail.com>

> On Jun 19, 2019, at 6:00 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> 
> On Tue, Jun 18, 2019 at 12:40 AM Nadav Amit <namit@vmware.com> wrote:
>>> On Jun 17, 2019, at 10:33 PM, Nadav Amit <namit@vmware.com> wrote:
>>> 
>>>> On Jun 17, 2019, at 9:57 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>>>> 
>>>> On Wed, 12 Jun 2019 21:59:03 -0700 Nadav Amit <namit@vmware.com> wrote:
>>>> 
>>>>> For efficient search of resources, as needed to determine the memory
>>>>> type for dax page-faults, introduce a cache of the most recently used
>>>>> top-level resource. Caching the top-level should be safe as ranges in
>>>>> that level do not overlap (unlike those of lower levels).
>>>>> 
>>>>> Keep the cache per-cpu to avoid possible contention. Whenever a resource
>>>>> is added, removed or changed, invalidate all the resources. The
>>>>> invalidation takes place when the resource_lock is taken for write,
>>>>> preventing possible races.
>>>>> 
>>>>> This patch provides relatively small performance improvements over the
>>>>> previous patch (~0.5% on sysbench), but can benefit systems with many
>>>>> resources.
>>>> 
>>>>> --- a/kernel/resource.c
>>>>> +++ b/kernel/resource.c
>>>>> @@ -53,6 +53,12 @@ struct resource_constraint {
>>>>> 
>>>>> static DEFINE_RWLOCK(resource_lock);
>>>>> 
>>>>> +/*
>>>>> + * Cache of the top-level resource that was most recently use by
>>>>> + * find_next_iomem_res().
>>>>> + */
>>>>> +static DEFINE_PER_CPU(struct resource *, resource_cache);
>>>> 
>>>> A per-cpu cache which is accessed under a kernel-wide read_lock looks a
>>>> bit odd - the latency getting at that rwlock will swamp the benefit of
>>>> isolating the CPUs from each other when accessing resource_cache.
>>>> 
>>>> On the other hand, if we have multiple CPUs running
>>>> find_next_iomem_res() concurrently then yes, I see the benefit.  Has
>>>> the benefit of using a per-cpu cache (rather than a kernel-wide one)
>>>> been quantified?
>>> 
>>> No. I am not sure how easy it would be to measure it. On the other hander
>>> the lock is not supposed to be contended (at most cases). At the time I saw
>>> numbers that showed that stores to “exclusive" cache lines can be as
>>> expensive as atomic operations [1]. I am not sure how up to date these
>>> numbers are though. In the benchmark I ran, multiple CPUs ran
>>> find_next_iomem_res() concurrently.
>>> 
>>> [1] https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsigops.org%2Fs%2Fconferences%2Fsosp%2F2013%2Fpapers%2Fp33-david.pdf&amp;data=02%7C01%7Cnamit%40vmware.com%7Ca2706c5ab2c544283f3b08d6f4b6152b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636965460234022371&amp;sdata=cD7Nhs4jcJGMD7Lav6D%2BC6E5Sei0DiWhKXL7vz2cVHA%3D&amp;reserved=0
>> 
>> Just to clarify - the main motivation behind the per-cpu variable is not
>> about contention, but about the fact the different processes/threads that
>> run concurrently might use different resources.
> 
> IIUC, the underlying problem is that dax relies heavily on ioremap(),
> and ioremap() on x86 takes too long because it relies on
> find_next_iomem_res() via the __ioremap_caller() ->
> __ioremap_check_mem() -> walk_mem_res() path.

I don’t know much about this path and whether it is painful. The path I was
regarding is during page-fault handling:

   - handle_mm_fault
      - __handle_mm_fault
         - do_wp_page
            - ext4_dax_fault
               - ext4_dax_huge_fault
                  - dax_iomap_fault
                     - dax_iomap_pte_fault
                        - vmf_insert_mixed_mkwrite
                           - __vm_insert_mixed
                              - track_pfn_insert
                                 - lookup_memtype
                                    - pat_pagerange_is_ram

But indeed track_pfn_insert() in x86 specific. I guess the differences are
due to the page-table controlling the cachability in x86 (PAT), but I don’t
know much about other architectures and whether they have similar
cachability controls in the page-tables.


  reply	other threads:[~2019-06-19 20:35 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-13  4:59 [PATCH 0/3] resource: find_next_iomem_res() improvements Nadav Amit
     [not found] ` <20190613045903.4922-2-namit@vmware.com>
2019-06-15 22:15   ` [PATCH 1/3] resource: Fix locking in find_next_iomem_res() Sasha Levin
2019-06-17 19:14     ` Nadav Amit
2019-06-18  0:55       ` Sasha Levin
2019-06-18  1:32         ` Nadav Amit
2019-06-18  4:26   ` Andrew Morton
     [not found] ` <20190613045903.4922-4-namit@vmware.com>
2019-06-15 22:16   ` [PATCH 3/3] resource: Introduce resource cache Sasha Levin
2019-06-17 17:20     ` Nadav Amit
2019-06-18  4:57   ` Andrew Morton
2019-06-18  5:33     ` Nadav Amit
2019-06-18  5:40       ` Nadav Amit
2019-06-19 13:00         ` Bjorn Helgaas
2019-06-19 20:35           ` Nadav Amit [this message]
2019-06-19 21:53           ` Dan Williams
2019-06-20 21:31             ` Andi Kleen
2019-06-20 23:13               ` Dan Williams
2019-06-18  6:44 ` [PATCH 0/3] resource: find_next_iomem_res() improvements Dan Williams
2019-06-18 17:42   ` Nadav Amit
2019-06-18 18:30     ` Dan Williams
2019-06-18 21:56       ` Nadav Amit
2019-07-16 22:00         ` Andrew Morton
2019-07-16 22:06           ` Nadav Amit
2019-07-16 22:07           ` Dan Williams
2019-07-16 22:13             ` Nadav Amit
2019-07-16 22:20               ` Dan Williams
2019-07-16 22:28                 ` Nadav Amit
2019-07-16 22:45                   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9175AC33-BB3A-4D1F-B7FC-D0B1A8F971B6@vmware.com \
    --to=namit@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=bp@suse.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox