From: Nadav Amit <namit@vmware.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>, Borislav Petkov <bp@suse.de>,
Toshi Kani <toshi.kani@hpe.com>,
Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH 3/3] resource: Introduce resource cache
Date: Wed, 19 Jun 2019 20:35:14 +0000 [thread overview]
Message-ID: <9175AC33-BB3A-4D1F-B7FC-D0B1A8F971B6@vmware.com> (raw)
In-Reply-To: <CAErSpo5eiweMk2rfT81Kwnpd=MZsOa01prPo_rAFp-MZ9F2xdQ@mail.gmail.com>
> On Jun 19, 2019, at 6:00 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> On Tue, Jun 18, 2019 at 12:40 AM Nadav Amit <namit@vmware.com> wrote:
>>> On Jun 17, 2019, at 10:33 PM, Nadav Amit <namit@vmware.com> wrote:
>>>
>>>> On Jun 17, 2019, at 9:57 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>>>>
>>>> On Wed, 12 Jun 2019 21:59:03 -0700 Nadav Amit <namit@vmware.com> wrote:
>>>>
>>>>> For efficient search of resources, as needed to determine the memory
>>>>> type for dax page-faults, introduce a cache of the most recently used
>>>>> top-level resource. Caching the top-level should be safe as ranges in
>>>>> that level do not overlap (unlike those of lower levels).
>>>>>
>>>>> Keep the cache per-cpu to avoid possible contention. Whenever a resource
>>>>> is added, removed or changed, invalidate all the resources. The
>>>>> invalidation takes place when the resource_lock is taken for write,
>>>>> preventing possible races.
>>>>>
>>>>> This patch provides relatively small performance improvements over the
>>>>> previous patch (~0.5% on sysbench), but can benefit systems with many
>>>>> resources.
>>>>
>>>>> --- a/kernel/resource.c
>>>>> +++ b/kernel/resource.c
>>>>> @@ -53,6 +53,12 @@ struct resource_constraint {
>>>>>
>>>>> static DEFINE_RWLOCK(resource_lock);
>>>>>
>>>>> +/*
>>>>> + * Cache of the top-level resource that was most recently use by
>>>>> + * find_next_iomem_res().
>>>>> + */
>>>>> +static DEFINE_PER_CPU(struct resource *, resource_cache);
>>>>
>>>> A per-cpu cache which is accessed under a kernel-wide read_lock looks a
>>>> bit odd - the latency getting at that rwlock will swamp the benefit of
>>>> isolating the CPUs from each other when accessing resource_cache.
>>>>
>>>> On the other hand, if we have multiple CPUs running
>>>> find_next_iomem_res() concurrently then yes, I see the benefit. Has
>>>> the benefit of using a per-cpu cache (rather than a kernel-wide one)
>>>> been quantified?
>>>
>>> No. I am not sure how easy it would be to measure it. On the other hander
>>> the lock is not supposed to be contended (at most cases). At the time I saw
>>> numbers that showed that stores to “exclusive" cache lines can be as
>>> expensive as atomic operations [1]. I am not sure how up to date these
>>> numbers are though. In the benchmark I ran, multiple CPUs ran
>>> find_next_iomem_res() concurrently.
>>>
>>> [1] https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsigops.org%2Fs%2Fconferences%2Fsosp%2F2013%2Fpapers%2Fp33-david.pdf&data=02%7C01%7Cnamit%40vmware.com%7Ca2706c5ab2c544283f3b08d6f4b6152b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636965460234022371&sdata=cD7Nhs4jcJGMD7Lav6D%2BC6E5Sei0DiWhKXL7vz2cVHA%3D&reserved=0
>>
>> Just to clarify - the main motivation behind the per-cpu variable is not
>> about contention, but about the fact the different processes/threads that
>> run concurrently might use different resources.
>
> IIUC, the underlying problem is that dax relies heavily on ioremap(),
> and ioremap() on x86 takes too long because it relies on
> find_next_iomem_res() via the __ioremap_caller() ->
> __ioremap_check_mem() -> walk_mem_res() path.
I don’t know much about this path and whether it is painful. The path I was
regarding is during page-fault handling:
- handle_mm_fault
- __handle_mm_fault
- do_wp_page
- ext4_dax_fault
- ext4_dax_huge_fault
- dax_iomap_fault
- dax_iomap_pte_fault
- vmf_insert_mixed_mkwrite
- __vm_insert_mixed
- track_pfn_insert
- lookup_memtype
- pat_pagerange_is_ram
But indeed track_pfn_insert() in x86 specific. I guess the differences are
due to the page-table controlling the cachability in x86 (PAT), but I don’t
know much about other architectures and whether they have similar
cachability controls in the page-tables.
next prev parent reply other threads:[~2019-06-19 20:35 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-13 4:59 [PATCH 0/3] resource: find_next_iomem_res() improvements Nadav Amit
[not found] ` <20190613045903.4922-2-namit@vmware.com>
2019-06-15 22:15 ` [PATCH 1/3] resource: Fix locking in find_next_iomem_res() Sasha Levin
2019-06-17 19:14 ` Nadav Amit
2019-06-18 0:55 ` Sasha Levin
2019-06-18 1:32 ` Nadav Amit
2019-06-18 4:26 ` Andrew Morton
[not found] ` <20190613045903.4922-4-namit@vmware.com>
2019-06-15 22:16 ` [PATCH 3/3] resource: Introduce resource cache Sasha Levin
2019-06-17 17:20 ` Nadav Amit
2019-06-18 4:57 ` Andrew Morton
2019-06-18 5:33 ` Nadav Amit
2019-06-18 5:40 ` Nadav Amit
2019-06-19 13:00 ` Bjorn Helgaas
2019-06-19 20:35 ` Nadav Amit [this message]
2019-06-19 21:53 ` Dan Williams
2019-06-20 21:31 ` Andi Kleen
2019-06-20 23:13 ` Dan Williams
2019-06-18 6:44 ` [PATCH 0/3] resource: find_next_iomem_res() improvements Dan Williams
2019-06-18 17:42 ` Nadav Amit
2019-06-18 18:30 ` Dan Williams
2019-06-18 21:56 ` Nadav Amit
2019-07-16 22:00 ` Andrew Morton
2019-07-16 22:06 ` Nadav Amit
2019-07-16 22:07 ` Dan Williams
2019-07-16 22:13 ` Nadav Amit
2019-07-16 22:20 ` Dan Williams
2019-07-16 22:28 ` Nadav Amit
2019-07-16 22:45 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9175AC33-BB3A-4D1F-B7FC-D0B1A8F971B6@vmware.com \
--to=namit@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=bhelgaas@google.com \
--cc=bp@suse.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=toshi.kani@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox