From: Boaz Harrosh <boazh@netapp.com>
To: Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Jeff Moyer <jmoyer@redhat.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <mawilcox@microsoft.com>,
Amit Golander <Amit.Golander@netapp.com>
Subject: Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU
Date: Tue, 15 May 2018 14:54:29 +0300 [thread overview]
Message-ID: <cff721c3-65e8-c1e8-9f6d-c37ce6e56416@netapp.com> (raw)
In-Reply-To: <20180515004406.GB5168@bombadil.infradead.org>
On 15/05/18 03:44, Matthew Wilcox wrote:
> On Mon, May 14, 2018 at 02:49:01PM -0700, Andrew Morton wrote:
>> On Mon, 14 May 2018 20:28:01 +0300 Boaz Harrosh <boazh@netapp.com> wrote:
>>> In this project we utilize a per-core server thread so everything
>>> is kept local. If we use the regular zap_ptes() API All CPU's
>>> are scheduled for the unmap, though in our case we know that we
>>> have only used a single core. The regular zap_ptes adds a very big
>>> latency on every operation and mostly kills the concurrency of the
>>> over all system. Because it imposes a serialization between all cores
>>
>> I'd have thought that in this situation, only the local CPU's bit is
>> set in the vma's mm_cpumask() and the remote invalidations are not
>> performed. Is that a misunderstanding, or is all that stuff not working
>> correctly?
>
> I think you misunderstand Boaz's architecture. He has one thread per CPU,
> so every bit will be set in the mm's (not vma's) mm_cpumask.
>
Hi Andrew, Matthew
Yes I have been trying to investigate and trace this for days.
Please see the code below:
> #define flush_tlb_range(vma, start, end) \
> flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
The mm_struct @mm below comes from here vma->vm_mm
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index e055d1a..1d398a0 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -611,39 +611,40 @@ static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
> void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> unsigned long end, unsigned long vmflag)
> {
> int cpu;
>
> struct flush_tlb_info info __aligned(SMP_CACHE_BYTES) = {
> .mm = mm,
> };
>
> cpu = get_cpu();
>
> /* This is also a barrier that synchronizes with switch_mm(). */
> info.new_tlb_gen = inc_mm_tlb_gen(mm);
>
> /* Should we flush just the requested range? */
> if ((end != TLB_FLUSH_ALL) &&
> !(vmflag & VM_HUGETLB) &&
> ((end - start) >> PAGE_SHIFT) <= tlb_single_page_flush_ceiling) {
> info.start = start;
> info.end = end;
> } else {
> info.start = 0UL;
> info.end = TLB_FLUSH_ALL;
> }
>
> if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
> VM_WARN_ON(irqs_disabled());
> local_irq_disable();
> flush_tlb_func_local(&info, TLB_LOCAL_MM_SHOOTDOWN);
> local_irq_enable();
> }
>
> - if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
> + if (!(vmflag & VM_LOCAL_CPU) &&
> + cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
> flush_tlb_others(mm_cpumask(mm), &info);
>
I have been tracing the mm_cpumask(vma->vm_mm) at my driver at
different points. At vma creation (file_operations->mmap()),
and before the call to insert_pfn (which calls here).
At the beginning I was wishful thinking that the mm_cpumask(vma->vm_mm)
should have a single bit set just as the affinity of the thread on
creation of that thread. But then I saw that at %80 of the times some
other random bits are also set.
Yes Random. Always the thread affinity (single) bit was set but
then zero one or two more bits were set as well. Never seen more then
two though, which baffles me a lot.
If it was like Matthew said .i.e the cpumask of the all process
then I would expect all the bits to be set. Because I have a thread
on each core. And also I would even expect that all vma->vm_mm
or maybe mm_cpumask(vma->vm_mm) to point to the same global object.
But it was not so. it was pointing to some thread unique object but
still those phantom bits were set all over. (And I am almost sure
same vma had those bits change over time)
So I would love some mm guy to explain where are those bits collected?
But I do not think this is a Kernel bug because as Matthew showed.
that vma above can and is allowed to leak memory addresses to other
threads / cores in the same process. So it appears that the Kernel
has some correct logic behind its madness.
Which brings me to another question. How can I find from
within a thread Say at the file_operations->mmap() call that the thread
is indeed core-pinned. What mm_cpumask should I inspect?
> put_cpu();
> }
Thanks
Boaz
next prev parent reply other threads:[~2018-05-15 11:55 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-14 17:28 Boaz Harrosh
2018-05-14 18:26 ` Boaz Harrosh
2018-05-15 7:08 ` Christoph Hellwig
2018-05-15 10:45 ` Boaz Harrosh
2018-05-14 19:15 ` Matthew Wilcox
2018-05-14 19:37 ` Boaz Harrosh
2018-05-15 0:41 ` Matthew Wilcox
2018-05-15 10:43 ` Boaz Harrosh
2018-05-15 11:11 ` Matthew Wilcox
2018-05-15 11:41 ` Boaz Harrosh
2018-05-15 12:03 ` Matthew Wilcox
2018-05-15 13:29 ` Boaz Harrosh
2018-05-15 13:50 ` Matthew Wilcox
2018-05-15 14:10 ` Boaz Harrosh
2018-05-15 14:18 ` Matthew Wilcox
2018-05-15 14:30 ` Boaz Harrosh
2018-05-15 12:09 ` Peter Zijlstra
2018-05-15 12:31 ` Boaz Harrosh
2018-05-15 11:47 ` Peter Zijlstra
2018-05-15 12:01 ` Boaz Harrosh
2018-05-15 12:07 ` Mark Rutland
2018-05-15 12:35 ` Peter Zijlstra
2018-05-15 13:19 ` Boaz Harrosh
2018-05-18 14:14 ` Christopher Lameter
2018-05-22 16:05 ` Boaz Harrosh
2018-05-22 16:18 ` Dave Hansen
2018-05-22 16:46 ` Christopher Lameter
2018-05-22 16:56 ` Peter Zijlstra
2018-05-22 17:03 ` Dave Hansen
2018-05-22 17:35 ` Christopher Lameter
2018-05-22 17:51 ` Matthew Wilcox
2018-05-23 17:30 ` Dave Hansen
2018-05-23 17:46 ` Nadav Amit
2018-05-23 18:10 ` Mark Rutland
2018-05-14 21:49 ` Andrew Morton
2018-05-15 0:44 ` Matthew Wilcox
2018-05-15 11:54 ` Boaz Harrosh [this message]
2018-05-15 13:24 ` Boaz Harrosh
2018-05-15 14:17 ` Peter Zijlstra
2018-05-15 14:36 ` Boaz Harrosh
2018-05-15 14:19 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cff721c3-65e8-c1e8-9f6d-c37ce6e56416@netapp.com \
--to=boazh@netapp.com \
--cc=Amit.Golander@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jack@suse.cz \
--cc=jmoyer@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mawilcox@microsoft.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox