From: Boaz Harrosh <boazh@netapp.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Jeff Moyer <jmoyer@redhat.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <mawilcox@microsoft.com>,
Amit Golander <Amit.Golander@netapp.com>
Subject: Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU
Date: Tue, 15 May 2018 16:24:05 +0300 [thread overview]
Message-ID: <6698c486-56e7-1fc2-567c-69cc446a58be@netapp.com> (raw)
In-Reply-To: <cff721c3-65e8-c1e8-9f6d-c37ce6e56416@netapp.com>
On 15/05/18 14:54, Boaz Harrosh wrote:
> On 15/05/18 03:44, Matthew Wilcox wrote:
>> On Mon, May 14, 2018 at 02:49:01PM -0700, Andrew Morton wrote:
>>> On Mon, 14 May 2018 20:28:01 +0300 Boaz Harrosh <boazh@netapp.com> wrote:
>>>> In this project we utilize a per-core server thread so everything
>>>> is kept local. If we use the regular zap_ptes() API All CPU's
>>>> are scheduled for the unmap, though in our case we know that we
>>>> have only used a single core. The regular zap_ptes adds a very big
>>>> latency on every operation and mostly kills the concurrency of the
>>>> over all system. Because it imposes a serialization between all cores
>>>
>>> I'd have thought that in this situation, only the local CPU's bit is
>>> set in the vma's mm_cpumask() and the remote invalidations are not
>>> performed. Is that a misunderstanding, or is all that stuff not working
>>> correctly?
>>
>> I think you misunderstand Boaz's architecture. He has one thread per CPU,
>> so every bit will be set in the mm's (not vma's) mm_cpumask.
>>
>
> Hi Andrew, Matthew
>
> Yes I have been trying to investigate and trace this for days.
> Please see the code below:
>
>> #define flush_tlb_range(vma, start, end) \
>> flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
>
> The mm_struct @mm below comes from here vma->vm_mm
>
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index e055d1a..1d398a0 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -611,39 +611,40 @@ static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
>> void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>> unsigned long end, unsigned long vmflag)
>> {
>> int cpu;
>>
>> struct flush_tlb_info info __aligned(SMP_CACHE_BYTES) = {
>> .mm = mm,
>> };
>>
>> cpu = get_cpu();
>>
>> /* This is also a barrier that synchronizes with switch_mm(). */
>> info.new_tlb_gen = inc_mm_tlb_gen(mm);
>>
>> /* Should we flush just the requested range? */
>> if ((end != TLB_FLUSH_ALL) &&
>> !(vmflag & VM_HUGETLB) &&
>> ((end - start) >> PAGE_SHIFT) <= tlb_single_page_flush_ceiling) {
>> info.start = start;
>> info.end = end;
>> } else {
>> info.start = 0UL;
>> info.end = TLB_FLUSH_ALL;
>> }
>>
>> if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
>> VM_WARN_ON(irqs_disabled());
>> local_irq_disable();
>> flush_tlb_func_local(&info, TLB_LOCAL_MM_SHOOTDOWN);
>> local_irq_enable();
>> }
>>
>> - if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
>> + if (!(vmflag & VM_LOCAL_CPU) &&
>> + cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
>> flush_tlb_others(mm_cpumask(mm), &info);
>>
>
> I have been tracing the mm_cpumask(vma->vm_mm) at my driver at
> different points. At vma creation (file_operations->mmap()),
> and before the call to insert_pfn (which calls here).
>
> At the beginning I was wishful thinking that the mm_cpumask(vma->vm_mm)
> should have a single bit set just as the affinity of the thread on
> creation of that thread. But then I saw that at %80 of the times some
> other random bits are also set.
>
> Yes Random. Always the thread affinity (single) bit was set but
> then zero one or two more bits were set as well. Never seen more then
> two though, which baffles me a lot.
>
> If it was like Matthew said .i.e the cpumask of the all process
> then I would expect all the bits to be set. Because I have a thread
> on each core. And also I would even expect that all vma->vm_mm
> or maybe mm_cpumask(vma->vm_mm) to point to the same global object.
> But it was not so. it was pointing to some thread unique object but
> still those phantom bits were set all over. (And I am almost sure
> same vma had those bits change over time)
>
> So I would love some mm guy to explain where are those bits collected?
> But I do not think this is a Kernel bug because as Matthew showed.
> that vma above can and is allowed to leak memory addresses to other
> threads / cores in the same process. So it appears that the Kernel
> has some correct logic behind its madness.
>
Hi Mark
So you see %20 of the times the mm_cpumask(vma->vm_mm) is a single
bit set. And a natural call to flush_tlb_range will only invalidate
the local cpu. Are you familiar with this logic?
> Which brings me to another question. How can I find from
> within a thread Say at the file_operations->mmap() call that the thread
> is indeed core-pinned. What mm_cpumask should I inspect?
>
Mark, Peter do you know how I can check the above?
Thanks
Boaz
>> put_cpu();
>> }
>
> Thanks
> Boaz
>
next prev parent reply other threads:[~2018-05-15 13:24 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-14 17:28 Boaz Harrosh
2018-05-14 18:26 ` Boaz Harrosh
2018-05-15 7:08 ` Christoph Hellwig
2018-05-15 10:45 ` Boaz Harrosh
2018-05-14 19:15 ` Matthew Wilcox
2018-05-14 19:37 ` Boaz Harrosh
2018-05-15 0:41 ` Matthew Wilcox
2018-05-15 10:43 ` Boaz Harrosh
2018-05-15 11:11 ` Matthew Wilcox
2018-05-15 11:41 ` Boaz Harrosh
2018-05-15 12:03 ` Matthew Wilcox
2018-05-15 13:29 ` Boaz Harrosh
2018-05-15 13:50 ` Matthew Wilcox
2018-05-15 14:10 ` Boaz Harrosh
2018-05-15 14:18 ` Matthew Wilcox
2018-05-15 14:30 ` Boaz Harrosh
2018-05-15 12:09 ` Peter Zijlstra
2018-05-15 12:31 ` Boaz Harrosh
2018-05-15 11:47 ` Peter Zijlstra
2018-05-15 12:01 ` Boaz Harrosh
2018-05-15 12:07 ` Mark Rutland
2018-05-15 12:35 ` Peter Zijlstra
2018-05-15 13:19 ` Boaz Harrosh
2018-05-18 14:14 ` Christopher Lameter
2018-05-22 16:05 ` Boaz Harrosh
2018-05-22 16:18 ` Dave Hansen
2018-05-22 16:46 ` Christopher Lameter
2018-05-22 16:56 ` Peter Zijlstra
2018-05-22 17:03 ` Dave Hansen
2018-05-22 17:35 ` Christopher Lameter
2018-05-22 17:51 ` Matthew Wilcox
2018-05-23 17:30 ` Dave Hansen
2018-05-23 17:46 ` Nadav Amit
2018-05-23 18:10 ` Mark Rutland
2018-05-14 21:49 ` Andrew Morton
2018-05-15 0:44 ` Matthew Wilcox
2018-05-15 11:54 ` Boaz Harrosh
2018-05-15 13:24 ` Boaz Harrosh [this message]
2018-05-15 14:17 ` Peter Zijlstra
2018-05-15 14:36 ` Boaz Harrosh
2018-05-15 14:19 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6698c486-56e7-1fc2-567c-69cc446a58be@netapp.com \
--to=boazh@netapp.com \
--cc=Amit.Golander@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jack@suse.cz \
--cc=jmoyer@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=mawilcox@microsoft.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox