linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <boazh@netapp.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jeff Moyer <jmoyer@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Amit Golander <Amit.Golander@netapp.com>
Subject: Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU
Date: Tue, 15 May 2018 16:24:05 +0300	[thread overview]
Message-ID: <6698c486-56e7-1fc2-567c-69cc446a58be@netapp.com> (raw)
In-Reply-To: <cff721c3-65e8-c1e8-9f6d-c37ce6e56416@netapp.com>

On 15/05/18 14:54, Boaz Harrosh wrote:
> On 15/05/18 03:44, Matthew Wilcox wrote:
>> On Mon, May 14, 2018 at 02:49:01PM -0700, Andrew Morton wrote:
>>> On Mon, 14 May 2018 20:28:01 +0300 Boaz Harrosh <boazh@netapp.com> wrote:
>>>> In this project we utilize a per-core server thread so everything
>>>> is kept local. If we use the regular zap_ptes() API All CPU's
>>>> are scheduled for the unmap, though in our case we know that we
>>>> have only used a single core. The regular zap_ptes adds a very big
>>>> latency on every operation and mostly kills the concurrency of the
>>>> over all system. Because it imposes a serialization between all cores
>>>
>>> I'd have thought that in this situation, only the local CPU's bit is
>>> set in the vma's mm_cpumask() and the remote invalidations are not
>>> performed.  Is that a misunderstanding, or is all that stuff not working
>>> correctly?
>>
>> I think you misunderstand Boaz's architecture.  He has one thread per CPU,
>> so every bit will be set in the mm's (not vma's) mm_cpumask.
>>
> 
> Hi Andrew, Matthew
> 
> Yes I have been trying to investigate and trace this for days.
> Please see the code below:
> 
>> #define flush_tlb_range(vma, start, end)	\
>> 		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
> 
> The mm_struct @mm below comes from here vma->vm_mm
> 
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index e055d1a..1d398a0 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -611,39 +611,40 @@ static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
>>  void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>>  				unsigned long end, unsigned long vmflag)
>>  {
>>  	int cpu;
>>  
>>  	struct flush_tlb_info info __aligned(SMP_CACHE_BYTES) = {
>>  		.mm = mm,
>>  	};
>>  
>>  	cpu = get_cpu();
>>  
>>  	/* This is also a barrier that synchronizes with switch_mm(). */
>>  	info.new_tlb_gen = inc_mm_tlb_gen(mm);
>>  
>>  	/* Should we flush just the requested range? */
>>  	if ((end != TLB_FLUSH_ALL) &&
>>  	    !(vmflag & VM_HUGETLB) &&
>>  	    ((end - start) >> PAGE_SHIFT) <= tlb_single_page_flush_ceiling) {
>>  		info.start = start;
>>  		info.end = end;
>>  	} else {
>>  		info.start = 0UL;
>>  		info.end = TLB_FLUSH_ALL;
>>  	}
>>  
>>  	if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
>>  		VM_WARN_ON(irqs_disabled());
>>  		local_irq_disable();
>>  		flush_tlb_func_local(&info, TLB_LOCAL_MM_SHOOTDOWN);
>>  		local_irq_enable();
>>  	}
>>  
>> -	if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
>> +	if (!(vmflag & VM_LOCAL_CPU) &&
>> +	    cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
>>  		flush_tlb_others(mm_cpumask(mm), &info);
>>  
> 
> I have been tracing the mm_cpumask(vma->vm_mm) at my driver at
> different points. At vma creation (file_operations->mmap()), 
> and before the call to insert_pfn (which calls here).
> 
> At the beginning I was wishful thinking that the mm_cpumask(vma->vm_mm)
> should have a single bit set just as the affinity of the thread on
> creation of that thread. But then I saw that at %80 of the times some
> other random bits are also set.
> 
> Yes Random. Always the thread affinity (single) bit was set but
> then zero one or two more bits were set as well. Never seen more then
> two though, which baffles me a lot.
> 
> If it was like Matthew said .i.e the cpumask of the all process
> then I would expect all the bits to be set. Because I have a thread
> on each core. And also I would even expect that all vma->vm_mm
> or maybe mm_cpumask(vma->vm_mm) to point to the same global object.
> But it was not so. it was pointing to some thread unique object but
> still those phantom bits were set all over. (And I am almost sure
> same vma had those bits change over time)
> 
> So I would love some mm guy to explain where are those bits collected?
> But I do not think this is a Kernel bug because as Matthew showed.
> that vma above can and is allowed to leak memory addresses to other
> threads / cores in the same process. So it appears that the Kernel
> has some correct logic behind its madness.
> 
Hi Mark

So you see %20 of the times the mm_cpumask(vma->vm_mm) is a single
bit set. And a natural call to flush_tlb_range will only invalidate
the local cpu. Are you familiar with this logic?

> Which brings me to another question. How can I find from
> within a thread Say at the file_operations->mmap() call that the thread
> is indeed core-pinned. What mm_cpumask should I inspect?
> 

Mark, Peter do you know how I can check the above?

Thanks
Boaz

>>  	put_cpu();
>>  }
> 
> Thanks
> Boaz
> 

  reply	other threads:[~2018-05-15 13:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-14 17:28 Boaz Harrosh
2018-05-14 18:26 ` Boaz Harrosh
2018-05-15  7:08   ` Christoph Hellwig
2018-05-15 10:45     ` Boaz Harrosh
2018-05-14 19:15 ` Matthew Wilcox
2018-05-14 19:37   ` Boaz Harrosh
2018-05-15  0:41     ` Matthew Wilcox
2018-05-15 10:43       ` Boaz Harrosh
2018-05-15 11:11         ` Matthew Wilcox
2018-05-15 11:41           ` Boaz Harrosh
2018-05-15 12:03             ` Matthew Wilcox
2018-05-15 13:29               ` Boaz Harrosh
2018-05-15 13:50                 ` Matthew Wilcox
2018-05-15 14:10                   ` Boaz Harrosh
2018-05-15 14:18                     ` Matthew Wilcox
2018-05-15 14:30                       ` Boaz Harrosh
2018-05-15 12:09             ` Peter Zijlstra
2018-05-15 12:31               ` Boaz Harrosh
2018-05-15 11:47         ` Peter Zijlstra
2018-05-15 12:01           ` Boaz Harrosh
2018-05-15 12:07         ` Mark Rutland
2018-05-15 12:35           ` Peter Zijlstra
2018-05-15 13:19           ` Boaz Harrosh
2018-05-18 14:14         ` Christopher Lameter
2018-05-22 16:05           ` Boaz Harrosh
2018-05-22 16:18             ` Dave Hansen
2018-05-22 16:46               ` Christopher Lameter
2018-05-22 16:56                 ` Peter Zijlstra
2018-05-22 17:03                 ` Dave Hansen
2018-05-22 17:35                   ` Christopher Lameter
2018-05-22 17:51                   ` Matthew Wilcox
2018-05-23 17:30                     ` Dave Hansen
2018-05-23 17:46                       ` Nadav Amit
2018-05-23 18:10             ` Mark Rutland
2018-05-14 21:49 ` Andrew Morton
2018-05-15  0:44   ` Matthew Wilcox
2018-05-15 11:54     ` Boaz Harrosh
2018-05-15 13:24       ` Boaz Harrosh [this message]
2018-05-15 14:17       ` Peter Zijlstra
2018-05-15 14:36         ` Boaz Harrosh
2018-05-15 14:19 ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6698c486-56e7-1fc2-567c-69cc446a58be@netapp.com \
    --to=boazh@netapp.com \
    --cc=Amit.Golander@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mawilcox@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox