From: Boaz Harrosh <boazh@netapp.com>
To: Boaz Harrosh <boazh@netapp.com>, Jeff Moyer <jmoyer@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <mawilcox@microsoft.com>,
Amit Golander <Amit.Golander@netapp.com>
Subject: Re: [PATCH] mm: Add new vma flag VM_LOCAL_CPU
Date: Mon, 14 May 2018 21:26:13 +0300 [thread overview]
Message-ID: <1d5f676f-b5d1-3ad3-c7a5-25b390c0e44e@netapp.com> (raw)
In-Reply-To: <0efb5547-9250-6b6c-fe8e-cf4f44aaa5eb@netapp.com>
On 14/05/18 20:28, Boaz Harrosh wrote:
>
> On a call to mmap an mmap provider (like an FS) can put
> this flag on vma->vm_flags.
>
> The VM_LOCAL_CPU flag tells the Kernel that the vma will be used
> from a single-core only, and therefore invalidation (flush_tlb) of
> PTE(s) need not be a wide CPU scheduling.
>
> The motivation of this flag is the ZUFS project where we want
> to optimally map user-application buffers into a user-mode-server
> execute the operation and efficiently unmap.
>
I am please pushing for this patch ahead of the push of ZUFS, because
this is the only patch we need from otherwise an STD Kernel.
We are partnering with Distro(s) to push ZUFS out-of-tree to beta clients
to try and stabilize such a big project before final submission and
an ABI / on-disk freeze.
By itself this patch has 0 risk and can not break anything.
Thanks
Boaz
> In this project we utilize a per-core server thread so everything
> is kept local. If we use the regular zap_ptes() API All CPU's
> are scheduled for the unmap, though in our case we know that we
> have only used a single core. The regular zap_ptes adds a very big
> latency on every operation and mostly kills the concurrency of the
> over all system. Because it imposes a serialization between all cores
>
> Some preliminary measurements on a 40 core machines:
>
> unpatched patched
> Threads Op/s Lat [us] Op/s Lat [us]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1 185391 4.9 200799 4.6
> 2 197993 9.6 314321 5.9
> 4 310597 12.1 565574 6.6
> 8 546702 13.8 1113138 6.6
> 12 641728 17.2 1598451 6.8
> 18 744750 22.2 1648689 7.8
> 24 790805 28.3 1702285 8
> 36 849763 38.9 1783346 13.4
> 48 792000 44.6 1741873 17.4
>
> We can clearly see that on an unpatched Kernel we do not scale
> and the threads are interfering with each other. This is because
> flush-tlb is scheduled on all (other) CPUs.
>
> NOTE: This vma (VM_LOCAL_CPU) is never used during a page_fault. It is
> always used in a synchronous way from a thread pinned to a single core.
>
> Signed-off-by: Boaz Harrosh <boazh@netapp.com>
> ---
> arch/x86/mm/tlb.c | 3 ++-
> fs/proc/task_mmu.c | 3 +++
> include/linux/mm.h | 3 +++
> mm/memory.c | 13 +++++++++++--
> 4 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index e055d1a..1d398a0 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -640,7 +640,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> local_irq_enable();
> }
>
> - if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
> + if (!(vmflag & VM_LOCAL_CPU) &&
> + cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
> flush_tlb_others(mm_cpumask(mm), &info);
>
> put_cpu();
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index c486ad4..305d6e4 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -680,6 +680,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> [ilog2(VM_PKEY_BIT2)] = "",
> [ilog2(VM_PKEY_BIT3)] = "",
> #endif
> +#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> + [ilog2(VM_LOCAL_CPU)] = "lc",
> +#endif
> };
> size_t i;
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ac1f06..3d14107 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -226,6 +226,9 @@ extern unsigned int kobjsize(const void *objp);
> #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
> #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
> #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
> +#define VM_LOCAL_CPU BIT(37) /* FIXME: Needs to move from here */
> +#else /* ! CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
> +#define VM_LOCAL_CPU 0 /* FIXME: Needs to move from here */
> #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
>
> #if defined(CONFIG_X86)
> diff --git a/mm/memory.c b/mm/memory.c
> index 01f5464..6236f5e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1788,6 +1788,7 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> int retval;
> pte_t *pte, entry;
> spinlock_t *ptl;
> + bool need_flush = false;
>
> retval = -ENOMEM;
> pte = get_locked_pte(mm, addr, &ptl);
> @@ -1795,7 +1796,12 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> goto out;
> retval = -EBUSY;
> if (!pte_none(*pte)) {
> - if (mkwrite) {
> + if ((vma->vm_flags & VM_LOCAL_CPU)) {
> + /* VM_LOCAL_CPU is set, A single CPU is allowed to not
> + * go through zap_vma_ptes before changing a pte
> + */
> + need_flush = true;
> + } else if (mkwrite) {
> /*
> * For read faults on private mappings the PFN passed
> * in may not match the PFN we have mapped if the
> @@ -1807,8 +1813,9 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> goto out_unlock;
> entry = *pte;
> goto out_mkwrite;
> - } else
> + } else {
> goto out_unlock;
> + }
> }
>
> /* Ok, finally just insert the thing.. */
> @@ -1824,6 +1831,8 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> }
>
> set_pte_at(mm, addr, pte, entry);
> + if (need_flush)
> + flush_tlb_range(vma, addr, addr + PAGE_SIZE);
> update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
>
> retval = 0;
>
next prev parent reply other threads:[~2018-05-14 18:26 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-14 17:28 Boaz Harrosh
2018-05-14 18:26 ` Boaz Harrosh [this message]
2018-05-15 7:08 ` Christoph Hellwig
2018-05-15 10:45 ` Boaz Harrosh
2018-05-14 19:15 ` Matthew Wilcox
2018-05-14 19:37 ` Boaz Harrosh
2018-05-15 0:41 ` Matthew Wilcox
2018-05-15 10:43 ` Boaz Harrosh
2018-05-15 11:11 ` Matthew Wilcox
2018-05-15 11:41 ` Boaz Harrosh
2018-05-15 12:03 ` Matthew Wilcox
2018-05-15 13:29 ` Boaz Harrosh
2018-05-15 13:50 ` Matthew Wilcox
2018-05-15 14:10 ` Boaz Harrosh
2018-05-15 14:18 ` Matthew Wilcox
2018-05-15 14:30 ` Boaz Harrosh
2018-05-15 12:09 ` Peter Zijlstra
2018-05-15 12:31 ` Boaz Harrosh
2018-05-15 11:47 ` Peter Zijlstra
2018-05-15 12:01 ` Boaz Harrosh
2018-05-15 12:07 ` Mark Rutland
2018-05-15 12:35 ` Peter Zijlstra
2018-05-15 13:19 ` Boaz Harrosh
2018-05-18 14:14 ` Christopher Lameter
2018-05-22 16:05 ` Boaz Harrosh
2018-05-22 16:18 ` Dave Hansen
2018-05-22 16:46 ` Christopher Lameter
2018-05-22 16:56 ` Peter Zijlstra
2018-05-22 17:03 ` Dave Hansen
2018-05-22 17:35 ` Christopher Lameter
2018-05-22 17:51 ` Matthew Wilcox
2018-05-23 17:30 ` Dave Hansen
2018-05-23 17:46 ` Nadav Amit
2018-05-23 18:10 ` Mark Rutland
2018-05-14 21:49 ` Andrew Morton
2018-05-15 0:44 ` Matthew Wilcox
2018-05-15 11:54 ` Boaz Harrosh
2018-05-15 13:24 ` Boaz Harrosh
2018-05-15 14:17 ` Peter Zijlstra
2018-05-15 14:36 ` Boaz Harrosh
2018-05-15 14:19 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1d5f676f-b5d1-3ad3-c7a5-25b390c0e44e@netapp.com \
--to=boazh@netapp.com \
--cc=Amit.Golander@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jack@suse.cz \
--cc=jmoyer@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mawilcox@microsoft.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox