From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx152.postini.com [74.125.245.152]) by kanga.kvack.org (Postfix) with SMTP id D61936B002B for ; Thu, 11 Oct 2012 07:15:49 -0400 (EDT) Date: Thu, 11 Oct 2012 12:15:45 +0100 From: Mel Gorman Subject: Re: [PATCH 05/33] autonuma: pte_numa() and pmd_numa() Message-ID: <20121011111545.GR3317@csn.ul.ie> References: <1349308275-2174-1-git-send-email-aarcange@redhat.com> <1349308275-2174-6-git-send-email-aarcange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1349308275-2174-6-git-send-email-aarcange@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andrew Morton , Peter Zijlstra , Ingo Molnar , Hugh Dickins , Rik van Riel , Johannes Weiner , Hillf Danton , Andrew Jones , Dan Smith , Thomas Gleixner , Paul Turner , Christoph Lameter , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Srivatsa Vaddagiri , Alex Shi , Mauricio Faria de Oliveira , Konrad Rzeszutek Wilk , Don Morris , Benjamin Herrenschmidt On Thu, Oct 04, 2012 at 01:50:47AM +0200, Andrea Arcangeli wrote: > Implement pte_numa and pmd_numa. > > We must atomically set the numa bit and clear the present bit to > define a pte_numa or pmd_numa. > Or I could just have kept reading :/ > Once a pte or pmd has been set as pte_numa or pmd_numa, the next time > a thread touches a virtual address in the corresponding virtual range, > a NUMA hinting page fault will trigger. The NUMA hinting page fault > will clear the NUMA bit and set the present bit again to resolve the > page fault. > > NUMA hinting page faults are used: > > 1) to fill in the per-thread NUMA statistic stored for each thread in > a current->task_autonuma data structure > > 2) to track the per-node last_nid information in the page structure to > detect false sharing > > 3) to migrate the page with Migrate On Fault if there have been enough > NUMA hinting page faults on the page coming from remote CPUs > (autonuma_last_nid heuristic) > > NUMA hinting page faults collect information and possibly add pages to > migrate queues. They are extremely quick, and they try to be They better be :D They are certainly a contributor to the high System CPU usage I saw in the basic tests but I expect they are a relatively small contributor with the bulk of the time actually being consumed by the various scanners. > non-blocking also when Migrate On Fault is invoked as result. > > The generic implementation is used when CONFIG_AUTONUMA=n. > > Acked-by: Rik van Riel > Signed-off-by: Andrea Arcangeli > --- > arch/x86/include/asm/pgtable.h | 65 ++++++++++++++++++++++++++++++++++++++- > include/asm-generic/pgtable.h | 12 +++++++ > 2 files changed, 75 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index c3520d7..6c14b40 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -404,7 +404,8 @@ static inline int pte_same(pte_t a, pte_t b) > > static inline int pte_present(pte_t a) > { > - return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE); > + return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE | > + _PAGE_NUMA); > } > huh? #define _PAGE_NUMA _PAGE_PROTNONE so this is effective _PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PROTNONE I suspect you are doing this because there is no requirement for _PAGE_NUMA == _PAGE_PROTNONE for other architectures and it was best to describe your intent. Is that really the case or did I miss something stupid? > static inline int pte_hidden(pte_t pte) > @@ -420,7 +421,63 @@ static inline int pmd_present(pmd_t pmd) > * the _PAGE_PSE flag will remain set at all times while the > * _PAGE_PRESENT bit is clear). > */ > - return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE); > + return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE | > + _PAGE_NUMA); > +} > + > +#ifdef CONFIG_AUTONUMA > +/* > + * _PAGE_NUMA works identical to _PAGE_PROTNONE (it's actually the > + * same bit too). It's set only when _PAGE_PRESET is not set and it's same bit on x86, not necessarily anywhere else. _PAGE_PRESENT? > + * never set if _PAGE_PRESENT is set. > + * > + * pte/pmd_present() returns true if pte/pmd_numa returns true. Page > + * fault triggers on those regions if pte/pmd_numa returns true > + * (because _PAGE_PRESENT is not set). > + */ > +static inline int pte_numa(pte_t pte) > +{ > + return (pte_flags(pte) & > + (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA; > +} > + > +static inline int pmd_numa(pmd_t pmd) > +{ > + return (pmd_flags(pmd) & > + (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA; > +} > +#endif > + > +/* > + * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically > + * because they're called by the NUMA hinting minor page fault. automatically or atomically? I assume you meant atomically but what stops two threads faulting at the same time and doing to the same update? mmap_sem will be insufficient in that case so what is guaranteeing the atomicity. PTL? > If we > + * wouldn't set the _PAGE_ACCESSED bitflag here, the TLB miss handler > + * would be forced to set it later while filling the TLB after we > + * return to userland. That would trigger a second write to memory > + * that we optimize away by setting _PAGE_ACCESSED here. > + */ > +static inline pte_t pte_mknonnuma(pte_t pte) > +{ > + pte = pte_clear_flags(pte, _PAGE_NUMA); > + return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED); > +} > + > +static inline pmd_t pmd_mknonnuma(pmd_t pmd) > +{ > + pmd = pmd_clear_flags(pmd, _PAGE_NUMA); > + return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED); > +} > + > +static inline pte_t pte_mknuma(pte_t pte) > +{ > + pte = pte_set_flags(pte, _PAGE_NUMA); > + return pte_clear_flags(pte, _PAGE_PRESENT); > +} > + > +static inline pmd_t pmd_mknuma(pmd_t pmd) > +{ > + pmd = pmd_set_flags(pmd, _PAGE_NUMA); > + return pmd_clear_flags(pmd, _PAGE_PRESENT); > } > > static inline int pmd_none(pmd_t pmd) > @@ -479,6 +536,10 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) > > static inline int pmd_bad(pmd_t pmd) > { > +#ifdef CONFIG_AUTONUMA > + if (pmd_numa(pmd)) > + return 0; > +#endif > return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE; > } > > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h > index ff4947b..0ff87ec 100644 > --- a/include/asm-generic/pgtable.h > +++ b/include/asm-generic/pgtable.h > @@ -530,6 +530,18 @@ static inline int pmd_trans_unstable(pmd_t *pmd) > #endif > } > > +#ifndef CONFIG_AUTONUMA > +static inline int pte_numa(pte_t pte) > +{ > + return 0; > +} > + > +static inline int pmd_numa(pmd_t pmd) > +{ > + return 0; > +} > +#endif /* CONFIG_AUTONUMA */ > + > #endif /* CONFIG_MMU */ > > #endif /* !__ASSEMBLY__ */ > -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org