linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	 Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	Marc Zyngier <maz@kernel.org>,
	 Oliver Upton <oliver.upton@linux.dev>,
	James Morse <james.morse@arm.com>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	 Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	 Andrey Konovalov <andreyknvl@gmail.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	 Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Anshuman Khandual <anshuman.khandual@arm.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Yu Zhao <yuzhao@google.com>, Mark Rutland <mark.rutland@arm.com>,
	 David Hildenbrand <david@redhat.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	 John Hubbard <jhubbard@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	 linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 14/14] arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown
Date: Thu, 30 Nov 2023 13:57:45 +0800	[thread overview]
Message-ID: <CAGsJ_4x=R4YT68gY99Y5BVBTn+rd4=iVvCoU-UJ_+Uh+D4uc7g@mail.gmail.com> (raw)
In-Reply-To: <87leafg768.fsf@nvdebian.thelocal>

On Thu, Nov 30, 2023 at 1:08 PM Alistair Popple <apopple@nvidia.com> wrote:
>
>
> Ryan Roberts <ryan.roberts@arm.com> writes:
>
> >>>> So if we do need to deal with racing HW, I'm pretty sure my v1 implementation is
> >>>> buggy because it iterated through the PTEs, getting and accumulating. Then
> >>>> iterated again, writing that final set of bits to all the PTEs. And the HW could
> >>>> have modified the bits during those loops. I think it would be possible to fix
> >>>> the race, but intuition says it would be expensive.
> >>>
> >>> So the issue as I understand it is subsequent iterations would see a
> >>> clean PTE after the first iteration returned a dirty PTE. In
> >>> ptep_get_and_clear_full() why couldn't you just copy the dirty/accessed
> >>> bit (if set) from the PTE being cleared to an adjacent PTE rather than
> >>> all the PTEs?
> >>
> >> The raciness I'm describing is the race between reading access/dirty from one
> >> pte and applying it to another. But yes I like your suggestion. if we do:
> >>
> >> pte = __ptep_get_and_clear_full(ptep)
> >>
> >> on the target pte, then we have grabbed access/dirty from it in a race-free
> >> manner. we can then loop from current pte up towards the top of the block until
> >> we find a valid entry (and I guess wrap at the top to make us robust against
> >> future callers clearing an an arbitrary order). Then atomically accumulate the
> >> access/dirty bits we have just saved into that new entry. I guess that's just a
> >> cmpxchg loop - there are already examples of how to do that correctly when
> >> racing the TLB.
> >>
> >> For most entries, we will just be copying up to the next pte. For the last pte,
> >> we would end up reading all ptes and determine we are the last one.
> >>
> >> What do you think?
> >
> > OK here is an attempt at something which solves the fragility. I think this is
> > now robust and will always return the correct access/dirty state from
> > ptep_get_and_clear_full() and ptep_get().
> >
> > But I'm not sure about performance; each call to ptep_get_and_clear_full() for
> > each pte in a contpte block will cause a ptep_get() to gather the access/dirty
> > bits from across the contpte block - which requires reading each pte in the
> > contpte block. So its O(n^2) in that sense. I'll benchmark it and report back.
> >
> > Was this the type of thing you were thinking of, Alistair?
>
> Yes, that is along the lines of what I was thinking. However I have
> added a couple of comments inline.
>
> > --8<--
> >  arch/arm64/include/asm/pgtable.h | 23 ++++++++-
> >  arch/arm64/mm/contpte.c          | 81 ++++++++++++++++++++++++++++++++
> >  arch/arm64/mm/fault.c            | 38 +++++++++------
> >  3 files changed, 125 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 9bd2f57a9e11..6c295d277784 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -851,6 +851,7 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> >       return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
> >  }
> >
> > +extern int __ptep_set_access_flags_notlbi(pte_t *ptep, pte_t entry);
> >  extern int __ptep_set_access_flags(struct vm_area_struct *vma,
> >                                unsigned long address, pte_t *ptep,
> >                                pte_t entry, int dirty);
> > @@ -1145,6 +1146,8 @@ extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte);
> >  extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep);
> >  extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr,
> >                               pte_t *ptep, pte_t pte, unsigned int nr);
> > +extern pte_t contpte_ptep_get_and_clear_full(struct mm_struct *mm,
> > +                             unsigned long addr, pte_t *ptep);
> >  extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
> >                               unsigned long addr, pte_t *ptep);
> >  extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
> > @@ -1270,12 +1273,28 @@ static inline void pte_clear(struct mm_struct *mm,
> >       __pte_clear(mm, addr, ptep);
> >  }
> >
> > +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
> > +static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
> > +                             unsigned long addr, pte_t *ptep, int full)
> > +{
> > +     pte_t orig_pte = __ptep_get(ptep);
> > +
> > +     if (!pte_valid_cont(orig_pte))
> > +             return __ptep_get_and_clear(mm, addr, ptep);
> > +
> > +     if (!full) {
> > +             contpte_try_unfold(mm, addr, ptep, orig_pte);
> > +             return __ptep_get_and_clear(mm, addr, ptep);
> > +     }
> > +
> > +     return contpte_ptep_get_and_clear_full(mm, addr, ptep);
> > +}
> > +
> >  #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
> >  static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> >                               unsigned long addr, pte_t *ptep)
> >  {
> > -     contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
> > -     return __ptep_get_and_clear(mm, addr, ptep);
> > +     return ptep_get_and_clear_full(mm, addr, ptep, 0);
> >  }
> >
> >  #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
> > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> > index 2a57df16bf58..99b211118d93 100644
> > --- a/arch/arm64/mm/contpte.c
> > +++ b/arch/arm64/mm/contpte.c
> > @@ -145,6 +145,14 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
> >       for (i = 0; i < CONT_PTES; i++, ptep++) {
> >               pte = __ptep_get(ptep);
> >
> > +             /*
> > +              * Deal with the partial contpte_ptep_get_and_clear_full() case,
> > +              * where some of the ptes in the range may be cleared but others
> > +              * are still to do. See contpte_ptep_get_and_clear_full().
> > +              */
> > +             if (!pte_valid(pte))
> > +                     continue;
> > +
> >               if (pte_dirty(pte))
> >                       orig_pte = pte_mkdirty(orig_pte);
> >
> > @@ -257,6 +265,79 @@ void contpte_set_ptes(struct mm_struct *mm, unsigned long addr,
> >  }
> >  EXPORT_SYMBOL(contpte_set_ptes);
> >
> > +pte_t contpte_ptep_get_and_clear_full(struct mm_struct *mm,
> > +                                     unsigned long addr, pte_t *ptep)
> > +{
> > +     /*
> > +      * When doing a full address space teardown, we can avoid unfolding the
> > +      * contiguous range, and therefore avoid the associated tlbi. Instead,
> > +      * just get and clear the pte. The caller is promising to call us for
> > +      * every pte, so every pte in the range will be cleared by the time the
> > +      * final tlbi is issued.
> > +      *
> > +      * This approach requires some complex hoop jumping though, as for the
> > +      * duration between returning from the first call to
> > +      * ptep_get_and_clear_full() and making the final call, the contpte
> > +      * block is in an intermediate state, where some ptes are cleared and
> > +      * others are still set with the PTE_CONT bit. If any other APIs are
> > +      * called for the ptes in the contpte block during that time, we have to
> > +      * be very careful. The core code currently interleaves calls to
> > +      * ptep_get_and_clear_full() with ptep_get() and so ptep_get() must be
> > +      * careful to ignore the cleared entries when accumulating the access
> > +      * and dirty bits - the same goes for ptep_get_lockless(). The only
> > +      * other calls we might resonably expect are to set markers in the
> > +      * previously cleared ptes. (We shouldn't see valid entries being set
> > +      * until after the tlbi, at which point we are no longer in the
> > +      * intermediate state). Since markers are not valid, this is safe;
> > +      * set_ptes() will see the old, invalid entry and will not attempt to
> > +      * unfold. And the new pte is also invalid so it won't attempt to fold.
> > +      * We shouldn't see pte markers being set for the 'full' case anyway
> > +      * since the address space is being torn down.
> > +      *
> > +      * The last remaining issue is returning the access/dirty bits. That
> > +      * info could be present in any of the ptes in the contpte block.
> > +      * ptep_get() will gather those bits from across the contpte block (for
> > +      * the remaining valid entries). So below, if the pte we are clearing
> > +      * has dirty or young set, we need to stash it into a pte that we are
> > +      * yet to clear. This allows future calls to return the correct state
> > +      * even when the info was stored in a different pte. Since the core-mm
> > +      * calls from low to high address, we prefer to stash in the last pte of
> > +      * the contpte block - this means we are not "dragging" the bits up
> > +      * through all ptes and increases the chances that we can exit early
> > +      * because a given pte will have neither dirty or young set.
> > +      */
> > +
> > +     pte_t orig_pte = __ptep_get_and_clear(mm, addr, ptep);
> > +     bool dirty = pte_dirty(orig_pte);
> > +     bool young = pte_young(orig_pte);
> > +     pte_t *start;
> > +
> > +     if (!dirty && !young)
> > +             return contpte_ptep_get(ptep, orig_pte);
>
> I don't think we need to do this. If the PTE is !dirty && !young we can
> just return it. As you say we have to assume HW can set those flags at
> any time anyway so it doesn't get us much. This means in the common case
> we should only run through the loop setting the dirty/young flags once
> which should alay the performance concerns.
>
> However I am now wondering if we're doing the wrong thing trying to hide
> this down in the arch layer anyway. Perhaps it would be better to deal
> with this in the core-mm code after all.
>
> So how about having ptep_get_and_clear_full() clearing the PTEs for the
> entire cont block? We know by definition all PTEs should be pointing to

I truly believe we should clear all PTEs for the entire folio block. However,
if the existing api ptep_get_and_clear_full() is always handling a single one
PTE, we might keep its behaviour as is.  On the other hand, clearing the
whole block isn't only required in fullmm case, it is also a requirement for
normal zap_pte_range() cases coming from madvise(DONTNEED) etc.

I do think we need a folio-level variant. as we are now supporting
pte-level large
folios, we need some new api to handle folio-level PTEs entirely as we always
have the needs to drop the whole folio rather than one by one when they are
compound.

> the same folio anyway, and it seems at least zap_pte_range() would cope
> with this just fine because subsequent iterations would just see
> pte_none() and continue the loop. I haven't checked the other call sites
> though, but in principal I don't see why we couldn't define
> ptep_get_and_clear_full() as being something that clears all PTEs
> mapping a given folio (although it might need renaming).
>
> This does assume you don't need to partially unmap a page in
> zap_pte_range (ie. end >= folio), but we're already making that
> assumption.
>
> > +
> > +     start = contpte_align_down(ptep);
> > +     ptep = start + CONT_PTES - 1;
> > +
> > +     for (; ptep >= start; ptep--) {
> > +             pte_t pte = __ptep_get(ptep);
> > +
> > +             if (!pte_valid(pte))
> > +                     continue;
> > +
> > +             if (dirty)
> > +                     pte = pte_mkdirty(pte);
> > +
> > +             if (young)
> > +                     pte = pte_mkyoung(pte);
> > +
> > +             __ptep_set_access_flags_notlbi(ptep, pte);
> > +             return contpte_ptep_get(ptep, orig_pte);
> > +     }
> > +
> > +     return orig_pte;
> > +}
> > +EXPORT_SYMBOL(contpte_ptep_get_and_clear_full);
> > +
> >  int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
> >                                       unsigned long addr, pte_t *ptep)
> >  {
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index d63f3a0a7251..b22216a8153c 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -199,19 +199,7 @@ static void show_pte(unsigned long addr)
> >       pr_cont("\n");
> >  }
> >
> > -/*
> > - * This function sets the access flags (dirty, accessed), as well as write
> > - * permission, and only to a more permissive setting.
> > - *
> > - * It needs to cope with hardware update of the accessed/dirty state by other
> > - * agents in the system and can safely skip the __sync_icache_dcache() call as,
> > - * like __set_ptes(), the PTE is never changed from no-exec to exec here.
> > - *
> > - * Returns whether or not the PTE actually changed.
> > - */
> > -int __ptep_set_access_flags(struct vm_area_struct *vma,
> > -                         unsigned long address, pte_t *ptep,
> > -                         pte_t entry, int dirty)
> > +int __ptep_set_access_flags_notlbi(pte_t *ptep, pte_t entry)
> >  {
> >       pteval_t old_pteval, pteval;
> >       pte_t pte = __ptep_get(ptep);
> > @@ -238,10 +226,30 @@ int __ptep_set_access_flags(struct vm_area_struct *vma,
> >               pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
> >       } while (pteval != old_pteval);
> >
> > +     return 1;
> > +}
> > +
> > +/*
> > + * This function sets the access flags (dirty, accessed), as well as write
> > + * permission, and only to a more permissive setting.
> > + *
> > + * It needs to cope with hardware update of the accessed/dirty state by other
> > + * agents in the system and can safely skip the __sync_icache_dcache() call as,
> > + * like __set_ptes(), the PTE is never changed from no-exec to exec here.
> > + *
> > + * Returns whether or not the PTE actually changed.
> > + */
> > +int __ptep_set_access_flags(struct vm_area_struct *vma,
> > +                         unsigned long address, pte_t *ptep,
> > +                         pte_t entry, int dirty)
> > +{
> > +     int changed = __ptep_set_access_flags_notlbi(ptep, entry);
> > +
> >       /* Invalidate a stale read-only entry */
> > -     if (dirty)
> > +     if (changed && dirty)
> >               flush_tlb_page(vma, address);
> > -     return 1;
> > +
> > +     return changed;
> >  }
> >
> >  static bool is_el1_instruction_abort(unsigned long esr)
> > --8<--
>

Thanks
Barry


  reply	other threads:[~2023-11-30  5:58 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-15 16:30 [PATCH v2 00/14] Transparent Contiguous PTEs for User Mappings Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork() Ryan Roberts
2023-11-15 21:26   ` kernel test robot
2023-11-16 10:07     ` Ryan Roberts
2023-11-16 10:12       ` David Hildenbrand
2023-11-16 10:36         ` Ryan Roberts
2023-11-16 11:01           ` David Hildenbrand
2023-11-16 11:13             ` Ryan Roberts
2023-11-15 21:37   ` Andrew Morton
2023-11-16  9:34     ` Ryan Roberts
2023-12-04 11:01     ` Christophe Leroy
2023-11-15 22:40   ` kernel test robot
2023-11-16 10:03   ` David Hildenbrand
2023-11-16 10:26     ` Ryan Roberts
2023-11-27  8:42     ` Barry Song
2023-11-27  9:35       ` Ryan Roberts
2023-11-27  9:59         ` Barry Song
2023-11-27 10:10           ` Ryan Roberts
2023-11-27 10:28             ` Barry Song
2023-11-27 11:07               ` Ryan Roberts
2023-11-27 20:34                 ` Barry Song
2023-11-28  9:14                   ` Ryan Roberts
2023-11-28  9:49                     ` Barry Song
2023-11-28 10:49                       ` Ryan Roberts
2023-11-28 21:06                         ` Barry Song
2023-11-29 12:21                           ` Ryan Roberts
2023-11-30  0:51                             ` Barry Song
2023-11-16 11:03   ` David Hildenbrand
2023-11-16 11:20     ` Ryan Roberts
2023-11-16 13:20       ` David Hildenbrand
2023-11-16 13:49         ` Ryan Roberts
2023-11-16 14:13           ` David Hildenbrand
2023-11-16 14:15             ` David Hildenbrand
2023-11-16 17:58               ` Ryan Roberts
2023-11-23 10:26               ` Ryan Roberts
2023-11-23 12:12                 ` David Hildenbrand
2023-11-23 12:28                   ` Ryan Roberts
2023-11-24  8:53                     ` David Hildenbrand
2023-11-23  4:26   ` Alistair Popple
2023-11-23 14:43     ` Ryan Roberts
2023-11-23 23:50       ` Alistair Popple
2023-11-27  5:54   ` Barry Song
2023-11-27  9:24     ` Ryan Roberts
2023-11-28  0:11       ` Barry Song
2023-11-28 11:00         ` Ryan Roberts
2023-11-28 19:00           ` Barry Song
2023-11-29 12:29             ` Ryan Roberts
2023-11-29 13:09               ` Barry Song
2023-11-29 14:07                 ` Ryan Roberts
2023-11-30  0:34                   ` Barry Song
2023-11-15 16:30 ` [PATCH v2 02/14] arm64/mm: set_pte(): New layer to manage contig bit Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 03/14] arm64/mm: set_ptes()/set_pte_at(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 04/14] arm64/mm: pte_clear(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 05/14] arm64/mm: ptep_get_and_clear(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 06/14] arm64/mm: ptep_test_and_clear_young(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 07/14] arm64/mm: ptep_clear_flush_young(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 08/14] arm64/mm: ptep_set_wrprotect(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 09/14] arm64/mm: ptep_set_access_flags(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 10/14] arm64/mm: ptep_get(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 11/14] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 12/14] arm64/mm: Wire up PTE_CONT for user mappings Ryan Roberts
2023-11-21 11:22   ` Alistair Popple
2023-11-21 15:14     ` Ryan Roberts
2023-11-22  6:01       ` Alistair Popple
2023-11-22  8:35         ` Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 13/14] arm64/mm: Implement ptep_set_wrprotects() to optimize fork() Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 14/14] arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown Ryan Roberts
2023-11-23  5:13   ` Alistair Popple
2023-11-23 16:01     ` Ryan Roberts
2023-11-24  1:35       ` Alistair Popple
2023-11-24  8:54         ` Ryan Roberts
2023-11-27  7:34           ` Alistair Popple
2023-11-27  8:53             ` Ryan Roberts
2023-11-28  6:54               ` Alistair Popple
2023-11-28 12:45                 ` Ryan Roberts
2023-11-28 16:55                   ` Ryan Roberts
2023-11-30  5:07                     ` Alistair Popple
2023-11-30  5:57                       ` Barry Song [this message]
2023-11-30 11:47                       ` Ryan Roberts
2023-12-03 23:20                         ` Alistair Popple
2023-12-04  9:39                           ` Ryan Roberts
2023-11-28  7:32   ` Barry Song
2023-11-28 11:15     ` Ryan Roberts
2023-11-28  8:17   ` Barry Song
2023-11-28 11:49     ` Ryan Roberts
2023-11-28 20:23       ` Barry Song
2023-11-29 12:43         ` Ryan Roberts
2023-11-29 13:00           ` Barry Song
2023-11-30  5:35           ` Barry Song
2023-11-30 12:00             ` Ryan Roberts
2023-12-03 21:41               ` Barry Song
2023-11-27  3:18 ` [PATCH v2 00/14] Transparent Contiguous PTEs for User Mappings Barry Song
2023-11-27  9:15   ` Ryan Roberts
2023-11-27 10:35     ` Barry Song
2023-11-27 11:11       ` Ryan Roberts
2023-11-27 22:53         ` Barry Song
2023-11-28 11:52           ` Ryan Roberts
2023-11-28  3:13     ` Yang Shi
2023-11-28 11:58       ` Ryan Roberts
2023-11-28  5:49     ` Barry Song
2023-11-28 12:08       ` Ryan Roberts
2023-11-28 19:37         ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4x=R4YT68gY99Y5BVBTn+rd4=iVvCoU-UJ_+Uh+D4uc7g@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=james.morse@arm.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=ryabinin.a.a@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vincenzo.frascino@arm.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzenghui@huawei.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox