Re: [External] Re: [PATCH v4] mm/hugetlb: add mempolicy check in the reservation routine

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Muchun Song <songmuchun@bytedance.com>
To: Baoquan He <bhe@redhat.com>
Cc: mike.kravetz@oracle.com,
	Andrew Morton <akpm@linux-foundation.org>,
	 Michal Hocko <mhocko@kernel.org>,
	David Rientjes <rientjes@google.com>,
	mgorman@suse.de,  Michel Lespinasse <walken@google.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	Jianchao Guo <guojianchao@bytedance.com>
Subject: Re: [External] Re: [PATCH v4] mm/hugetlb: add mempolicy check in the reservation routine
Date: Tue, 28 Jul 2020 22:16:29 +0800	[thread overview]
Message-ID: <CAMZfGtUSJN5KVpTb-cSe=76w6EX-7Z01nXFF+mqWGDvubCUj7w@mail.gmail.com> (raw)
In-Reply-To: <20200728132453.GB14854@MiWiFi-R3L-srv>

On Tue, Jul 28, 2020 at 9:25 PM Baoquan He <bhe@redhat.com> wrote:
>
> Hi Muchun,
>
> On 07/28/20 at 11:49am, Muchun Song wrote:
> > In the reservation routine, we only check whether the cpuset meets
> > the memory allocation requirements. But we ignore the mempolicy of
> > MPOL_BIND case. If someone mmap hugetlb succeeds, but the subsequent
> > memory allocation may fail due to mempolicy restrictions and receives
> > the SIGBUS signal. This can be reproduced by the follow steps.
> >
> >  1) Compile the test case.
> >     cd tools/testing/selftests/vm/
> >     gcc map_hugetlb.c -o map_hugetlb
> >
> >  2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the
> >     system. Each node will pre-allocate one huge page.
> >     echo 2 > /proc/sys/vm/nr_hugepages
> >
> >  3) Run test case(mmap 4MB). We receive the SIGBUS signal.
> >     numactl --membind=0 ./map_hugetlb 4
>
> I think supporting the  mempolicy of MPOL_BIND case is a good idea.
> I am wondering what about the other mempolicy cases, e.g MPOL_INTERLEAVE,
> MPOL_PREFERRED. Asking these because we already have similar handling in
> sysfs, proc nr_hugepages_mempolicy writting. Please see
> __nr_hugepages_store_common() for detail.

Yeah, I know the nr_hugepages_mempolicy. But this new code will
help produce a quick failure as described in the commit message
instead of waiting until the page fault routine(and receive a SIGBUG
signal).

>
> Thanks
> Baoquan
>
> >
> > With this patch applied, the mmap will fail in the step 3) and throw
> > "mmap: Cannot allocate memory".
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Reported-by: Jianchao Guo <guojianchao@bytedance.com>
> > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > ---
> > changelog in v4:
> >  1) Fix compilation errors with !CONFIG_NUMA.
> >
> > changelog in v3:
> >  1) Do not allocate nodemask on the stack.
> >  2) Update comment.
> >
> > changelog in v2:
> >  1) Reuse policy_nodemask().
> >
> >  include/linux/mempolicy.h | 14 ++++++++++++++
> >  mm/hugetlb.c              | 22 ++++++++++++++++++----
> >  mm/mempolicy.c            |  2 +-
> >  3 files changed, 33 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> > index ea9c15b60a96..0656ece1ccf1 100644
> > --- a/include/linux/mempolicy.h
> > +++ b/include/linux/mempolicy.h
> > @@ -152,6 +152,15 @@ extern int huge_node(struct vm_area_struct *vma,
> >  extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
> >  extern bool mempolicy_nodemask_intersects(struct task_struct *tsk,
> >                               const nodemask_t *mask);
> > +extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy);
> > +
> > +static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
> > +{
> > +     struct mempolicy *mpol = get_task_policy(current);
> > +
> > +     return policy_nodemask(gfp, mpol);
> > +}
> > +
> >  extern unsigned int mempolicy_slab_node(void);
> >
> >  extern enum zone_type policy_zone;
> > @@ -281,5 +290,10 @@ static inline int mpol_misplaced(struct page *page, struct vm_area_struct *vma,
> >  static inline void mpol_put_task_policy(struct task_struct *task)
> >  {
> >  }
> > +
> > +static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
> > +{
> > +     return NULL;
> > +}
> >  #endif /* CONFIG_NUMA */
> >  #endif
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 589c330df4db..a34458f6a475 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -3463,13 +3463,21 @@ static int __init default_hugepagesz_setup(char *s)
> >  }
> >  __setup("default_hugepagesz=", default_hugepagesz_setup);
> >
> > -static unsigned int cpuset_mems_nr(unsigned int *array)
> > +static unsigned int allowed_mems_nr(struct hstate *h)
> >  {
> >       int node;
> >       unsigned int nr = 0;
> > +     nodemask_t *mpol_allowed;
> > +     unsigned int *array = h->free_huge_pages_node;
> > +     gfp_t gfp_mask = htlb_alloc_mask(h);
> > +
> > +     mpol_allowed = policy_nodemask_current(gfp_mask);
> >
> > -     for_each_node_mask(node, cpuset_current_mems_allowed)
> > -             nr += array[node];
> > +     for_each_node_mask(node, cpuset_current_mems_allowed) {
> > +             if (!mpol_allowed ||
> > +                 (mpol_allowed && node_isset(node, *mpol_allowed)))
> > +                     nr += array[node];
> > +     }
> >
> >       return nr;
> >  }
> > @@ -3648,12 +3656,18 @@ static int hugetlb_acct_memory(struct hstate *h, long delta)
> >        * we fall back to check against current free page availability as
> >        * a best attempt and hopefully to minimize the impact of changing
> >        * semantics that cpuset has.
> > +      *
> > +      * Apart from cpuset, we also have memory policy mechanism that
> > +      * also determines from which node the kernel will allocate memory
> > +      * in a NUMA system. So similar to cpuset, we also should consider
> > +      * the memory policy of the current task. Similar to the description
> > +      * above.
> >        */
> >       if (delta > 0) {
> >               if (gather_surplus_pages(h, delta) < 0)
> >                       goto out;
> >
> > -             if (delta > cpuset_mems_nr(h->free_huge_pages_node)) {
> > +             if (delta > allowed_mems_nr(h)) {
> >                       return_unused_surplus_pages(h, delta);
> >                       goto out;
> >               }
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 93fcfc1f2fa2..fce14c3f4f38 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1873,7 +1873,7 @@ static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone)
> >   * Return a nodemask representing a mempolicy for filtering nodes for
> >   * page allocation
> >   */
> > -static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
> > +nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
> >  {
> >       /* Lower zones don't get a nodemask applied for MPOL_BIND */
> >       if (unlikely(policy->mode == MPOL_BIND) &&
> > --
> > 2.11.0
> >
> >
>


--
Yours,
Muchun

next prev parent reply	other threads:[~2020-07-28 14:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  3:49 Muchun Song
2020-07-28 13:24 ` Baoquan He
2020-07-28 14:16   ` Muchun Song [this message]
2020-07-28 16:46   ` Mike Kravetz
2020-07-29 10:33     ` Baoquan He
2020-08-06  7:45 ` Muchun Song
2020-08-07  1:22   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMZfGtUSJN5KVpTb-cSe=76w6EX-7Z01nXFF+mqWGDvubCUj7w@mail.gmail.com' \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=guojianchao@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rientjes@google.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox