Re: [PATCH v2] mm/memcontrol: fix a data race in scan count

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Marco Elver <elver@google.com>
To: Qian Cai <cai@lca.pw>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	vdavydov.dev@gmail.com,  Cgroups <cgroups@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] mm/memcontrol: fix a data race in scan count
Date: Fri, 7 Feb 2020 19:19:33 +0100	[thread overview]
Message-ID: <CANpmjNMk5zw+nbLa4Ko7zUdWOY8pFR6EuQ6WbRECmX=8o8PLUw@mail.gmail.com> (raw)
In-Reply-To: <1581096119-13593-1-git-send-email-cai@lca.pw>

On Fri, 7 Feb 2020 at 18:22, Qian Cai <cai@lca.pw> wrote:
>
> struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
> accessed concurrently as noticed by KCSAN,
>
>  BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size
>
>  write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
>   mem_cgroup_update_lru_size+0x11c/0x1d0
>   mem_cgroup_update_lru_size at mm/memcontrol.c:1266
>   isolate_lru_pages+0x6a9/0xf30
>   shrink_active_list+0x123/0xcc0
>   shrink_lruvec+0x8fd/0x1380
>   shrink_node+0x317/0xd80
>   do_try_to_free_pages+0x1f7/0xa10
>   try_to_free_pages+0x26c/0x5e0
>   __alloc_pages_slowpath+0x458/0x1290
>   __alloc_pages_nodemask+0x3bb/0x450
>   alloc_pages_vma+0x8a/0x2c0
>   do_anonymous_page+0x170/0x700
>   __handle_mm_fault+0xc9f/0xd00
>   handle_mm_fault+0xfc/0x2f0
>   do_page_fault+0x263/0x6f9
>   page_fault+0x34/0x40
>
>  read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
>   lruvec_lru_size+0xbb/0x270
>   mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
>   (inlined by) lruvec_lru_size at mm/vmscan.c:326
>   shrink_lruvec+0x1d0/0x1380
>   shrink_node+0x317/0xd80
>   do_try_to_free_pages+0x1f7/0xa10
>   try_to_free_pages+0x26c/0x5e0
>   __alloc_pages_slowpath+0x458/0x1290
>   __alloc_pages_nodemask+0x3bb/0x450
>   alloc_pages_current+0xa6/0x120
>   alloc_slab_page+0x3b1/0x540
>   allocate_slab+0x70/0x660
>   new_slab+0x46/0x70
>   ___slab_alloc+0x4ad/0x7d0
>   __slab_alloc+0x43/0x70
>   kmem_cache_alloc+0x2c3/0x420
>   getname_flags+0x4c/0x230
>   getname+0x22/0x30
>   do_sys_openat2+0x205/0x3b0
>   do_sys_open+0x9a/0xf0
>   __x64_sys_openat+0x62/0x80
>   do_syscall_64+0x91/0xb47
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
>  Reported by Kernel Concurrency Sanitizer on:
>  CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
>  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>
> The write is under lru_lock, but the read is done as lockless. The scan
> count is used to determine how aggressively the anon and file LRU lists
> should be scanned. Load tearing could generate an inefficient heuristic,
> so fix it by adding READ_ONCE() for the read and WRITE_ONCE() for the
> writes.
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>
> v2: also have WRITE_ONCE() in the writer which is necessary.

Again, note that KCSAN will *not* complain if you omitted the
WRITE_ONCE and only had the READ_ONCE, as long as the write aligned
and up to word-size. Because we still don't have a nice way to deal
with read-modify-writes, like 'var +=', '++', I don't know if we want
to do the WRITE_ONCE right now.

I think the kernel might need a primitive that avoids the readability
issues of writing 'WRITE_ONCE(var, var + val)'. I don't have strong
opinions on this, so it's up to maintainers.

Thanks,
-- Marco

>  include/linux/memcontrol.h | 2 +-
>  mm/memcontrol.c            | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index a7a0a1a5c8d5..e8734dabbc61 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -533,7 +533,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
>         struct mem_cgroup_per_node *mz;
>
>         mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> -       return mz->lru_zone_size[zone_idx][lru];
> +       return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
>  }
>
>  void mem_cgroup_handle_over_high(void);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6f6dc8712e39..daf375cc312c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1263,7 +1263,7 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>         lru_size = &mz->lru_zone_size[zid][lru];
>
>         if (nr_pages < 0)
> -               *lru_size += nr_pages;
> +               WRITE_ONCE(*lru_size, *lru_size + nr_pages);
>
>         size = *lru_size;
>         if (WARN_ONCE(size < 0,
> @@ -1274,7 +1274,7 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>         }
>
>         if (nr_pages > 0)
> -               *lru_size += nr_pages;
> +               WRITE_ONCE(*lru_size, *lru_size + nr_pages);
>  }
>
>  /**
> --
> 1.8.3.1
>

next prev parent reply	other threads:[~2020-02-07 18:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 17:21 Qian Cai
2020-02-07 18:19 ` Marco Elver [this message]
2020-02-07 19:30   ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANpmjNMk5zw+nbLa4Ko7zUdWOY8pFR6EuQ6WbRECmX=8o8PLUw@mail.gmail.com' \
    --to=elver@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox