From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.com>,
Alex Shi <alex.shi@linux.alibaba.com>,
Roman Gushchin <guro@fb.com>, Linux MM <linux-mm@kvack.org>,
Cgroups <cgroups@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH] mm: fix unsafe page -> lruvec lookups with cgroup charge migration
Date: Wed, 20 Nov 2019 12:31:06 -0800 [thread overview]
Message-ID: <CALvZod50AanTCNkTVSptU+Hg--69j6OuKdc04UPs4Vf64DkGiw@mail.gmail.com> (raw)
In-Reply-To: <20191120165847.423540-1-hannes@cmpxchg.org>
On Wed, Nov 20, 2019 at 8:58 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> While reviewing the "per lruvec lru_lock for memcg" series, Hugh and I
> noticed two places in the existing code where the page -> memcg ->
> lruvec lookup can result in a use-after-free bug. This affects cgroup1
> setups that have charge migration enabled.
>
> To pin page->mem_cgroup, callers need to either have the page locked,
> an exclusive refcount (0), or hold the lru_lock and "own" PageLRU
> (either ensure it's set, or be the one to hold the page in isolation)
> to make cgroup migration fail the isolation step.
I think we should add the above para in the comments for better visibility.
> Failure to follow
> this can result in the page moving out of the memcg and freeing it,
> along with its lruvecs, while the observer is dereferencing them.
>
> 1. isolate_lru_page() calls mem_cgroup_page_lruvec() with the lru_lock
> held but before testing PageLRU. It doesn't dereference the returned
> lruvec before testing PageLRU, giving the impression that it might
> just be safe ordering after all - but mem_cgroup_page_lruvec() itself
> touches the lruvec to lazily initialize the pgdat back pointer. This
> one is easy to fix, just move the lookup into the PageLRU branch.
>
> 2. pagevec_lru_move_fn() conveniently looks up the lruvec for all the
> callbacks it might get invoked on. Unfortunately, it's the callbacks
> that first check PageLRU under the lru_lock, which makes this order
> equally unsafe as isolate_lru_page(). Remove the lruvec argument from
> the move callbacks and let them do it inside their PageLRU branches.
>
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
> ---
> mm/swap.c | 48 +++++++++++++++++++++++++++++-------------------
> mm/vmscan.c | 8 ++++----
> 2 files changed, 33 insertions(+), 23 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 5341ae93861f..6b015e9532fb 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -188,12 +188,11 @@ int get_kernel_page(unsigned long start, int write, struct page **pages)
> EXPORT_SYMBOL_GPL(get_kernel_page);
>
> static void pagevec_lru_move_fn(struct pagevec *pvec,
> - void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg),
> + void (*move_fn)(struct page *page, void *arg),
> void *arg)
> {
> int i;
> struct pglist_data *pgdat = NULL;
> - struct lruvec *lruvec;
> unsigned long flags = 0;
>
> for (i = 0; i < pagevec_count(pvec); i++) {
> @@ -207,8 +206,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
> spin_lock_irqsave(&pgdat->lru_lock, flags);
> }
>
> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
> - (*move_fn)(page, lruvec, arg);
> + (*move_fn)(page, arg);
> }
> if (pgdat)
> spin_unlock_irqrestore(&pgdat->lru_lock, flags);
> @@ -216,12 +214,14 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
> pagevec_reinit(pvec);
> }
>
> -static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec,
> - void *arg)
> +static void pagevec_move_tail_fn(struct page *page, void *arg)
> {
> int *pgmoved = arg;
>
> if (PageLRU(page) && !PageUnevictable(page)) {
> + struct lruvec *lruvec;
> +
> + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> del_page_from_lru_list(page, lruvec, page_lru(page));
> ClearPageActive(page);
> add_page_to_lru_list_tail(page, lruvec, page_lru(page));
> @@ -272,12 +272,14 @@ static void update_page_reclaim_stat(struct lruvec *lruvec,
> reclaim_stat->recent_rotated[file]++;
> }
>
> -static void __activate_page(struct page *page, struct lruvec *lruvec,
> - void *arg)
> +static void __activate_page(struct page *page, void *arg)
> {
> if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
> int file = page_is_file_cache(page);
> int lru = page_lru_base_type(page);
> + struct lruvec *lruvec;
> +
> + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
>
> del_page_from_lru_list(page, lruvec, lru);
> SetPageActive(page);
> @@ -328,7 +330,7 @@ void activate_page(struct page *page)
>
> page = compound_head(page);
> spin_lock_irq(&pgdat->lru_lock);
> - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL);
> + __activate_page(page, NULL);
> spin_unlock_irq(&pgdat->lru_lock);
> }
> #endif
> @@ -498,9 +500,9 @@ void lru_cache_add_active_or_unevictable(struct page *page,
> * be write it out by flusher threads as this is much more effective
> * than the single-page writeout from reclaim.
> */
> -static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
> - void *arg)
> +static void lru_deactivate_file_fn(struct page *page, void *arg)
> {
> + struct lruvec *lruvec;
> int lru, file;
> bool active;
>
> @@ -518,6 +520,8 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
> file = page_is_file_cache(page);
> lru = page_lru_base_type(page);
>
> + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> +
> del_page_from_lru_list(page, lruvec, lru + active);
> ClearPageActive(page);
> ClearPageReferenced(page);
> @@ -544,12 +548,14 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
> update_page_reclaim_stat(lruvec, file, 0);
> }
>
> -static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
> - void *arg)
> +static void lru_deactivate_fn(struct page *page, void *arg)
> {
> if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
> int file = page_is_file_cache(page);
> int lru = page_lru_base_type(page);
> + struct lruvec *lruvec;
> +
> + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
>
> del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
> ClearPageActive(page);
> @@ -561,12 +567,14 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
> }
> }
>
> -static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
> - void *arg)
> +static void lru_lazyfree_fn(struct page *page, void *arg)
> {
> if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> !PageSwapCache(page) && !PageUnevictable(page)) {
> bool active = PageActive(page);
> + struct lruvec *lruvec;
> +
> + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
>
> del_page_from_lru_list(page, lruvec,
> LRU_INACTIVE_ANON + active);
> @@ -921,15 +929,17 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> -static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
> - void *arg)
> +static void __pagevec_lru_add_fn(struct page *page, void *arg)
> {
> - enum lru_list lru;
> int was_unevictable = TestClearPageUnevictable(page);
> + struct lruvec *lruvec;
> + enum lru_list lru;
>
> VM_BUG_ON_PAGE(PageLRU(page), page);
> -
> SetPageLRU(page);
> +
> + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> +
> /*
> * Page becomes evictable in two ways:
> * 1) Within LRU lock [munlock_vma_page() and __munlock_pagevec()].
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index df859b1d583c..3c8b81990146 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1767,15 +1767,15 @@ int isolate_lru_page(struct page *page)
>
> if (PageLRU(page)) {
> pg_data_t *pgdat = page_pgdat(page);
> - struct lruvec *lruvec;
>
> spin_lock_irq(&pgdat->lru_lock);
> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
> if (PageLRU(page)) {
> - int lru = page_lru(page);
> + struct lruvec *lruvec;
> +
> + lruvec = mem_cgroup_page_lruvec(page, pgdat);
> get_page(page);
> ClearPageLRU(page);
> - del_page_from_lru_list(page, lruvec, lru);
> + del_page_from_lru_list(page, lruvec, page_lru(page));
> ret = 0;
> }
> spin_unlock_irq(&pgdat->lru_lock);
> --
> 2.24.0
>
next prev parent reply other threads:[~2019-11-20 20:31 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-20 16:58 Johannes Weiner
2019-11-20 20:31 ` Shakeel Butt [this message]
2019-11-20 21:39 ` Johannes Weiner
2019-11-21 3:15 ` Hugh Dickins
2019-11-21 13:03 ` Alex Shi
2019-11-21 20:56 ` Johannes Weiner
2019-11-21 21:30 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALvZod50AanTCNkTVSptU+Hg--69j6OuKdc04UPs4Vf64DkGiw@mail.gmail.com \
--to=shakeelb@google.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox