From: Chris Li <chrisl@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>,
Kemeng Shi <shikemeng@huaweicloud.com>
Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
mgorman@techsingularity.net, Michal Hocko <mhocko@suse.com>,
david@redhat.com, willy@infradead.org, linux-mm@kvack.org,
Namhyung Kim <namhyung@google.com>,
Greg Thelen <gthelen@google.com>,
linux-kernel@vger.kernel.org, Chris Li <chrisl@kernel.org>,
John Sperbeck <jsperbeck@google.com>
Subject: [PATCH RFC 1/2] mm/page_alloc: safeguard free_pcppages_bulk
Date: Thu, 17 Aug 2023 23:05:23 -0700 [thread overview]
Message-ID: <20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org> (raw)
In-Reply-To: <20230817-free_pcppages_bulk-v1-0-c14574a9f80c@kernel.org>
The current free_pcppages_bulk() can panic when
pcp->count is changed outside of this function by
the BPF program injected in ftrace function entry.
Commit c66a36af7ba3a628 was to fix on the BPF program side
to not allocate memory inside the spinlock.
But the kernel can still panic loading similar BPF without the fix.
Here is the step to reproduce it:
$ git checkout 19030564ab116757e32
$ cd tools/perf
$ make perf
$ ./perf lock con -ab -- ./perf bench sched messaging
You should be able to see the kernel panic within 20 seconds.
Here is what happened in the panic:
count = min(pcp->count, count);
free_pcppages_bulk() assumes count and pcp->count are in sync.
There are no pcp->count changes outside of this function.
That assumption gets broken when BPF lock contention code
allocates memory inside spinlock. pcp->count is one less than
"count". The loop only checks against "count" and runs into
a deadloop because pcp->count drops to zero and all lists
are empty. In a deadloop pindex_min can grow bigger than pindex_max
and pindex_max can lower to negative. The kernel panic is happening
on the pindex trying to access outside of pcp->lists ranges.
Notice that this is just one of the (buggy) BPF programs that
can break it. Other than the spin lock, there are other function
tracepoints under this function can be hooked up to the BPF program
which can allocate memory and change the pcp->count.
One argument is that BPF should not allocate memory under the
spinlock. On the other hand, the kernel can just check pcp->count
inside the loop to avoid the kernel panic.
Signed-off-by: Chris Li <chrisl@kernel.org>
Reported-by: John Sperbeck<jsperbeck@google.com>
---
mm/page_alloc.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1eb3864e1dbc7..347cb93081a02 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1215,12 +1215,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
bool isolated_pageblocks;
struct page *page;
- /*
- * Ensure proper count is passed which otherwise would stuck in the
- * below while (list_empty(list)) loop.
- */
- count = min(pcp->count, count);
-
/* Ensure requested pindex is drained first. */
pindex = pindex - 1;
@@ -1266,7 +1260,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
__free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE);
trace_mm_page_pcpu_drain(page, order, mt);
- } while (count > 0 && !list_empty(list));
+ } while (count > 0 && pcp->count > 0 && !list_empty(list));
}
spin_unlock_irqrestore(&zone->lock, flags);
--
2.42.0.rc1.204.g551eb34607-goog
next prev parent reply other threads:[~2023-08-18 6:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-18 6:05 [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Chris Li
2023-08-18 6:05 ` Chris Li [this message]
2023-08-18 6:05 ` [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up Chris Li
2023-08-24 6:28 ` kernel test robot
2023-08-24 15:25 ` Chris Li
2023-08-21 10:32 ` [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Mel Gorman
2023-08-22 1:27 ` Kemeng Shi
2023-08-22 21:14 ` Chris Li
2023-08-22 21:19 ` Alexei Starovoitov
2023-08-22 21:29 ` Chris Li
2023-08-22 21:35 ` Alexei Starovoitov
2023-08-22 21:46 ` Chris Li
2023-08-22 17:48 ` Chris Li
2023-08-22 18:28 ` Matthew Wilcox
2023-08-22 18:57 ` Alexei Starovoitov
2023-08-22 21:34 ` Chris Li
2023-08-22 21:37 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=gthelen@google.com \
--cc=jsperbeck@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=namhyung@google.com \
--cc=shikemeng@huaweicloud.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox