From: Charan Teja Kalla <quic_charante@quicinc.com>
To: Michal Hocko <mhocko@suse.com>
Cc: <akpm@linux-foundation.org>, <david@redhat.com>,
<pasha.tatashin@soleen.com>, <sieberf@amazon.com>,
<shakeelb@google.com>, <sjpark@amazon.de>, <dhowells@redhat.com>,
<willy@infradead.org>, <quic_pkondeti@quicinc.com>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH V3] mm: fix use-after free of page_ext after race with memory-offline
Date: Tue, 16 Aug 2022 15:04:01 +0530 [thread overview]
Message-ID: <286e47e7-3d63-133c-aa6c-05100b557d42@quicinc.com> (raw)
In-Reply-To: <Yvpg6odyDsXrjw5i@dhcp22.suse.cz>
Thanks Michal!!
On 8/15/2022 8:36 PM, Michal Hocko wrote:
> On Tue 09-08-22 20:16:43, Charan Teja Kalla wrote:
> [...]
>> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
>> index fabb2e1..0e259da 100644
>> --- a/include/linux/page_ext.h
>> +++ b/include/linux/page_ext.h
> [...]
>> @@ -87,5 +83,14 @@ static inline void page_ext_init_flatmem_late(void)
>> static inline void page_ext_init_flatmem(void)
>> {
>> }
>> +
>> +static inline struct page *page_ext_get(struct page *page)
> struct page_ext *
>
oops!! It didn't get caught as this is in !CONFIG_PAGE_EXTENSION.
>> +{
>> + return NULL;
>> +}
>> +
>> +static inline void page_ext_put(void)
>> +{
>> +}
>> #endif /* CONFIG_PAGE_EXTENSION */
>> #endif /* __LINUX_PAGE_EXT_H */
> [...]
>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>> index 3dc715d..91d7bd2 100644
>> --- a/mm/page_ext.c
>> +++ b/mm/page_ext.c
>> @@ -9,6 +9,7 @@
>> #include <linux/page_owner.h>
>> #include <linux/page_idle.h>
>> #include <linux/page_table_check.h>
>> +#include <linux/rcupdate.h>
>>
>> /*
>> * struct page extension
>> @@ -59,6 +60,10 @@
>> * can utilize this callback to initialize the state of it correctly.
>> */
>>
>> +#ifdef CONFIG_SPARSEMEM
>> +#define PAGE_EXT_INVALID (0x1)
>> +#endif
>> +
>> #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
>> static bool need_page_idle(void)
>> {
>> @@ -84,6 +89,7 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
>> unsigned long page_ext_size = sizeof(struct page_ext);
>>
>> static unsigned long total_usage;
>> +static struct page_ext *lookup_page_ext(const struct page *page);
>>
>> static bool __init invoke_need_callbacks(void)
>> {
>> @@ -125,6 +131,37 @@ static inline struct page_ext *get_entry(void *base, unsigned long index)
>> return base + page_ext_size * index;
>> }
>>
>> +/*
>> + * This function gives proper page_ext of a memory section
>> + * during race with the offline operation on a memory block
>> + * this section falls into. Not using this function to get
>> + * page_ext of a page, in code paths where extra refcount
>> + * is not taken on that page eg: pfn walking, can lead to
>> + * use-after-free access of page_ext.
> I do not think this is really useful comment, it goes into way too much
> detail about memory hotplug yet not enough to actually understand the
> interaction because there are no references to the actual
> synchronization scheme. I would go with something like:
>
> /*
> * Get a page_ext associated with the given page. Returns NULL if
> * no such page_ext exists otherwise ensures that the page_ext will
> * stay alive until page_ext_put is called.
> * This implies a non-sleeping context.
> */
Will update as per the Matthew input @
https://lore.kernel.org/all/YvplthTjM8Ez5DIq@casper.infradead.org/
>> + */
>> +struct page_ext *page_ext_get(struct page *page)
>> +{
>> + struct page_ext *page_ext;
>> +
>> + rcu_read_lock();
>> + page_ext = lookup_page_ext(page);
>> + if (!page_ext) {
>> + rcu_read_unlock();
>> + return NULL;
>> + }
>> +
>> + return page_ext;
>> +}
>> +
>> +/*
>> + * Must be called after work is done with the page_ext received
>> + * with page_ext_get().
>> + */
>> +
>> +void page_ext_put(void)
>> +{
>> + rcu_read_unlock();
>> +}
> Thinking about this some more I am not sure this is a good interface. It
> doesn't have any reference to the actual object this is called for. This
> is nicely visible in __folio_copy_owner which just calles page_ext_put()
> twice because there are 2 page_exts and I can already see how somebody
> might get confused this is just an error and send a patch to drop one of
> them.
>
> I do understand why you went this way because having a parameter which
> is not used will likely lead to the same situation. On the other hand it
> could be annotated to not raise warnings. One potential way to
> workaround that would be
>
> void page_ext_put(struct page_ext *page_ext)
> {
> if (unlikely(!page_ext))
> return;
>
> rcu_read_unlock();
> }
>
> which would help to make the api slightly more robust in case somebody
> does page_ext_put in a branch where page_ext_get returns NULL.
>
Looks better. Will change this accordingly.
> No strong opinion on that though. WDYI?
>
>> #ifndef CONFIG_SPARSEMEM
>>
>>
> [...]
>> @@ -183,19 +184,26 @@ static inline void __set_page_owner_handle(struct page_ext *page_ext,
>> noinline void __set_page_owner(struct page *page, unsigned short order,
>> gfp_t gfp_mask)
>> {
>> - struct page_ext *page_ext = lookup_page_ext(page);
>> + struct page_ext *page_ext = page_ext_get(page);
>> depot_stack_handle_t handle;
>>
>> if (unlikely(!page_ext))
>> return;
> Either add a comment like this
> /* save_stack can sleep in general so we have to page_ext_put */
Vlastimil suggested to go for save stack first since !page_ext is mostly
unlikely. Snip from his comments:
Why not simply do the save_stack() first and then page_ext_get() just
once? It should be really rare that it's NULL, so I don't think we save
much by avoiding an unnecessary save_stack(), while the overhead of
doing two get/put instead of one will affect every call.
https://lore.kernel.org/all/f5fd4942-b03e-1d1c-213b-9cd5283ced91@suse.cz/
>> + page_ext_put();
>>
>> handle = save_stack(gfp_mask);
> or just drop the initial page_ext_get altogether. This function is
> called only when page_ext is supposed to be initialized and !page_ext
> case above should be very unlikely. Or is there any reason to keep this?
>
>> +
>> + /* Ensure page_ext is valid after page_ext_put() above */
>> + page_ext = page_ext_get(page);
>> + if (unlikely(!page_ext))
>> + return;
>> __set_page_owner_handle(page_ext, handle, order, gfp_mask);
>> + page_ext_put();
>> }
>>
> [...]
>> @@ -558,13 +590,17 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>> */
>> handle = READ_ONCE(page_owner->handle);
>> if (!handle)
>> - continue;
>> + goto loop;
>>
>> /* Record the next PFN to read in the file offset */
>> *ppos = (pfn - min_low_pfn) + 1;
>>
>> + memcpy(&page_owner_tmp, page_owner, sizeof(struct page_owner));
>> + page_ext_put();
> why not
> page_owner_tmp = *page_owner;
Done!!
>
>> return print_page_owner(buf, count, pfn, page,
>> - page_owner, handle);
>> + &page_owner_tmp, handle);
>> +loop:
>> + page_ext_put();
>> }
>>
>> return 0;
next prev parent reply other threads:[~2022-08-16 9:56 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-09 14:46 Charan Teja Kalla
2022-08-10 1:57 ` Andrew Morton
2022-08-10 7:23 ` Michal Hocko
2022-08-10 8:27 ` Charan Teja Kalla
2022-08-10 11:40 ` Vlastimil Babka
2022-08-10 14:31 ` Charan Teja Kalla
[not found] ` <Yvpg6odyDsXrjw5i@dhcp22.suse.cz>
2022-08-15 15:26 ` Matthew Wilcox
2022-08-15 15:31 ` Michal Hocko
2022-08-16 9:34 ` Charan Teja Kalla [this message]
[not found] ` <YvvCpW0ep9N8CbDr@dhcp22.suse.cz>
2022-08-18 14:01 ` Charan Teja Kalla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=286e47e7-3d63-133c-aa6c-05100b557d42@quicinc.com \
--to=quic_charante@quicinc.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@soleen.com \
--cc=quic_pkondeti@quicinc.com \
--cc=shakeelb@google.com \
--cc=sieberf@amazon.com \
--cc=sjpark@amazon.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox