linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Solar Designer <solar@openwall.com>,
	Yunsheng Lin <linyunsheng@huawei.com>
Cc: Yunsheng Lin <yunshenglin0825@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Simon Horman <horms@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mina Almasry <almasrymina@google.com>,
	Yonglong Liu <liuyonglong@huawei.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Robin Murphy <robin.murphy@arm.com>,
	IOMMU <iommu@lists.linux.dev>,
	segoon@openwall.com, kernel-hardening@lists.openwall.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-mm@kvack.org,
	Qiuling Ren <qren@redhat.com>, Yuying Ma <yuma@redhat.com>,
	sultan@kerneltoast.com
Subject: Re: [PATCH net-next 3/3] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
Date: Thu, 20 Mar 2025 15:37:29 +0100	[thread overview]
Message-ID: <87ldszix92.fsf@toke.dk> (raw)
In-Reply-To: <20250320023202.GA25514@openwall.com>

Solar Designer <solar@openwall.com> writes:

> On Wed, Mar 19, 2025 at 07:06:57PM +0800, Yunsheng Lin wrote:
>> On 2025/3/19 4:55, Toke Høiland-Jørgensen wrote:
>> > Yunsheng Lin <linyunsheng@huawei.com> writes:
>> >> On 2025/3/17 23:16, Toke Høiland-Jørgensen wrote:
>> >>> Yunsheng Lin <yunshenglin0825@gmail.com> writes:
>> >>>> On 3/14/2025 6:10 PM, Toke Høiland-Jørgensen wrote:
>> >>>>
>> >>>> ...
>> >>>>
>> >>>>> To avoid having to walk the entire xarray on unmap to find the page
>> >>>>> reference, we stash the ID assigned by xa_alloc() into the page
>> >>>>> structure itself, using the upper bits of the pp_magic field. This
>> >>>>> requires a couple of defines to avoid conflicting with the
>> >>>>> POINTER_POISON_DELTA define, but this is all evaluated at compile-time,
>> >>>>> so does not affect run-time performance. The bitmap calculations in this
>> >>>>> patch gives the following number of bits for different architectures:
>> >>>>>
>> >>>>> - 24 bits on 32-bit architectures
>> >>>>> - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE)
>> >>>>> - 32 bits on other 64-bit architectures
>> >>>>
>> >>>>  From commit c07aea3ef4d4 ("mm: add a signature in struct page"):
>> >>>> "The page->signature field is aliased to page->lru.next and
>> >>>> page->compound_head, but it can't be set by mistake because the
>> >>>> signature value is a bad pointer, and can't trigger a false positive
>> >>>> in PageTail() because the last bit is 0."
>> >>>>
>> >>>> And commit 8a5e5e02fc83 ("include/linux/poison.h: fix LIST_POISON{1,2} 
>> >>>> offset"):
>> >>>> "Poison pointer values should be small enough to find a room in
>> >>>> non-mmap'able/hardly-mmap'able space."
>> >>>>
>> >>>> So the question seems to be:
>> >>>> 1. Is stashing the ID causing page->pp_magic to be in the mmap'able/
>> >>>>     easier-mmap'able space? If yes, how can we make sure this will not
>> >>>>     cause any security problem?
>> >>>> 2. Is the masking the page->pp_magic causing a valid pionter for
>> >>>>     page->lru.next or page->compound_head to be treated as a vaild
>> >>>>     PP_SIGNATURE? which might cause page_pool to recycle a page not
>> >>>>     allocated via page_pool.
>> >>>
>> >>> Right, so my reasoning for why the defines in this patch works for this
>> >>> is as follows: in both cases we need to make sure that the ID stashed in
>> >>> that field never looks like a valid kernel pointer. For 64-bit arches
>> >>> (where CONFIG_ILLEGAL_POINTER_VALUE), we make sure of this by never
>> >>> writing to any bits that overlap with the illegal value (so that the
>> >>> PP_SIGNATURE written to the field keeps it as an illegal pointer value).
>> >>> For 32-bit arches, we make sure of this by making sure the top-most bit
>> >>> is always 0 (the -1 in the define for _PP_DMA_INDEX_BITS) in the patch,
>> >>> which puts it outside the range used for kernel pointers (AFAICT).
>> >>
>> >> Is there any season you think only kernel pointer is relevant here?
>> > 
>> > Yes. Any pointer stored in the same space as pp_magic by other users of
>> > the page will be kernel pointers (as they come from page->lru.next). The
>> > goal of PP_SIGNATURE is to be able to distinguish pages allocated by
>> > page_pool, so we don't accidentally recycle a page from somewhere else.
>> > That's the goal of the check in page_pool_page_is_pp():
>> > 
>> > (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE
>> > 
>> > To achieve this, we must ensure that the check above never returns true
>> > for any value another page user could have written into the same field
>> > (i.e., into page->lru.next). For 64-bit arches, POISON_POINTER_DELTA
>> 
>> POISON_POINTER_DELTA is defined according to CONFIG_ILLEGAL_POINTER_VALUE,
>> if CONFIG_ILLEGAL_POINTER_VALUE is not defined yet, POISON_POINTER_DELTA
>> is defined to zero.
>> 
>> It seems only the below 64-bit arches define CONFIG_ILLEGAL_POINTER_VALUE
>> through grepping:
>> a29815a333c6 core, x86: make LIST_POISON less deadly
>> 5c178472af24 riscv: define ILLEGAL_POINTER_VALUE for 64bit
>> f6853eb561fb powerpc/64: Define ILLEGAL_POINTER_VALUE for 64-bit
>> bf0c4e047324 arm64: kconfig: Move LIST_POISON to a safe value
>> 
>> The below 64-bit arches don't seems to define the above config yet:
>> MIPS64, SPARC64, System z(S390X),loongarch
>> 
>> Does ID stashing cause problem for the above arches?
>> 
>> > serves this purpose. For 32-bit arches, we can leave the top-most bits
>> > out of PP_MAGIC_MASK, to make sure that any valid pointer value will
>> > fail the check above.
>> 
>> The above mainly explained how to ensure page_pool_page_is_pp() will
>> not return false positive result from the page_pool perspective.
>> 
>> From MM/security perspective, most of the commits quoted above seem
>> to suggest that poison pointer should be in the non-mmap'able or
>> hardly-mmap'able space, otherwise userspace can arrange for those
>> pointers to actually be dereferencable, potentially turning an oops
>> to an expolit, more detailed example in the below paper, which explains
>> how to exploit a vulnerability which hardened by the 8a5e5e02fc83 commit:
>> https://www.usenix.org/system/files/conference/woot15/woot15-paper-xu.pdf
>> 
>> ID stashing seems to cause page->lru.next (aliased to page->pp_magic) to
>> be in the mmap'able space for some arches.
>
> ...
>
>> To be honest, I am not that familiar with the pointer poison mechanism.
>> But through some researching and analyzing, it makes sense to overstate
>> it a little as it seems to be security-related.
>> Cc'ed some security-related experts and ML to see if there is some
>> clarifying from them.
>
> You're correct that the pointer poison values should be in areas not
> mmap'able by userspace (at least with reasonable mmap_min_addr values).
>
> Looking at the union inside "struct page", I see pp_magic is aliased
> against multiple pointers in the union'ed anonymous structs.
>
> I'm not familiar with the uses of page->pp_magic and how likely or not
> we are to have a bug where its aliasing with pointers would be exposed
> as an attack vector, but this does look like a serious security concern.
> It looks like we would be seriously weakening the poisoning, except on
> archs where the new values with ID stashing are still not mmap'able.
>
> I just discussed the matter with my colleague at CIQ, Sultan Alsawaf,
> and he thinks the added risk is not that bad.  He wrote:
>
>> Toke's response here is fair:
>> 
>> > Right, okay, I see what you mean. So the risk is basically the
>> > following:
>> > 
>> > If some other part of the kernel ends up dereferencing the
>> > page->lru.next pointer of a page that is owned by page_pool, and which
>> > has an ID stashed into page->pp_magic, that dereference can end up being
>> > to a valid userspace mapping, which can lead to Bad Things(tm), cf the
>> > paper above.
>> > 
>> > This is mitigated by the fact that it can only happen on architectures
>> > that don't set ILLEGAL_POINTER_VALUE (which includes 32-bit arches, and
>> > the ones you listed above). In addition, this has to happen while the
>> > page is owned by page_pool, and while it is DMA-mapped - we already
>> > clear the pp_magic field when releasing the page from page_pool.
>> > 
>> > I am not sure to what extent the above is a risk we should take pains to
>> > avoid, TBH. It seems to me that for this to become a real problem, lots
>> > of other things will already have gone wrong. But happy to defer to the
>> > mm/security folks here.
>> 
>> For this to be a problem, there already needs to be a use-after-free on
>> a page, which arguably creates many other vectors for attack.
>> 
>> The lru field of struct page is already used as a generic list pointer
>> in several places in the kernel once ownership of the page is obtained.
>> Any risk of dereferencing lru.next in a use-after-free scenario would
>> technically apply to a bunch of other places in the kernel (grep for
>> page->lru).
>
> We also tried searching for existing exploitation techniques for "struct
> page" use-after-free.  We couldn't find any.  The closest (non-)match I
> found is this fine research (the same project presented differently):
>
> https://i.blackhat.com/BH-US-24/Presentations/US24-Qian-PageJack-A-Powerful-Exploit-Technique-With-Page-Level-UAF-Thursday.pdf page 33+
> https://arxiv.org/html/2401.17618v2#S4
> https://phrack.org/issues/71/13
>
> The arxiv paper includes this sentence: "To create a page-level UAF, the
> key is to cause a UAF of the struct page objects."  However, we do not
> see them actually do that, and this statement is not found in the slides
> nor in the Phrack article.  Confused.

Thank you for weighing in! I will post an updated version of this patch
with a reference to this discussion and try to summarise the above.

> Thank you for CC'ing me and the kernel-hardening list.  However, please
> do not CC the oss-security list like that, where it's against content
> guidelines.  Only properly focused new postings/threads are acceptable
> there (not CC'ing from/to other lists where only part of the content is
> on-topic, and follow-ups might not be on-topic at all).  See:
>
> https://oss-security.openwall.org/wiki/mailing-lists/oss-security#list-content-guidelines
>
> As a moderator for oss-security, I'm going to remove these messages from
> the queue now.  Please drop Cc: oss-security from any further replies.
>
> If desired, we may bring these topics to oss-security separately, with a
> proper Subject line and clear description of what we're talking about.

Noted, thanks!

-Toke



  reply	other threads:[~2025-03-20 14:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-14 10:10 [PATCH net-next 0/3] Fix late DMA unmap crash for page pool Toke Høiland-Jørgensen
2025-03-14 10:10 ` [PATCH net-next 1/3] page_pool: Move pp_magic check into helper functions Toke Høiland-Jørgensen
2025-03-21 17:16   ` Mina Almasry
2025-03-14 10:10 ` [PATCH net-next 2/3] page_pool: Turn dma_sync and dma_sync_cpu fields into a bitmap Toke Høiland-Jørgensen
2025-03-15  9:31   ` Yunsheng Lin
2025-03-17 14:55     ` Toke Høiland-Jørgensen
2025-03-21 23:13   ` Mina Almasry
2025-03-14 10:10 ` [PATCH net-next 3/3] page_pool: Track DMA-mapped pages and unmap them when destroying the pool Toke Høiland-Jørgensen
2025-03-15  9:46   ` Yunsheng Lin
2025-03-17 15:16     ` Toke Høiland-Jørgensen
2025-03-18 12:39       ` Yunsheng Lin
2025-03-18 20:55         ` Toke Høiland-Jørgensen
2025-03-19 11:06           ` Yunsheng Lin
2025-03-19 12:18             ` Toke Høiland-Jørgensen
2025-03-20 11:17               ` Yunsheng Lin
2025-03-20 14:34                 ` Toke Høiland-Jørgensen
2025-03-20  2:32             ` Solar Designer
2025-03-20 14:37               ` Toke Høiland-Jørgensen [this message]
2025-03-17  2:02 ` [PATCH net-next 0/3] Fix late DMA unmap crash for page pool Yonglong Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ldszix92.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=asml.silence@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=iommu@lists.linux.dev \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=liuyonglong@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=qren@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=saeedm@nvidia.com \
    --cc=segoon@openwall.com \
    --cc=solar@openwall.com \
    --cc=sultan@kerneltoast.com \
    --cc=tariqt@nvidia.com \
    --cc=willy@infradead.org \
    --cc=yuma@redhat.com \
    --cc=yunshenglin0825@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox