linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
@ 2026-02-09 19:36 Michael Fara
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Fara @ 2026-02-09 19:36 UTC (permalink / raw)
  To: senozhatsky; +Cc: linux-mm, linux-kernel, akpm, mjfara

get_next_zpdesc() calls get_zspage() which unconditionally dereferences
zpdesc->zspage without a NULL check. This causes a kernel oops when
zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
between zspage destruction and page compaction/migration.

The race window is documented in a TODO comment in zs_page_migrate():

    "nothing prevents a zspage from getting destroyed while it is
    isolated for migration, as the page lock is temporarily dropped
    after zs_page_isolate() succeeded"

The sequence is:
  1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
     page lock.
  2. Concurrently, async_free_zspage() or free_zspage() destroys the
     zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
  3. A subsequent zs_free() path calls trylock_zspage(), which iterates
     zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
     NULL backpointer, causing:

       BUG: kernel NULL pointer dereference, address: 0000000000000000
       RIP: 0010:free_zspage+0x26/0x100
       Call Trace:
        zs_free+0xf4/0x110
        zswap_entry_free+0x7e/0x160

The migration side already has a NULL guard (zs_page_migrate line 1675:
"if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
protection.

Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
instead of going through get_zspage(), and returning NULL when the
backpointer is NULL. This stops iteration safely — the caller treats
it as the end of the page chain.

Signed-off-by: Michael Fara <mjfara@gmail.com>
---
 mm/zsmalloc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)

 static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
 {
-	struct zspage *zspage = get_zspage(zpdesc);
+	struct zspage *zspage = zpdesc->zspage;
+
+	/*
+	 * If the backpointer is NULL, this zpdesc was already freed via
+	 * reset_zpdesc() by a racing async_free_zspage() while isolated
+	 * for compaction. See the TODO comment in zs_page_migrate().
+	 */
+	if (unlikely(!zspage)) {
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	BUG_ON(zspage->magic != ZSPAGE_MAGIC);

 	if (unlikely(ZsHugePage(zspage)))
 		return NULL;
--
2.39.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
  2026-02-09 22:50 ` Joshua Hahn
  2026-02-09 23:16   ` Joshua Hahn
@ 2026-02-18  5:56   ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2026-02-18  5:56 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Michael Fara, senozhatsky, linux-mm, linux-kernel, akpm,
	Minchan Kim, Brian Geffon

* Recovered from SPAM folder *

On (26/02/09 14:50), Joshua Hahn wrote:
> On Mon,  9 Feb 2026 19:32:57 +0000 Michael Fara <mjfara@gmail.com> wrote:
> 
> Hello Michael,
> 
> I hope you are doing well! Thank you for the patch. I'm not entirely sure if
> the race condition that you note here is correct, and also if this is the
> right fix.

Thanks for taking a look!

> > get_next_zpdesc() calls get_zspage() which unconditionally dereferences
> > zpdesc->zspage without a NULL check. This causes a kernel oops when
> > zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
> > between zspage destruction and page compaction/migration.
> > 
> > The race window is documented in a TODO comment in zs_page_migrate():
> > 
> >     "nothing prevents a zspage from getting destroyed while it is
> >     isolated for migration, as the page lock is temporarily dropped
> >     after zs_page_isolate() succeeded"
> 
> I'm taking a look at zsmalloc these days, and I was confused by this comment
> and remember looking into what was going on here as well : -) My thoughts
> were that it should be safe in every other case, though.
> 
> > The sequence is:
> >   1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
> >      page lock.
> >   2. Concurrently, async_free_zspage() or free_zspage() destroys the
> >      zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
> 
> async_free_zspage isolates the zspage first by taking the class lock and
> then splicing the zspage out. free_zspage is always called with the
> class lock held, and it likewise removes itself from the fullness list.
> 
> zs_free -> free_zspage -> trylock_zspage holds the class->lock
> In zs_page_migrate, we call replace_sub_page to remove the zpdesc from the
> chain before calling reset_zpdesc, so I don't think there would be a race there
> either.

Right, am having same thoughts.  I need to look closer, but sort
of expect that class->lock should have kept us safe here.

> >   3. A subsequent zs_free() path calls trylock_zspage(), which iterates
> >      zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
> >      NULL backpointer, causing:
> 
> So I don't see how a subsequent zs_free could race with reset_zpdesc on
> the same zpdesc, maybe I'm missing something? : -)
> 
> >        BUG: kernel NULL pointer dereference, address: 0000000000000000
> >        RIP: 0010:free_zspage+0x26/0x100
> >        Call Trace:
> >         zs_free+0xf4/0x110
> >         zswap_entry_free+0x7e/0x160
> 
> Is this from a real crash? A more detailed crash dump would be helpful to
> see what else was happening on the other threads!
> 
> > The migration side already has a NULL guard (zs_page_migrate line 1675:
> > "if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
> > protection.
> > 
> > Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
> > instead of going through get_zspage(), and returning NULL when the
> > backpointer is NULL. This stops iteration safely — the caller treats
> > it as the end of the page chain.
> 
> If we return NULL early, what happens to the remaining zpdescs in the
> zspage? In trylock_zspage for instance, we might exit early and think we
> successfully locked all zpdescs, when that isn't the case. It would eventually
> lead to a VM_BUG_ON_PAGE when trying to free the zspage later anyways.

Agreed.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
  2026-02-09 19:37 Michael Fara
  2026-02-18  5:01 ` Sergey Senozhatsky
@ 2026-02-18  5:46 ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2026-02-18  5:46 UTC (permalink / raw)
  To: Michael Fara
  Cc: senozhatsky, linux-mm, linux-kernel, akpm, mjfara, Minchan Kim,
	Brian Geffon

On (26/02/09 19:37), Michael Fara wrote:
[..]
> The sequence is:
>   1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
>      page lock.
>   2. Concurrently, async_free_zspage() or free_zspage() destroys the
>      zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
>   3. A subsequent zs_free() path calls trylock_zspage(), which iterates
>      zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
>      NULL backpointer, causing:
> 
>        BUG: kernel NULL pointer dereference, address: 0000000000000000
>        RIP: 0010:free_zspage+0x26/0x100
>        Call Trace:
>         zs_free+0xf4/0x110
>         zswap_entry_free+0x7e/0x160
> 
> The migration side already has a NULL guard (zs_page_migrate line 1675:
> "if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
> protection.
> 
> Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
> instead of going through get_zspage(), and returning NULL when the
> backpointer is NULL. This stops iteration safely — the caller treats
> it as the end of the page chain.
> 
> Signed-off-by: Michael Fara <mjfara@gmail.com>
> ---
>  mm/zsmalloc.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)
> 
>  static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
>  {
> -	struct zspage *zspage = get_zspage(zpdesc);
> +	struct zspage *zspage = zpdesc->zspage;
> +
> +	/*
> +	 * If the backpointer is NULL, this zpdesc was already freed via
> +	 * reset_zpdesc() by a racing async_free_zspage() while isolated
> +	 * for compaction. See the TODO comment in zs_page_migrate().
> +	 */
> +	if (unlikely(!zspage)) {
> +		WARN_ON_ONCE(1);
> +		return NULL;
> +	}

I need to look closer, but the quick glance suggests that this is a
problematic approach.  We can't just return NULL from get_next_zpdesc()
because this can potentially cause issues in the callers.  E.g.
trylock_zspage() will treat NULL as the end of the page chain and
return success, which is clearly wrong.  We also have a bunch of
other callers that never expect NULL from get_next_zpdesc().


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
  2026-02-09 19:37 Michael Fara
@ 2026-02-18  5:01 ` Sergey Senozhatsky
  2026-02-18  5:46 ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2026-02-18  5:01 UTC (permalink / raw)
  To: Michael Fara
  Cc: senozhatsky, linux-mm, linux-kernel, akpm, mjfara, Minchan Kim,
	Brian Geffon

Cc-ing Minchan and Brian.

On (26/02/09 19:37), Michael Fara wrote:
> get_next_zpdesc() calls get_zspage() which unconditionally dereferences
> zpdesc->zspage without a NULL check. This causes a kernel oops when
> zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
> between zspage destruction and page compaction/migration.
> 
> The race window is documented in a TODO comment in zs_page_migrate():
> 
>     "nothing prevents a zspage from getting destroyed while it is
>     isolated for migration, as the page lock is temporarily dropped
>     after zs_page_isolate() succeeded"
> 
> The sequence is:
>   1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
>      page lock.
>   2. Concurrently, async_free_zspage() or free_zspage() destroys the
>      zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
>   3. A subsequent zs_free() path calls trylock_zspage(), which iterates
>      zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
>      NULL backpointer, causing:
> 
>        BUG: kernel NULL pointer dereference, address: 0000000000000000
>        RIP: 0010:free_zspage+0x26/0x100
>        Call Trace:
>         zs_free+0xf4/0x110
>         zswap_entry_free+0x7e/0x160
> 
> The migration side already has a NULL guard (zs_page_migrate line 1675:
> "if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
> protection.
> 
> Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
> instead of going through get_zspage(), and returning NULL when the
> backpointer is NULL. This stops iteration safely — the caller treats
> it as the end of the page chain.
> 
> Signed-off-by: Michael Fara <mjfara@gmail.com>

JFI: all of your emails ended up in the SPAM folder, somehow.
Recovered.

[..]
> @@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)
> 
>  static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
>  {
> -	struct zspage *zspage = get_zspage(zpdesc);
> +	struct zspage *zspage = zpdesc->zspage;
> +
> +	/*
> +	 * If the backpointer is NULL, this zpdesc was already freed via
> +	 * reset_zpdesc() by a racing async_free_zspage() while isolated
> +	 * for compaction. See the TODO comment in zs_page_migrate().
> +	 */
> +	if (unlikely(!zspage)) {
> +		WARN_ON_ONCE(1);

What is the purpose of this WARN_ON_ONCE()?

> +		return NULL;
> +	}
> +
> +	BUG_ON(zspage->magic != ZSPAGE_MAGIC);

We can't add new BUG_ON().

>  	if (unlikely(ZsHugePage(zspage)))
>  		return NULL;
> --
> 2.39.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
  2026-02-09 22:50 ` Joshua Hahn
@ 2026-02-09 23:16   ` Joshua Hahn
  2026-02-18  5:56   ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Joshua Hahn @ 2026-02-09 23:16 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Michael Fara, senozhatsky, linux-mm, linux-kernel, akpm

I just noticed that Minchan wasn't Cc-ed in the patch. Adding him to the
conversation here.

On Mon,  9 Feb 2026 14:50:16 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> On Mon,  9 Feb 2026 19:32:57 +0000 Michael Fara <mjfara@gmail.com> wrote:
> 
> Hello Michael,
> 
> I hope you are doing well! Thank you for the patch. I'm not entirely sure if
> the race condition that you note here is correct, and also if this is the
> right fix.
> 
> > get_next_zpdesc() calls get_zspage() which unconditionally dereferences
> > zpdesc->zspage without a NULL check. This causes a kernel oops when
> > zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
> > between zspage destruction and page compaction/migration.

[...snip...]

Should we add a Fixes tag here as well?
> > Signed-off-by: Michael Fara <mjfara@gmail.com>
> > ---
> >  mm/zsmalloc.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)
> > 
> >  static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
> >  {
> > -	struct zspage *zspage = get_zspage(zpdesc);
> > +	struct zspage *zspage = zpdesc->zspage;
> > +
> > +	/*
> > +	 * If the backpointer is NULL, this zpdesc was already freed via
> > +	 * reset_zpdesc() by a racing async_free_zspage() while isolated
> > +	 * for compaction. See the TODO comment in zs_page_migrate().
> > +	 */
> > +	if (unlikely(!zspage)) {
> > +		WARN_ON_ONCE(1);
> > +		return NULL;
> > +	}
> > +
> > +	BUG_ON(zspage->magic != ZSPAGE_MAGIC);
> > 
> >  	if (unlikely(ZsHugePage(zspage)))
> >  		return NULL;
> > --
> > 2.39.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
  2026-02-09 19:32 Michael Fara
@ 2026-02-09 22:50 ` Joshua Hahn
  2026-02-09 23:16   ` Joshua Hahn
  2026-02-18  5:56   ` Sergey Senozhatsky
  0 siblings, 2 replies; 8+ messages in thread
From: Joshua Hahn @ 2026-02-09 22:50 UTC (permalink / raw)
  To: Michael Fara; +Cc: senozhatsky, linux-mm, linux-kernel, akpm

On Mon,  9 Feb 2026 19:32:57 +0000 Michael Fara <mjfara@gmail.com> wrote:

Hello Michael,

I hope you are doing well! Thank you for the patch. I'm not entirely sure if
the race condition that you note here is correct, and also if this is the
right fix.

> get_next_zpdesc() calls get_zspage() which unconditionally dereferences
> zpdesc->zspage without a NULL check. This causes a kernel oops when
> zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
> between zspage destruction and page compaction/migration.
> 
> The race window is documented in a TODO comment in zs_page_migrate():
> 
>     "nothing prevents a zspage from getting destroyed while it is
>     isolated for migration, as the page lock is temporarily dropped
>     after zs_page_isolate() succeeded"

I'm taking a look at zsmalloc these days, and I was confused by this comment
and remember looking into what was going on here as well : -) My thoughts
were that it should be safe in every other case, though.

> The sequence is:
>   1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
>      page lock.
>   2. Concurrently, async_free_zspage() or free_zspage() destroys the
>      zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.

async_free_zspage isolates the zspage first by taking the class lock and
then splicing the zspage out. free_zspage is always called with the
class lock held, and it likewise removes itself from the fullness list.

zs_free -> free_zspage -> trylock_zspage holds the class->lock
In zs_page_migrate, we call replace_sub_page to remove the zpdesc from the
chain before calling reset_zpdesc, so I don't think there would be a race there
either.

>   3. A subsequent zs_free() path calls trylock_zspage(), which iterates
>      zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
>      NULL backpointer, causing:

So I don't see how a subsequent zs_free could race with reset_zpdesc on
the same zpdesc, maybe I'm missing something? : -)

>        BUG: kernel NULL pointer dereference, address: 0000000000000000
>        RIP: 0010:free_zspage+0x26/0x100
>        Call Trace:
>         zs_free+0xf4/0x110
>         zswap_entry_free+0x7e/0x160

Is this from a real crash? A more detailed crash dump would be helpful to
see what else was happening on the other threads!

> The migration side already has a NULL guard (zs_page_migrate line 1675:
> "if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
> protection.
> 
> Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
> instead of going through get_zspage(), and returning NULL when the
> backpointer is NULL. This stops iteration safely — the caller treats
> it as the end of the page chain.

If we return NULL early, what happens to the remaining zpdescs in the
zspage? In trylock_zspage for instance, we might exit early and think we
successfully locked all zpdescs, when that isn't the case. It would eventually
lead to a VM_BUG_ON_PAGE when trying to free the zspage later anyways.

If you have a more detailed crash trace, that would be really helpful.
Thank you again for this patch, I hope you have a great day!
Joshua

> Signed-off-by: Michael Fara <mjfara@gmail.com>
> ---
>  mm/zsmalloc.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)
> 
>  static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
>  {
> -	struct zspage *zspage = get_zspage(zpdesc);
> +	struct zspage *zspage = zpdesc->zspage;
> +
> +	/*
> +	 * If the backpointer is NULL, this zpdesc was already freed via
> +	 * reset_zpdesc() by a racing async_free_zspage() while isolated
> +	 * for compaction. See the TODO comment in zs_page_migrate().
> +	 */
> +	if (unlikely(!zspage)) {
> +		WARN_ON_ONCE(1);
> +		return NULL;
> +	}
> +
> +	BUG_ON(zspage->magic != ZSPAGE_MAGIC);
> 
>  	if (unlikely(ZsHugePage(zspage)))
>  		return NULL;
> --
> 2.39.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
@ 2026-02-09 19:37 Michael Fara
  2026-02-18  5:01 ` Sergey Senozhatsky
  2026-02-18  5:46 ` Sergey Senozhatsky
  0 siblings, 2 replies; 8+ messages in thread
From: Michael Fara @ 2026-02-09 19:37 UTC (permalink / raw)
  To: senozhatsky; +Cc: linux-mm, linux-kernel, akpm, mjfara

get_next_zpdesc() calls get_zspage() which unconditionally dereferences
zpdesc->zspage without a NULL check. This causes a kernel oops when
zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
between zspage destruction and page compaction/migration.

The race window is documented in a TODO comment in zs_page_migrate():

    "nothing prevents a zspage from getting destroyed while it is
    isolated for migration, as the page lock is temporarily dropped
    after zs_page_isolate() succeeded"

The sequence is:
  1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
     page lock.
  2. Concurrently, async_free_zspage() or free_zspage() destroys the
     zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
  3. A subsequent zs_free() path calls trylock_zspage(), which iterates
     zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
     NULL backpointer, causing:

       BUG: kernel NULL pointer dereference, address: 0000000000000000
       RIP: 0010:free_zspage+0x26/0x100
       Call Trace:
        zs_free+0xf4/0x110
        zswap_entry_free+0x7e/0x160

The migration side already has a NULL guard (zs_page_migrate line 1675:
"if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
protection.

Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
instead of going through get_zspage(), and returning NULL when the
backpointer is NULL. This stops iteration safely — the caller treats
it as the end of the page chain.

Signed-off-by: Michael Fara <mjfara@gmail.com>
---
 mm/zsmalloc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)

 static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
 {
-	struct zspage *zspage = get_zspage(zpdesc);
+	struct zspage *zspage = zpdesc->zspage;
+
+	/*
+	 * If the backpointer is NULL, this zpdesc was already freed via
+	 * reset_zpdesc() by a racing async_free_zspage() while isolated
+	 * for compaction. See the TODO comment in zs_page_migrate().
+	 */
+	if (unlikely(!zspage)) {
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	BUG_ON(zspage->magic != ZSPAGE_MAGIC);

 	if (unlikely(ZsHugePage(zspage)))
 		return NULL;
--
2.39.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc
@ 2026-02-09 19:32 Michael Fara
  2026-02-09 22:50 ` Joshua Hahn
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Fara @ 2026-02-09 19:32 UTC (permalink / raw)
  To: senozhatsky; +Cc: linux-mm, linux-kernel, akpm, mjfara

get_next_zpdesc() calls get_zspage() which unconditionally dereferences
zpdesc->zspage without a NULL check. This causes a kernel oops when
zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
between zspage destruction and page compaction/migration.

The race window is documented in a TODO comment in zs_page_migrate():

    "nothing prevents a zspage from getting destroyed while it is
    isolated for migration, as the page lock is temporarily dropped
    after zs_page_isolate() succeeded"

The sequence is:
  1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
     page lock.
  2. Concurrently, async_free_zspage() or free_zspage() destroys the
     zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
  3. A subsequent zs_free() path calls trylock_zspage(), which iterates
     zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
     NULL backpointer, causing:

       BUG: kernel NULL pointer dereference, address: 0000000000000000
       RIP: 0010:free_zspage+0x26/0x100
       Call Trace:
        zs_free+0xf4/0x110
        zswap_entry_free+0x7e/0x160

The migration side already has a NULL guard (zs_page_migrate line 1675:
"if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
protection.

Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
instead of going through get_zspage(), and returning NULL when the
backpointer is NULL. This stops iteration safely — the caller treats
it as the end of the page chain.

Signed-off-by: Michael Fara <mjfara@gmail.com>
---
 mm/zsmalloc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)

 static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
 {
-	struct zspage *zspage = get_zspage(zpdesc);
+	struct zspage *zspage = zpdesc->zspage;
+
+	/*
+	 * If the backpointer is NULL, this zpdesc was already freed via
+	 * reset_zpdesc() by a racing async_free_zspage() while isolated
+	 * for compaction. See the TODO comment in zs_page_migrate().
+	 */
+	if (unlikely(!zspage)) {
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	BUG_ON(zspage->magic != ZSPAGE_MAGIC);

 	if (unlikely(ZsHugePage(zspage)))
 		return NULL;
--
2.39.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-02-18  5:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-09 19:36 [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc Michael Fara
  -- strict thread matches above, loose matches on Subject: below --
2026-02-09 19:37 Michael Fara
2026-02-18  5:01 ` Sergey Senozhatsky
2026-02-18  5:46 ` Sergey Senozhatsky
2026-02-09 19:32 Michael Fara
2026-02-09 22:50 ` Joshua Hahn
2026-02-09 23:16   ` Joshua Hahn
2026-02-18  5:56   ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox