linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Nhat Pham <nphamcs@gmail.com>, Minchan Kim <minchan@kernel.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Brian Geffon <bgeffon@google.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH] zsmalloc: use actual object size to detect spans
Date: Wed, 7 Jan 2026 05:19:16 +0000	[thread overview]
Message-ID: <u6dy6oa6ztghy7ozficimubhb2mwppcq6gosupepnn63uu6oq7@qyph3nyq7las> (raw)
In-Reply-To: <tzppylvpaq7lk4re33h4niofgfwiziibp3jdoos2ofeepxx5yh@sh2doyzze25z>

On Wed, Jan 07, 2026 at 11:20:20AM +0900, Sergey Senozhatsky wrote:
> On (26/01/07 02:10), Yosry Ahmed wrote:
> > On Wed, Jan 07, 2026 at 11:06:09AM +0900, Sergey Senozhatsky wrote:
> > > On (26/01/07 01:56), Yosry Ahmed wrote:
> > > > > I recall us having exactly this idea when we first introduced
> > > > > zs_obj_{read,write}_end() functions, and I do recall that it
> > > > > did not work.  Somehow this panics in __memcpy+0xc/0x44.  Let
> > > > > me dig into it again.
> > > > 
> > > > Maybe because at this point we are trying to memcpy() class->size, which
> > > > already includes ZS_HANDLE_SIZE. So reading after increasing the offset
> > > > reads ZS_HANDLE_SIZE after class->size.
> > > 
> > > Yeah, I guess that falsely hits the spanning path because of extra
> > > sizeof(unsigned long).
> > 
> > Or the object could be spanning two pages indeed, but we're copying
> > extra sizeof(unsigned long), that shouldn't crash tho.
> 
> It seems there is no second page, it's a pow-of-two size class.  So
> we mis-detect spanning.
> 
> [   51.406310] zsmalloc: :: size class 48, orig offt 16336, page size 16384, memcpy sizes 40, 8
> [   51.407571] Unable to handle kernel paging request at virtual address ffffc04000000000
> [   51.420816] pc : __memcpy+0xc/0x44
> 
> Second memcpy() of sizeof(unsigned long) traps.

I think this case is exactly what you expected earlier (not sure what
you mean by the pow of 2 reply). We increase the offset by 8 bytes
(ZS_HANDLE_SIZE), but we still copy 48 bytes, even though 48 bytes
includes both the object and ZS_HANDLE_SIZE. So we end up copying 8
bytes beyond the end of the object, which puts us in the next page which
we should not be copying.

I think to fix the bug at this point we need to subtract ZS_HANDLE_SIZE
from class->size before we use it for copying or spanning detection.

Something like (untested):

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5bf832f9c05c..894783d2526c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1072,6 +1072,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
        unsigned long obj, off;
        unsigned int obj_idx;
        struct size_class *class;
+       unsigned long size;
        void *addr;

        /* Guarantee we can get zspage from handle safely */
@@ -1087,7 +1088,13 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
        class = zspage_class(pool, zspage);
        off = offset_in_page(class->size * obj_idx);

-       if (off + class->size <= PAGE_SIZE) {
+       size = class->size;
+       if (!ZsHugePage(zspage)) {
+               off += ZS_HANDLE_SIZE;
+               size -= ZS_HANDLE_SIZE;
+       }
+
+       if (off + size <= PAGE_SIZE) {
                /* this object is contained entirely within a page */
                addr = kmap_local_zpdesc(zpdesc);
                addr += off;
@@ -1096,7 +1103,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,

                /* this object spans two pages */
                sizes[0] = PAGE_SIZE - off;
-               sizes[1] = class->size - sizes[0];
+               sizes[1] = size - sizes[0];
                addr = local_copy;

                memcpy_from_page(addr, zpdesc_page(zpdesc),
@@ -1107,9 +1114,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
                                 0, sizes[1]);
        }

-       if (!ZsHugePage(zspage))
-               addr += ZS_HANDLE_SIZE;
-
        return addr;
 }
 EXPORT_SYMBOL_GPL(zs_obj_read_begin);
@@ -1121,6 +1125,7 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
        struct zpdesc *zpdesc;
        unsigned long obj, off;
        unsigned int obj_idx;
+       unsigned long size;
        struct size_class *class;

        obj = handle_to_obj(handle);
@@ -1129,9 +1134,13 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
        class = zspage_class(pool, zspage);
        off = offset_in_page(class->size * obj_idx);

-       if (off + class->size <= PAGE_SIZE) {
-               if (!ZsHugePage(zspage))
-                       off += ZS_HANDLE_SIZE;
+       size = class->size;
+       if (!ZsHugePage(zspage)) {
+               off += ZS_HANDLE_SIZE;
+               size -= ZS_HANDLE_SIZE;
+       }
+
+       if (off + size <= PAGE_SIZE) {
                handle_mem -= off;
                kunmap_local(handle_mem);
        }

> 
> > I think the changes need to be shuffled around to avoid this, or just
> > have a combined patch, which would be less pretty.
> 
> I think I prefer a shuffle.
> 
> There is another possible improvement point (UNTESTED): if the first
> page holds only ZS_HANDLE bytes, then we can avoid memcpy() path and
> instead just kmap the second page + offset.

Yeah good point.


  parent reply	other threads:[~2026-01-07  5:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-06  4:25 Sergey Senozhatsky
2026-01-07  0:23 ` Yosry Ahmed
2026-01-07  0:59   ` Sergey Senozhatsky
2026-01-07  1:37     ` Sergey Senozhatsky
2026-01-07  1:56       ` Yosry Ahmed
2026-01-07  2:06         ` Sergey Senozhatsky
2026-01-07  2:10           ` Yosry Ahmed
2026-01-07  2:20             ` Sergey Senozhatsky
2026-01-07  2:22               ` Sergey Senozhatsky
2026-01-07  5:19               ` Yosry Ahmed [this message]
2026-01-07  5:30                 ` Sergey Senozhatsky
2026-01-07  7:12                   ` Sergey Senozhatsky
2026-01-07  3:03             ` Sergey Senozhatsky
2026-01-07  5:22               ` Yosry Ahmed
2026-01-07  5:38                 ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=u6dy6oa6ztghy7ozficimubhb2mwppcq6gosupepnn63uu6oq7@qyph3nyq7las \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox