From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEFBEC0218D for ; Wed, 29 Jan 2025 17:31:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 546D6280053; Wed, 29 Jan 2025 12:31:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F63B280051; Wed, 29 Jan 2025 12:31:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 395DF280053; Wed, 29 Jan 2025 12:31:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 18E39280051 for ; Wed, 29 Jan 2025 12:31:34 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8FCD71C53BA for ; Wed, 29 Jan 2025 17:31:33 +0000 (UTC) X-FDA: 83061181266.24.8C4E884 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) by imf09.hostedemail.com (Postfix) with ESMTP id 43556140018 for ; Wed, 29 Jan 2025 17:31:31 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LI1WNSyu; spf=pass (imf09.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.182 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738171891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8bKkJ+QqgmynLY95kvGLDVcYDCpRIJCz4/woWyo1YIU=; b=Hbed244r1CnNa3BSW233ue5eEEXnb42hgzdIEw5Wuz0ck/GcS82VlcKjWaw8HiX/4YvKIh HJFmv6skkUL9M8vRMpdYxlKCeyMdm1OScxQ17vBMyU85NerrYhqhRslnBZCEGbjvEtcw+q WwI6r0CeFBPf6IqtXm2LpxUcd4aup6I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738171891; a=rsa-sha256; cv=none; b=3cJxIHtS74Ft2AxWxqCXUt2vQzTETsk6R9IBqwC+iWDuSuP+zMOj5eM4yB/a1aAjQDwSQ7 C6O/fygFSUYV85e/VC9PUelVGyYMYIiGa1QYy5zl5BSOHSRiJ0tEKLBzh2PVZyxCBVNbFk MDrRX4gbkscFNu9CUqPdlKr4M+AgHrk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LI1WNSyu; spf=pass (imf09.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.182 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Wed, 29 Jan 2025 17:31:20 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1738171884; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8bKkJ+QqgmynLY95kvGLDVcYDCpRIJCz4/woWyo1YIU=; b=LI1WNSyuNWz1wCkbOG8xsSAPALDAleNU3ap3pBr3S63TvDLKLAi5Uzv2pNqRgLTDxXS8E/ p4eEUv+IH97RCjfU6GHtWj9DvpNHPBb0SJJATwlGSTujpk5eB4hnIvfTGy7CmSmbV+gssy kLppY//oEuPRnaFq5jwGeEaCMLC+Nic= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Sergey Senozhatsky Cc: Andrew Morton , Minchan Kim , Johannes Weiner , Nhat Pham , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv1 4/6] zsmalloc: introduce new object mapping API Message-ID: References: <20250129064853.2210753-1-senozhatsky@chromium.org> <20250129064853.2210753-5-senozhatsky@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250129064853.2210753-5-senozhatsky@chromium.org> X-Migadu-Flow: FLOW_OUT X-Stat-Signature: zfue11ywmpix16rpchwwxh7qpkrnyek8 X-Rspam-User: X-Rspamd-Queue-Id: 43556140018 X-Rspamd-Server: rspam03 X-HE-Tag: 1738171891-975768 X-HE-Meta: U2FsdGVkX1+d8GxYhHjNAhROtQKM+aOw8nx3LFBbWVYuMUD6CiTnfoo0LhsaVfz9DAWUCfrfiN0Znlebc7xqU2Cmkhw0wZCK7pecf7ZcOs+eZ5ag1emT+AKBGPdoZQVUBzItWleInq76e0uVsat74bJDR0VZBWTott1CGXygmY5r8SxIVVfjuA5kjT3Z+8y+J71lZ3V7D1eiF/uz4H9evn/KbwDuWfguGaYPeDlgZqyvO9Uezola7nV/ut94OAkNl+RmjtPAKdB5muMNLJ2ukR90AbsX+7Oy7xdhN1ZXDJs51GPCcrzWJdXrouKMu5pVgsuck+47vNT+dAT6xZ/oHjaj6NijeZdzwIJ2/aeW181WiGRwjDpvxBWKgXKL8OzXtJnI38QcvgMTMRLQORrjKh2I2pLkFY68zlWKddYAiYtq+SIhn9FcOfY0TYyts2P5l+tELyJQG1SmNwgtzhnh1meQhvJuuaol4Xlb1NqcYAzZZuGTiLIgEoRmhd0IlCKf2yEMwmwKdjQHk+stxNdEf/Gw5PVZnMTzKQBorzoMHx+9WYVdc+fBs/jJxBDIVMvaLu7dn5CzoCvxzA0fCqhjpLoDNVccqhf7dGbmafR+6XehullzmvVENnMwrcCOjPGS2R8b12Q6+DsVV+0nRf2nEvdpAVcK8GyEov96u9URyj5Gt2RadMDvVgT4Nrvf7FQZbnYqspXuaPMi7o/kKMRPaJ27+a1XRF2U92/OQ2PETSAiYZzgqFhamLXG/abDzbpWQaoCSfjf1hrLBUHNCsnAQhkqMv6D4TpGBRW8i5lSjHVs/JyBQa3V47ZcGXW1WSYsw8YscPsCNAiwNS3FL4cVbPMMHLFmnHGmxoHc6QSU/mmd/tAViccFcWApMDXKobEp9FCSpBBZGoAHy/ry7RrS90nlwYEuvsyptp++EKtTyQGwP65uGVzxk5vahOYxfpxTq4qaK2CrDbWF8hf8+rg n+5WtIhS d7tMWGwC2pfzithkyBtGia5YttPXB4uw1ADloTYWzLN0oY43yTGSZVzfEieSPn2jqu5RUrvYzY8XzHBxUvIPxYAkc59flXpBGnc2jJqH126FsZtlk9BuWKIIdztCOuo2KACvMCaFU3/Kfj6ugFldON0XLWZr2ZMKd+7znzLdpiH23QQ7YeVhcMKV0f/Hw1ub49DYl5VtqyHfscDXmlIW8+FFC0GQsBNb9Nh7Isg6pwp99ERaZTb3WPVphgHhSpF5451vXcs8TKZk0L2jx37WVZyEIfQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 29, 2025 at 03:43:50PM +0900, Sergey Senozhatsky wrote: > Current object mapping API is a little cumbersome. First, it's > inconsistent, sometimes it returns with page-faults disabled and > sometimes with page-faults enabled. Second, and most importantly, > it enforces atomicity restrictions on its users. zs_map_object() > has to return a liner object address which is not always possible > because some objects span multiple physical (non-contiguous) > pages. For such objects zsmalloc uses a per-CPU buffer to which > object's data is copied before a pointer to that per-CPU buffer > is returned back to the caller. This leads to another, final, > issue - extra memcpy(). Since the caller gets a pointer to > per-CPU buffer it can memcpy() data only to that buffer, and > during zs_unmap_object() zsmalloc will memcpy() from that per-CPU > buffer to physical pages that object in question spans across. > > New API splits functions by access mode: > - zs_obj_read_begin(handle, local_copy) > Returns a pointer to handle memory. For objects that span two > physical pages a local_copy buffer is used to store object's > data before the address is returned to the caller. Otherwise > the object's page is kmap_local mapped directly. > > - zs_obj_read_end(handle, buf) > Unmaps the page if it was kmap_local mapped by zs_obj_read_begin(). > > - zs_obj_write(handle, buf, len) > Copies len-bytes from compression buffer to handle memory > (takes care of objects that span two pages). This does not > need any additional (e.g. per-CPU) buffers and writes the data > directly to zsmalloc pool pages. > > The old API will stay around until the remaining users switch > to the new one. After that we'll also remove zsmalloc per-CPU > buffer and CPU hotplug handling. I will propose removing zbud (in addition to z3fold) soon. If that gets in then we'd only need to update zpool and zswap code to use the new API. I can take care of that if you want. > > Signed-off-by: Sergey Senozhatsky I have a couple of questions below, but generally LGTM: Reviewed-by: Yosry Ahmed > --- > include/linux/zsmalloc.h | 8 +++ > mm/zsmalloc.c | 129 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 137 insertions(+) > > diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h > index a48cd0ffe57d..625adae8e547 100644 > --- a/include/linux/zsmalloc.h > +++ b/include/linux/zsmalloc.h > @@ -58,4 +58,12 @@ unsigned long zs_compact(struct zs_pool *pool); > unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size); > > void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); > + > +void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > + void *handle_mem); > +void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > + void *local_copy); Nit: Any reason to put 'end' before 'begin'? Same for the function definitions. > +void zs_obj_write(struct zs_pool *pool, unsigned long handle, > + void *handle_mem, size_t mem_len); > + > #endif > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 8f4011713bc8..0e21bc57470b 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1371,6 +1371,135 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) > } > EXPORT_SYMBOL_GPL(zs_unmap_object); > > +void zs_obj_write(struct zs_pool *pool, unsigned long handle, > + void *handle_mem, size_t mem_len) > +{ > + struct zspage *zspage; > + struct zpdesc *zpdesc; > + unsigned long obj, off; > + unsigned int obj_idx; > + struct size_class *class; > + > + WARN_ON(in_interrupt()); > + > + /* Guarantee we can get zspage from handle safely */ > + pool_read_lock(pool); > + obj = handle_to_obj(handle); > + obj_to_location(obj, &zpdesc, &obj_idx); > + zspage = get_zspage(zpdesc); > + > + /* Make sure migration doesn't move any pages in this zspage */ > + zspage_read_lock(zspage); > + pool_read_unlock(pool); > + > + class = zspage_class(pool, zspage); > + off = offset_in_page(class->size * obj_idx); > + > + if (off + class->size <= PAGE_SIZE) { > + /* this object is contained entirely within a page */ > + void *dst = kmap_local_zpdesc(zpdesc); > + > + if (!ZsHugePage(zspage)) > + off += ZS_HANDLE_SIZE; > + memcpy(dst + off, handle_mem, mem_len); > + kunmap_local(dst); > + } else { > + size_t sizes[2]; > + > + /* this object spans two pages */ > + off += ZS_HANDLE_SIZE; Are huge pages always stored in a single page? If yes, can we just do this before the if block for both cases: if (!ZsHugePage(zspage)) off += ZS_HANDLE_SIZE; > + sizes[0] = PAGE_SIZE - off; > + sizes[1] = mem_len - sizes[0]; > + > + memcpy_to_page(zpdesc_page(zpdesc), off, > + handle_mem, sizes[0]); > + zpdesc = get_next_zpdesc(zpdesc); > + memcpy_to_page(zpdesc_page(zpdesc), 0, > + handle_mem + sizes[0], sizes[1]); > + } > + > + zspage_read_unlock(zspage); > +} > +EXPORT_SYMBOL_GPL(zs_obj_write); > + > +void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > + void *handle_mem) > +{ > + struct zspage *zspage; > + struct zpdesc *zpdesc; > + unsigned long obj, off; > + unsigned int obj_idx; > + struct size_class *class; > + > + obj = handle_to_obj(handle); > + obj_to_location(obj, &zpdesc, &obj_idx); > + zspage = get_zspage(zpdesc); > + class = zspage_class(pool, zspage); > + off = offset_in_page(class->size * obj_idx); > + > + if (off + class->size <= PAGE_SIZE) { > + if (!ZsHugePage(zspage)) > + off += ZS_HANDLE_SIZE; > + handle_mem -= off; > + kunmap_local(handle_mem); > + } > + > + zspage_read_unlock(zspage); > +} > +EXPORT_SYMBOL_GPL(zs_obj_read_end); > + > +void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > + void *local_copy) > +{ > + struct zspage *zspage; > + struct zpdesc *zpdesc; > + unsigned long obj, off; > + unsigned int obj_idx; > + struct size_class *class; > + void *addr; > + > + WARN_ON(in_interrupt()); > + > + /* Guarantee we can get zspage from handle safely */ > + pool_read_lock(pool); > + obj = handle_to_obj(handle); > + obj_to_location(obj, &zpdesc, &obj_idx); > + zspage = get_zspage(zpdesc); > + > + /* Make sure migration doesn't move any pages in this zspage */ > + zspage_read_lock(zspage); > + pool_read_unlock(pool); > + > + class = zspage_class(pool, zspage); > + off = offset_in_page(class->size * obj_idx); > + > + if (off + class->size <= PAGE_SIZE) { > + /* this object is contained entirely within a page */ > + addr = kmap_local_zpdesc(zpdesc); > + addr += off; > + } else { > + size_t sizes[2]; > + > + /* this object spans two pages */ > + sizes[0] = PAGE_SIZE - off; > + sizes[1] = class->size - sizes[0]; > + addr = local_copy; > + > + memcpy_from_page(addr, zpdesc_page(zpdesc), > + off, sizes[0]); > + zpdesc = get_next_zpdesc(zpdesc); > + memcpy_from_page(addr + sizes[0], > + zpdesc_page(zpdesc), > + 0, sizes[1]); > + } > + > + if (!ZsHugePage(zspage)) > + addr += ZS_HANDLE_SIZE; > + > + return addr; > +} > +EXPORT_SYMBOL_GPL(zs_obj_read_begin); > + > /** > * zs_huge_class_size() - Returns the size (in bytes) of the first huge > * zsmalloc &size_class. > -- > 2.48.1.262.g85cc9f2d1e-goog >