From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64C0FC02198 for ; Wed, 12 Feb 2025 15:36:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE7776B0082; Wed, 12 Feb 2025 10:36:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D98456B0083; Wed, 12 Feb 2025 10:36:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5F3A6B0085; Wed, 12 Feb 2025 10:36:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A72246B0082 for ; Wed, 12 Feb 2025 10:36:14 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 50993A1F4C for ; Wed, 12 Feb 2025 15:36:14 +0000 (UTC) X-FDA: 83111693868.06.2E099DE Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185]) by imf22.hostedemail.com (Postfix) with ESMTP id C270FC0014 for ; Wed, 12 Feb 2025 15:36:10 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="A44NwY/p"; spf=pass (imf22.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.185 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739374572; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Unmtqq3rbJGV4I/h1riYDozrdTcvq+CZZyEIdT7M0DY=; b=upiYg+/HAMffVlTd78DaE1DBOARtXEBrFRXJ3SgQ6RgN1887XxwJI7DQ3B6nKYe+oSmrMk HyTMOza7GNG1tQvkDjLms7KGj0bGLb0j3+ySXqZDnHrV/f9TWUhsiW/H6Hkdq3imltZry+ SZyui13dJp9gNZzlYerEc7sd2JlJO+k= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="A44NwY/p"; spf=pass (imf22.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.185 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739374572; a=rsa-sha256; cv=none; b=A3m84uSquCkCWerzoaLw1KHdBQ08Q9rHS0niWa9jRpULFLIEhPBKO9DQvOAEXvgOHIFOnc ICHegMXleCF2Xqs8DKbr+CDhTzltemkgHe5LjvUz2DKAJA4BqpjIjZAz/xXXURRe70adw1 TIY9w1G7Rj7DPt6trItHBZIq6yr8jhg= Date: Wed, 12 Feb 2025 15:35:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1739374566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Unmtqq3rbJGV4I/h1riYDozrdTcvq+CZZyEIdT7M0DY=; b=A44NwY/phhv2iAkDHNJxCRG3J2jgmc0nEfYGep4VdQA0wFOE3jUri8IMVCArvBlhRHFcmv nNtWdGEilhtXX80TGd/erLK39H0KQ3Ti5d1ImN0++SPGn8NYqpGkVVvH3C9HtJJFqE/Wcb 9QEsS4nty91DxsWTr0JFJTyhmWaC13s= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Sergey Senozhatsky Cc: Andrew Morton , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kairui Song Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Message-ID: References: <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv> <6uhsj4bckhursiblkxe54azfgyqal6tq2de3lpkxw6omkised6@uylodcjruuei> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C270FC0014 X-Stat-Signature: ffcoazq8quun5bj897qerem7p338tcgf X-HE-Tag: 1739374570-589801 X-HE-Meta: U2FsdGVkX1/mIcmvZi4RLqeT01l30H3Gyes9aA03wObn22l81kG8E71xpevGBUxjhcFo/2GvXG9b1WIX8TF+PfELNSiqljAfBVLLkIIpeBcwG/ulD/y1MBeblMGGUSbO+EKZnpl1Et4bh6lYw80MJym1WbIgyNMFr8uUTS46mteMektzPKyb5R4gVk610pLIEAq1lGKTWud2dteATzPSin/G9WRHjaVJ/iGiaaepjK+bAriEjBk1Ki5Ou9bXVQ4iTqZUqJ/4uqdbgDVp28rocXVbpiR/ISmN1K/CGOaycGntz/NHC2/6KQl9W4WgH5dQqo7HGLRZqJK4tgWqVpQRhRxYzWC4Q0/m88WgwF+bopCbeCfoKHXMAHMM/EIRBLcPuig8wNJjDEefplG9Xc/ei2KlGjEM8/G/3aLsJtzjUGCSlG+3w4QQvYs6ZWHwuw/b6iu39ITIkAZGCQrxePwLH0M5dGNGmp/lTLvabhWm5FVHBBTBth9upmUWRvFz1/UZVO0ccT4ns6oX9t8Hz+S5DrIZS6mmuT5Az7Vr6u2UGytgivwtJWxBQ4Yd5iJGvdLKU8T37+C1B0p9dfTJSasGp7MANCpL7h3h67J/ZNgIx+Z1+qDydV578LGkYxhQK7oNvZBfYOE3mj9lQXbg41R0pHI34FDz2PtnMzHH3thEZMCWGrqqA7oKgV/2TpA19AXNH34KeDH22xdQAZu/Det9tGTnpG/oHHICEK4aI8ztBr5CCf1QCzlKtYkBPXUB4y17ZlNIBKCAClxo+X+9s5uDQyCmfvGQZKlHfJfYG/perFY1xcxn9cdSknzYnloc0jCx9jB6Qyh2X/S3jJoAKoD7bg4ZXOuP100V1ar4go1oE3OhR8yjck3lfB5HfApI8HejjmgfIEhLBxEo1UAjCgIUnIjDrNXwoCbItawdzPNIieCDHSrkwllMwYgX16ZqPzoV83ZQIETSoerigfy1Uud Q0s53Vej FT3nqhRa49eSDE23BrQLJ1OjLWiTbW7RMYg9mo210aZqBwNV1f+gSHrgNaZ+xeokrFvdAhe0UubBzLmA9y6EemiKaCKbbs29mM4XROTZgnQaOfJi54fbAFSCxlv9iwil9XS6MbgNYrUKdkYq8fkesym3RbrQ7doRiPm2Zvi7jAlab+r5vOC38mgFCe3PsX7cbhScHKI6uXQXuaIMzWYc15BxaaQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 12, 2025 at 02:00:26PM +0900, Sergey Senozhatsky wrote: > On (25/02/07 21:09), Yosry Ahmed wrote: > > Can we do some perf testing to make sure this custom locking is not > > regressing performance (selfishly I'd like some zswap testing too)? > > So for zsmalloc I (usually) write some simple testing code which is > triggered via sysfs (device attr) and that is completely reproducible, > so that I compares apples to apples. In this particular case I just > have a loop that creates objects (we don't need to compress or decompress > anything, zsmalloc doesn't really care) > > - echo 1 > /sys/ ... / test_prepare > > for (sz = 32; sz < PAGE_SIZE; sz += 64) { > for (i = 0; i < 4096; i++) { > ent->handle = zs_malloc(zram->mem_pool, sz) > list_add(ent) > } > } > > > And now I just `perf stat` writes: > > - perf stat echo 1 > /sys/ ... / test_exec_old > > list_for_each_entry > zs_map_object(ent->handle, ZS_MM_RO); > zs_unmap_object(ent->handle) > > list_for_each_entry > dst = zs_map_object(ent->handle, ZS_MM_WO); > memcpy(dst, tmpbuf, ent->sz) > zs_unmap_object(ent->handle) > > > > - perf stat echo 1 > /sys/ ... / test_exec_new > > list_for_each_entry > dst = zs_obj_read_begin(ent->handle, loc); > zs_obj_read_end(ent->handle, dst); > > list_for_each_entry > zs_obj_write(ent->handle, tmpbuf, ent->sz); > > > - echo 1 > /sys/ ... / test_finish > > free all handles and ent-s > > > The nice part is that we don't depend on any of the upper layers, we > don't even need to compress/decompress anything; we allocate objects > of required sizes and memcpy static data there (zsmalloc doesn't have > any opinion on that) and that's pretty much it. > > > OLD API > ======= > > 10 runs > > 369,205,778 instructions # 0.80 insn per cycle > 40,467,926 branches # 113.732 M/sec > > 369,002,122 instructions # 0.62 insn per cycle > 40,426,145 branches # 189.361 M/sec > > 369,051,170 instructions # 0.45 insn per cycle > 40,434,677 branches # 157.574 M/sec > > 369,014,522 instructions # 0.63 insn per cycle > 40,427,754 branches # 201.464 M/sec > > 369,019,179 instructions # 0.64 insn per cycle > 40,429,327 branches # 198.321 M/sec > > 368,973,095 instructions # 0.64 insn per cycle > 40,419,245 branches # 234.210 M/sec > > 368,950,705 instructions # 0.64 insn per cycle > 40,414,305 branches # 231.460 M/sec > > 369,041,288 instructions # 0.46 insn per cycle > 40,432,599 branches # 155.576 M/sec > > 368,964,080 instructions # 0.67 insn per cycle > 40,417,025 branches # 245.665 M/sec > > 369,036,706 instructions # 0.63 insn per cycle > 40,430,860 branches # 204.105 M/sec > > > NEW API > ======= > > 10 runs > > 265,799,293 instructions # 0.51 insn per cycle > 29,834,567 branches # 170.281 M/sec > > 265,765,970 instructions # 0.55 insn per cycle > 29,829,019 branches # 161.602 M/sec > > 265,764,702 instructions # 0.51 insn per cycle > 29,828,015 branches # 189.677 M/sec > > 265,836,506 instructions # 0.38 insn per cycle > 29,840,650 branches # 124.237 M/sec > > 265,836,061 instructions # 0.36 insn per cycle > 29,842,285 branches # 137.670 M/sec > > 265,887,080 instructions # 0.37 insn per cycle > 29,852,881 branches # 126.060 M/sec > > 265,769,869 instructions # 0.57 insn per cycle > 29,829,873 branches # 210.157 M/sec > > 265,803,732 instructions # 0.58 insn per cycle > 29,835,391 branches # 186.940 M/sec > > 265,766,624 instructions # 0.58 insn per cycle > 29,827,537 branches # 212.609 M/sec > > 265,843,597 instructions # 0.57 insn per cycle > 29,843,650 branches # 171.877 M/sec > > > x old-api-insn > + new-api-insn > +-------------------------------------------------------------------------------------+ > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |A A| > +-------------------------------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 10 3.689507e+08 3.6920578e+08 3.6901918e+08 3.6902586e+08 71765.519 > + 10 2.657647e+08 2.6588708e+08 2.6580373e+08 2.6580734e+08 42187.024 > Difference at 95.0% confidence > -1.03219e+08 +/- 55308.7 > -27.9705% +/- 0.0149878% > (Student's t, pooled s = 58864.4) Thanks for sharing these results, but I wonder if this will capture regressions from locking changes (e.g. a lock being preemtible)? IIUC this is counting the instructions executed in these paths, and that won't change if the task gets preempted. Lock contention may be captured as extra instructions, but I am not sure we'll directly see its effect in terms of serialization and delays. I think we also need some high level testing (e.g. concurrent swapins/swapouts) to find that out. I think that's what Kairui's testing covers.