From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D42B5D2E002 for ; Wed, 23 Oct 2024 00:57:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A4E66B008A; Tue, 22 Oct 2024 20:57:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67BD36B00B9; Tue, 22 Oct 2024 20:57:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CEF26B00BA; Tue, 22 Oct 2024 20:57:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 26D2F6B008A for ; Tue, 22 Oct 2024 20:57:39 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7466640660 for ; Wed, 23 Oct 2024 00:57:29 +0000 (UTC) X-FDA: 82703053608.06.F3140AD Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf24.hostedemail.com (Postfix) with ESMTP id 1C1E318000E for ; Wed, 23 Oct 2024 00:57:33 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=F12yxnly; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729645005; a=rsa-sha256; cv=none; b=IUhzrueRaSmNzKA1oqy4lShbkIiPqLQjw4BO3Mc6+ztiXNEhRQginDh3/RJ1ead7g2Xq2R EHbBVj/Bcl8I9m3Xop3uevGC5RSM21zvtOpTVZzTw3BMLGEQRDfmJ0iU2d0xrqYaiRVQxe pm0b06PjauC0NtDV1ZUO2K6eQ72hGvk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=F12yxnly; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729645005; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MeXSAL+y+3H2wvc+euef7BeJQcQ0H2qAVwYECCHhQc4=; b=0TZAUPBoFhPDk7RHL3vCrTDbfsa4XgPbtcFeENTo3g7PnhV4yUtW5Lf44yoL1w4wn2t0OY BAu4uKUpEyVHbivOjL8vkTMldV8RfXqOUd7EdXthxkcm/g6xZwhdaxLKy8bF6GP4ls3/2w q+ZWmk9H1WaublPNVWjC70MDFrGff24= Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-539fb49c64aso8775840e87.0 for ; Tue, 22 Oct 2024 17:57:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729645055; x=1730249855; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MeXSAL+y+3H2wvc+euef7BeJQcQ0H2qAVwYECCHhQc4=; b=F12yxnlyEBvCvB/k94muLhGeecCzCcDCIE0Vum8lWTaIdKrV9aCYL/Nh8ypdw2gVxl pDhvI+QT7tEYvH+ePssUbNnWlGemjZ2hltk8duo4leYzzgEVdguVO3u124v2SVan6DDr EYjCzGx/hYwIBH3OMKmQ8+k0Ct3t5OCaKe0/SvT4gVp+p41e5fEMXf1QutWLDEpZDW64 bc5KK1mhFIGvJoQmJUsDBXTgKYob6gnf8N7VOSGdcitdF5S/uRoEO2pDDrU2jerJoA/Z SHViSaXboPtdEsEAQHnxHhkzlRnEuqxkUeesZXcdxjTmdY/OBOD1zTA+ocbGvaHvtarc lxNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729645055; x=1730249855; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MeXSAL+y+3H2wvc+euef7BeJQcQ0H2qAVwYECCHhQc4=; b=SrokFH2y7LAMMMhJEHg/T4xaSPe2hrF4yWKLYZWc2EQXHvQIYiyJREMh8CHmbNMvdj Rg5i51rcmBGfTb0oY9OmcwUNf9IwcoxkvbIBI0dd4syhtqutZdC+N9Mh0UgbXARlZIQj PeQH0hPSjH6T6sQm9Q/pqUgqhBUMP62orGVE3DQM6No577onf3+jDsRao6rxl/CQLm1H 1k3Vez18VsHOCzb4/9qL2HThb4Hx+TojOphwF9RPa5JWlYG2vwELO5n6txCPzay1Jo/B MJW7+7TnbF7+dLJA6p1E87adNHAA3QHrFf5IR2zor7z5Zc1qzCSTZJRLVoO/DxIWQ+2P tXSQ== X-Forwarded-Encrypted: i=1; AJvYcCUiMxOOoadNdP22m5wR6xJ5mnqmzHC8o5lEFZL20gChzWuVCVVy1M4G4EwuE+0dYRFwYD/LJfXlrw==@kvack.org X-Gm-Message-State: AOJu0Yw5tQKGskf/bheI1KuslbHVbFcIeAVopzBEjYHXExPHiXGyG0rW qcttYAdE+Iif+UvYKqH9G9GZO+i9pgfNH77FANcL94fOIBsBN8I6h/UVKPS1g0p3ceoL81cHn5U Q9g6sSo+JTU40LYybxuHO6ENeoPL8kx/51HgC X-Google-Smtp-Source: AGHT+IF4UEmrRhjIlbtGWa81MoTbxoRdBd5botAhosqlKJWcq9ld67eM1ZXWrrmloPnHZeSED5Xqm1DxaRCBX2N5r4w= X-Received: by 2002:a05:6512:10c4:b0:530:b773:b4ce with SMTP id 2adb3069b0e04-53b1a34e1camr501661e87.33.1729645054540; Tue, 22 Oct 2024 17:57:34 -0700 (PDT) MIME-Version: 1.0 References: <20241018064101.336232-1-kanchana.p.sridhar@intel.com> In-Reply-To: <20241018064101.336232-1-kanchana.p.sridhar@intel.com> From: Yosry Ahmed Date: Tue, 22 Oct 2024 17:56:58 -0700 Message-ID: Subject: Re: [RFC PATCH v1 00/13] zswap IAA compress batching To: Kanchana P Sridhar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com, zanussi@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mcgrof@kernel.org, kees@kernel.org, joel.granados@kernel.org, bfoster@redhat.com, willy@infradead.org, linux-fsdevel@vger.kernel.org, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 1C1E318000E X-Rspamd-Server: rspam01 X-Stat-Signature: 7anb7tgoehrpnckykz1grw3wnkhhfypi X-HE-Tag: 1729645053-434280 X-HE-Meta: U2FsdGVkX19096o40ad+DLqO8Lc9pRiev+SlmDg69Qxsl4NG5CN0Y1NQTxM6+jk8xoQj0nuJUATx8b8vDiuKXoJpGQuZLrP6Nz/Ez2+Okp0ULQ1j4+Cv+7SlHoNvHJy1Ug/ujrSjlEzpLW72AddZfjk+Apkw/T+h+IXEsaPEY+Yuk32tJQwO5iS6aC+isJ76vn2zjYqE8h11q1f/T/l9/a66+xpIOzZP3VqmixYF90PsvU3BheGFovVR2hPmG7Hq1oGFPqrQrId9QnGW8bFSvUZncvoMeyczHkULB1Gl/iUtbNbIssYfR/QJRG3sTk7PRzLDhJfCLRXiPx7alOSeS267URj9Pe1vzSpORsClsPdYOYteowi3sH2SjFQ55Xs7ghKgetEc/lgbt+aZhNhFMLPIHN+wGQanJTthzokb//Z2aq7T5zgOBhoSc562vTrrhKS73GOmARF0MMJ3sJwDQQxemVatFBLwo4M71NWn02LJAhry0fqVn8k4v9u61djP5PrOcF2CU6w9YrChxuko+memNO7AExbkCrySITZ1pqar+pc/M6NCuu3ZUEPqnIknr8+kVMfEIOd09zbJ32VHGLjlXiEv3acgxl+9VBRdnnMvZwCrw+Aio4ZehvqjqlkO1o8e3FxQrkzsLnVLpwsvegHey5oDua2RCPhPkKl306bL5bHfNxTrIXq3KNfRqEbGko2FuunNPfO1PZTwalePkUv/eFrU4Dgn0DRcEi0E6EsgloIihU5bc/8lKLAAnYvD62LBM4/s0qKKq5kx/S3YGm4v0LTzQ0YIrolrKTCTgM4TBAE2t1Jr+gmtlqifuKUggob/UeObhAXsdXDpiflY57Eu488GkKrGcstmpZWKae7nK5cdI7ivUD6vg5qOVmSyGeBZC2YkkbpqIlAhqmWBjhgsdCB9Sv7RwmhHL4+S+cgcOc12zFSGYw2exTd9wPsvV5sKTEIRxBzkQH59ull hf8bsjVd qdb6/dfmYMflEJeEnuKKhsdTnMvHeKOAi6MaZt50ph6To8jv3ylg3HkUt1EbnQsZuHEKB00s6lr+tbRDWu0nddsTXrUScFLfqHI3Kjutull2suoA29ycHYg0pcZ0MMKij9J1AhMc/7bIsfyH0Lfe/IXpPZ/9og2bz3hN+hqSTW9MD+q49UFWoUUfdjV9/X2zkVM8nUBFclsVbByrsLUOtwsNrLdPM6Sh7AS56 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 17, 2024 at 11:41=E2=80=AFPM Kanchana P Sridhar wrote: > > > IAA Compression Batching: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > This RFC patch-series introduces the use of the Intel Analytics Accelerat= or > (IAA) for parallel compression of pages in a folio, and for batched recla= im > of hybrid any-order batches of folios in shrink_folio_list(). > > The patch-series is organized as follows: > > 1) iaa_crypto driver enablers for batching: Relevant patches are tagged > with "crypto:" in the subject: > > a) async poll crypto_acomp interface without interrupts. > b) crypto testmgr acomp poll support. > c) Modifying the default sync_mode to "async" and disabling > verify_compress by default, to facilitate users to run IAA easily = for > comparison with software compressors. > d) Changing the cpu-to-iaa mappings to more evenly balance cores to I= AA > devices. > e) Addition of a "global_wq" per IAA, which can be used as a global > resource for the socket. If the user configures 2WQs per IAA devic= e, > the driver will distribute compress jobs from all cores on the > socket to the "global_wqs" of all the IAA devices on that socket, = in > a round-robin manner. This can be used to improve compression > throughput for workloads that see a lot of swapout activity. > > 2) Migrating zswap to use async poll in zswap_compress()/decompress(). > 3) A centralized batch compression API that can be used by swap modules. > 4) IAA compress batching within large folio zswap stores. > 5) IAA compress batching of any-order hybrid folios in > shrink_folio_list(). The newly added "sysctl vm.compress-batchsize" > parameter can be used to configure the number of folios in [1, 32] to > be reclaimed using compress batching. I am still digesting this series but I have some high level questions that I left on some patches. My intuition though is that we should drop (5) from the initial proposal as it's most controversial. Batching reclaim of unrelated folios through zswap *might* make sense, but it needs a broader conversation and it needs justification on its own merit, without the rest of the series. > > IAA compress batching can be enabled only on platforms that have IAA, by > setting this config variable: > > CONFIG_ZSWAP_STORE_BATCHING_ENABLED=3D"y" > > The performance testing data with usemem 30 instances shows throughput > gains of up to 40%, elapsed time reduction of up to 22% and sys time > reduction of up to 30% with IAA compression batching. > > Our internal validation of IAA compress/decompress batching in highly > contended Sapphire Rapids server setups with workloads running on 72 core= s > for ~25 minutes under stringent memory limit constraints have shown up to > 50% reduction in sys time and 3.5% reduction in workload run time as > compared to software compressors. > > > System setup for testing: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Testing of this patch-series was done with mm-unstable as of 10-16-2024, > commit 817952b8be34, without and with this patch-series. > Data was gathered on an Intel Sapphire Rapids server, dual-socket 56 core= s > per socket, 4 IAA devices per socket, 503 GiB RAM and 525G SSD disk > partition swap. Core frequency was fixed at 2500MHz. > > The vm-scalability "usemem" test was run in a cgroup whose memory.high > was fixed at 150G. The is no swap limit set for the cgroup. 30 usemem > processes were run, each allocating and writing 10G of memory, and sleepi= ng > for 10 sec before exiting: > > usemem --init-time -w -O -s 10 -n 30 10g > > Other kernel configuration parameters: > > zswap compressor : deflate-iaa > zswap allocator : zsmalloc > vm.page-cluster : 2,4 > > IAA "compression verification" is disabled and the async poll acomp > interface is used in the iaa_crypto driver (the defaults with this > series). > > > Performance testing (usemem30): > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > > 4K folios: deflate-iaa: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > ------------------------------------------------------------------------= ------- > mm-unstable-10-16-2024 shrink_folio_list() shrink_folio= _list() > batching of folios batching of= folios > ------------------------------------------------------------------------= ------- > zswap compressor deflate-iaa deflate-iaa defl= ate-iaa > vm.compress-batchsize n/a 1 = 32 > vm.page-cluster 2 2 = 2 > ------------------------------------------------------------------------= ------- > Total throughput 4,470,466 5,770,824 6,= 363,045 > (KB/s) > Average throughput 149,015 192,360 = 212,101 > (KB/s) > elapsed time 119.24 100.96 = 92.99 > (sec) > sys time (sec) 2,819.29 2,168.08 1= ,970.79 > > ------------------------------------------------------------------------= ------- > memcg_high 668,185 646,357 = 613,421 > memcg_swap_fail 0 0 = 0 > zswpout 62,991,796 58,275,673 53,= 070,201 > zswpin 431 415 = 396 > pswpout 0 0 = 0 > pswpin 0 0 = 0 > thp_swpout 0 0 = 0 > thp_swpout_fallback 0 0 = 0 > pgmajfault 3,137 3,085 = 3,440 > swap_ra 99 100 = 95 > swap_ra_hit 42 44 = 45 > ------------------------------------------------------------------------= ------- > > > 16k/32/64k folios: deflate-iaa: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > All three large folio sizes 16k/32/64k were enabled to "always". > > ------------------------------------------------------------------------= ------- > mm-unstable- zswap_store() + shrink_folio_list() > 10-16-2024 batching of batching of folios > pages in > large folios > ------------------------------------------------------------------------= ------- > zswap compr deflate-iaa deflate-iaa deflate-iaa > vm.compress- n/a n/a 4 8 = 16 > batchsize > vm.page- 2 2 2 2 = 2 > cluster > ------------------------------------------------------------------------= ------- > Total throughput 7,182,198 8,448,994 8,584,728 8,729,643 8,= 775,944 > (KB/s) > Avg throughput 239,406 281,633 286,157 290,988 = 292,531 > (KB/s) > elapsed time 85.04 77.84 77.03 75.18 = 74.98 > (sec) > sys time (sec) 1,730.77 1,527.40 1,528.52 1,473.76 1= ,465.97 > > ------------------------------------------------------------------------= ------- > memcg_high 648,125 694,188 696,004 699,728 = 724,887 > memcg_swap_fail 1,550 2,540 1,627 1,577 = 1,517 > zswpout 57,606,876 56,624,450 56,125,082 55,999,42 57,= 352,204 > zswpin 421 406 422 400 = 437 > pswpout 0 0 0 0 = 0 > pswpin 0 0 0 0 = 0 > thp_swpout 0 0 0 0 = 0 > thp_swpout_fallback 0 0 0 0 = 0 > 16kB-mthp_swpout_ 0 0 0 0 = 0 > fallback > 32kB-mthp_swpout_ 0 0 0 0 = 0 > fallback > 64kB-mthp_swpout_ 1,550 2,539 1,627 1,577 = 1,517 > fallback > pgmajfault 3,102 3,126 3,473 3,454 = 3,134 > swap_ra 107 144 109 124 = 181 > swap_ra_hit 51 88 45 66 = 107 > ZSWPOUT-16kB 2 3 4 4 = 3 > ZSWPOUT-32kB 0 2 1 1 = 0 > ZSWPOUT-64kB 3,598,889 3,536,556 3,506,134 3,498,324 3,= 582,921 > SWPOUT-16kB 0 0 0 0 = 0 > SWPOUT-32kB 0 0 0 0 = 0 > SWPOUT-64kB 0 0 0 0 = 0 > ------------------------------------------------------------------------= ------- > > > 2M folios: deflate-iaa: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > ------------------------------------------------------------------------= ------- > mm-unstable-10-16-2024 zswap_store() batching of pa= ges > in pmd-mappable fol= ios > ------------------------------------------------------------------------= ------- > zswap compressor deflate-iaa deflate-iaa > vm.compress-batchsize n/a n/a > vm.page-cluster 2 2 > ------------------------------------------------------------------------= ------- > Total throughput 7,444,592 8,916,349 > (KB/s) > Average throughput 248,153 297,211 > (KB/s) > elapsed time 86.29 73.44 > (sec) > sys time (sec) 1,833.21 1,418.58 > > ------------------------------------------------------------------------= ------- > memcg_high 81,786 89,905 > memcg_swap_fail 82 395 > zswpout 58,874,092 57,721,884 > zswpin 422 458 > pswpout 0 0 > pswpin 0 0 > thp_swpout 0 0 > thp_swpout_fallback 82 394 > pgmajfault 14,864 21,544 > swap_ra 34,953 53,751 > swap_ra_hit 34,895 53,660 > ZSWPOUT-2048kB 114,815 112,269 > SWPOUT-2048kB 0 0 > ------------------------------------------------------------------------= ------- > > Since 4K folios account for ~0.4% of all zswapouts when pmd-mappable foli= os > are enabled for usemem30, we cannot expect much improvement from reclaim > batching. > > > Performance testing (Kernel compilation): > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > As mentioned earlier, for workloads that see a lot of swapout activity, w= e > can benefit from configuring 2 WQs per IAA device, with compress jobs fro= m > all same-socket cores being distributed toothe wq.1 of all IAAs on the > socket, with the "global_wq" developed in this patch-series. > > Although this data includes IAA decompress batching, which will be > submitted as a separate RFC patch-series, I am listing it here to quantif= y > the benefit of distributing compress jobs among all IAAs. The kernel > compilation test with "allmodconfig" is able to quantify this well: > > > 4K folios: deflate-iaa: kernel compilation to quantify crypto patches > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > ------------------------------------------------------------------------= ------ > IAA shrink_folio_list() compress batching and > swapin_readahead() decompress batching > > 1WQ 2WQ (distribute compress j= obs) > > 1 local WQ (wq.0) 1 local WQ (wq.0) + > per IAA 1 global WQ (wq.1) per IAA > > ------------------------------------------------------------------------= ------ > zswap compressor deflate-iaa deflate-iaa > vm.compress-batchsize 32 32 > vm.page-cluster 4 4 > ------------------------------------------------------------------------= ------ > real_sec 746.77 745.42 > user_sec 15,732.66 15,738.85 > sys_sec 5,384.14 5,247.86 > Max_Res_Set_Size_KB 1,874,432 1,872,640 > > ------------------------------------------------------------------------= ------ > zswpout 101,648,460 104,882,982 > zswpin 27,418,319 29,428,515 > pswpout 213 22 > pswpin 207 6 > pgmajfault 21,896,616 23,629,768 > swap_ra 6,054,409 6,385,080 > swap_ra_hit 3,791,628 3,985,141 > ------------------------------------------------------------------------= ------ > > The iaa_crypto wq stats will show almost the same number of compress call= s > for wq.1 of all IAA devices. wq.0 will handle decompress calls exclusivel= y. > We see a latency reduction of 2.5% by distributing compress jobs among al= l > IAA devices on the socket. > > I would greatly appreciate code review comments for the iaa_crypto driver > and mm patches included in this series! > > Thanks, > Kanchana > > > > Kanchana P Sridhar (13): > crypto: acomp - Add a poll() operation to acomp_alg and acomp_req > crypto: iaa - Add support for irq-less crypto async interface > crypto: testmgr - Add crypto testmgr acomp poll support. > mm: zswap: zswap_compress()/decompress() can submit, then poll an > acomp_req. > crypto: iaa - Make async mode the default. > crypto: iaa - Disable iaa_verify_compress by default. > crypto: iaa - Change cpu-to-iaa mappings to evenly balance cores to > IAAs. > crypto: iaa - Distribute compress jobs to all IAA devices on a NUMA > node. > mm: zswap: Config variable to enable compress batching in > zswap_store(). > mm: zswap: Create multiple reqs/buffers in crypto_acomp_ctx if > platform has IAA. > mm: swap: Add IAA batch compression API > swap_crypto_acomp_compress_batch(). > mm: zswap: Compress batching with Intel IAA in zswap_store() of large > folios. > mm: vmscan, swap, zswap: Compress batching of folios in > shrink_folio_list(). > > crypto/acompress.c | 1 + > crypto/testmgr.c | 70 +- > drivers/crypto/intel/iaa/iaa_crypto_main.c | 467 +++++++++++-- > include/crypto/acompress.h | 18 + > include/crypto/internal/acompress.h | 1 + > include/linux/fs.h | 2 + > include/linux/mm.h | 8 + > include/linux/writeback.h | 5 + > include/linux/zswap.h | 106 +++ > kernel/sysctl.c | 9 + > mm/Kconfig | 12 + > mm/page_io.c | 152 +++- > mm/swap.c | 15 + > mm/swap.h | 96 +++ > mm/swap_state.c | 115 +++ > mm/vmscan.c | 154 +++- > mm/zswap.c | 771 +++++++++++++++++++-- > 17 files changed, 1870 insertions(+), 132 deletions(-) > > > base-commit: 817952b8be34aad40e07f6832fb9d1fc08961550 > -- > 2.27.0 > >