From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D42B5D2E002
	for <linux-mm@archiver.kernel.org>; Wed, 23 Oct 2024 00:57:39 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 6A4E66B008A; Tue, 22 Oct 2024 20:57:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 67BD36B00B9; Tue, 22 Oct 2024 20:57:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4CEF26B00BA; Tue, 22 Oct 2024 20:57:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 26D2F6B008A
	for <linux-mm@kvack.org>; Tue, 22 Oct 2024 20:57:39 -0400 (EDT)
Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 7466640660
	for <linux-mm@kvack.org>; Wed, 23 Oct 2024 00:57:29 +0000 (UTC)
X-FDA: 82703053608.06.F3140AD
Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42])
	by imf24.hostedemail.com (Postfix) with ESMTP id 1C1E318000E
	for <linux-mm@kvack.org>; Wed, 23 Oct 2024 00:57:33 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=F12yxnly;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729645005; a=rsa-sha256;
	cv=none;
	b=IUhzrueRaSmNzKA1oqy4lShbkIiPqLQjw4BO3Mc6+ztiXNEhRQginDh3/RJ1ead7g2Xq2R
	EHbBVj/Bcl8I9m3Xop3uevGC5RSM21zvtOpTVZzTw3BMLGEQRDfmJ0iU2d0xrqYaiRVQxe
	pm0b06PjauC0NtDV1ZUO2K6eQ72hGvk=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=F12yxnly;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1729645005;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=MeXSAL+y+3H2wvc+euef7BeJQcQ0H2qAVwYECCHhQc4=;
	b=0TZAUPBoFhPDk7RHL3vCrTDbfsa4XgPbtcFeENTo3g7PnhV4yUtW5Lf44yoL1w4wn2t0OY
	BAu4uKUpEyVHbivOjL8vkTMldV8RfXqOUd7EdXthxkcm/g6xZwhdaxLKy8bF6GP4ls3/2w
	q+ZWmk9H1WaublPNVWjC70MDFrGff24=
Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-539fb49c64aso8775840e87.0
        for <linux-mm@kvack.org>; Tue, 22 Oct 2024 17:57:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1729645055; x=1730249855; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=MeXSAL+y+3H2wvc+euef7BeJQcQ0H2qAVwYECCHhQc4=;
        b=F12yxnlyEBvCvB/k94muLhGeecCzCcDCIE0Vum8lWTaIdKrV9aCYL/Nh8ypdw2gVxl
         pDhvI+QT7tEYvH+ePssUbNnWlGemjZ2hltk8duo4leYzzgEVdguVO3u124v2SVan6DDr
         EYjCzGx/hYwIBH3OMKmQ8+k0Ct3t5OCaKe0/SvT4gVp+p41e5fEMXf1QutWLDEpZDW64
         bc5KK1mhFIGvJoQmJUsDBXTgKYob6gnf8N7VOSGdcitdF5S/uRoEO2pDDrU2jerJoA/Z
         SHViSaXboPtdEsEAQHnxHhkzlRnEuqxkUeesZXcdxjTmdY/OBOD1zTA+ocbGvaHvtarc
         lxNg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1729645055; x=1730249855;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=MeXSAL+y+3H2wvc+euef7BeJQcQ0H2qAVwYECCHhQc4=;
        b=SrokFH2y7LAMMMhJEHg/T4xaSPe2hrF4yWKLYZWc2EQXHvQIYiyJREMh8CHmbNMvdj
         Rg5i51rcmBGfTb0oY9OmcwUNf9IwcoxkvbIBI0dd4syhtqutZdC+N9Mh0UgbXARlZIQj
         PeQH0hPSjH6T6sQm9Q/pqUgqhBUMP62orGVE3DQM6No577onf3+jDsRao6rxl/CQLm1H
         1k3Vez18VsHOCzb4/9qL2HThb4Hx+TojOphwF9RPa5JWlYG2vwELO5n6txCPzay1Jo/B
         MJW7+7TnbF7+dLJA6p1E87adNHAA3QHrFf5IR2zor7z5Zc1qzCSTZJRLVoO/DxIWQ+2P
         tXSQ==
X-Forwarded-Encrypted: i=1; AJvYcCUiMxOOoadNdP22m5wR6xJ5mnqmzHC8o5lEFZL20gChzWuVCVVy1M4G4EwuE+0dYRFwYD/LJfXlrw==@kvack.org
X-Gm-Message-State: AOJu0Yw5tQKGskf/bheI1KuslbHVbFcIeAVopzBEjYHXExPHiXGyG0rW
	qcttYAdE+Iif+UvYKqH9G9GZO+i9pgfNH77FANcL94fOIBsBN8I6h/UVKPS1g0p3ceoL81cHn5U
	Q9g6sSo+JTU40LYybxuHO6ENeoPL8kx/51HgC
X-Google-Smtp-Source: AGHT+IF4UEmrRhjIlbtGWa81MoTbxoRdBd5botAhosqlKJWcq9ld67eM1ZXWrrmloPnHZeSED5Xqm1DxaRCBX2N5r4w=
X-Received: by 2002:a05:6512:10c4:b0:530:b773:b4ce with SMTP id
 2adb3069b0e04-53b1a34e1camr501661e87.33.1729645054540; Tue, 22 Oct 2024
 17:57:34 -0700 (PDT)
MIME-Version: 1.0
References: <20241018064101.336232-1-kanchana.p.sridhar@intel.com>
In-Reply-To: <20241018064101.336232-1-kanchana.p.sridhar@intel.com>
From: Yosry Ahmed <yosryahmed@google.com>
Date: Tue, 22 Oct 2024 17:56:58 -0700
Message-ID: <CAJD7tkamDPn8LKTd-0praj+MMJ3cNVuF3R0ivqHCW=2vWBQ_Yw@mail.gmail.com>
Subject: Re: [RFC PATCH v1 00/13] zswap IAA compress batching
To: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, 
	nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, 
	ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, 
	akpm@linux-foundation.org, linux-crypto@vger.kernel.org, 
	herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, 
	ardb@kernel.org, ebiggers@google.com, surenb@google.com, 
	kristen.c.accardi@intel.com, zanussi@kernel.org, viro@zeniv.linux.org.uk, 
	brauner@kernel.org, jack@suse.cz, mcgrof@kernel.org, kees@kernel.org, 
	joel.granados@kernel.org, bfoster@redhat.com, willy@infradead.org, 
	linux-fsdevel@vger.kernel.org, wajdi.k.feghali@intel.com, 
	vinodh.gopal@intel.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Queue-Id: 1C1E318000E
X-Rspamd-Server: rspam01
X-Stat-Signature: 7anb7tgoehrpnckykz1grw3wnkhhfypi
X-HE-Tag: 1729645053-434280
X-HE-Meta: U2FsdGVkX19096o40ad+DLqO8Lc9pRiev+SlmDg69Qxsl4NG5CN0Y1NQTxM6+jk8xoQj0nuJUATx8b8vDiuKXoJpGQuZLrP6Nz/Ez2+Okp0ULQ1j4+Cv+7SlHoNvHJy1Ug/ujrSjlEzpLW72AddZfjk+Apkw/T+h+IXEsaPEY+Yuk32tJQwO5iS6aC+isJ76vn2zjYqE8h11q1f/T/l9/a66+xpIOzZP3VqmixYF90PsvU3BheGFovVR2hPmG7Hq1oGFPqrQrId9QnGW8bFSvUZncvoMeyczHkULB1Gl/iUtbNbIssYfR/QJRG3sTk7PRzLDhJfCLRXiPx7alOSeS267URj9Pe1vzSpORsClsPdYOYteowi3sH2SjFQ55Xs7ghKgetEc/lgbt+aZhNhFMLPIHN+wGQanJTthzokb//Z2aq7T5zgOBhoSc562vTrrhKS73GOmARF0MMJ3sJwDQQxemVatFBLwo4M71NWn02LJAhry0fqVn8k4v9u61djP5PrOcF2CU6w9YrChxuko+memNO7AExbkCrySITZ1pqar+pc/M6NCuu3ZUEPqnIknr8+kVMfEIOd09zbJ32VHGLjlXiEv3acgxl+9VBRdnnMvZwCrw+Aio4ZehvqjqlkO1o8e3FxQrkzsLnVLpwsvegHey5oDua2RCPhPkKl306bL5bHfNxTrIXq3KNfRqEbGko2FuunNPfO1PZTwalePkUv/eFrU4Dgn0DRcEi0E6EsgloIihU5bc/8lKLAAnYvD62LBM4/s0qKKq5kx/S3YGm4v0LTzQ0YIrolrKTCTgM4TBAE2t1Jr+gmtlqifuKUggob/UeObhAXsdXDpiflY57Eu488GkKrGcstmpZWKae7nK5cdI7ivUD6vg5qOVmSyGeBZC2YkkbpqIlAhqmWBjhgsdCB9Sv7RwmhHL4+S+cgcOc12zFSGYw2exTd9wPsvV5sKTEIRxBzkQH59ull
 hf8bsjVd
 qdb6/dfmYMflEJeEnuKKhsdTnMvHeKOAi6MaZt50ph6To8jv3ylg3HkUt1EbnQsZuHEKB00s6lr+tbRDWu0nddsTXrUScFLfqHI3Kjutull2suoA29ycHYg0pcZ0MMKij9J1AhMc/7bIsfyH0Lfe/IXpPZ/9og2bz3hN+hqSTW9MD+q49UFWoUUfdjV9/X2zkVM8nUBFclsVbByrsLUOtwsNrLdPM6Sh7AS56
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Oct 17, 2024 at 11:41=E2=80=AFPM Kanchana P Sridhar
<kanchana.p.sridhar@intel.com> wrote:
>
>
> IAA Compression Batching:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
>
> This RFC patch-series introduces the use of the Intel Analytics Accelerat=
or
> (IAA) for parallel compression of pages in a folio, and for batched recla=
im
> of hybrid any-order batches of folios in shrink_folio_list().
>
> The patch-series is organized as follows:
>
>  1) iaa_crypto driver enablers for batching: Relevant patches are tagged
>     with "crypto:" in the subject:
>
>     a) async poll crypto_acomp interface without interrupts.
>     b) crypto testmgr acomp poll support.
>     c) Modifying the default sync_mode to "async" and disabling
>        verify_compress by default, to facilitate users to run IAA easily =
for
>        comparison with software compressors.
>     d) Changing the cpu-to-iaa mappings to more evenly balance cores to I=
AA
>        devices.
>     e) Addition of a "global_wq" per IAA, which can be used as a global
>        resource for the socket. If the user configures 2WQs per IAA devic=
e,
>        the driver will distribute compress jobs from all cores on the
>        socket to the "global_wqs" of all the IAA devices on that socket, =
in
>        a round-robin manner. This can be used to improve compression
>        throughput for workloads that see a lot of swapout activity.
>
>  2) Migrating zswap to use async poll in zswap_compress()/decompress().
>  3) A centralized batch compression API that can be used by swap modules.
>  4) IAA compress batching within large folio zswap stores.
>  5) IAA compress batching of any-order hybrid folios in
>     shrink_folio_list(). The newly added "sysctl vm.compress-batchsize"
>     parameter can be used to configure the number of folios in [1, 32] to
>     be reclaimed using compress batching.

I am still digesting this series but I have some high level questions
that I left on some patches. My intuition though is that we should
drop (5) from the initial proposal as it's most controversial.
Batching reclaim of unrelated folios through zswap *might* make sense,
but it needs a broader conversation and it needs justification on its
own merit, without the rest of the series.

>
> IAA compress batching can be enabled only on platforms that have IAA, by
> setting this config variable:
>
>  CONFIG_ZSWAP_STORE_BATCHING_ENABLED=3D"y"
>
> The performance testing data with usemem 30 instances shows throughput
> gains of up to 40%, elapsed time reduction of up to 22% and sys time
> reduction of up to 30% with IAA compression batching.
>
> Our internal validation of IAA compress/decompress batching in highly
> contended Sapphire Rapids server setups with workloads running on 72 core=
s
> for ~25 minutes under stringent memory limit constraints have shown up to
> 50% reduction in sys time and 3.5% reduction in workload run time as
> compared to software compressors.
>
>
> System setup for testing:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
> Testing of this patch-series was done with mm-unstable as of 10-16-2024,
> commit 817952b8be34, without and with this patch-series.
> Data was gathered on an Intel Sapphire Rapids server, dual-socket 56 core=
s
> per socket, 4 IAA devices per socket, 503 GiB RAM and 525G SSD disk
> partition swap. Core frequency was fixed at 2500MHz.
>
> The vm-scalability "usemem" test was run in a cgroup whose memory.high
> was fixed at 150G. The is no swap limit set for the cgroup. 30 usemem
> processes were run, each allocating and writing 10G of memory, and sleepi=
ng
> for 10 sec before exiting:
>
> usemem --init-time -w -O -s 10 -n 30 10g
>
> Other kernel configuration parameters:
>
>     zswap compressor : deflate-iaa
>     zswap allocator   : zsmalloc
>     vm.page-cluster   : 2,4
>
> IAA "compression verification" is disabled and the async poll acomp
> interface is used in the iaa_crypto driver (the defaults with this
> series).
>
>
> Performance testing (usemem30):
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
>
>  4K folios: deflate-iaa:
>  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
>  ------------------------------------------------------------------------=
-------
>                 mm-unstable-10-16-2024  shrink_folio_list()  shrink_folio=
_list()
>                                          batching of folios   batching of=
 folios
>  ------------------------------------------------------------------------=
-------
>  zswap compressor          deflate-iaa          deflate-iaa          defl=
ate-iaa
>  vm.compress-batchsize             n/a                    1              =
     32
>  vm.page-cluster                     2                    2              =
      2
>  ------------------------------------------------------------------------=
-------
>  Total throughput            4,470,466            5,770,824            6,=
363,045
>            (KB/s)
>  Average throughput            149,015              192,360              =
212,101
>            (KB/s)
>  elapsed time                   119.24               100.96              =
  92.99
>         (sec)
>  sys time (sec)               2,819.29             2,168.08             1=
,970.79
>
>  ------------------------------------------------------------------------=
-------
>  memcg_high                    668,185              646,357              =
613,421
>  memcg_swap_fail                     0                    0              =
      0
>  zswpout                    62,991,796           58,275,673           53,=
070,201
>  zswpin                            431                  415              =
    396
>  pswpout                             0                    0              =
      0
>  pswpin                              0                    0              =
      0
>  thp_swpout                          0                    0              =
      0
>  thp_swpout_fallback                 0                    0              =
      0
>  pgmajfault                      3,137                3,085              =
  3,440
>  swap_ra                            99                  100              =
     95
>  swap_ra_hit                        42                   44              =
     45
>  ------------------------------------------------------------------------=
-------
>
>
>  16k/32/64k folios: deflate-iaa:
>  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
>  All three large folio sizes 16k/32/64k were enabled to "always".
>
>  ------------------------------------------------------------------------=
-------
>                 mm-unstable-  zswap_store()      + shrink_folio_list()
>                   10-16-2024    batching of         batching of folios
>                                    pages in
>                                large folios
>  ------------------------------------------------------------------------=
-------
>  zswap compr     deflate-iaa     deflate-iaa          deflate-iaa
>  vm.compress-            n/a             n/a         4          8        =
     16
>  batchsize
>  vm.page-                  2               2         2          2        =
      2
>   cluster
>  ------------------------------------------------------------------------=
-------
>  Total throughput   7,182,198   8,448,994    8,584,728    8,729,643    8,=
775,944
>            (KB/s)
>  Avg throughput       239,406     281,633      286,157      290,988      =
292,531
>          (KB/s)
>  elapsed time           85.04       77.84        77.03        75.18      =
  74.98
>          (sec)
>  sys time (sec)      1,730.77    1,527.40     1,528.52     1,473.76     1=
,465.97
>
>  ------------------------------------------------------------------------=
-------
>  memcg_high           648,125     694,188      696,004      699,728      =
724,887
>  memcg_swap_fail        1,550       2,540        1,627        1,577      =
  1,517
>  zswpout           57,606,876  56,624,450   56,125,082    55,999,42   57,=
352,204
>  zswpin                   421         406          422          400      =
    437
>  pswpout                    0           0            0            0      =
      0
>  pswpin                     0           0            0            0      =
      0
>  thp_swpout                 0           0            0            0      =
      0
>  thp_swpout_fallback        0           0            0            0      =
      0
>  16kB-mthp_swpout_          0           0            0            0      =
      0
>           fallback
>  32kB-mthp_swpout_          0           0            0            0      =
      0
>           fallback
>  64kB-mthp_swpout_      1,550       2,539        1,627        1,577      =
  1,517
>           fallback
>  pgmajfault             3,102       3,126        3,473        3,454      =
  3,134
>  swap_ra                  107         144          109          124      =
    181
>  swap_ra_hit               51          88           45           66      =
    107
>  ZSWPOUT-16kB               2           3            4            4      =
      3
>  ZSWPOUT-32kB               0           2            1            1      =
      0
>  ZSWPOUT-64kB       3,598,889   3,536,556    3,506,134    3,498,324    3,=
582,921
>  SWPOUT-16kB                0           0            0            0      =
      0
>  SWPOUT-32kB                0           0            0            0      =
      0
>  SWPOUT-64kB                0           0            0            0      =
      0
>  ------------------------------------------------------------------------=
-------
>
>
>  2M folios: deflate-iaa:
>  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
>  ------------------------------------------------------------------------=
-------
>                    mm-unstable-10-16-2024    zswap_store() batching of pa=
ges
>                                                       in pmd-mappable fol=
ios
>  ------------------------------------------------------------------------=
-------
>  zswap compressor             deflate-iaa                deflate-iaa
>  vm.compress-batchsize                n/a                        n/a
>  vm.page-cluster                        2                          2
>  ------------------------------------------------------------------------=
-------
>  Total throughput               7,444,592                 8,916,349
>            (KB/s)
>  Average throughput               248,153                   297,211
>            (KB/s)
>  elapsed time                       86.29                     73.44
>         (sec)
>  sys time (sec)                  1,833.21                  1,418.58
>
>  ------------------------------------------------------------------------=
-------
>  memcg_high                        81,786                    89,905
>  memcg_swap_fail                       82                       395
>  zswpout                       58,874,092                57,721,884
>  zswpin                               422                       458
>  pswpout                                0                         0
>  pswpin                                 0                         0
>  thp_swpout                             0                         0
>  thp_swpout_fallback                   82                       394
>  pgmajfault                        14,864                    21,544
>  swap_ra                           34,953                    53,751
>  swap_ra_hit                       34,895                    53,660
>  ZSWPOUT-2048kB                   114,815                   112,269
>  SWPOUT-2048kB                          0                         0
>  ------------------------------------------------------------------------=
-------
>
> Since 4K folios account for ~0.4% of all zswapouts when pmd-mappable foli=
os
> are enabled for usemem30, we cannot expect much improvement from reclaim
> batching.
>
>
> Performance testing (Kernel compilation):
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> As mentioned earlier, for workloads that see a lot of swapout activity, w=
e
> can benefit from configuring 2 WQs per IAA device, with compress jobs fro=
m
> all same-socket cores being distributed toothe wq.1 of all IAAs on the
> socket, with the "global_wq" developed in this patch-series.
>
> Although this data includes IAA decompress batching, which will be
> submitted as a separate RFC patch-series, I am listing it here to quantif=
y
> the benefit of distributing compress jobs among all IAAs. The kernel
> compilation test with "allmodconfig" is able to quantify this well:
>
>
>  4K folios: deflate-iaa: kernel compilation to quantify crypto patches
>  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
>
>  ------------------------------------------------------------------------=
------
>                    IAA shrink_folio_list() compress batching and
>                        swapin_readahead() decompress batching
>
>                                       1WQ      2WQ (distribute compress j=
obs)
>
>                         1 local WQ (wq.0)    1 local WQ (wq.0) +
>                                   per IAA    1 global WQ (wq.1) per IAA
>
>  ------------------------------------------------------------------------=
------
>  zswap compressor             deflate-iaa         deflate-iaa
>  vm.compress-batchsize                 32                  32
>  vm.page-cluster                        4                   4
>  ------------------------------------------------------------------------=
------
>  real_sec                          746.77              745.42
>  user_sec                       15,732.66           15,738.85
>  sys_sec                         5,384.14            5,247.86
>  Max_Res_Set_Size_KB            1,874,432           1,872,640
>
>  ------------------------------------------------------------------------=
------
>  zswpout                      101,648,460         104,882,982
>  zswpin                        27,418,319          29,428,515
>  pswpout                              213                  22
>  pswpin                               207                   6
>  pgmajfault                    21,896,616          23,629,768
>  swap_ra                        6,054,409           6,385,080
>  swap_ra_hit                    3,791,628           3,985,141
>  ------------------------------------------------------------------------=
------
>
> The iaa_crypto wq stats will show almost the same number of compress call=
s
> for wq.1 of all IAA devices. wq.0 will handle decompress calls exclusivel=
y.
> We see a latency reduction of 2.5% by distributing compress jobs among al=
l
> IAA devices on the socket.
>
> I would greatly appreciate code review comments for the iaa_crypto driver
> and mm patches included in this series!
>
> Thanks,
> Kanchana
>
>
>
> Kanchana P Sridhar (13):
>   crypto: acomp - Add a poll() operation to acomp_alg and acomp_req
>   crypto: iaa - Add support for irq-less crypto async interface
>   crypto: testmgr - Add crypto testmgr acomp poll support.
>   mm: zswap: zswap_compress()/decompress() can submit, then poll an
>     acomp_req.
>   crypto: iaa - Make async mode the default.
>   crypto: iaa - Disable iaa_verify_compress by default.
>   crypto: iaa - Change cpu-to-iaa mappings to evenly balance cores to
>     IAAs.
>   crypto: iaa - Distribute compress jobs to all IAA devices on a NUMA
>     node.
>   mm: zswap: Config variable to enable compress batching in
>     zswap_store().
>   mm: zswap: Create multiple reqs/buffers in crypto_acomp_ctx if
>     platform has IAA.
>   mm: swap: Add IAA batch compression API
>     swap_crypto_acomp_compress_batch().
>   mm: zswap: Compress batching with Intel IAA in zswap_store() of large
>     folios.
>   mm: vmscan, swap, zswap: Compress batching of folios in
>     shrink_folio_list().
>
>  crypto/acompress.c                         |   1 +
>  crypto/testmgr.c                           |  70 +-
>  drivers/crypto/intel/iaa/iaa_crypto_main.c | 467 +++++++++++--
>  include/crypto/acompress.h                 |  18 +
>  include/crypto/internal/acompress.h        |   1 +
>  include/linux/fs.h                         |   2 +
>  include/linux/mm.h                         |   8 +
>  include/linux/writeback.h                  |   5 +
>  include/linux/zswap.h                      | 106 +++
>  kernel/sysctl.c                            |   9 +
>  mm/Kconfig                                 |  12 +
>  mm/page_io.c                               | 152 +++-
>  mm/swap.c                                  |  15 +
>  mm/swap.h                                  |  96 +++
>  mm/swap_state.c                            | 115 +++
>  mm/vmscan.c                                | 154 +++-
>  mm/zswap.c                                 | 771 +++++++++++++++++++--
>  17 files changed, 1870 insertions(+), 132 deletions(-)
>
>
> base-commit: 817952b8be34aad40e07f6832fb9d1fc08961550
> --
> 2.27.0
>
>