From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42D7EC4345F for ; Wed, 1 May 2024 21:46:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C1A56B0083; Wed, 1 May 2024 17:46:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 672246B0087; Wed, 1 May 2024 17:46:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 539046B0088; Wed, 1 May 2024 17:46:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 34A876B0083 for ; Wed, 1 May 2024 17:46:36 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 313A680C5A for ; Wed, 1 May 2024 21:46:35 +0000 (UTC) X-FDA: 82071161550.02.4A1D603 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by imf11.hostedemail.com (Postfix) with ESMTP id AB62640006 for ; Wed, 1 May 2024 21:46:32 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GXrsDoJt; spf=none (imf11.hostedemail.com: domain of andre.glover@linux.intel.com has no SPF policy when checking 198.175.65.15) smtp.mailfrom=andre.glover@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714599993; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ZiLOeTK8xrV9FMdciwB2lvZSQqSrRc+7r8jy3eU4Ffo=; b=C1X8UaJ9WXBSaremUyb1TY+EpcmXn9YWemcbJvmVxbpz43bUVcg8NGtHKQH5etYLVpsqdI 7zaVGD8fS12BpDwfZk7DvMFRRJ/aly6k3++5AeMsAs2ueosBDD9AOWiXq9bEFme5tFNpyn aB/ixQrOF2S680Bof24zwGDQw/No73I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714599993; a=rsa-sha256; cv=none; b=LRfQpOoxMfHjujyGacdZtbpus34U/aA1QMtb3L9YZJz5bjqDkTi3GUs9FHKShrB/b53nah +UR1QL4ayqUqTMlIZBlv25Vty4RGFfQC7YA8CjoIf7iQyuPs12Tw17FZlJ0gNWFN8tDS5D lHUOAGSGqH/GO9iLRfWqn0r+3PfGK1M= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GXrsDoJt; spf=none (imf11.hostedemail.com: domain of andre.glover@linux.intel.com has no SPF policy when checking 198.175.65.15) smtp.mailfrom=andre.glover@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1714599993; x=1746135993; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=RO7P29qn1M22iniMK6HK7Pgb3F+22AoEOD8lP3wry/8=; b=GXrsDoJtsHI7fLKvVW4PjmUuwMKq/IMHuk6EUstYHVoUNe2s49LhnU1m lw0Qh1h2UU/zM8KHh/aFBqMIJ+327a9vxXcMD2eIBBDMSsIod3iazq9FV I+C0LIC3FGfO1YCGm9ZfhFmAJGS/vJ3s0unrk8pTwlvAX7Mhym3G1wTcR LTtsmOZNZQFjFu3sXudOUZWAZba3jodqMvwLHQ+PgjS8nHVp2hsLjeJhM 1X70CoDT/9RGXPHo/fM8h3Kve3ED24cfK0hNjyZsndFMP9mkokZBDPIn3 SRnyF3fqc6yq+7twDmu1PyObuQcCxcplnhCf6b4TTkBmYkzilB9Ntkfkp A==; X-CSE-ConnectionGUID: DS31vKRxRxSewAiqGKJ+pQ== X-CSE-MsgGUID: VIaIEId8TO2xPbrQFTiAcw== X-IronPort-AV: E=McAfee;i="6600,9927,11061"; a="14130138" X-IronPort-AV: E=Sophos;i="6.07,246,1708416000"; d="scan'208";a="14130138" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 May 2024 14:46:31 -0700 X-CSE-ConnectionGUID: OSxaXT7pSuqsbWgzOTMAhQ== X-CSE-MsgGUID: sZ1yQv1gTr+bEFQ0mcTVDQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,246,1708416000"; d="scan'208";a="31726356" Received: from jf5300-b11a264t.jf.intel.com ([10.242.51.89]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 May 2024 14:46:30 -0700 From: Andre Glover To: tom.zanussi@linux.intel.com, minchan@kernel.org, senozhatsky@chromium.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, herbert@gondor.apana.org.au, davem@davemloft.net, fenghua.yu@intel.com, dave.jiang@intel.com Cc: wajdi.k.feghali@intel.com, james.guilford@intel.com, vinodh.gopal@intel.com, bala.seshasayee@intel.com, heath.caldwell@intel.com, kanchana.p.sridhar@intel.com, andre.glover@linux.intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, 21cnbao@gmail.com, ryan.roberts@arm.com, linux-crypto@vger.kernel.org, dmaengine@vger.kernel.org Subject: [RFC PATCH 0/3] by_n compression and decompression with Intel IAA Date: Wed, 1 May 2024 14:46:26 -0700 Message-Id: X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: toy3wsr36kchgrcqymdp7a1comgtj37j X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AB62640006 X-HE-Tag: 1714599992-184156 X-HE-Meta: U2FsdGVkX18bTA3C9DNmP8Lf5f54BnZMx6WsGOdfra8sqrEG7SSNJKi02qcIRYtA8JlEc8AHDOHjr56wHxk/MlYhmyFXNXN01vkXNUeGRKSUnP3Iho0bndYLZwyx56tZVR81BvXvJ4K1pElekvNM+1JVFGs0dkPaYcjYSsgTwmIfrhRpspjX+pg62LUlthLXmqz66ITbwoJbRwnBgM91F3kIH5dh9hrZ4NTZtSseSrDFQ0/JGJDMEb7sVqBcISywKdPrDDPlMvlJ0mjj5RuzVWy/xTaCo9dQGS010PlmxVeGaA2jIp6FCAC9j9lM4GOWZ3zQQ0jeJ7232DkvRdi1zN5Sg0le5D6HCF+QGrJRrszI4msV/0TOweTbywXWQsSL4mT8jJeoXspvvHg2g6kd/4eA29H7HP8ztGiLaxvxlNoVRJCpQ5T8LEz533vfGL/SQfqUdXaB8NVS5Ib2QGtxqcV9IzmNZAKdAzRLAUL9pv9BzdgXbekMYJcCtInNGkhcdY3tJaY6Plm1eI+EZF7VP4Zl4acYYVsfG9Fi11hplaXlXZT6Pdgol+/qTul5IqHy3bJgtnNqwu1/XVdMoU1W2TzvIsTonCID+O06VxqshLKW6KP0srjVZ61ny2r1tnVoUn68LEYA8OTX4xIYNGnojuCg6mMuYxOWXtBRCtqdAJl1fw3cvnuifwsyGFup44J4xSM/NU0GaeX0c55t1BKz9SYke1jw9Voagg3JrOxecsy3SwIsU7B+7l66K69HZYc69MRlXioebQTOo86ZSb43Hua31ni0SomCSfCSyPalnwwKdvi1xVpWwL78/u/3z3oMdoz1WxY/lbsG6gDtKKxrvnvPGyPx5iAUQ5bkBr1bGtu5TrQ/6siYjoyDRCiDNCK+UdJazaYKd5EUUktT1R0Woc8/EBhBJ756WUkguUsrMdVcLwLJfRHh1ENWctK8enNDEr5+LB9vH3E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With the introduction of the 'canned' compression algorithm [1], we see better latencies than the 'dynamic' Deflate, and a better compression ratio than 'fixed' Deflate. When using IAA hardware accelerated 'canned' compression, we are able to take advantage of the IAA architecture and initiate independent parallel operations for both compress and decompress. In support of mTHP and large folio swap in/out, we have developed an algorithm based on 'canned' compression, called 'canned-by_n' that takes advantage of the IAA hardware that has multiple compression and decompression engines which creates parallelism for each single compress and/or decompress operation thus greatly reducing latency. When using the 'canned-by_n' algorithm, the user provides an input buffer, an output buffer, and a parameter N. The 'canned-by_n' crypto algorithm compresses (or decompresses) a single input buffer into a single output buffer. This is done in such a way that the compress and decompress operations can be parallelized into up to N parallel operations from a single input buffer into a single output buffer. Usage ===== With the introduction of the 'canned-by_n' algorithm, the user would simply do the following to initiate an operation: struct crypto_acomp *tfm; struct acomp_req *req; tfm = crypto_alloc_acomp("deflate-iaa-canned-by_n", 0, 0); .... // Ignored by non 'by_n' algorithms req->by_n = N; err = crypto_wait_req(crypto_acomp_compress(req), &wait); In the above example, the only new initialization for an acomp_req would be to specify the by_n number N, where N is a power of 2 and 1 <= N <= 64 (64 is the current limit but this can be changed to a greater value based on the hardware capability). Performance =========== 'Canned-by_n' compression shows promising performance improvements when applied to recent patches pertaining to multi-sized THPs in mm-unstable (7cca940d) -- swapping out the large folios and storing them in zram as outlined in [2] and swapping them back in as large folios [3]. Our results with a simple madvise-based benchmark swapping out/in folios comprised of data folios collected from SPEC benchmarks shows an over 16x improvement in compression latency and close to 10x in decompression latency over lzo-rle on 64KB mTHPs. This translates to a greater than 10x improvement in zram write latency and 7x improvement in zram read latency. The achieved compression ratio, at 2.8 is better than that of lzo-rle. These are achieved with 'canned-by_n' compression by_n setting of 8. See table below for additional data. With larger values of N, the latency of compression and decompression drops, due to more parallelism. Concurrently, the overheads also increase with larger N values, and start to dominate the cost after a point. Compression ratio also drops with the increased splitting with larger values of N. Performance comparison for each 64KB folio with zram on Sapphire Rapids, whose core frequency is fixed at 2500MHz, is shown below: +------------+-------------+---------+-------------+----------+----------+ | | Compression | Decomp | Compression | zram | zram | | Algorithm | latency | latency | ratio | write | read | +------------+-------------+---------+-------------+----------+----------+ | | Median (ns) | | Median (ns) | +------------+-------------+---------+-------------+----------+----------+ | | | | | | | | IAA by_1 | 34,493 | 20,038 | 2.93 | 40,130 | 24,478 | | IAA by_2 | 18,830 | 11,888 | 2.93 | 24,149 | 15,536 | | IAA by_4 | 11,364 | 8,146 | 2.90 | 16,735 | 11,469 | | IAA by_8 | 8,344 | 6,342 | 2.77 | 13,527 | 9,177 | | IAA by_16 | 8,837 | 6,549 | 2.33 | 15,309 | 9,547 | | IAA by_32 | 11,153 | 9,641 | 2.19 | 16,457 | 14,086 | | IAA by_64 | 18,272 | 16,696 | 1.96 | 24,294 | 20,048 | | | | | | | | | lz4 | 139,190 | 33,687 | 2.40 | 144,940 | 37,312 | | | | | | | | | lzo-rle | 138,235 | 61,055 | 2.52 | 143,666 | 64,321 | | | | | | | | | zstd | 251,820 | 90,878 | 3.40 | 256,384 | 94,328 | +------------+-------------+---------+-------------+----------+----------+ [1] https://lore.kernel.org/all/cover.1710969449.git.andre.glover@linux.intel.com/ [2] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@gmail.com/ [3] https://lore.kernel.org/linux-mm/20240304081348.197341-1-21cnbao@gmail.com/ Andre Glover (3): crypto: Add pre_alloc and post_free callbacks for acomp algorithms crypto: add by_n attribute to acomp_req crypto: Add deflate-canned-byN algorithm to IAA crypto/acompress.c | 13 + drivers/crypto/intel/iaa/iaa_crypto.h | 9 + drivers/crypto/intel/iaa/iaa_crypto_main.c | 402 ++++++++++++++++++++- include/crypto/acompress.h | 4 + include/crypto/internal/acompress.h | 6 + 5 files changed, 421 insertions(+), 13 deletions(-) -- 2.27.0