From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AE4DCF9C6F for ; Mon, 23 Sep 2024 16:54:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF82B6B007B; Mon, 23 Sep 2024 12:54:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA7476B0083; Mon, 23 Sep 2024 12:54:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6F826B0089; Mon, 23 Sep 2024 12:54:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A90116B007B for ; Mon, 23 Sep 2024 12:54:27 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2ED7412141F for ; Mon, 23 Sep 2024 16:54:27 +0000 (UTC) X-FDA: 82596601374.10.EB8E0CE Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf18.hostedemail.com (Postfix) with ESMTP id 533651C0019 for ; Mon, 23 Sep 2024 16:54:25 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gTICCAcj; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727110308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XETdMacu+o7N4b4ReIdSaCukIfjrx18Z91S0X8y5fAA=; b=10YJDEvt+FNVzh79TJczaw87NoU0LZA2Ta3CorRtfdbz/aR+tGbl544afOziqUIPnniTkJ T6iV/xo0Ti4bwZF1IYBQZNKdW5q9R+o7/md9C25jWaoyXkMpXVS3UfCWe/A/pYkwr1H3Qm AqgczkCZcuvau2ckMoXe5gA3wmJIuPU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gTICCAcj; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727110308; a=rsa-sha256; cv=none; b=gaqHfkVAfIBMctbCbaSc5aBUFgo6+kKlxZGd2M1CsPEV3nNE+yX/TewRdphH8Pd/tN5gfr 7WGxEfQJUjoWMIqt4nwpmI9Z0RL3irXlcImv+q3T1JsW7OkZ1QMQdNdVW+oQwzQLmNYD4k tyeeG/0BTi+3GIBg6+eWDWjU0ZQnOFA= Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a8a6d1766a7so651376366b.3 for ; Mon, 23 Sep 2024 09:54:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727110464; x=1727715264; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XETdMacu+o7N4b4ReIdSaCukIfjrx18Z91S0X8y5fAA=; b=gTICCAcjaY3/HNGCpDeOVQYAnmbWdBHmpckOCmy18aKh6qtB3UBTEe59cPVI8NHwrz zNfH701yor/82nwshUyv1qHgmBjwtVjfcdVGjslXaRJmeC3GZv/Af/6ALMzXFzKRdg7J N5g3oa/87EcLBbWNiB1Cd+zPkoI+k7HyVgcoXcn5qz8shupy7rvoZ6rMF09UUmhWYma4 CZb2ZW8HNleI6gsKvZs2MCUsN2D0J4y6T85cmjoTktfYGEggLFTFHZO9f7SVQ9ov7Ouj BM8OpTm9yqeB7iPqPzPl2OVaygUM+cxuhDArkBzU2g8Lvz9psy0HsQ+Hp1jwPI+r0IMN 0aaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727110464; x=1727715264; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XETdMacu+o7N4b4ReIdSaCukIfjrx18Z91S0X8y5fAA=; b=Pjl0SG7YD8HxsaIbgS5dYLqI/mbzNuMsVakQDm5PSCTCxHQnFphuL79DJmh3BL5qIR Gc8fCyim+Eqh2JzSiZrsNfgFXXcBiQYNoiFmQRrlvTIpQhpjM6Sz/O9bL7wBfVki985j Sgqcsi0O0+l9rfN6uV+DT4I/qoR+/nPHBFirRqiIh9eaalKe8a8QwMj7JKgJc3/TZL+V d3/OvMLvB3mcBRNNCUHAUT1qQ53yfx7Og8hUSyCfbGPte+moJbHQ09uFwIErA0/Qpkhe CdBIPMnXQ9yaYp567X6u248k4+4mbfNVchwJJGk6cMJu7r1FGZzB1RlVskVMB3t1vRw+ ZjHQ== X-Forwarded-Encrypted: i=1; AJvYcCUX2GQKnt7cRVmznECrTy4FEXDkM/dWQZ7n+ZcyRIJvw/P11PEU/7+U/jEfX4UlbJjuEhVLaPzAiQ==@kvack.org X-Gm-Message-State: AOJu0Yy36ywRRuJ6rkio5HbuwhlZowzYGr4Y3YZ5+LmEBTI6WkTxLiK0 /uZ8LZzhWmZrNKxyjInKS07ANIggx6eBBww/PvEgLQ5Nw97yeC5wkLBT0+Sy9i+4RUj6mfk1866 MQKkrRgcd4tEmckNZEVnC2ccLf74Ju1tloEZO X-Google-Smtp-Source: AGHT+IHHD2kds/l82L6lPQGpHZhOksbBaHj/dSlQ8/ggb1gxmfLehhBmB+vMGUP6geJ9I1EkkMafwixx7zrfQmTVWEc= X-Received: by 2002:a17:906:bc1b:b0:a8d:75ab:17ca with SMTP id a640c23a62f3a-a90d5003ff0mr1178368366b.31.1727110463388; Mon, 23 Sep 2024 09:54:23 -0700 (PDT) MIME-Version: 1.0 References: <20240903130757.f584c73f356c03617a2c8804@linux-foundation.org> <94eb70cd-b508-42ef-b5d2-acc29e22eb0e@gmail.com> <2c418b81-8f67-4a45-b4d2-d158fa4f05d9@gmail.com> <20240923121041.GB437832@cmpxchg.org> In-Reply-To: <20240923121041.GB437832@cmpxchg.org> From: Yosry Ahmed Date: Mon, 23 Sep 2024 09:53:45 -0700 Message-ID: Subject: Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices To: Johannes Weiner Cc: Usama Arif , Barry Song <21cnbao@gmail.com>, Andrew Morton , Kairui Song , hanchuanhua@oppo.com, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hughd@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, hch@infradead.org, Hailong Liu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 533651C0019 X-Stat-Signature: 8rgf4e8o19p9phjsameripoytdxrgqz8 X-Rspam-User: X-HE-Tag: 1727110465-68487 X-HE-Meta: U2FsdGVkX1+ku1DVRvaf5jsWmRke42CeP/moNozeoW5H0wq5DOOBhx9PQnU6KXxn8G/nasfxgDKT1CPmsF4r73TkxDBBltrWhhkXK4ZRHHUfsGKNrHSCb4mACwKb1TbQu1KNUK1OAjuYV7MSfFBYzCMt5ufnuf1Z5GbdZqWzkCYvWNIFjsMwWHmKWdwuqEkI0MoW3885IPfHptujs4JSNBlOYQU/EaI4Bj5GqrDzKKdMA3C/hhzkyZ5c+g7EOABNifRysDNuRh/5iObTRTTDBta9LHYFWCSRS+Qeb8MILX7VgB0NNBnzzQp3pD+pHa/m4oc1v33nHN5JtIAbQdeJ6cyd79bWw73rLNadXjns14m7WgPqcitGOMHLej0ONtntkrnWy2R+/gJ8R8kFoX7S+5o5hMTx2tngQYw4z0fNhd2vnGEVzbIbBVd9ic8xsgmFAB3rMoS7yKu2ipuZHzKuw1SS39O6Bfoq48VYGa05R/9E9bqwXtxmUIUwxVOIWIgk7V5lCB2s/B7I0HS1vf03t+NmWpnIN5jjU71XrH6zbxWBvG1oNlwyijHs8kT+HUtfZY/T9hRtHFoX8I8SQDkju3IXpuc1zARa4c3F93ZVLbheCHMvsaHyDFfAFU87AbsWTv/P35LzCG/N9ylWRDKRDhjQ9Vx40H0ZsdlFMmLcjjpWKmTzmjzLrjzGfm5kx5tCjHU6hK3qkhaxTReJheWoqjv6Y1HgGHVGO3ryB6FOuyW3JAAwmG7+HLExDTNgJaPY7pCSmDyABL4pv1ADSOiPqWFF2V75CyDjKvE8Rsx1cmZLvN27d8nYIC8Xyn8Ck4RsuCIIiggAb3mITRggVuCFOiaQ0wN9s/h+EG1IXuR6n9WxoHaZVRta9KAe9/zm2/N/7egsnRxo4xNJMB7aDwbYlQspNX3MuKRK0gmpPkT2jgB0b9sl93+78AgRLenkmpDXKFFOhT/YU0V/5RDOwqX Detrbsgv 6OevWSFluRgyZingY8kY9vJh5V2DLDcpw3y9aVWUCa1kQAwNnzWqd7VxCpnNuuw8xZxPGROoQPknYB0nfqv7UsW/x+2DrnsxOWx3JOkNQ+DhyC3Dd83QFU85d3VDhz0Pwpa0hudZ+gi/az0xFrUyXSfLfqQe5ksHKGJCDq3U2YuBsjQGwyi3XgMw0qDVsShF1Rr9ZEpKNsm/tEEIcWHo3nh8sgCf/RDwlgf/yWAmlLDRdC7syjEdOivXxc82mPWEZNQEuN1TBsrSPimDDLOmUwpCRrJHXtaTnOBoeR1T5j2WO5UwfidAkijKqUKHfuvdp4vaCbw4PbyRIQE8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 23, 2024 at 5:10=E2=80=AFAM Johannes Weiner wrote: > > On Mon, Sep 23, 2024 at 11:22:30AM +0100, Usama Arif wrote: > > On 23/09/2024 00:57, Barry Song wrote: > > > On Thu, Sep 5, 2024 at 7:36=E2=80=AFAM Yosry Ahmed wrote: > > >>>> On the other hand, if you read the code of zRAM, you will find zRA= M has > > >>>> exactly the same mechanism as zeromap but zRAM can even do more > > >>>> by same_pages filled. since zRAM does the job in swapfile layer, t= here > > >>>> is no this kind of consistency issue like zeromap. > > >>>> > > >>>> So I feel for zRAM case, we don't need zeromap at all as there are= duplicated > > >>>> efforts while I really appreciate your job which can benefit all s= wapfiles. > > >>>> i mean, zRAM has the ability to check "zero"(and also non-zero but= same > > >>>> content). after zeromap checks zeromap, zRAM will check again: > > >>>> > > >>> > > >>> Yes, so there is a reason for having the zeromap patches, which I h= ave outlined > > >>> in the coverletter. > > >>> > > >>> https://lore.kernel.org/all/20240627105730.3110705-1-usamaarif642@g= mail.com/ > > >>> > > >>> There are usecases where zswap/zram might not be used in production= . > > >>> We can reduce I/O and flash wear in those cases by a large amount. > > >>> > > >>> Also running in Meta production, we found that the number of non-ze= ro filled > > >>> complete pages were less than 1%, so essentially its only the zero-= filled pages > > >>> that matter. > > >>> > > >>> I believe after zeromap, it might be a good idea to remove the page= _same_filled > > >>> check from zram code? Its not really a problem if its kept as well = as I dont > > >>> believe any zero-filled pages should reach zram_write_page? > > >> > > >> I brought this up before and Sergey pointed out that zram is sometim= es > > >> used as a block device without swap, and that use case would benefit > > >> from having this handling in zram. That being said, I have no idea h= ow > > >> many people care about this specific scenario. > > > > > > Hi Usama/Yosry, > > > > > > We successfully gathered page_same_filled data for zram on Android. > > > Interestingly, > > > our findings differ from yours on zswap. > > > > > > Hailong discovered that around 85-86% of the page_same_filled data > > > consists of zeros, > > > while about 15% are non-zero. We suspect that on Android or similar > > > systems, some > > > graphics or media data might be duplicated at times, such as a red > > > block displayed > > > on the screen. > > > > > > Does this suggest that page_same_filled could still provide some > > > benefits in zram > > > cases? > > > > Hi Barry, > > > > Thanks for the data, its very interesting to know this from mobile side= . > > Eventhough its not 99% that I observed, I do feel 85% is still quite hi= gh. > > Would it be possible to benchmark Android with zram only optimizing > zero pages? > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.= c > index c3d245617083..f6ded491fd00 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -211,6 +211,9 @@ static bool page_same_filled(void *ptr, unsigned long= *element) > page =3D (unsigned long *)ptr; > val =3D page[0]; > > + if (val) > + return false; > + > if (val !=3D page[last_pos]) > return false; > > My take is that, if this is worth optimizing for, then it's probably > worth optimizing for in the generic swap layer too. It makes sense to > maintain feature parity if we one day want Android to work with zswap. I am not sure if it's worth it for the generic swap layer. We would need to store 8 bytes per swap entry to maintain feature parity, that's about 0.2% of swap capacity as consistent memory overhead. Swap capacity is usually higher than the actual size of swapped data. IIUC the data you gathered from prod showed that 1% of same filled pages were non-zero, and 10-20% of swapped data was same filled [1]. That means that ~0.15% of swapped data is non-zero same-filled. With zswap, assuming a 3:1 compression ratio, we'd be paying 0.2% of swap capacity to save around 0.05% of swapped data in memory. I think it may be worse because that compression ratio may be higher for same-filled data. With SSD swap, I am not sure if 0.15% reduction in IO is worth the memory overhead. OTOH, zram keeps track of the same-filled value for free because it overlays the zsmalloc handle (like zswap used to do). So the same tradeoffs do not apply. Barry mentioned that 15% of same-filled pages are non-zero in their Android experiment, but what % of total swapped memory is this, and how much space does it take if we just compress it instead? IOW, how much memory is this really saving with zram (especially that metadata is statically allocated)?