From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6C52CD3431 for ; Wed, 4 Sep 2024 07:54:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43CCC8D0234; Wed, 4 Sep 2024 03:54:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EEF18D0228; Wed, 4 Sep 2024 03:54:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28CC38D0234; Wed, 4 Sep 2024 03:54:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0AD368D0228 for ; Wed, 4 Sep 2024 03:54:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 84F04C0C52 for ; Wed, 4 Sep 2024 07:54:38 +0000 (UTC) X-FDA: 82526293836.16.E9AD20E Received: from mail-vs1-f47.google.com (mail-vs1-f47.google.com [209.85.217.47]) by imf07.hostedemail.com (Postfix) with ESMTP id BA89B40009 for ; Wed, 4 Sep 2024 07:54:36 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EFcx4muB; spf=pass (imf07.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725436380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XxP8uxBrxXEUMSrhNgWV8NVY8CGnEdtra3EFMhwcZVE=; b=VVR/NU/QX8Mwjz/Ru3siNU0J7urkC3hKzZE7IEJVIDTU847tyC7cAEWQpu4zFp4Ds2Tjxu FWKD7uxQVViLce6Eh0KilF06wPRh4szyWUquzgQjBiBxibwUMptIXci1C006OtdbQ8x2jY JpyH+ekPS9mkMCj6AfZ4MWfyA7YOGmA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725436380; a=rsa-sha256; cv=none; b=IRQclnAHXvBudt5x9/cNDVevNhAdpO6xJa2kXc1kWsRLRLefdyXbWDLcknXebpADIvai7Q FlqZQWC6jAnCYOkberMiNh1nwTG/+0t2ORa8Ugl0SvFclhW3hPsCGRONajyt12zqSyCVs2 0ncNkmJ1yy0Mip4OF0zkRyhLbAfHves= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EFcx4muB; spf=pass (imf07.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f47.google.com with SMTP id ada2fe7eead31-498c3d1d788so1919986137.3 for ; Wed, 04 Sep 2024 00:54:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725436476; x=1726041276; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XxP8uxBrxXEUMSrhNgWV8NVY8CGnEdtra3EFMhwcZVE=; b=EFcx4muBp16vnxTgFXzYvMD/T08mruJeRWcVB3ojXDNnHeYnmrQNNYhA25HB4ghl9p jLvHgiKB2jILwXB81pqIZWdcEXCNhqlfnB7rlZN5qBfrp0qeHeNH8YPEORuyrWh2FXuv inPOSxsDd6szZyVh0LaOo+44+n+Fb/gQs2IUiS+1ZZkYnbkLBcY6gd2AiIWq5qdFl7dy PAZEEu5wsOm5bB022Q/WdDeHnywPzmIewoWR/VJc8Q7vP8y8S1d4XLJM2tGfqZyPjlUu 8FHWmTspHOZVwzacGJBbzn+Zu7k40iLNHqeKeHFjop+SdqpxRRjj3H++ug6D2kdvXBme wjGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725436476; x=1726041276; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XxP8uxBrxXEUMSrhNgWV8NVY8CGnEdtra3EFMhwcZVE=; b=fdAQxcrGeNPt2+EDbFfmLxCSGlYS2KERuIGLhpALplhfj6M2Z6fLyVOJDPpZtzrQoT AcsUa4bnRmHu5VcwrXShGAVmNpn5bGc/LrYXZ3vzayLvnRHQhv/9lYrRVjhwSiQKrfps zbDvjI8koBjmhs6j8GtVX8fQzn0Qi1k9KXbeDzkoss0FkWhCfBcFfCD67p/uyb0Fy+0s 4Q78Do+P7jL99tlvcPalhaDsyNqkMDTVFOr6MZgt5xSPA/58L/uggG8d8iJYkK5W68y5 Ijbx/7oOEIr2ZfSofXw5Bt5xScJ4tpzoV//8uUak+LHLa0d9yS9l8+3TfpbrEuu6WyRM nIRg== X-Forwarded-Encrypted: i=1; AJvYcCUYb6UrhsKRIKeP97PyHw3dsMmyjy4vnh0qokD61f2dl4JytAV4blbwy++MyHgOVfmZ674Gi0S/Ow==@kvack.org X-Gm-Message-State: AOJu0YwFrR0nxHipauvhTxbLc3TI+KKJif+4BED8qP4mq0kA+EQBz5vV VBZhJ/YWdKgVCBvCnx2K6eKSOGuWsAbVxqTOIc+fX5XLPFs6VrMVwaacwXSbtx7xEfDa6uWV/Gl RbuXojOhH4WsAzojDgiXtAgkmTPg= X-Google-Smtp-Source: AGHT+IGj3e9LihOyTv2RTKK+ffTsTHLy4Q57XPgdreaMh7p6JHnGSNd1hg/5aRjmybMIvr0brZIQFjh24MDeXcupEtk= X-Received: by 2002:a05:6102:c46:b0:493:eebc:d77d with SMTP id ada2fe7eead31-49a779b68eamr10223208137.15.1725436475730; Wed, 04 Sep 2024 00:54:35 -0700 (PDT) MIME-Version: 1.0 References: <20240612124750.2220726-2-usamaarif642@gmail.com> <20240904055522.2376-1-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 4 Sep 2024 19:54:24 +1200 Message-ID: Subject: Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap To: Yosry Ahmed Cc: usamaarif642@gmail.com, akpm@linux-foundation.org, chengming.zhou@linux.dev, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, nphamcs@gmail.com, shakeel.butt@linux.dev, willy@infradead.org, ying.huang@intel.com, hanchuanhua@oppo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 5b9jfnfdikax4itidfh5rn1er8yip88z X-Rspamd-Queue-Id: BA89B40009 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1725436476-670835 X-HE-Meta: U2FsdGVkX19H0IHW9S0VSCKyIXRrIz2OTsDi9Jp9CGZPsOn0FqF44vXPrHBf8LMbR+rggaTbN8949qqFVbqmcjY1svcjapQdRJDXfpbJ1jhR7S/t88bOneO4xsA3CzZCnUljN0dM9gabnYZzr1xxiW6N9XXmq6HVgTDAXd2vTEv1WbqyAcehOtQX9nRoj67slTCCLIExEkf6i/oHyQpn6cKw/ZEVv4SVdLgWTWJ9rcvzF1c6+TIFp63HpSoG8LPEZoLi5WjQSoB/+HmI/4NkcAf2hbd2MIvVGifySEAAj2XlTlipzmSDkG5lkvV0OggAcS5C/YogiJ3Uo2569Mj5IeaDp9+aIjTGkG3Q+svWFmV8MvUkK6MSDaeUJIKV2PCVgqT+6uQ0CW4eZgOX4spnLZh/jXej1qOAFE216AvFcnYocD2mD1NClfJwK2vAsryyv1XSY+w2JMaL+hVzNFZ/CbrjW17+OFszsl5wCP0QJ3BudRhL+kTtpDaaW744l6ItFakOHhM5xw6EwTJDXqsbWvOK0UspYcovJsFjGhEbVd+THq4QO3ww0WxLHorCur+jlmIs6jdzJFjSefjuG+hZdxXnq+89PbViWCVeiBZ9S9IxJFZsL5U7kfcwT4DYb5bHMvvX3+seyOnOerttzbtoqsH7ebRsVdnQOn8BzelHpMa6HT4SWeQ7OkoPOg5G2Outw0UE93sEPt0SkpC2iqi+fzaU5z+7o25LI4thP14kMKQo9WzkJ1w6U/QKt9sqAe7dphl9hp1GXQfCU6ZHuk8/RT0W78m32OKz1RBn5ectN5R79VR4LmkhM3VDg1NWmCW+HrsJJcM2nGk5e/HoGJG8uDDWOOJKiysJm70VXOhBaQrBn8QFcIVosOyAZKIv+pIQuv4JycP5BXm1rsJVSz0YLB3JTX6MZvREth764VmLzjLiyNzLBmTYwQwOiCT3fFqaC3pXTuHg6aDyF0uaOYd 0OIyyLO2 ybpBi34Kn30CO3pmG2p49UFwKFreBEO8moTBnF/SvALQ36pZ9usDKQyIBDqm9vbnY+83aeVWozGpSQRHICcOMJ+QuPApYQVGALfZ99iaFE3Nj+8+MR/ZicDXhPCLPXYXEgJeb94rDO+W5jsCKeqZrYHUOktwSKEK1DElUj9Xee1HthAWsI5RnFBqOea0qd0SF7Ro2x1UkgFJ2njCDUTZHwHNvgltvqmHHRZaAaPWUqr1XXNTFyz0wCboMojnk2xZcHjTTIRRiPGwvl7HKoPmZgxRHysqUCtu+CbehfttOjMHws3DthXsznenmIUAbfb2Te6dX10KynGIZgh7TxfEy8EadpDkKZKWSqbgmFw4M8cgKTNCpY9/7oPBkCg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 4, 2024 at 7:22=E2=80=AFPM Yosry Ahmed = wrote: > > On Wed, Sep 4, 2024 at 12:17=E2=80=AFAM Barry Song <21cnbao@gmail.com> wr= ote: > > > > On Wed, Sep 4, 2024 at 7:12=E2=80=AFPM Yosry Ahmed wrote: > > > > > > [..] > > > > > @@ -426,6 +515,26 @@ static void sio_read_complete(struct kiocb *= iocb, long ret) > > > > > mempool_free(sio, sio_pool); > > > > > } > > > > > > > > > > +static bool swap_read_folio_zeromap(struct folio *folio) > > > > > +{ > > > > > + unsigned int idx =3D swap_zeromap_folio_test(folio); > > > > > + > > > > > + if (idx =3D=3D 0) > > > > > + return false; > > > > > + > > > > > + /* > > > > > + * Swapping in a large folio that is partially in the zer= omap is not > > > > > + * currently handled. Return true without marking the fol= io uptodate so > > > > > + * that an IO error is emitted (e.g. do_swap_page() will = sigbus). > > > > > + */ > > > > > + if (WARN_ON_ONCE(idx < folio_nr_pages(folio))) > > > > > + return true; > > > > > > > > Hi Usama, Yosry, > > > > > > > > I feel the warning is wrong as we could have the case where idx=3D= =3D0 > > > > is not zeromap but idx=3D1 is zeromap. idx =3D=3D 0 doesn't necessa= rily > > > > mean we should return false. > > > > > > Good catch. Yeah if idx =3D=3D 0 is not in the zeromap but other indi= ces > > > are we will mistakenly read the entire folio from swap. > > > > > > > > > > > What about the below change which both fixes the warning and unbloc= ks > > > > large folios swap-in? > > > > > > But I don't see how that unblocks the large folios swap-in work? We > > > still need to actually handle the case where a large folio being > > > swapped in is partially in the zeromap. Right now we warn and unlock > > > the folio without calling folio_mark_uptodate(), which emits an IO > > > error. > > > > I placed this in mm/swap.h so that during swap-in, it can filter out an= y large > > folios where swap_zeromap_entries_count() is greater than 0 and less th= an > > nr. > > > > I believe this case would be quite rare, as it can only occur when some= small > > folios that are swapped out happen to have contiguous and aligned swap > > slots. > > I am assuming this would be near where the zswap_never_enabled() check > is today, right? The code is close to the area, but it doesn't rely on zeromap being disabled. > > I understand the point of doing this to unblock the synchronous large > folio swapin support work, but at some point we're gonna have to > actually handle the cases where a large folio being swapped in is > partially in the swap cache, zswap, the zeromap, etc. > > All these cases will need similar-ish handling, and I suspect we won't > just skip swapping in large folios in all these cases. I agree that this is definitely the goal. `swap_read_folio()` should be a dependable API that always returns reliable data, regardless of whether `zeromap` or `zswap` is involved. Despite these issues, mTHP swap-in should= n't be held back. Significant efforts are underway to support large folios in `zswap`, and progress is being made. Not to mention we've already allowed `zeromap` to proceed, even though it doesn't support large folios. It's genuinely unfair to let the lack of mTHP support in `zeromap` and `zswap` hold swap-in hostage. Nonetheless, `zeromap` and `zswap` are distinct cases. With `zeromap`, we permit almost all mTHP swap-ins, except for those rare situations where small folios that were swapped out happen to have contiguous and aligned swap slots. swapcache is another quite different story, since our user scenarios begin = from the simplest sync io on mobile phones, we don't quite care about swapcache. Thanks Barry