From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0368ED3C525 for ; Thu, 17 Oct 2024 16:48:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 756406B0088; Thu, 17 Oct 2024 12:48:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 705FE6B0089; Thu, 17 Oct 2024 12:48:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A72F6B008A; Thu, 17 Oct 2024 12:48:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3B9236B0088 for ; Thu, 17 Oct 2024 12:48:35 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CA4BF120163 for ; Thu, 17 Oct 2024 16:48:24 +0000 (UTC) X-FDA: 82683677328.08.6B9743A Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf11.hostedemail.com (Postfix) with ESMTP id 3283C4001F for ; Thu, 17 Oct 2024 16:48:20 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="PJ+28/Ou"; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729183568; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UNBXnF8EkGs9XWZGSpqQKc9Q7hAXIsj1MCkuRykAZ18=; b=fxTFOrw1Ftln0IwrKYZTVx1EWRP5mMYCDSPiXvhesTcyNOWQ/v8IWf/DE7RltzYDNVCeqz GjRCPXqjjz592JRKhbITQZ0q744S2Nc+ej4k+e1lX9D2md/r4l/QRCrCN9MruXbkKakm2N wpGj2CxOTt9dlBAfSMxdjbrPGgUPuOg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729183568; a=rsa-sha256; cv=none; b=u+aCoYonlegICuDn1RodBVH+MZgerifseQEllKhkKHtmHHpqb5qt5laItAXhscqCQKPSLQ iULhLpW+sSOnsoLRfF7x+L1z1R/VfHk2SYDxnq2V+xuNyHe9c8ZAIh1a1KUE1w0c1ohYyo lW0R3NVTT99RqZCu5mNJXuT0i0ucOX4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="PJ+28/Ou"; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-2fb58980711so15166411fa.0 for ; Thu, 17 Oct 2024 09:48:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729183711; x=1729788511; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UNBXnF8EkGs9XWZGSpqQKc9Q7hAXIsj1MCkuRykAZ18=; b=PJ+28/OucBYP9TCy7CeEU1AR9PUAfGBnhtiVXr3zp7S85IUW1nElcVHMD9lCLtR7TV LqhOxKz+3R09gUmEJNeaPGUfcwffXNkZJx92wV1wCN3EPvSuUbGzSoVuodWj6CgqvO4R 0zJS77kHTQwV561Cx9NV2K7B3wMQ+LoQRvmBcZ1RxEuXSn51MjjqEO5YHnYNR7pTdEEI xBtOtm3xQLFHC3cw5EmYLUd/8A3j46k8js0eT4vqA/RXKnEvwn+J94fkHkVojfFwlLfX 5Tr6kl+Or/aj2YG4o8L3X9elCvsYJqI0u8s8OeSUiWSmGtG1UeYrIflo+dg9dn5GiAjW cxXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729183711; x=1729788511; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UNBXnF8EkGs9XWZGSpqQKc9Q7hAXIsj1MCkuRykAZ18=; b=t1Gg9JTZtLb3m79QFkTknJpOb1klfFQxVXWSSV8Cx3NUacLaVMZHzC4zdik8eEJ2iB 4ahkxUUeaZXG+GbuNTazWaYCBHJlln8Qq1jZrFPhSwBM0Lta5cgu5EWIzDrpm7xmlDT6 j0l9gJfH+Fla7fSqkXZGJIc4XPXkv2m0eTaOL1SmsrRyx4YokeF47ksOAWYtBs9l6n1u fWPizWLs3jGv0gfxY56XWwH5zaxSFdT8EIpytdcFM7k2Etn1Wddazx89OXVp1DYr/k4Q wIWWz146/5UAiuztMG660eH8Ae141vlp+ji3wSMxPfF1+R7GoO/ItieoMjYn0mW/dNuM sliA== X-Forwarded-Encrypted: i=1; AJvYcCWdSGp8VB4OVJdDxZLPGNlX9S4ieZsYzMiWuLdXX85vHpZI+vZi3BQct3Zinfwu31bh9zgjc9CUHA==@kvack.org X-Gm-Message-State: AOJu0YzkRzdVP/SNjAMnuYrXB7dwV9hIwKzhAtAWILJNapGZQMHtGWPX Q8VceXExkfZSQV3igEqAtEjqYe5FJbRsKPvnKrtltm6CNaJbX/Q/o20sLUbWhyYbxw7fyFSVUDc B1MFs2g6SHrM1CPY8yENnWBSEuho= X-Google-Smtp-Source: AGHT+IEJsS4W5f1GczZB+/LvZjg6YfvKNdugAQyCCU/NTwLHwhbwS6l83b1W/j4XP1p9dBB5Y5YkawAzZ2DFHu7MyNw= X-Received: by 2002:a2e:b88b:0:b0:2fa:de52:f03c with SMTP id 38308e7fff4ca-2fb61b3e705mr58631141fa.5.1729183710963; Thu, 17 Oct 2024 09:48:30 -0700 (PDT) MIME-Version: 1.0 References: <2b3572e1-a618-4f86-979d-87f59282fe8f@linux.alibaba.com> In-Reply-To: <2b3572e1-a618-4f86-979d-87f59282fe8f@linux.alibaba.com> From: Yang Shi Date: Thu, 17 Oct 2024 09:48:19 -0700 Message-ID: Subject: Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance To: Baolin Wang Cc: Matthew Wilcox , akpm@linux-foundation.org, hughd@google.com, david@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3283C4001F X-Stat-Signature: e6rtwmzeeenamxkedhoofpsa48n6hsau X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729183700-324554 X-HE-Meta: U2FsdGVkX18Kt6jIRwJtTvH87rjWDgZj5f0yiop2EwJZp8gCfwcJZh7NP31W1tLoLJoa7NqzDf7Y4EmIRKpwDO1MHgW4k+EDh/wV3I2hjyNP71Fr1/Ql5v7hSAL6s5qRUfpwPhX9pg3zz5KOPZZTocVhcxPmKDktwN4Cv7YINzhYVN5Y1P8qUL8OrTqXONZ07dUENw0XXFxNcxcy6uj64gSerGFRtvNTLi45jYLFp5NxSFk4iGsGSaP6UHv80DAdgOgmqz0JI0A7jN320D7dpjoLZ2wDKusKwj0uoAE+Ab1KLe9iYUMsAHRROE7Gocb3frF4R5SNx3PQeNA+3wXoF9BbPFVAM133D4MeItlJA4HhfohGOPngAHuS70LU9jerjm2RHdrH7NMyydUr+JcZ4uVPVPKyMbMR6FXPqY3gEz461wCzOi2oWzeUlBKoIacoG0RVf53KM7Zht+WjcOWdfYslmqUsVpp/XAJ3GaGvf3SrfveAGDk7TbrnMp9+BlPeop5z1J6z5WoSsJCTIZ4sOhA8d47OuGnZ1aiakOboQJuO6+U1Wpk/PmgduQc2rAe3udtdCB4fcOT1Qx8qJvxoEVxVyiy3buG2vvqh1TrPXwcyxGQc/VBd8Bbu1GLaFY6GKAjB4v++felmpP8MY3xb5ioRn3P0p1jBb6aXZYmuiStMJPkDFdpcDAcY8R4wk8iFvVSVEppg72qRUzD87WhCEWeyYNbqVMkYv4MtVAmDgWrTwzpVsyib+v0p7jTupcbT7qd9DPFWRy0we4RoUxc6Fvvr23ltOaIe+zquBenDr9pA+hbC9LRBe7ZP2w0VBDJSoa2ec8XllC2JblQxuKoSO6OW9ETZiAnesrj92fDBCSm33+1Mz8ttGEbn9mqtjCrnSfWFbvfcImLwLbbGQP3muKCCjJOyvxG6TIZkQptHxYV5e8/ztwHqVomaikEUScZuWlpV7xpt8qU1FmsS0L3 KPVuPPQh mEESRvKvZHdkFOGQozRnYkHwr29/iJ0O0KJXY8V+u2lzw4QFLAIBnxFNLFWfX5eK+rZB3eayMJswsInltlfTrmVz1gZYCSEjyGgQU2Md/vicR00NIv/Lih7b9c3wD59kfl9VMQ/ZkuObzvf/tpfcqMjCN2KIanMJ6bCDY6F1rgZfHRpXbTJc5bGzxJOkhfceelhqVl+bwOycndsYLMPnCU8DL/6trpYhA3cDX5Aq3tliYgVAova3RRsDOWnm8RJgc3gNvblqzbhIlvroSIfCauu3PdHd2r2pE1/HKllio4d8UUQytAOxuNqkyaoIN9ZLAzKnJr1LTvLE6uGi8lVd7hOiA2qabPcy/7YJm+2UcvthvtolBJY0aqBa+rw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 16, 2024 at 8:25=E2=80=AFPM Baolin Wang wrote: > > > > On 2024/10/17 01:33, Yang Shi wrote: > > On Wed, Oct 16, 2024 at 8:38=E2=80=AFAM Matthew Wilcox wrote: > >> > >> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote: > >>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kioc= b *iocb, struct iov_iter *to) > >>> if (folio) { > >>> folio_unlock(folio); > >>> > >>> - page =3D folio_file_page(folio, index); > >>> - if (PageHWPoison(page)) { > >>> + if (folio_test_hwpoison(folio) || > >>> + (folio_test_large(folio) && > >>> + folio_test_has_hwpoisoned(folio))) { > >> > >> Hm, so if we have hwpoison set on one page in a folio, we now can't re= ad > >> bytes from any page in the folio? That seems like we've made a bad > >> situation worse. > > > > Yeah, I agree. I think we can fallback to page copy if > > folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page. > > > > The folio_test_has_hwpoisoned is kept set if the folio split is failed > > in memory failure handler. > > Right. I can still keep the page size copy if > folio_test_has_hwpoisoned() is true. Some sample changes are as follow. > > Moreover, I noticed shmem splice_read() and write() also simply return > an error if the folio_test_has_hwpoisoned() is true, without any > fallback to page granularity. I wonder if it is worth adding page > granularity support as well? I think you should do the same. > > diff --git a/mm/shmem.c b/mm/shmem.c > index 7e79b6a96da0..f30e24e529b9 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3111,9 +3111,11 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > > for (;;) { > struct folio *folio =3D NULL; > + struct page *page =3D NULL; > unsigned long nr, ret; > loff_t end_offset, i_size =3D i_size_read(inode); > size_t fsize; > + bool fallback_page_copy =3D false; > > if (unlikely(iocb->ki_pos >=3D i_size)) > break; > @@ -3127,13 +3129,16 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > if (folio) { > folio_unlock(folio); > > - if (folio_test_hwpoison(folio) || > - (folio_test_large(folio) && > - folio_test_has_hwpoisoned(folio))) { > + page =3D folio_file_page(folio, index); > + if (PageHWPoison(page)) { > folio_put(folio); > error =3D -EIO; > break; > } > + > + if (folio_test_large(folio) && > + folio_test_has_hwpoisoned(folio)) > + fallback_page_copy =3D true; > } > > /* > @@ -3147,7 +3152,7 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > break; > } > end_offset =3D min_t(loff_t, i_size, iocb->ki_pos + > to->count); > - if (folio) > + if (folio && likely(!fallback_page_copy)) > fsize =3D folio_size(folio); > else > fsize =3D PAGE_SIZE; > @@ -3160,8 +3165,13 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > * virtual addresses, take care about potential > aliasing > * before reading the page on the kernel side. > */ > - if (mapping_writably_mapped(mapping)) > - flush_dcache_folio(folio); > + if (mapping_writably_mapped(mapping)) { > + if (unlikely(fallback_page_copy)) > + flush_dcache_page(page); > + else > + flush_dcache_folio(folio); > + } > + > /* > * Mark the page accessed if we read the beginni= ng. > */ > @@ -3171,7 +3181,10 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > * Ok, we have the page, and it's up-to-date, so > * now we can copy it to user space... > */ > - ret =3D copy_folio_to_iter(folio, offset, nr, to)= ; > + if (unlikely(fallback_page_copy)) > + ret =3D copy_page_to_iter(page, offset, > nr, to); > + else > + ret =3D copy_folio_to_iter(folio, offset, > nr, to); > folio_put(folio); > } else if (user_backed_iter(to)) { > /* The change seems fine to me.