From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 658C4C531DC for ; Thu, 15 Aug 2024 03:38:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B282C6B007B; Wed, 14 Aug 2024 23:38:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD8376B0082; Wed, 14 Aug 2024 23:38:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 977666B0083; Wed, 14 Aug 2024 23:38:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7A63C6B007B for ; Wed, 14 Aug 2024 23:38:44 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2466B1C4747 for ; Thu, 15 Aug 2024 03:38:44 +0000 (UTC) X-FDA: 82453072968.14.B31168F Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) by imf19.hostedemail.com (Postfix) with ESMTP id 5CA161A0008 for ; Thu, 15 Aug 2024 03:38:42 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="VMFWo/Gv"; spf=pass (imf19.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723693051; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VvhBfVaq4ky98Qq1ttq96d2l8SLRGhLbpdRRgm3Iero=; b=scsXLynie3LdT1cjza50UNJAshOcn1X+ARnR3VXKwDelcD59PhI0DvKq8WwAJcB8HYwmOa ClCBXfHjkhZQQKa5YVcRJJclk1izZee8sDYz9UsaYwc7qhFNOSYn7zkorp8Dpgp5GrjXc6 GEWp9SPpMLitJzFXOqMwlp6iPVBCx6A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723693051; a=rsa-sha256; cv=none; b=l9gVOPpMgBMc0zV6XSfo39v9kKHpo/32aihSueqvkhCWb1DqD3YpJJsZWxfc3cgSCXvGSF mClMZiM/NVKtBKaPAVDYraY12KPVQYg7YwMebkh7bOeTSHCgTTtEnLME32lVuGztYI8Hkv YjquBG2CNQsLoH0G6QlAly1oSPKvQpg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="VMFWo/Gv"; spf=pass (imf19.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-7093705c708so503619a34.1 for ; Wed, 14 Aug 2024 20:38:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723693121; x=1724297921; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VvhBfVaq4ky98Qq1ttq96d2l8SLRGhLbpdRRgm3Iero=; b=VMFWo/Gv7LbQyTL08UBQbgOeE1RF+DM/8csplWMj/68p9J+qyxRPDW2qLCy/Ra/+MR 7ElogUYHjpgU80lryrPFqSleHmuRtW6loQVS2LVUMwv6DBiMaEv5cilDDIryryIVB1cf bVaAl715zQ6BmEMmWkHLyzxAHURPzKTqHBr/MDmE+MlTdqFuqYRlCmvbpNpkMveHFBoq VNMkwJHal7HHkEmrzwiQVb+Uxt5OiwWmXlo9Qz1AW0C2/9npHJlCFMjl/jsFHEklKuHt mLP4dwbU9OSEVwuxDeW6/as9wCHhPIsGIJt2Z/F6n4Xv3La2wr2C7W/D8hr17bLp35/D PISQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723693121; x=1724297921; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VvhBfVaq4ky98Qq1ttq96d2l8SLRGhLbpdRRgm3Iero=; b=n1HekIVbzYsb+SfX+gqU5jQ/JjzyrRMBBYRR7nRufis2tQsHNU1bd8xuLvYCJh1Ldh aCUi70IqYjdp5mgVW18gP+y6jji+6Wd0VDKveSEFWYR7VC9xbAcggS2SanKn22VTVpXc 4ZIkdbyyo8T8vxRTch6aNM/bYSThlRdWD5k17ZRM7q4bObYQ9yAjhLPa9CeIr1EzvlJO olChxiHDr85ThaW0NYh5d9LDctnNhl1FsgftT+QIzJ4LDLnvQCp1JuLhnH5rkk0tXfze xFoEFw3vaWAzbelm+fdvR0Buyvx1f0KuESZ88QvZq8nxFE1tIR5rgJnnK2BrHsTXchYc Ny4A== X-Forwarded-Encrypted: i=1; AJvYcCVWEXtFTBSNr4gFNggXXqTqzsnWmGqkTaBeKZ6PrNnaQMJoPq/r83e3TgBP5D/UIXOPuLZO2JSRRbftPj9K2rxwz7Y= X-Gm-Message-State: AOJu0YwNU57GlgxxYAsZCyQ9igqsrLNapq0oSHvk6t6qeMbPCwFeSkL+ NRuY18nUEcN88t8K7GDT32UZJskZgjg4uIuUIF03VjLgKLrAXrjUUQGDhDD7GdMCYelwdv+/MBi ZYyMfr8aJC25axhTkSNIfLdSfp4A= X-Google-Smtp-Source: AGHT+IFQ/ksUuT6O+ki8j4YfRoQVMxXkh8qy8kKXnxpHh8g0RltcFjxltw1nLmx2WpZc5zQ89OnT80jxGhK6BUo/noI= X-Received: by 2002:a05:6358:5302:b0:1af:3e08:3992 with SMTP id e5c5f4694b2df-1b1aab5ae80mr587562355d.10.1723693121239; Wed, 14 Aug 2024 20:38:41 -0700 (PDT) MIME-Version: 1.0 References: <20240812090525.80299-1-laoar.shao@gmail.com> <20240812090525.80299-2-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Thu, 15 Aug 2024 11:38:05 +0800 Message-ID: Subject: Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} To: Dave Chinner Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Kent Overstreet Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5CA161A0008 X-Stat-Signature: fciz7bnxsgjfs114rbdx9xrjjbztpx4x X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1723693122-391892 X-HE-Meta: U2FsdGVkX1/hBrUnz3EDOQkm3lpV2EtA/iY0eaPfhzRwFWKy6btGLY9vP+HHh1TaGc8gLcykK4WwLb1k+b8e7ukFPqeEN+Dc3Z7H5rsiV9f6hxqVXF4Qk3Tj6G8RoPlJpwT88RkDi7TCL8vPjQGWO/sFwwquWipd5dwYH28X4QNXtRuROadaR53iUlA9Tm62kn9B+bWzlIjmuDom+eJYn+NsyxiGY/kSQ8qegQlwTomU/rRpMboNvKr9ak6ao7Ofixzza8sQVD9Ay5XcIHMcToK9n/zKMdGfsbYpX5nc50D9nkSPHjJt20IRW0c9u2W1VDh0pUII8gb2skLkoUklArvrQLZRGOfxunSc7G31M8KnlDsI8g4R9t7hXWyGc8pcKtcpl/oh2Sx6t8fjYf/qq9rq1uQD6u+UXQOX4f1NrV1LLSdSMZQhpngZw48uuOObxTC+lOKqUAF0pvE6xnCNS85Md6NvTJ2zeYHHrQITfeaFZ5OO61rwkCRwTD6nSFy5b4lzLI053ykUJn8rNs0d6DYSzZUyJt7USLdeNpOijd6fvvFSL2a2+dUgbHyqneQ1GMkqbChgHo9WVblX4iOd/q/fVsBn3wp3aLjFp9hz/EjjIUpC1ZdtWoFBr63Lk/xv9Sn0Bgtc4P7wiOf8o16f6v2J0Sz3oNnCHfmo7D1yPGEiiWcZgykXnx9BaqrFzxtKeAOH0yfSOxaEpwELO2Gu7ThqmgP2qsDH5qxZy+M2UgMobiyd9ZAMhR6SfGxVVKTTtWAYdSzMNs2xXcVUEomon+55inRPkcjENA8jtU/CKAox5oGBwd6vMFPgNURriVY8LBEQcpy1hAe0+QjA61BPBanCYdYU9g7pklaTs5VFRA1ducsEWvndkFNKsOa+xCLcVDaG0o+77IO97toB5fc1mEkOb/fMo8h22JzHBRbH4ZyJGtPVO0G1pgI1m4eg9HyBzmku9LTUwj2KU5IGp69 CsVHeAiN Vx0nX5vyATsLF0teZGvawiF7JLPYyh65Oxsz1LxzvVZohz//lsxyR8sVtWb3uiRSZKW7bAgQ8J3W9DPPt1l01AgIazwaFDwVU7CFPG8trGDL4YdhQgzpKarrVz5l0xCPT7xYuJ4WLqMM8hlttfxvFQFP+loJds5YHAB/Zvp2yK+F2vZcNNDqsKqdFof65KaFDw9dm0AwPh+S11vbtYhoamcXiFNkBuPetMkwTcSqKE90/iYeqclEiKlsZ6LKRCal/I8l92GK1VpvnycIRe5YKaBi76mjV1gjkTR4OL1ncwdfWJBsMMfM5DsZqEsFFIlqfWL4PlydCcd59XFTc64hx7DkjrtW/RQHyAmRahVeKyyB66+2wIOCOQp/CuF+GL0tojZVRILL466DS8GD+B4fLB3+kor/kMde6j/15CEVOb0fwYVIJDsWhqRCNUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 15, 2024 at 10:54=E2=80=AFAM Dave Chinner = wrote: > > On Wed, Aug 14, 2024 at 03:32:26PM +0800, Yafang Shao wrote: > > On Wed, Aug 14, 2024 at 1:42=E2=80=AFPM Dave Chinner wrote: > > > > > > On Wed, Aug 14, 2024 at 10:19:36AM +0800, Yafang Shao wrote: > > > > On Wed, Aug 14, 2024 at 8:28=E2=80=AFAM Dave Chinner wrote: > > > > > > > > > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote: > > > > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af9= 05bfc > > > > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To= complement > > > > > > this, let's add two helper functions, memalloc_nowait_{save,res= tore}, which > > > > > > will be useful in scenarios where we want to avoid waiting for = memory > > > > > > reclamation. > > > > > > > > > > Readahead already uses this context: > > > > > > > > > > static inline gfp_t readahead_gfp_mask(struct address_space *x) > > > > > { > > > > > return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN= ; > > > > > } > > > > > > > > > > and __GFP_NORETRY means minimal direct reclaim should be performe= d. > > > > > Most filesystems already have GFP_NOFS context from > > > > > mapping_gfp_mask(), so how much difference does completely avoidi= ng > > > > > direct reclaim actually make under memory pressure? > > > > > > > > Besides the __GFP_NOFS , ~__GFP_DIRECT_RECLAIM also implies > > > > __GPF_NOIO. If we don't set __GPF_NOIO, the readahead can wait for = IO, > > > > right? > > > > > > There's a *lot* more difference between __GFP_NORETRY and > > > __GFP_NOWAIT than just __GFP_NOIO. I don't need you to try to > > > describe to me what the differences are; What I'm asking you is this: > > > > > > > > i.e. doing some direct reclaim without blocking when under memory > > > > > pressure might actually give better performance than skipping dir= ect > > > > > reclaim and aborting readahead altogether.... > > > > > > > > > > This really, really needs some numbers (both throughput and IO > > > > > latency histograms) to go with it because we have no evidence eit= her > > > > > way to determine what is the best approach here. > > > > > > Put simply: does the existing readahead mechanism give better results > > > than the proposed one, and if so, why wouldn't we just reenable > > > readahead unconditionally instead of making it behave differently > > > for this specific case? > > > > Are you suggesting we compare the following change with the current pro= posal? > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index fd34b5755c0b..ced74b1b350d 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -3455,7 +3455,6 @@ static inline int kiocb_set_rw_flags(struct > > kiocb *ki, rwf_t flags, > > if (flags & RWF_NOWAIT) { > > if (!(ki->ki_filp->f_mode & FMODE_NOWAIT)) > > return -EOPNOTSUPP; > > - kiocb_flags |=3D IOCB_NOIO; > > } > > if (flags & RWF_ATOMIC) { > > if (rw_type !=3D WRITE) > > Yes. > > > Doesn't unconditional readahead break the semantics of RWF_NOWAIT, > > which is supposed to avoid waiting for I/O? For example, it might > > trigger a pageout for a dirty page. > > Yes, but only for *some filesystems* in *some configurations*. > Readahead allocation behaviour is specifically controlled by the gfp > mask set on the mapping by the filesystem at inode instantiation > time. i.e. via a call to mapping_set_gfp_mask(). > > XFS, for one, always clears __GFP_FS from this mask, and several > other filesystems set it to GFP_NOFS. Filesystems that do this will > not do pageout for a dirty page during memory allocation. > > Further, memory reclaim can not write dirty pages to a filesystem > without a ->writepage implementation. ->writepage is almost > completely gone - neither ext4, btrfs or XFS have a ->writepage > implementation anymore - with f2fs being the only "major" filesystem > with a ->writepage implementation remaining. > > IOWs, for most readahead cases right now, direct memory reclaim will > not issue writeback IO on dirty cached file pages and in the near > future that will change to -never-. > > That means the only IO that direct reclaim will be able to do is for > swapping and compaction. Both of these can be prevented simply by > setting a GFP_NOIO allocation context. IOWs, in the not-to-distant > future we won't have to turn direct reclaim off to prevent IO from > and blocking in direct reclaim during readahead - GFP_NOIO context > will be all that is necessary for IOCB_NOWAIT readahead. > > That's why I'm asking if just doing readahead as it stands from > RWF_NOWAIT causes any obvious problems. I think we really only need > need GFP_NOIO | __GFP_NORETRY allocation context for NOWAIT > readahead IO, and that's something we already have a context API > for. Understood, thanks for your explanation. so we need below changes, @@ -2526,8 +2528,12 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, if (!folio_batch_count(fbatch)) { if (iocb->ki_flags & IOCB_NOIO) return -EAGAIN; + if (iocb->ki_flags & IOCB_NOWAIT) + flags =3D memalloc_noio_save(); page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); + if (iocb->ki_flags & IOCB_NOWAIT) + memalloc_noio_restore(flags); filemap_get_read_batch(mapping, index, last_index - 1, fbat= ch); } if (!folio_batch_count(fbatch)) { What data would you recommend collecting after implementing the above change? Should we measure the latency of preadv2(2) under high memory pressure? Although latency can vary, it seems we have no choice but to use memalloc_noio_save instead of memalloc_nowait_save, as the MM folks are not in favor of the latter. -- Regards Yafang