From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B130D65553 for ; Tue, 26 Nov 2024 20:22:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B246A6B008C; Tue, 26 Nov 2024 15:22:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD4176B0092; Tue, 26 Nov 2024 15:22:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 974B06B0093; Tue, 26 Nov 2024 15:22:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 79EDD6B008C for ; Tue, 26 Nov 2024 15:22:40 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 23CDA141958 for ; Tue, 26 Nov 2024 20:22:40 +0000 (UTC) X-FDA: 82829369028.19.A2D1626 Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) by imf29.hostedemail.com (Postfix) with ESMTP id 4E92E120006 for ; Tue, 26 Nov 2024 20:22:31 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cOIvWXrD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of anders.blomdell@gmail.com designates 209.85.167.170 as permitted sender) smtp.mailfrom=anders.blomdell@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732652553; a=rsa-sha256; cv=none; b=7w+vFjGWbqu0kMjEyBRt/ApYaw5owoq2EOzcXOM4i2kRblR0G5jeINMGJwG0Ak3V0JeWlj dihDCj5GL1ukwULsuw+jvT999CZs2YUg1nweOEtxKAvo+aFEenlIMIrxO3/HKUHDGAGl6n 5HzKg1vCv1CSo0SqdZkrKWHcQSgu9qE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cOIvWXrD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of anders.blomdell@gmail.com designates 209.85.167.170 as permitted sender) smtp.mailfrom=anders.blomdell@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732652553; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CBlbk0AS+XcfZouELKmDMFq/W8rcsFIq2TWwhD2p4H4=; b=MFSLdQjWG290uK8CvGF1w74ojYBN43R6VjOmBNrnlu3OB/ZnG+Y0r4gurXnWgjjVlI939P QYIqF6HALSZZ1V6f3YuB3fDndOFSRvYU4QA51sIMGzLOcu++VfMypwOpl0kRFfTvoKrG4a +g4IXRDZ59+vUykqpv3U41G0sUO0rvs= Received: by mail-oi1-f170.google.com with SMTP id 5614622812f47-3ea45abb4e1so1237258b6e.3 for ; Tue, 26 Nov 2024 12:22:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732652557; x=1733257357; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=CBlbk0AS+XcfZouELKmDMFq/W8rcsFIq2TWwhD2p4H4=; b=cOIvWXrD2DPKm88Aeih/r67uATc68fRzOn9qi5twTAwDR/dWR76bn0r9NlK2XUaWwu w8u/pSRTURkgtsltJ81QplEnSWR27XKCeLLWpoaQacJC3GNLGS7a00seXOBTAyLqSeAP YFG+in0ALWkXwDeHfBdS2+BUKuHUHYhyZi7vqZNGXwR3m/p+uegWKxWQwQCMEVIiUglQ DzFb3BJlRP36rEBSONj6VaOV9pe1yH2YtKXvnVpbheJoQ3vWpWQstvdwT+27YHzU6PuS IxGoFF/7EvN1nS5w4K4sPVrCSa1Swz9gsAoBXh3KFV2ZgyF9rettm68Ibb7H8dIA4Acw 55nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732652557; x=1733257357; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CBlbk0AS+XcfZouELKmDMFq/W8rcsFIq2TWwhD2p4H4=; b=fQC11PGmPZ4rG2e6mRELduw4xHYrPq461q7wo5/FanZ4LdTbFmLYwjgjHw7PekE2GL RX8wYupXl3YH6WILe+r2VEBQrErJ0ZBHPL9lZ/r8FyUMIwBCJyQBeLADN/awSb9qOHiQ OVCiBVup4WlEheH1bG01XeyWTHVH3hLVOy7FooPITH+U5pD1rEBKFSpMFvQIdsAWOuZo dBp2rRzr0X1jD2RJQkHvM+QLhQLLg/HOcxMWaI6S9UI1muBotHMy/Fl93qV3Ct5xHlGn y5eHgAkdcGM4GtXZVF3SKQY35f6vRwETC5jQeGi711saFPYh2w3qAgiwNpHJxhOoPNgY VNBg== X-Forwarded-Encrypted: i=1; AJvYcCW62nttj8NWT7gJ6XecdQrtcwQMX6qilij3oQ47XgdYqS/sRNe12DGS4eD+U6nooELB8a3AukguPg==@kvack.org X-Gm-Message-State: AOJu0YyBME/Z0+rUbEXtBRE55aDjrXJWFrlPBYbQ9qdNPV0U9DigS0ZU SzU525KjBE8lNk64Kfm2m6Xczhds34n9TIwR1NOxjyz3FJXvg7YyTOSbzXXncN2f2x/tfcU3WG5 cJWb9g5dIaDNOJv5mfNAx0O3nKEE= X-Gm-Gg: ASbGncvrDitds1sKH201nJbEJbarPM0DwI8ph62cv0T5JwIPCnYScQMdKqyNXGFTPU/ QWDnu3XjpjevLq7jJ5hahXhr+UuSs4lI= X-Google-Smtp-Source: AGHT+IFEVvimFk61tma1cx47mSFVQaDEui21ymXAhtlKQPhT0ZaW6qOBO7YMgrgVtKa/znsh+xqBqtS1q1i01A4fWbI= X-Received: by 2002:a05:6808:2018:b0:3ea:4c54:51f3 with SMTP id 5614622812f47-3ea6dbd2df0mr938123b6e.16.1732652557312; Tue, 26 Nov 2024 12:22:37 -0800 (PST) MIME-Version: 1.0 References: <49648605-d800-4859-be49-624bbe60519d@gmail.com> <3b1d4265b384424688711a9259f98dec44c77848.camel@fifi.org> <4bb8bfe1-5de6-4b5d-af90-ab24848c772b@gmail.com> <20241126103719.bvd2umwarh26pmb3@quack3> <20241126150613.a4b57y2qmolapsuc@quack3> <569d0df0-71d5-4227-aa28-e57cd60bc9f1@gmail.com> In-Reply-To: From: Anders Blomdell Date: Tue, 26 Nov 2024 21:22:25 +0100 Message-ID: Subject: Re: Regression in NFS probably due to very large amounts of readahead To: "Matthew Wilcox (Oracle)" Cc: Jan Kara , phil@fifi.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, LKML , NeilBrown Content-Type: multipart/alternative; boundary="00000000000032b09e0627d69c82" X-Stat-Signature: riepfob39eanthjshymto85qrr4jefw8 X-Rspam-User: X-Rspamd-Queue-Id: 4E92E120006 X-Rspamd-Server: rspam08 X-HE-Tag: 1732652551-191763 X-HE-Meta: U2FsdGVkX1/8nHGWavabR0jMYLt+BELI/a/DPfxXUTnTqbgyCcI4PoEPoGYnFPJyTTqWADeenK2yQZvkuBEh6P9wQt0tKyYwGrWoKNBmvYrxb5BarC5phhbNowXLVxhJBoHMG1NFLJ33hjXbsFgzeqjUB9AUzEViULQFoBBiuQ7TgEK4EcBDt3Q4GmZhRN4I8m4fXCJRl2BiUidssBT0R4SXINUt8v6zbEyAMrcrj6VpY0BeIdbetnAKcY8HrM6byccspVxj3H7NGMG2HVfil3Wqk/ssaCbb7klMUcr2ln7E7wug5zcfYi52G6zfMDjVOmWNey988koYGgNLWEhKbEmnKY4HluJldi54aIc9gJy+FLlJOtK3L2Td45zNc9LUb5RiVAMxo7R+YhIlGt9vMXoOODGSeARKpQDYXGJyVivI5U2aqLK63i2JoS9UXz8lpXKG1a8IUszzUeK/qdiKDleDEz8y3qIiJGuWFQfKaLmk0+fphPAuskcrEPucPuRQTJKI9RqsYTsoDnKT6y1o9CIqZx+J8NI+hbgM1Na8nYJyVkFde7lAnLJLDTFS94MOs896/mvrSguFNGBQlZXVDqF3ifa/h5qI13QMG/gtQ0GPA0ojev5VPGnAPtr4dLtbVByeeBhoJ4qfroH1ZSwVj1YYHMKNHA0Ll5JjVyCDilEQmDWxrG1uhpHDGkMtl3p4CZBHEmYk27j5OCwoU453f8BJu1OXSxmkQsXDdMCVyd5p5V9q5BKZM41GtaS4UNq5yrpZfmkng7tr4GG7+fr2mJslfUNgOE6semGaetqmrtu/RIhDQAL0uy5kKcei7BIF0RLI88kJDsoZntTAbDLP2B/3k+5HdO3knBPsa5EDiUS13406u4rAo83gdjdTcq4j5BBoJv2jnKB646dRqER2Vmcvau9y13sDXOPFBifvYw+OHoE2yCEPmUT0W+VkWYIRVfFvamfdIcOsk20P6lR YgLEod+C jGFggbOMK+iCnBxBBCOo8v8lQIQ7mW2dDWgf5Nx1MOrEmmUjOi9+FTbOn++cYJtPr0vP4pDf6I+3ZNv6UHb6NXQv7QIQcxH3LkaebGBrHgZJeSFv2KJbDuN3mnHyc+AERsYvSDpVvA7VvEYepeEYfexpNPvxuLO2J+q5Vk/wnpOE5pWf2S2jfG9hoGRRN6HDat1+sEJx7FFLVRhoVTUkFHvUq2H9sfIyyLxlX65uwjBZKf7UO+IR/vPWRyxpgBbMgDZowMRdcjljWsyT4DB0LZrVCPHct2telm1H6V+GFGv8DK2Vcmc7AkoG65+zVsVTJ6c6IYvgJJ5tfHG5xiUIVAULMRph+xURSObMmS143qbEfw4cwDEo+azhiEVdmRHcpEOCEIHzmhhCxPmmfvY+JvAuKPsDGljjn4AmNjQxPRibsoRqOmWBKVJTMok03a+8NBy4Rn+tZUoj/Nmq36DMAJMviINsPNZOtQlMznpWJ6/2oxpk5wJvzjXd0FObsZFN56VUXVvInFzsI+E4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --00000000000032b09e0627d69c82 Content-Type: text/plain; charset="UTF-8" More like me not reading the comments properly, sorry. What I thought I said, was that the problematic code in the call to do_page_cache_ra was reached when the folio alloction returned an error. Sorry for not being clear, and thanks for your patience. /Anders On Tue, 26 Nov 2024, 19:42 Matthew Wilcox, wrote: > On Tue, Nov 26, 2024 at 06:26:13PM +0100, Anders Blomdell wrote: > > On 2024-11-26 17:55, Matthew Wilcox wrote: > > > On Tue, Nov 26, 2024 at 04:28:04PM +0100, Anders Blomdell wrote: > > > > On 2024-11-26 16:06, Jan Kara wrote: > > > > > Hum, checking the history the update of ra->size has been added by > Neil two > > > > > years ago in 9fd472af84ab ("mm: improve cleanup when ->readpages > doesn't > > > > > process all pages"). Neil, the changelog seems as there was some > real > > > > > motivation behind updating of ra->size in read_pages(). What was > it? Now I > > > > > somewhat disagree with reducing ra->size in read_pages() because > it seems > > > > > like a wrong place to do that and if we do need something like > that, > > > > > readahead window sizing logic should rather be changed to take > that into > > > > > account? But it all depends on what was the real rationale behind > reducing > > > > > ra->size in read_pages()... > > > > > > > > My (rather limited) understanding of the patch is that it was > intended to read those pages > > > > that didn't get read because the allocation of a bigger folio > failed, while not redoing what > > > > readpages already did; how it was actually going to accomplish that > is still unclear to me, > > > > but I even don't even quite understand the comment... > > > > > > > > /* > > > > * If there were already pages in the page cache, then we may have > > > > * left some gaps. Let the regular readahead code take care of > this > > > > * situation. > > > > */ > > > > > > > > the reason for an unchanged async_size is also beyond my > understanding. > > > > > > This isn't because we couldn't allocate a folio, this is when we > > > allocated folios, tried to read them and we failed to submit the I/O. > > > This is a pretty rare occurrence under normal conditions. > > > > I beg to differ, the code is reached when there is > > no folio support or ra->size < 4 (not considered in > > this discussion) or falling throug when !err, err > > is set by: > > > > err = ra_alloc_folio(ractl, index, mark, order, gfp); > > if (err) > > break; > > > > isn't the reading done by: > > > > read_pages(ractl); > > > > which does not set err! > > You're misunderstanding. Yes, read_pages() is called when we fail to > allocate a fresh folio; either because there's already one in the > page cache, or because -ENOMEM (or if we raced to install one), but > it's also called when all folios are normally allocated. Here: > > /* > * Now start the IO. We ignore I/O errors - if the folio is not > * uptodate then the caller will launch read_folio again, and > * will then handle the error. > */ > read_pages(ractl); > > So at the point that read_pages() is called, all folios that ractl > describes are present in the page cache, locked and !uptodate. > > After calling aops->readahead() in read_pages(), most filesystems will > have consumed all folios described by ractl. It seems that NFS is > choosing not to submit some folios, so rather than leave them sitting > around in the page cache, Neil decided that we should remove them from > the page cache. > --00000000000032b09e0627d69c82 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

More like me not reading the comments prop= erly, sorry. What I thought I said, was that the problematic code in the ca= ll to do_page_cache_ra was reached when the folio alloction returned an err= or. Sorry for not being clear, and thanks for your patience.

/Anders


On Tue= , 26 Nov 2024, 19:42 Matthew Wilcox, <willy@infradead.org> wrote= :
On Tue, Nov 26, 2024 at 06:26:13P= M +0100, Anders Blomdell wrote:
> On 2024-11-26 17:55, Matthew Wilcox wrote:
> > On Tue, Nov 26, 2024 at 04:28:04PM +0100, Anders Blomdell wrote:<= br> > > > On 2024-11-26 16:06, Jan Kara wrote:
> > > > Hum, checking the history the update of ra->size has= been added by Neil two
> > > > years ago in 9fd472af84ab ("mm: improve cleanup wh= en ->readpages doesn't
> > > > process all pages"). Neil, the changelog seems as = there was some real
> > > > motivation behind updating of ra->size in read_pages= (). What was it? Now I
> > > > somewhat disagree with reducing ra->size in read_pag= es() because it seems
> > > > like a wrong place to do that and if we do need somethi= ng like that,
> > > > readahead window sizing logic should rather be changed = to take that into
> > > > account? But it all depends on what was the real ration= ale behind reducing
> > > > ra->size in read_pages()...
> > >
> > > My (rather limited) understanding of the patch is that it wa= s intended to read those pages
> > > that didn't get read because the allocation of a bigger = folio failed, while not redoing what
> > > readpages already did; how it was actually going to accompli= sh that is still unclear to me,
> > > but I even don't even quite understand the comment... > > >
> > >=C2=A0 =C2=A0/*
> > >=C2=A0 =C2=A0 * If there were already pages in the page cache= , then we may have
> > >=C2=A0 =C2=A0 * left some gaps.=C2=A0 Let the regular readahe= ad code take care of this
> > >=C2=A0 =C2=A0 * situation.
> > >=C2=A0 =C2=A0 */
> > >
> > > the reason for an unchanged async_size is also beyond my und= erstanding.
> >
> > This isn't because we couldn't allocate a folio, this is = when we
> > allocated folios, tried to read them and we failed to submit the = I/O.
> > This is a pretty rare occurrence under normal conditions.
>
> I beg to differ, the code is reached when there is
> no folio support or ra->size < 4 (not considered in
> this discussion) or falling throug when !err, err
> is set by:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0err =3D ra_alloc_folio(ractl, index, = mark, order, gfp);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (err)<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0break;
>
> isn't the reading done by:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0read_pages(ractl);
>
> which does not set err!

You're misunderstanding.=C2=A0 Yes, read_pages() is called when we fail= to
allocate a fresh folio; either because there's already one in the
page cache, or because -ENOMEM (or if we raced to install one), but
it's also called when all folios are normally allocated.=C2=A0 Here:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* Now start the IO.=C2=A0 We ignore I/O e= rrors - if the folio is not
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* uptodate then the caller will launch re= ad_folio again, and
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* will then handle the error.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
=C2=A0 =C2=A0 =C2=A0 =C2=A0 read_pages(ractl);

So at the point that read_pages() is called, all folios that ractl
describes are present in the page cache, locked and !uptodate.

After calling aops->readahead() in read_pages(), most filesystems will have consumed all folios described by ractl.=C2=A0 It seems that NFS is
choosing not to submit some folios, so rather than leave them sitting
around in the page cache, Neil decided that we should remove them from
the page cache.
--00000000000032b09e0627d69c82--