From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25FF8D3ABCC for ; Mon, 11 Nov 2024 16:14:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0F086B0088; Mon, 11 Nov 2024 11:14:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A70DE6B008A; Mon, 11 Nov 2024 11:14:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EA576B008C; Mon, 11 Nov 2024 11:14:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6CD066B0088 for ; Mon, 11 Nov 2024 11:14:29 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2C4791A1A40 for ; Mon, 11 Nov 2024 16:14:29 +0000 (UTC) X-FDA: 82774309758.07.FA0EA5A Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) by imf08.hostedemail.com (Postfix) with ESMTP id 4EA8A16002D for ; Mon, 11 Nov 2024 16:14:00 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=krErVqEY; spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.48 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731341580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xvQwAnANsgT0rajCHw82xTSaXxa0bUKaVNBhRNEXLCU=; b=juSXUorNWT+rPE5gRqt8ZT1Ujz/TgAx0U6Pmyx9m1Fxxgc+C7QHA+emHdgpz9c5YYSqeFg eK5DM5AqlzSkaSqc2X/EtwK5VHGUJWGTczTUocU8mOJbQUtSI388yAeaSi+pEFeOIlvmAE cduwVWbKsN3dMxnWSGmmKzB57w60qOk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731341580; a=rsa-sha256; cv=none; b=mZA0vrK/gzCWKrpYA9Nuv+gvM+z9u+K8Bws+xOFrI+MMVoDefa8AJGNbB80/qi4A+PI4eJ /ZvIMIhIU2EWeH+ZyUZaXefXHDLar/fZRoBVAP2ZR2bHzrZGSWMCCAWCh3mI9x8Hj6ssxe UmD50BX1F3XM7x4XG4ONBSjtpjG4dl8= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=krErVqEY; spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.48 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oa1-f48.google.com with SMTP id 586e51a60fabf-290d8d5332cso2178324fac.2 for ; Mon, 11 Nov 2024 08:14:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731341666; x=1731946466; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xvQwAnANsgT0rajCHw82xTSaXxa0bUKaVNBhRNEXLCU=; b=krErVqEYoAVry+VFNvZIoO/oGAd5aEVm9XsuMxstB2qQGGs9pZblhFfkNxyWoO6OoH yLcgD3emDXTPjekwU08ToEWKj+x4JsMPyEc5z5yKN6+4fMSY0TA7vcy1WOrGnCccMdwA jxS3o7kqE0MTr/tB4CUku/ilKPL6ZuNKOFYppt3d5cuScU1xO8HzNtvnXGlb6d88N62L 8G3lKDm6YtWIccONHtjTFN2qKP2UmKvmOU6ED3Y/j0Ql8VSsSZrteQCm1i6gOPNNmdsW YBwYLcNJCxST5UVxACLZj4u/Qpr1EyDbvumTUKi4SML6jqOZqrBzk9RdXn7T+cSTzrDZ LwQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731341666; x=1731946466; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xvQwAnANsgT0rajCHw82xTSaXxa0bUKaVNBhRNEXLCU=; b=ujOJtK+xWsrYh82RS9CfnlwTcAuIMHEMxGz8KFq8O67XCDud3eZdH5z8nPuM/Ysgh5 rxCYX3weYRneO+zqz4/7SNxQpL/8r8MBKpAuGKtBM/JHlljFnQSkGLGNhmhha844LqNJ sOCo0qIYCiZ+/ucMUvzP3amoot0HwNyg7+78IWarVj8sZe2GGpSnfewFjxKpOKyQcB3K nu4WsdIdIMY880a8abvFfNi6j7cjfYMincRIvP02GyD/we1iOJjqjpNBzhVYr1QWOGsL 2iznxYAUl7JbtPLNyYsq/togLTZzalUbOvLQaikPJ+M4CJ7JF0bg1jD9pn+Rl0DsMuaI dk2Q== X-Forwarded-Encrypted: i=1; AJvYcCVUdQkrPnTAkvnj1hu0cjDmFJBf3lc4koTqz5Pw62xdNzgk5kHtckNNIgAtYoYQUZfadM2R4sAX2A==@kvack.org X-Gm-Message-State: AOJu0Yz372s/Xb7XtTh9s2vEOwEAVNe7VWnXBdphaeUh2a05OSs0sDDU 3/U037sN+gFOr1G6On7XxYdSFPPdfEKJQ3V50S7X4t3TPTRFOSPQje8Vd2A+zozU4g1rug6h9u4 m13BbE51ZVhE3a0jhTtQeAzUviqY= X-Google-Smtp-Source: AGHT+IEwboN3F4v7Tb9v6ZFHVXW3p55ehp66PGhfnbo0jrt7Oj7p/zFq63iGXb3GfzDVUZr4Jmwm+dsle3sfvbml7tw= X-Received: by 2002:a05:6870:7028:b0:288:3c6c:ca with SMTP id 586e51a60fabf-295603dcbe2mr10152037fac.41.1731341666337; Mon, 11 Nov 2024 08:14:26 -0800 (PST) MIME-Version: 1.0 References: <20241108141710.9721-1-laoar.shao@gmail.com> <85cfc467-320f-4388-b027-2cbad85dfbed@redhat.com> In-Reply-To: From: Yafang Shao Date: Tue, 12 Nov 2024 00:13:50 +0800 Message-ID: Subject: Re: [PATCH v2] mm/readahead: Fix large folio support in async readahead To: David Hildenbrand Cc: willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Stat-Signature: 4hi9it7obq465m7mrzjxzii8buzybdpu X-Rspamd-Queue-Id: 4EA8A16002D X-Rspam-User: X-HE-Tag: 1731341640-405026 X-HE-Meta: U2FsdGVkX19dkU/ChJtuDQRaC0ELsdoJj8cQs1s5ujWkhDM1w51VriQmfLgBQyQgB6LcUQHBLiejZYwffbfZumZiMVE7DxwYVZ91y9TD1qgDdF5jYwtFCzaFxVIckYRf/zBYZzmQ4xgfXrY7B355aF8plqENChN3nshkn4gk35VaRBN6paY7LFxAEqMb3CRPQmLflnAvpCSYxeN76zHWUq0lfav0AVIG/BLLV1RsHj59B7bWte5z5pya75145VdHDlbYkHgE2Sr/mc+WrSIdFqnSFhPzA4zoFhoAAnCf3IbMliQaF2RsESBSAno3JcDl7U20dmFkiGrPOoassJm3LXGgvRnWZAH/yIqUSMn6jIV47txRy/4B71iDbXpl4DEdaGaBj05CKC5hjGEMRY8Kt62nc0GdrycEm6hBnnIFVbOc8KFuwUNviTCTezJXvhx/8laxeI436vi6mU8/0bMZFuc2wvXde/Ru68qJebUlREYhRfAVJ6qnMxnTnwXYwQi1W2by7VnfTM5NHHoOlsJ7gljaeJ5V5ZMoPL9GEP5a/9/nztF2dyCMj7UoRNCPDWpQ0TvZ+lrlPQ0j8MHbK14i0nIXTUCdQ6uq2+M84udoQotcrINYZI8RI9sFn+5acRfKJITrFI+gfEoeprk6OhNlhNBuk5Ko4akh2xRY1I5mqG8X5ku+uzM4y1towSUujz733TM4xV+wA6qIUp+KRnwQP1fkDXSZgUpo9R+u5hzf4Yiv0IQKstU7IeBVXO6jL8wsw6alD3lTbfvl4Iy9PcBI3WImGd1Uc9UElkx9JzfIAy87b5t1QXM/kS5mx/IHlw/AFxrX6SdRxR3VRqq94bjNYjotEP9oHpyBVNJSmxtSv/GZpvZAL4Azi2h+1JfKPquKUw0nIUNWrurbtNzItDfh7Lh4h9QQe9GQdEpv+SlQxzdVa719fTKR24k3yB+dAqA3bO7+fY++F0JLaVr7XiG FAm78+fO cF0q2iwSXJt/0XeltZU6vy1QHJWqeXjTBK1inY9EDfCqDh8XqeRtUGAZqqkIsc+9j2kZkVq8s9wYdSBJ8GGgs3JAyN5IbCZU05Qrvnsq1aVzhmZfxvmRnqgeeonKXoNxDtZ4P9tfbl8so6JUHVp8DAfrVWL8943C78pZcG62yNIWQuYTxU7nGjsfWqR6ydRN8m9fjedzb97ifjQ5tOw5YR3qZsDPQlfdwXZDiNXJ3f5rjEfavM9d+/X5faNIWfmUCNdEtnnt1A2K2J3V8t0n/U1DH9zH89VgMRG1qbt+3LqBqc2i3kS4IluU3MqIsBTD+kqLAzTtJAE8GZhsp/hP8qlFX9EBAgqcwDPgQY0setN2ltgrIJvSK5ETGSIFCyb9OH518ouDZ2z+c1aTUeA3+kcsmS2J/MLK5gkfCdW/62gKZObI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000023, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 11, 2024 at 11:26=E2=80=AFPM David Hildenbrand wrote: > > On 11.11.24 16:05, David Hildenbrand wrote: > > On 11.11.24 15:28, Yafang Shao wrote: > >> On Mon, Nov 11, 2024 at 6:33=E2=80=AFPM David Hildenbrand wrote: > >>> > >>> On 08.11.24 15:17, Yafang Shao wrote: > >>>> When testing large folio support with XFS on our servers, we observe= d that > >>>> only a few large folios are mapped when reading large files via mmap= . > >>>> After a thorough analysis, I identified it was caused by the > >>>> `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, thi= s > >>>> parameter is set to 128KB. After I tune it to 2MB, the large folio c= an > >>>> work as expected. However, I believe the large folio behavior should= not be > >>>> dependent on the value of read_ahead_kb. It would be more robust if = the > >>>> kernel can automatically adopt to it. > >>> > >>> Now I am extremely confused. > >>> > >>> Documentation/ABI/stable/sysfs-block: > >>> > >>> "[RW] Maximum number of kilobytes to read-ahead for filesystems on th= is > >>> block device." > >>> > >>> > >>> So, with your patch, will we also be changing the readahead size to > >>> exceed that, or simply allocate larger folios and not exceeding the > >>> readahead size (e.g., leaving them partially non-filled)? > >> > >> Exceeding the readahead size for the MADV_HUGEPAGE case is > >> straightforward; this is what the current patch accomplishes. > >> > > > > Okay, so this only applies with MADV_HUGEPAGE I assume. Likely we shoul= d > > also make that clearer in the subject. > > > > mm/readahead: allow exceeding configured read_ahead_kb with MADV_HUGEPA= GE > > > > > > If this is really a fix, especially one that deserves CC-stable, I > > cannot tell. Willy is the obvious expert :) > > > >>> > >>> If you're also changing the readahead behavior to exceed the > >>> configuration parameter it would sound to me like "I am pushing the > >>> brake pedal and my care brakes; fix the brakes to adopt whether to br= ake > >>> automatically" :) > >>> > >>> Likely I am missing something here, and how the read_ahead_kb paramet= er > >>> is used after your patch. > >> > >> The read_ahead_kb parameter continues to function for > >> non-MADV_HUGEPAGE scenarios, whereas special handling is required for > >> the MADV_HUGEPAGE case. It appears that we ought to update the > >> Documentation/ABI/stable/sysfs-block to reflect the changes related to > >> large folios, correct? > > > > Yes, how it related to MADV_HUGEPAGE. I would assume that it would get > > ignored, but ... > > > > ... staring at get_next_ra_size(), it's not quite ignored, because we > > still us it as a baseline to detect how much we want to bump up the > > limit when the requested size is small? (*2 vs *4 etc) :/ > > > > So the semantics are really starting to get weird, unless I am missing > > something important. > Likely what I am missing is that the value of get_next_ra_size() will nev= er be relevant > in that case. I assume the following would end up doing the same: > > iff --git a/mm/readahead.c b/mm/readahead.c > index 475d2940a1edb..cc7f883f83d86 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -668,7 +668,12 @@ void page_cache_async_ra(struct readahead_control *r= actl, > ra->start =3D start; > ra->size =3D start - index; /* old async_size */ > ra->size +=3D req_count; > - ra->size =3D get_next_ra_size(ra, max_pages); > + /* > + * Allow the actual size to exceed the readahead window for > + * MADV_HUGEPAGE. > + */ > + if (ra->size < max_pages) > + ra->size =3D get_next_ra_size(ra, max_pages); This change doesn=E2=80=99t apply to MADV_HUGEPAGE because, in cases where `ra->size > max_pages`, ra->size has already been set to max_pages. This can be easily verified with the example provided in the previous version[1]. [1]. https://lore.kernel.org/linux-mm/20241106092114.8408-1-laoar.shao@gmai= l.com/ > ra->async_size =3D ra->size; > readit: > ractl->_index =3D ra->start; > > > So maybe it should just be in get_next_ra_size() where we clarify what "m= ax_pages" > means and why we simply decide to ignore the value ... --=20 Regards Yafang