From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19683D3ABC9 for ; Mon, 11 Nov 2024 16:09:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1E476B0083; Mon, 11 Nov 2024 11:09:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CEA36B0085; Mon, 11 Nov 2024 11:09:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 896306B0088; Mon, 11 Nov 2024 11:09:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6D26F6B0083 for ; Mon, 11 Nov 2024 11:09:10 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 199EE1419E5 for ; Mon, 11 Nov 2024 16:09:10 +0000 (UTC) X-FDA: 82774297158.21.7BCE58E Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf07.hostedemail.com (Postfix) with ESMTP id 13E6540024 for ; Mon, 11 Nov 2024 16:08:10 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jrj97Kcy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731341173; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G0DLm79c8nsyqnwnkNC9AVIHm71PDbkGvOOT8UgugGQ=; b=fhzC7qejiQJt0+GvDRJgmxs8fHGG+8smzXnEuPlL6WGloUI6nomr7+dKr/Cf6OTZHXEA3c YoLh8CdA3BjifRNzMIqiK7pmhgrgoqXLD+CBJej0B3tR0yvnFU02yPNlHdEiWeSrCQHVbi TmqodVRJaXZ713mfhqR3m1vJw/kQnfI= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jrj97Kcy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731341173; a=rsa-sha256; cv=none; b=UfWkufEgu4NJAanDuTWJeZD5CuF4+N92Q66w+bAbL4bWKJpQVJ2EIuVvg3NMzcHoNt4A1j 6bldSkkTadEMPc4xUUCA+tU7E+fb/uEJhTlhvPgsBqKDBaKMahK02RKUacYEVcuz8Jo8xd mJXySdtpQo6aaTzKjODX5OIidtSLq+0= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6cc03b649f2so30542636d6.3 for ; Mon, 11 Nov 2024 08:09:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731341347; x=1731946147; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=G0DLm79c8nsyqnwnkNC9AVIHm71PDbkGvOOT8UgugGQ=; b=jrj97KcyhQk7/iflJILMkJ25q/Ehgx/MeQCVBgXrZ0hcD8qDHLK9Qcja21agNs0Kjz K39lDI+ECrwHu5GjUps6jMVgCCWyo6kt1fOz7X7eu+5eq/4e0qliD82SDAbbIOJAQ85S DxCnHX5A6m1n+vBqjRWK85NBpkRniXPcB+CmxuQ6oOeFpd9NBTGxWASJfCTKZuV5rTcy xNvxsOsSB2pvHc9zhdmK4woNVOf36B0cueIkkEos9s7yUxiVNq5VkVaFewCatVlulmS0 G0M5ssJ/HySh4Rz840thBdA4kiH4D3jiGCc00rpmCdpplP9iQ8VUZCNpSoJLNr7st2ay QHsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731341347; x=1731946147; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G0DLm79c8nsyqnwnkNC9AVIHm71PDbkGvOOT8UgugGQ=; b=dGkvT/0uIsqfUda/8nMZJGJ85XA4kYcNrxMdCE4XTmaY0qvsSgQQo+4qVlN40XBzmY CXAppzMGYVmsph8Ow5RhQJYpuGql4RKlpkKQXglJykWxb0c00tkFYOG9OfSbYVsZw9UO kMQlbkoSEStPkFeLMFj9wV8s275CmZvdIW0zcMr5ZOkHpiyuAg8DwZwLrGldfcX+GZdl 68OLXjRxeb9Mei4pRX7VQfK7g920b0vq68+jct/jgLNaoBbZXL3v+a50WvUc0zksij0j SuJMFItyMhJLW9jteU2Q0au0UFrN7yc+XBYkSwvGlwYm6Ps8rmrQ45ndTTqc4DXJNYoK IkTg== X-Forwarded-Encrypted: i=1; AJvYcCXJ9SHiFyC0vVEhmabgLPfPThncYwkYcor+87NnYHVaA3psttEQEiEopl83S2ZDCW93OVYGFmomsg==@kvack.org X-Gm-Message-State: AOJu0Yx0gLphYGNOouaG3swpX5hJLB6zUoSc55zHzF4mi9+XCjeixFiG 3UJkcd3gyd6GL8YGjzjW4Dhz22HhCrK4rKZ5Od8pS5b1L4x10SFv9u+7nmlAP2IQvsivaaDjPf+ LvcjS8TR+Pfpc1A7w2m/fOtYYaGA= X-Google-Smtp-Source: AGHT+IHu0x4CdCBodNErT6K44euCCF0Z9POl4rIN/9Q4XzpF5dGVxexNjAbz7TUzm5sHsZupAPY8HbApNvrQ95g1QcM= X-Received: by 2002:a05:6214:2b97:b0:6cb:c744:c415 with SMTP id 6a1803df08f44-6d39e20a29cmr170018586d6.49.1731341347018; Mon, 11 Nov 2024 08:09:07 -0800 (PST) MIME-Version: 1.0 References: <20241108141710.9721-1-laoar.shao@gmail.com> <85cfc467-320f-4388-b027-2cbad85dfbed@redhat.com> In-Reply-To: From: Yafang Shao Date: Tue, 12 Nov 2024 00:08:30 +0800 Message-ID: Subject: Re: [PATCH v2] mm/readahead: Fix large folio support in async readahead To: David Hildenbrand Cc: willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 13E6540024 X-Stat-Signature: 93onjjntiho7gn9mke3sfy116xp46cpp X-Rspam-User: X-HE-Tag: 1731341290-195954 X-HE-Meta: U2FsdGVkX1+c3Thv+d5QWduEQNA/ND8/cV9/o8tF71ntVlObAgvk/UkD+e3VC9Ot3s8+01e2AWJNHglNcJCC6xyrFvEcy9htq80AvD7NX2E/0Dl/Zp2MNLPK53+iTCJgbghn04UP5ftrPzeR62EMXB2V2UtpNk/C4iARWZX6mKIrI3qwYHkyS158tDAq5JR2vSKMguip8dCZYo2jTLkWZ3hFw+Y1ysEX8lX7gGEPj6uH3LSeDvRewFIMx9aZ4GINcUE5C9CarrWEQe7mbWXieRhPbgTp4gf+uTvDIIkzjb7tOaC9+W1Vap9d+1gTFMIWN4Wute6slr5Gw8jezikj3/jDrY4Xl4yA4KqTHpHbdw3KeNWTluFUHgiof5Xv/4MXZmXEq2RXtDp/WGSvxd+PJM+vbimxbDhGdZK+N6gE7bYEHbXggnknhRTv/IFbzL8/17aIgs8DfAWb+UgLc0ypV5umQePMRfwiFLVMtBhJ86i37472N5RZSuSkwqHgPRJ2cTr+VIAjU8+G7t7rqmo/oa5vgcNRvFdth2T9FaQYptkCBsMze07Oh37cW2SUUk96wx0r0gtI0eyQTaryCxmBeEPt0ffdGQsiZVr9GeDQmBDiThL1fA7esKTZNi8lI/3jzLp4a0OsDOqdTdFSfez2ACeW7PwdyQEEQBdrmMRsdJMm9xy/hahlBIqLhgZ3ndaAa9duqnWR6bbD84Lla8U5kV0N9+H7RGIIheOmrDBeixFctJQhutXa0OSFY/tFSeP1L/yZjLBg0B7Ds3M/QCTHL2hvKePirz1O0xbPw4Eptzrww3iL8d2Xb2/TBgu3a1Jz4GVI+fXDRYQ7BrnOM+e/YMEdaVh46mkT90niVBVo7Z5u10CYEqwrseBPufiJ5Y22xjo4MzY/U+Q4s2fe8TbxBztvNhzvz3E+gQtuzbPKc24kAyfRi5S13XKXjhTwwwQ5HTEn/QScU7bOZMrqZbN 2AfdsovQ eYWvoypxsv8h27jyIl5V9okjE4FanKKVsFFE62tiIuFnDKFVpiiC72bpfBAkx+KSLSuRRVGSm455bjg3re2CR93hhiQU66n+M+QjjNwMyHyYZeUTVPpsxxjFQI89OnXU8u7OzKtnL50Ra7sMPAxA1yJ6PfpSh6fdAdhPGJGp3mlhCEfmtFwPua6u8Bc3vQrChenPQ4dQQ/MJzVEq5f1Yzzklb501FTc6Oxe/1CZMP8xSUjxVyG4Q/rAyrn5LnlqH+5rARAUfz/MK1Jwo9dtt+U0RtUcC9hUUXoEc6TJSZms6Pnq7CtgO8UCmW5Z3v1hgeZkdQmWe+7mXwvHIrMXAXx718GqwyiH11PhJOCo4Cyhtt3cgFQp4Ijzm6LxCytNlWOT9raeZI7G9oofWhhMPY1/vP5gRsfJ9V51MeovjAvKo0m60= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 11, 2024 at 11:05=E2=80=AFPM David Hildenbrand wrote: > > On 11.11.24 15:28, Yafang Shao wrote: > > On Mon, Nov 11, 2024 at 6:33=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 08.11.24 15:17, Yafang Shao wrote: > >>> When testing large folio support with XFS on our servers, we observed= that > >>> only a few large folios are mapped when reading large files via mmap. > >>> After a thorough analysis, I identified it was caused by the > >>> `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this > >>> parameter is set to 128KB. After I tune it to 2MB, the large folio ca= n > >>> work as expected. However, I believe the large folio behavior should = not be > >>> dependent on the value of read_ahead_kb. It would be more robust if t= he > >>> kernel can automatically adopt to it. > >> > >> Now I am extremely confused. > >> > >> Documentation/ABI/stable/sysfs-block: > >> > >> "[RW] Maximum number of kilobytes to read-ahead for filesystems on thi= s > >> block device." > >> > >> > >> So, with your patch, will we also be changing the readahead size to > >> exceed that, or simply allocate larger folios and not exceeding the > >> readahead size (e.g., leaving them partially non-filled)? > > > > Exceeding the readahead size for the MADV_HUGEPAGE case is > > straightforward; this is what the current patch accomplishes. > > > > Okay, so this only applies with MADV_HUGEPAGE I assume. Likely we should > also make that clearer in the subject. > > mm/readahead: allow exceeding configured read_ahead_kb with MADV_HUGEPAGE > > > If this is really a fix, especially one that deserves CC-stable, I > cannot tell. Willy is the obvious expert :) > > >> > >> If you're also changing the readahead behavior to exceed the > >> configuration parameter it would sound to me like "I am pushing the > >> brake pedal and my care brakes; fix the brakes to adopt whether to bra= ke > >> automatically" :) > >> > >> Likely I am missing something here, and how the read_ahead_kb paramete= r > >> is used after your patch. > > > > The read_ahead_kb parameter continues to function for > > non-MADV_HUGEPAGE scenarios, whereas special handling is required for > > the MADV_HUGEPAGE case. It appears that we ought to update the > > Documentation/ABI/stable/sysfs-block to reflect the changes related to > > large folios, correct? > > Yes, how it related to MADV_HUGEPAGE. I would assume that it would get > ignored, but ... > > ... staring at get_next_ra_size(), it's not quite ignored, because we > still us it as a baseline to detect how much we want to bump up the > limit when the requested size is small? (*2 vs *4 etc) :/ > > So the semantics are really starting to get weird, unless I am missing > something important. > > [...] > > > Perhaps a more straightforward solution would be to implement it > > directly at the callsite, as demonstrated below? > > Likely something into this direction might be better, but Willy is the > expert that code. > > > > > diff --git a/mm/readahead.c b/mm/readahead.c > > index 3dc6c7a128dd..187efae95b02 100644 > > --- a/mm/readahead.c > > +++ b/mm/readahead.c > > @@ -642,7 +642,11 @@ void page_cache_async_ra(struct readahead_control = *ractl, > > 1UL << order); > > if (index =3D=3D expected) { > > ra->start +=3D ra->size; > > - ra->size =3D get_next_ra_size(ra, max_pages); > > + /* > > + * Allow the actual size to exceed the readahead window= for a > > + * large folio. > > "a large folio" -> "with MADV_HUGEPAGE" ? Or can this be hit on > different paths that are not covered in the patch description? This branch may also be triggered by other large folios that are not necessarily order-9. Therefore, I=E2=80=99ve referred to it as a 'large fol= io' rather than associating it specifically with MADV_HUGEPAGE. If we were to handle only the MADV_HUGEPAGE case, we would proceed as outlined in the initial RFC patch[0]. However, following Willy's recommendation, I implemented it this way, as he likely has a deeper understanding of the intended behavior. [0]. https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao@gma= il.com/ -- Regards Yafang