From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 975C5C47258 for ; Mon, 29 Jan 2024 00:40:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB90F6B0071; Sun, 28 Jan 2024 19:40:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E69C16B0072; Sun, 28 Jan 2024 19:40:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D30626B0074; Sun, 28 Jan 2024 19:40:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C43FA6B0071 for ; Sun, 28 Jan 2024 19:40:08 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8FE671C0B07 for ; Mon, 29 Jan 2024 00:40:08 +0000 (UTC) X-FDA: 81730491696.03.19C1FDF Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf18.hostedemail.com (Postfix) with ESMTP id 0167D1C0016 for ; Mon, 29 Jan 2024 00:40:05 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=S3fcEKHk; spf=pass (imf18.hostedemail.com: domain of snitzer@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=snitzer@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706488806; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GSBIVu4e7zyJwYA8omI9DtyW9Cv4D1K3x1rR9vS0Ibk=; b=TuUvktjeA1rQ50UGryaQcYItvITMnGoHnO28xRhAmvMaeuirPWDj8ZeAtz8kLgOiTIgIbQ 8SWyCOWKObXABYjp8VUyAqNSYCyIMifGwlFX/yEXqfZlEXTp/c+LqZzEMVT9AstEuqX72Z fdYzucbbF+VwojUkTz8f3vmskrn7V2Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706488806; a=rsa-sha256; cv=none; b=2iLJOj/S/pQxVR/rpuHWhMT3+1aUkTleCfT4yiLLZOeMFzY+vlaU+M2MiEuuAc31TOZr9v X6zjtkyl86jMF4x7uw3VsVKczoR3TrE+ixqCV17zy5Fzf6dHkQ2CW8aYzUAddClrtN2Y/x D/yJyHZ3lrBlKm5jhf7NntsKeLqMoBM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=S3fcEKHk; spf=pass (imf18.hostedemail.com: domain of snitzer@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=snitzer@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 7F47DCE013C for ; Mon, 29 Jan 2024 00:40:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5E995C43601 for ; Mon, 29 Jan 2024 00:40:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706488801; bh=GOR+vztBp9qsOfWG/p7VJatHV1ebL3mS/nkMHAUATnU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=S3fcEKHkqEqitRMIVj5hmqHJZU8fhO5OEUusrfOzN27629hkWvh/7fyDdeLi0PZr9 n3vr5PXcB47an+jKNzx35aEt0hmZOf9EWGsH5XYeb9ylZl1NWtz4bqsBvWTpD44/lm nT1LsovvG7nanfo1PWWW25GgvPqdNHMVyPJDiLO0zJZETUyw/QsgWENqq+dOaSLyQI zX9iTaHdT580sJjFb1dFA0vAfwPmUzC8iU4yrYfigwYmpILiBfCFgIb1+6MDMKfjxO Z+7cssDVheDauDC40B3Stqn2Awq+CmHEyVgw2Vs0HQVYtQUtE8VYNA5yZ6/gCc8Q/y Y0ksnseRU6AFw== Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a3542a3374bso169096466b.2 for ; Sun, 28 Jan 2024 16:40:01 -0800 (PST) X-Gm-Message-State: AOJu0YyACKF/ZF6r3+B/OhAsjLkbTrr07YfFrJhJoGKq4dTr4AvOYVyL E1EPBnCJVy1fETqFpfKt6T3qHGuVeJiBvFu+O1F2KyDgO+N4+kxQ9FJIep76Vv4z6I4yz42CBOa 4xQqfjoBIfNkojCMbvqAjb2w8qJuvl4z+W5Um X-Google-Smtp-Source: AGHT+IEPQs8sUOkrWs+g/b4bMXhiO6ykzksvP2pqvceFjnm3ic5S2cUnErkR5WyrynR1Nl/rh9HjuD9/FYsibH5QMSY= X-Received: by 2002:a17:906:6810:b0:a31:4cf0:cf81 with SMTP id k16-20020a170906681000b00a314cf0cf81mr2993727ejr.37.1706488799959; Sun, 28 Jan 2024 16:39:59 -0800 (PST) MIME-Version: 1.0 References: <20240128142522.1524741-1-ming.lei@redhat.com> In-Reply-To: From: Mike Snitzer Date: Sun, 28 Jan 2024 19:39:49 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range To: Matthew Wilcox Cc: Mike Snitzer , Ming Lei , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0167D1C0016 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: wtb184qcsfmqbnpfqgscnnagmng5kyfp X-HE-Tag: 1706488805-588341 X-HE-Meta: U2FsdGVkX1/E6eA4A+pXq8/hCLNMUzAf9iwXuGPAovMps5Xn87KNOVkuFvmgNkTZt90vk6C9jbkiMhtvV4rP1zEvwBuTIkF3KSHbQbRCAK8HqL/RWxwRFNSxqBsQdX0sds2WgJt8Ot3xNjK2acQdZvimkfciD0770PidvHnV3Cwb1V1YAtMthFg1/8r97ovH0xtZhobPhvqhZlu+tF2HExM4ug6vZVHnu0cemOU5RaCJpt5A5sx2ybsgXmUKO+NUndh5JR8YBRvmoZIBJBMxJqcIZJXDDSOphlYPkfNkasudh4S1W+xSR4uiaF9YFVyNHbAFMlr8Omz6l1hdErz++/VTDBgICgjzHln9jRIyyAbqgCW4V3rBKGjivxdJ0XPqhPg6dKxFbeQFsuZBEzOVvhZ5j+0qwTHj3Y+UGHsae6xAkVEbllfX7UA3fxZ5AsGifoyoIEaQm/VsUKrMdz/a/xWoAMEpPVvSMnYcqlVF3WX93LpnQBSQHgFJ3Nv089XfQfH/tK1h4gEZ3S3vaZy9X7z9kFWYHM93OxBlzjbaFVK2fs2J019Wi+4mZS9mYpUnsHYVkjt9u+kV8jfx1/25Ew/t4V7X1UvYz7A4WiQDWUsIzgC4tQCU6Hf85wANtomRAPiM8I+ASF8GUvv0ZGA/pI4RKsNzVw5MxJ68ziUiXtRH5XiPi/3vbiH8p823yq5M56WcftNsTTI7meh7n/5nbKTeZZqXFSGk7MtmX0srNuDapR9bf4KSrFRMPlSHwKPmMtKZb+rr37Lsprh9v8a3MSOrw3ImV+XNbHSk4XcyUS/75GE373TsH+b5jjWTOq+uRo1D/PoKQIo9VhadHMnKiyMgNWWp7SNF3xRCQz19SMPZo99CMk/+ixKI7dC9einYG/1yqB5VAFl79Ty2CZnPQvMxgsYFGhUJOeu4o9fpAYZV0tDnYCEQPjeJ3/gSt1adLaT8M9fgZhMukSP9Shn z1EfjajD f2IPcoRTxJVUE4z5vFxFltk7o73r0zPWeFfhCHsP04Z4098DfHnbt/SqLb42YdqzRNbCbnklM+dD/phXWs3zV/mqE2Y7Sf/P99eCGuI7tvwUF6a12BJPbDcB8ad+MLy3/pScKBmJGugbSdel0MXIDS40OkGKoRqkAzBU1rxlvbkiesp/iiJGXzv2itw8AHvLyU2ZYL4pWMNKGacSt5DSbkDbUjWaQkPEnUxBCKnM6PKqyicQTNDcXOX2kKoGhzf5NfTaPSVcByhSeyTEQ1I3NCFkJKg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jan 28, 2024 at 7:22=E2=80=AFPM Matthew Wilcox wrote: > > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote: > > On Sun, Jan 28 2024 at 5:02P -0500, > > Matthew Wilcox wrote: > > > > > On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote: > > > > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure f= or > > > > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED > > > > only tries to readahead 512 pages, and the remained part in the adv= ised > > > > range fallback on normal readahead. > > > > > > Does the MAINTAINERS file mean nothing any more? > > > > "Ming, please use scripts/get_maintainer.pl when submitting patches." > > That's an appropriate response to a new contributor, sure. Ming has > been submitting patches since, what, 2008? Surely they know how to > submit patches by now. > > > I agree this patch's header could've worked harder to establish the > > problem that it fixes. But I'll now take a crack at backfilling the > > regression report that motivated this patch be developed: > > Thank you. > > > Linux 3.14 was the last kernel to allow madvise (MADV_WILLNEED) > > allowed mmap'ing a file more optimally if read_ahead_kb < max_sectors_k= b. > > > > Ths regressed with commit 6d2be915e589 (so Linux 3.15) such that > > mounting XFS on a device with read_ahead_kb=3D64 and max_sectors_kb=3D1= 024 > > and running this reproducer against a 2G file will take ~5x longer > > (depending on the system's capabilities), mmap_load_test.java follows: > > > > import java.nio.ByteBuffer; > > import java.nio.ByteOrder; > > import java.io.RandomAccessFile; > > import java.nio.MappedByteBuffer; > > import java.nio.channels.FileChannel; > > import java.io.File; > > import java.io.FileNotFoundException; > > import java.io.IOException; > > > > public class mmap_load_test { > > > > public static void main(String[] args) throws FileNotFoundExcep= tion, IOException, InterruptedException { > > if (args.length =3D=3D 0) { > > System.out.println("Please provide a file"); > > System.exit(0); > > } > > FileChannel fc =3D new RandomAccessFile(new File(args[0])= , "rw").getChannel(); > > MappedByteBuffer mem =3D fc.map(FileChannel.MapMode.READ_= ONLY, 0, fc.size()); > > > > System.out.println("Loading the file"); > > > > long startTime =3D System.currentTimeMillis(); > > mem.load(); > > long endTime =3D System.currentTimeMillis(); > > System.out.println("Done! Loading took " + (endTime-start= Time) + " ms"); > > > > } > > } > > It's good to have the original reproducer. The unfortunate part is > that being at such a high level, it doesn't really show what syscalls > the library makes on behalf of the application. I'll take your word > for it that it calls madvise(MADV_WILLNEED). An strace might not go > amiss. > > > reproduce with: > > > > javac mmap_load_test.java > > echo 64 > /sys/block/sda/queue/read_ahead_kb > > echo 1024 > /sys/block/sda/queue/max_sectors_kb > > mkfs.xfs /dev/sda > > mount /dev/sda /mnt/test > > dd if=3D/dev/zero of=3D/mnt/test/2G_file bs=3D1024k count=3D2000 > > > > echo 3 > /proc/sys/vm/drop_caches > > (I prefer to unmount/mount /mnt/test; it drops the cache for > /mnt/test/2G_file without affecting the rest of the system) > > > java mmap_load_test /mnt/test/2G_file > > > > Without a fix, like the patch Ming provided, iostat will show rareq-sz > > is 64 rather than ~1024. > > Understood. But ... the application is asking for as much readahead as > possible, and the sysadmin has said "Don't readahead more than 64kB at > a time". So why will we not get a bug report in 1-15 years time saying > "I put a limit on readahead and the kernel is ignoring it"? I think > typically we allow the sysadmin to override application requests, > don't we? The application isn't knowingly asking for readahead. It is asking to mmap the file (and reporter wants it done as quickly as possible.. like occurred before). This fix is comparable to Jens' commit 9491ae4aade6 ("mm: don't cap request size based on read-ahead setting") -- same logic, just applied to callchain that ends up using madvise(MADV_WILLNEED). > > > > @@ -972,6 +974,7 @@ struct file_ra_state { > > > > unsigned int ra_pages; > > > > unsigned int mmap_miss; > > > > loff_t prev_pos; > > > > + struct maple_tree *need_mt; > > > > > > No. Embed the struct maple tree. Don't allocate it. > > > > Constructive feedback, thanks. > > > > > What made you think this was the right approach? > > > > But then you closed with an attack, rather than inform Ming and/or > > others why you feel so strongly, e.g.: Best to keep memory used for > > file_ra_state contiguous. > > That's not an attack, it's a genuine question. Is there somewhere else > doing it wrong that Ming copied from? Does the documentation need to > be clearer? I can't fix what I don't know. OK