From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17724C47258 for ; Sun, 28 Jan 2024 23:12:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43BD96B0071; Sun, 28 Jan 2024 18:12:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EBD36B0072; Sun, 28 Jan 2024 18:12:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DA606B0074; Sun, 28 Jan 2024 18:12:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1EEF46B0071 for ; Sun, 28 Jan 2024 18:12:34 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 986AA14010E for ; Sun, 28 Jan 2024 23:12:33 +0000 (UTC) X-FDA: 81730270986.26.E9AADFB Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) by imf25.hostedemail.com (Postfix) with ESMTP id B4ED1A000C for ; Sun, 28 Jan 2024 23:12:31 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of msnitzer@redhat.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=msnitzer@redhat.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706483551; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GD5NdHDnfxOWQObF5uD/hMjS9qKRT60jFamS6md7FkU=; b=LnfscRFV++QtqeOr0gPvsMZOcuKbEVn2/NWUS/xMFv0MJHyDJBfP5jWzsWXk8DyGMcyRDS jbbZXFiwnUylTsvBm9SuCT76+3uk/pm2ccbmiU4clagapnQqOx2hdGVcZQrt7kxi2gF+/d 8i8IQMWKZw12YimSTMmvghmyqX3UC/E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706483551; a=rsa-sha256; cv=none; b=55xVRX2qk44LaVsJkI9l+WeYNw+tGJegoCcjz3f5McYBugMbsp8DmZINl7aWB40VzkzeLw kjG2rDCLlffgRaMAw05Z8SlCqeHdj3xMlJIrHFsP5BuvYrrm5wO36NfkA7RGj5oQNyR/Qz HH66rQg7GpQlyvNR39vEZXU0ONVd8sg= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of msnitzer@redhat.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=msnitzer@redhat.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-68c3d51ecebso10960306d6.3 for ; Sun, 28 Jan 2024 15:12:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706483551; x=1707088351; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GD5NdHDnfxOWQObF5uD/hMjS9qKRT60jFamS6md7FkU=; b=Zv2zqBWSptx6VNmxZCUwoPAUFm7E4b7qpeSvd5g3TEr5REQf2zcacZRU6fekDgv9OV FGa7LdIy3qYSMkbmKN71TAqppzRRN4h2T/tspcWUTMftClMGfcMDXiry85k/OcJC92jf m/C8uH/hVty1WITXKTxmYO7jn6+tzSkTgYd6p4jhJs3/gfUCk9yLBPwcoWeziPy912BG m9vpg6NUdGd2IshrqdMz/5ZnSnvufiH1BT2MjxVLcviWqo0MZUmRcIaaAwn6o3RWG+Q6 zO5s0OcV9IV+KHaVXY3LCPUYYaBDOA94XiB3YD/ACoKEEkWnAvWAPy6vb7oyfy05Bs1V JE3g== X-Gm-Message-State: AOJu0YyMTF73lgfDMOKeLRJCQ00/BuqPWJHHxZouwVpSw17oPezbVXUy LqrRsz76Tv25wxu8uZGqoipvR6W98kTs7psqhaYx8q6eZLcodHOdFzqsjQ73hg== X-Google-Smtp-Source: AGHT+IHK0/nUNToH8edwxVlWmvEyvZIHDEpHgPaEgJjUzz7rMirW/s8is9RvU2aReauLMczsnC32Uw== X-Received: by 2002:ad4:5aa2:0:b0:686:9f2e:d984 with SMTP id u2-20020ad45aa2000000b006869f2ed984mr5509530qvg.22.1706483550841; Sun, 28 Jan 2024 15:12:30 -0800 (PST) Received: from localhost (pool-68-160-141-91.bstnma.fios.verizon.net. [68.160.141.91]) by smtp.gmail.com with ESMTPSA id pg10-20020a0562144a0a00b0068690c3a04asm2816450qvb.20.2024.01.28.15.12.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 15:12:30 -0800 (PST) Date: Sun, 28 Jan 2024 18:12:29 -0500 From: Mike Snitzer To: Matthew Wilcox Cc: Ming Lei , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Message-ID: References: <20240128142522.1524741-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: B4ED1A000C X-Rspam-User: X-Stat-Signature: e6akx7fib8593md5fejqiihxog6h8t9x X-Rspamd-Server: rspam03 X-HE-Tag: 1706483551-874545 X-HE-Meta: U2FsdGVkX18OfShtX4jb8hyAq1xygEn20+mK6NXq638ZVWkUBH2J8755PRtcBcb9funC6e4vtY9q4koWpIjOZNieU9ualwhdha4PF8N5I0N2jNcZvyHhEgdein9QRFV0jlu55YbgI6TVM4KbHTroDc/DccSOUH3g6s8CYs0l4KU6owZMF40MESFKHqqRj/w1CoC3uyR1gEknKiNBExTlZ+yuuumIVmFNM/wgCHatD7kIjUZ0iuALlgwvTf6OCnB5GXQVNjw+Y7hrMopdvPAwGR0Ar+Q7Gq8YpkU3ohp+0DmP+aiDp2/QFif2FW1WMMD3RMQWHoezdN9JCwn6hGMxi9yAB29k/1WINbdEYD8M8J5UXCX2vIT4nbZGVNojOD3dzz0TvijK/PA/Hof8T8PvksNpCsnYp27ivA0jnC36HP0vRBa2XZ56Dd32qiJzbp4bTn02MrwNhnmnaaYwh+BTKE4U4kLXGQifZjmKPTiOQIw3JAJBm9aBatiaHoOSpivfl62akkGbxBamYos2DiirCDleIf/86VLOiZpc28ro1V3PauafLsbeBz9cL6ne1959soishlf20H2Xe6YJ5zlFegc7+EUa6vZCzebhHM9r4hR4wFpSyQF8Pq51zzCDvF/wXdXu9LWUPerKIXEFaBm0sMo4XgXVOoFgXrl/D94MgH1FoWAhB/adGUWTnj/cq2KoagN+QZkxQ+AAAw8zg7AzmAetwj1KvKIg3uuR9DwC3HcrC71v2hq917Cr67tBL4Zl2ZXYbRClVn1KDZygWNC3nOD2pSBezL9yb1sb22hhwU1tLkc3EFqrhvAQwJbaSVdg7ZDTcw6rq0exRdAycxzE7U5yGOYJveuetpIbnvUExhdAniGWtShaIS3uegb5qU1vw0//bwymPnkIF/KQpVlcTw6ANiPkypjaUzFkdarZJghTw7nms4Pfkbu8/yIbJxaW7h5FhdNUstD6Ssdoqag UFKAccOD 7oVG0PrnEbAuBZcR2T4ZVg9CbCy5G2ML3N5dUE4RT3rUb0uayqUsMeZEeuobwiOpkIXgcv5JcAoVsLDFeW1VA3R2slMj0gFGf0mp3u1M3C1cTAu5S815GOZdSVnUSLSU/zhWanGUBbk2Ef+J+h2tysNGM3XoN/g6p0FPYmKn0kCz1kqgLq7oeuxkZC9RK/BR+XCBWAekgFR3Mev5oYjkZmiP6y4MbrxE3K3bTEwJ1WWd4S2CkRm2u1jnuyicMGVGnipVfbxnNI6PVVzWAhOpbbmU1U+8vjOG/HCSAmg5aGo1zHCgYxHp6uXTPP1MinSazzBSqP5hABBSOuGLW3vzbH4tssJcXVMn0wpigetaQV6VECzyRwR7a8Q40t6cDW2UVVv20N9KsCExIT1Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jan 28 2024 at 5:02P -0500, Matthew Wilcox wrote: > On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote: > > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for > > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED > > only tries to readahead 512 pages, and the remained part in the advised > > range fallback on normal readahead. > > Does the MAINTAINERS file mean nothing any more? "Ming, please use scripts/get_maintainer.pl when submitting patches." (I've cc'd accordingly with this email). > > If bdi->ra_pages is set as small, readahead will perform not efficient > > enough. Increasing read ahead may not be an option since workload may > > have mixed random and sequential I/O. > > I thik there needs to be a lot more explanation than this about what's > going on before we jump to "And therefore this patch is the right > answer". The patch is "RFC". Ming didn't declare his RFC is "the right answer". All ideas for how best to fix this issue are welcome. I agree this patch's header could've worked harder to establish the problem that it fixes. But I'll now take a crack at backfilling the regression report that motivated this patch be developed: Linux 3.14 was the last kernel to allow madvise (MADV_WILLNEED) allowed mmap'ing a file more optimally if read_ahead_kb < max_sectors_kb. Ths regressed with commit 6d2be915e589 (so Linux 3.15) such that mounting XFS on a device with read_ahead_kb=64 and max_sectors_kb=1024 and running this reproducer against a 2G file will take ~5x longer (depending on the system's capabilities), mmap_load_test.java follows: import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.io.RandomAccessFile; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; public class mmap_load_test { public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException { if (args.length == 0) { System.out.println("Please provide a file"); System.exit(0); } FileChannel fc = new RandomAccessFile(new File(args[0]), "rw").getChannel(); MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size()); System.out.println("Loading the file"); long startTime = System.currentTimeMillis(); mem.load(); long endTime = System.currentTimeMillis(); System.out.println("Done! Loading took " + (endTime-startTime) + " ms"); } } reproduce with: javac mmap_load_test.java echo 64 > /sys/block/sda/queue/read_ahead_kb echo 1024 > /sys/block/sda/queue/max_sectors_kb mkfs.xfs /dev/sda mount /dev/sda /mnt/test dd if=/dev/zero of=/mnt/test/2G_file bs=1024k count=2000 echo 3 > /proc/sys/vm/drop_caches java mmap_load_test /mnt/test/2G_file Without a fix, like the patch Ming provided, iostat will show rareq-sz is 64 rather than ~1024. > > @@ -972,6 +974,7 @@ struct file_ra_state { > > unsigned int ra_pages; > > unsigned int mmap_miss; > > loff_t prev_pos; > > + struct maple_tree *need_mt; > > No. Embed the struct maple tree. Don't allocate it. Constructive feedback, thanks. > What made you think this was the right approach? But then you closed with an attack, rather than inform Ming and/or others why you feel so strongly, e.g.: Best to keep memory used for file_ra_state contiguous.