From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE35FC83F1B for ; Mon, 14 Jul 2025 15:16:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 524A08D0008; Mon, 14 Jul 2025 11:16:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D4F08D0001; Mon, 14 Jul 2025 11:16:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 412438D0008; Mon, 14 Jul 2025 11:16:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3280E8D0001 for ; Mon, 14 Jul 2025 11:16:59 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D34B71D89E9 for ; Mon, 14 Jul 2025 15:16:58 +0000 (UTC) X-FDA: 83663222916.04.7243445 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf17.hostedemail.com (Postfix) with ESMTP id 8C5B140004 for ; Mon, 14 Jul 2025 15:16:56 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=BfwhpM4x; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5XAJ7Mkj; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=kdkptHNX; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=75D590Bj; spf=pass (imf17.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752506216; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p0zCSwLeyGaWwD67hNeIU4ApHxfUm2so5D3bGdJtdMY=; b=SFVzUUuVqe9OhQ1yuAIU56lcvL26hcrccvai630KB8JrdnhRn0Bj8Yu3TOYLPHHSnB2WG/ 1+Ti+xfq8lB8H9nDic6XJRkLuWyprYF15GEDLiFKqIiJKhk/TWM8owEAKvsWZLKS1uoCx1 LiFG0sLK2PrDjfeqVELuSkYCHsu91EY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=BfwhpM4x; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5XAJ7Mkj; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=kdkptHNX; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=75D590Bj; spf=pass (imf17.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752506216; a=rsa-sha256; cv=none; b=ENQgSWWyt8Tye7lKcZv8QteBloqnC8vF6wxaMMm6x7OgCnOEn6zgASMQbzWJk9CYvdrXNd K4hGvykpO7sKNgIEZqP1kSbG2O/vSgj2wC0zBcvKZyrwSrnR/WOAyMuaxFKPm6OrZt8F2e WywdcVivD+6S1fSVPOvau/sdb55kvEo= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 33FE52115E; Mon, 14 Jul 2025 15:16:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1752506213; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p0zCSwLeyGaWwD67hNeIU4ApHxfUm2so5D3bGdJtdMY=; b=BfwhpM4xYGgh7zh/o/ve31KUtw40u4LaaSdbuWGqAD0q+C/9F2krxPO+wSo660kN54Kqtk zt4cv2PFR2g6Tz305n3BEr0fMu69S0e9FPLsL+f7Nu0EAzjv9eiDHm+wkiIE7mn9fBxpYv DuzgEljtA0pQcOHIEdzekReVZQ15ztU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1752506213; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p0zCSwLeyGaWwD67hNeIU4ApHxfUm2so5D3bGdJtdMY=; b=5XAJ7MkjteMeLcPT8X+3QhMOPR0zZwlJlB8yj1pl4i6fH7z55WAbLJHw8gFC/MHA6iMB1l zeUup+Q9Zk9NuzDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1752506212; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p0zCSwLeyGaWwD67hNeIU4ApHxfUm2so5D3bGdJtdMY=; b=kdkptHNXFgS7qPrTY1iTEf4WN3PtWgQ6At923ISkD4TrySkiTO3X9Lvh/kWEtogwmqqi+Y WWpQewqmMQbX7gPpINvE3iVk1nUfmwnqRWDKNrzJLh8PYPWIIbM8FllfKekDbrQNTPjGti ZpOIEl7GdSiDEwa72xE4grZTXP07pio= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1752506212; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p0zCSwLeyGaWwD67hNeIU4ApHxfUm2so5D3bGdJtdMY=; b=75D590BjlWc9Hyb81ViH55LSpLmw/TYMOisroang9xClZ0kxqdiIQgLKb3F+YE/Yrl8kqa QayeGRJEu/MaCqDg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 28E6113A57; Mon, 14 Jul 2025 15:16:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id x9/3CWQfdWj2MQAAD6G6ig (envelope-from ); Mon, 14 Jul 2025 15:16:52 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id C26EDA0998; Mon, 14 Jul 2025 17:16:51 +0200 (CEST) Date: Mon, 14 Jul 2025 17:16:51 +0200 From: Jan Kara To: Roman Gushchin Cc: Andrew Morton , Jan Kara , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Liu Shixin Subject: Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing Message-ID: References: <20250710195232.124790-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250710195232.124790-1-roman.gushchin@linux.dev> X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Queue-Id: 8C5B140004 X-Rspamd-Server: rspam06 X-Stat-Signature: ze7n4b5bmtr3dtu3kun1i98box3bwo7w X-HE-Tag: 1752506216-124081 X-HE-Meta: U2FsdGVkX19n2d4sSYGmKmZItDdJQPL39NstI8XmRhah7VsgbxX35ttn29lHHD37t91ZfSHmvgFl53snDd5TNr9S8TSN16razyxC+7/VKSXeQYNEko2SROAuLoiNJmDoKsDiMOvyFBCNhZwP6QQksYUz7hukz784WJ+K6xh3ZIEA++0i8CMRHsnw/NqJ0Egq9Fs8SeFeOU+cozZMheTgIBclnFOUUNP5p1lUnvO4GBk1QTw539sF/d7mvPLB7hnkzr4S2l8yqQ6ZW1ki4hKsNC5ASuZbmp31OuZjUMRVBgO0383u/EQU+FkuvozpVVgsXVvyVPjCYgSwUDIJbQBBSVPdBgeiUZbPtHbg1FS8hs+GY5OwHVvI4WcrvOG6DA3FzSCdGL/tkiJ5kRVwWgSl2bkgdfItYhrX6fLELNT+kSymdFeUxFugLjY/Oo203ww0FocdoaiSOR1qK80EKy+ksjljJZ6hOjxnTFe8QMY48eHq2dJs9olP59JaHPvkPDR+UC9ofblAKrICRiKw9+9ZDXvGR0UlbiwIOCgtgv+f6tx8Re/tE8Yl3wdPfHRS2ik3GcybRsULg8YPYHUFNb3wORyfTyLRtBKK2hgO9xmvl+W7DFVt4mobkHOrts3yI1YalVwTmgm3uQyMhiFOSLsYruldDtFWMvwu2UKFqZKO2jlNn1a+A4e/5xzET6qwy03WktFjp3z6zbZOQHHnlaWv+NhaNVD/KnSBGZC5REv0nHJO28yalq8+p9p0Ze5pWDU4wfAoxJ+3++D6KeNNAXovQ0As9D1JlcITNgPP4dEwZOOK/ItWjwnLet7wIFpn/06OlwU7w+cBen0HOvFHylPK8lwkaECoALy+okp+PR+aOLZK/RNU6sPHy3Uea/SJgWLXlz/yTuZ7qto3eYhaJr0ItsGGBrHe2hx1+D87IbCkISFOHbM2e9C4Aitg8Tp9Wr2s16dE9V3P+4hGPpxwEnI CtgPcZgR QFSUV9CL29sjBFXtnShSeoVVtld1Gke9b9ygwb80YC/aJSu8InFFZPCpp95AOm8ymVflaSNw0qtaO8oQJgLUSRDf0b79s0r4BooWvNEja3Yzy8WOzQ7kFX0b1MVucBtkm/KedWX+neZPIR5wNdnR6tAGJisKjN0hd37KdsdmsyQ2CTwwvWIfHtnh6mmd4V+InWLaRloH/xTuD12dxI05N8hhbFcju3lUNUF8giCGyaX6C2AOblNSuKbU4UZ+c2bKtGz+7vj3zQd7an5MiTXMStIVbd2lFmDD+BvFFQqAHHnq+FhGGQtgit1K6oXgCFpsj/qlpoFw+VG7fRvX9k1Fd0atUWeUOcRJzeIH45jtcnNjDy5gSL1OyQfyH62eJM85o8KqG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu 10-07-25 12:52:32, Roman Gushchin wrote: > We've noticed in production that under a very heavy memory pressure > the readahead behavior becomes unstable causing spikes in memory > pressure and CPU contention on zone locks. > > The current mmap_miss heuristics considers minor pagefaults as a > good reason to decrease mmap_miss and conditionally start async > readahead. This creates a vicious cycle: asynchronous readahead > loads more pages, which in turn causes more minor pagefaults. > This problem is especially pronounced when multiple threads of > an application fault on consecutive pages of an evicted executable, > aggressively lowering the mmap_miss counter and preventing readahead > from being disabled. I think you're talking about filemap_map_pages() logic of handling mmap_miss. It would be nice to mention it in the changelog. There's one thing that doesn't quite make sense to me: When there's memory pressure, I'd expect the pages to be reclaimed from memory and not just unmapped. Also given your solution uses !uptodate folios suggests the pages were actually fully reclaimed and the problem really is that filemap_map_pages() treats as minor page fault (i.e., cache hit) what is in fact a major page fault (i.e., cache miss)? Actually, now that I digged deeper I've remembered that based on Liu Shixin's report (https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/) which sounds a lot like what you're reporting, we have eventually merged his fixes (ended up as commits 0fd44ab213bc ("mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM"), 5c46d5319bde ("mm/filemap: don't decrease mmap_miss when folio has workingset flag")). Did you test a kernel with these fixes (6.10 or later)? In particular after these fixes the !folio_test_workingset() check in filemap_map_folio_range() and filemap_map_order0_folio() should make sure we don't decrease mmap_miss when faulting fresh pages. Or was in your case page evicted so long ago that workingset bit is already clear? Once we better understand the situation, let me also mention that I have two patches which I originally proposed to fix Liu's problems. They didn't quite fix them so his patches got merged in the end but the problems described there are still somewhat valid: mm/readahead: Improve page readaround miss detection filemap_map_pages() decreases ra->mmap_miss for every page it maps. This however overestimates number of real cache hits because we have no idea whether the application will use the pages we map or not. This is problematic in particular in memory constrained situations where we think we have great readahead success rate although in fact we are just trashing page cache & disk. Change filemap_map_pages() to count only success of mapping the page we are faulting in. This should be actually enough to keep mmap_miss close to 0 for workloads doing sequential reads because filemap_map_pages() does not map page with readahead flag and thus these are going to contribute to decreasing the mmap_miss counter. Fixes: f1820361f83d ("mm: implement ->map_pages for page cache") - mm/readahead: Fix readahead miss detection with FAULT_FLAG_RETRY_NOWAIT When the page fault happens with FAULT_FLAG_RETRY_NOWAIT (which is common) we will bail out of the page fault after issuing reads and retry the fault. That will then find the created pages in filemap_map_pages() and hence will be treated as cache hit canceling out the cache miss in do_sync_mmap_readahead(). Increment mmap_miss by two in do_sync_mmap_readahead() in case FAULT_FLAG_RETRY_NOWAIT is set to account for the following expected hit. If the page gets evicted even before we manage to retry the fault, we are under so heavy memory pressure that increasing mmap_miss by two is fine. Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transfer") In particular the second problem described could still lead to mmap_miss not growing as fast as it should so maybe it would be worth reviving it. Honza -- Jan Kara SUSE Labs, CR