From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E36FC5478C for ; Fri, 1 Mar 2024 05:55:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 735676B0088; Fri, 1 Mar 2024 00:55:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E4F26B0089; Fri, 1 Mar 2024 00:55:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55EAE6B008A; Fri, 1 Mar 2024 00:55:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 47C416B0088 for ; Fri, 1 Mar 2024 00:55:02 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 075241404AB for ; Fri, 1 Mar 2024 05:55:02 +0000 (UTC) X-FDA: 81847406844.05.A3A6FB4 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf08.hostedemail.com (Postfix) with ESMTP id 1EFD4160008 for ; Fri, 1 Mar 2024 05:54:59 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=Ewi3IIQX; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf08.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709272500; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SGDn7ZK32B8h9n8KioFlM4UEP+LNgayWnltJM0K/kfk=; b=KnBUUnY9dtgiNUlZvzjBtjLu955UXz5/xZjb2xZZOCEehTrtjvHPhCiRU4RmclYgD7fGbm ML7btsDV1YogW8CiiKGFP6xYV3U58zS7D8yVmCmHKSTPEqqyA8Ot1QVH7t0rnnqQGlo1oF 7GIyBNSF6jbJ07UjVsfzpb/pfulyl5U= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=Ewi3IIQX; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf08.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709272500; a=rsa-sha256; cv=none; b=vAr7fOrgC4rA2BBF/LWKYfmU4e4GWpIEWjbYpBctbrnS7C6uSz07kpLBjOnb2eA1naY5vo uQXZvXJQBOSpR4VYaO6P9AE9lIngtWvWM1t0Mx0j5fwB+yU88rvV3v0osml+a4DpEUDRsQ I2PNFzfMeXz8/iFt8MYOAZ3jq14OCeY= Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6e5a232fe80so844533b3a.0 for ; Thu, 29 Feb 2024 21:54:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1709272499; x=1709877299; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=SGDn7ZK32B8h9n8KioFlM4UEP+LNgayWnltJM0K/kfk=; b=Ewi3IIQXuR0sTrr34Egu2/sBwLwApkIio+FNbMmKeMg+1TRk5WMqGolcwRMWdp+5G1 h07T/GxFDskGu7sq45Vy0RvYJuom6uU6WRMGeT/Dyq0QG8v8sg2jd5p2NEB10hKNlRVq IvG+Mybok50UiXJKOL+zNpkIPx4DoduKOSnI7dCOv8fx5Um7ec2ho73+4/JIBKK2GKtP a0HpkcSlUwguJjntrHRpfD85dkzI9c/5zeUjGn3UlM7P80NbNLK2teQIaqv0LogBbf87 uMYjRrmT9H/QtercFD4sYycllc6Z+n+hBQEC/Vn3UT1tdfzOzm3s+Opqa/SynpXfPYqR MqFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709272499; x=1709877299; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SGDn7ZK32B8h9n8KioFlM4UEP+LNgayWnltJM0K/kfk=; b=OLnp6pTbWIvvcnT/pQJi5hu+g2yPfaoPIVoY4quCj1IAzMTz2Xllphxe8hmGT28T3S o4+FB4+NMYONSoFoVb+3/Y2wqnKcaYtx+JauiVK25hQyZ0LYsyQDkCCQCG8Qx75DUh5C 8uaUDbsQhsoiuA/ed8cuxR/6JUcGPnawEs/CCcMTEOM080+8XkWCYwxY1QqR8pZOrhDN FdAYth69XQcwIW3YhpmlaETVhG/OA++lWii6Ga8x9UY3xqAasFJeSWWtC1MQt6S5IFBy RGscaCfspmoDVwF275xUCAxU/refy1vYdjoywS5TqdEKpRvkiolUwEWZ9I8j9HzfAONq nVWg== X-Forwarded-Encrypted: i=1; AJvYcCV3+qHMUbnaikknVD0X43Wc9vPnxLSh9oleR7yT1PEz9/kdLsDStakkV5eTT2acWzdTIeQKQuUBDd8CS1j9xQsEdWQ= X-Gm-Message-State: AOJu0YwiqrdvJudFaCvYFaai1k9UPLfYG65UF3axCl3aUIias9xbgEGW CEJcEKcJe1FUBPmIzNzXGdsMQ5Y62VjLi/6x2IKQLnuAcbNa7ptuH2jwNCs0hHg= X-Google-Smtp-Source: AGHT+IGtgbu6rCNd+7STdgjPsKcgbdogJOVZR6qXc8EGJrqH8DYFbgxEUQRJXhe8JQdXD2vD+wv3Sw== X-Received: by 2002:a05:6a00:9097:b0:6e4:908c:613b with SMTP id jo23-20020a056a00909700b006e4908c613bmr1851056pfb.8.1709272498722; Thu, 29 Feb 2024 21:54:58 -0800 (PST) Received: from dread.disaster.area (pa49-181-247-196.pa.nsw.optusnet.com.au. [49.181.247.196]) by smtp.gmail.com with ESMTPSA id du8-20020a056a002b4800b006e559c41679sm2188675pfb.2.2024.02.29.21.54.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 21:54:58 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rfvrD-00DQmA-0Z; Fri, 01 Mar 2024 16:54:55 +1100 Date: Fri, 1 Mar 2024 16:54:55 +1100 From: Dave Chinner To: NeilBrown Cc: Matthew Wilcox , Amir Goldstein , paulmck@kernel.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel , Kent Overstreet , Jan Kara Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU Message-ID: References: <170925937840.24797.2167230750547152404@noble.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <170925937840.24797.2167230750547152404@noble.neil.brown.name> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 1EFD4160008 X-Stat-Signature: rg5xfcwy58yngao9xwkmci4yms797zhn X-Rspam-User: X-HE-Tag: 1709272499-715615 X-HE-Meta: U2FsdGVkX1+CbPveCD6IunTsj/lnMRE3PTT0v5yjVSzpDdCUQo6veDiVZahDJ7COt8lx1DiL6dQb8Rm2+Qegs23R6be41WPXpEX6Wu7YI2GDIk5kd/RowPcKgGA9m7Tjil3wlzbqLMjmnIwiA1uPJtuJ+rh4vBFMjLO9LM0wXHGoIxmlvrWoKvvOLbFUUCdMQ2noRTxtobrforehjpvtBzqxfpI0/TK9he+IXmmAvCk+Zel62YJHMKr2omgxzySoG6UE0RQvPfpp5hwjRsF76Bl02zeXI1iV3flidpHT1pNsgS6AYx9D97pDjb33WN7gh9Fyr97KuylSwI6bVGcoyyFbvN1byI8h77NkUiaWVTylUudZ34VceYKC30k4OqoEzGNFeWOZyjjImkb4nfsoc/w/zgbHCnEkrUDzDwNobMuzyUIo1BEowqGR56uWcszOTTo3S4rtyzftnEOjvmmwkil6tYdRVKWMUXl+9PRMQ3noI43ZpdDIhQTVuE1giKi82ANwis9E3lgXoz/erMxy1ll8PAzX2g9rpJkbvobLTHhpDlmHApIePDOJoh7MPxC5dZj8vHSKAb7f6Q2dx2piuJi3Oz4tT/y47X4w9lyl59QWUaqyZmK44++hzTxmTvbw6l2jo/G5jw39hYDt0YjyMMrhtGdapTEkItLAU7RddJNmHeX9V8CWvG3kTXO1sZZKSk/JEyzDryWLQ+80fvdQVm9JemP1aISNEcV0uYKQetoRXksSWcHwm+0rRkbJsBsdHEFcd0dZ/yozmaHUFd8a0U5DTJcaFzrg3uyaThkub+N3cUIN9MRtvZ0Hg60vmrNZDiLHDT748PRUCSihIoh9Ez9ap2z0pk6YSpJfd0LlfjVVavhrtEq9QtrPLiipQxY+VfOZLq7nipszTij6SaJ11BUbX8s+o2r1360Bz76yEGkdKTmAhvn58wHNOfvcq08yHLHOOF/9zJ7Py0IQQJf YXJ1U9z5 l6cU2IzeDeoWO9bPCsq3QWGXyOxkIVJjeCRRQ5M//8n3n4CZiqeQUWprZf0KnFnLbUIoNc1PgWKqmEdTayYa3e6U1uuaS08KMmBn/7/3b6tj9uLtNy9/K7gl8sQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 01, 2024 at 01:16:18PM +1100, NeilBrown wrote: > On Thu, 29 Feb 2024, Matthew Wilcox wrote: > > On Tue, Feb 27, 2024 at 09:19:47PM +0200, Amir Goldstein wrote: > > > On Tue, Feb 27, 2024 at 8:56 PM Paul E. McKenney wrote: > > > > > > > > Hello! > > > > > > > > Recent discussions [1] suggest that greater mutual understanding between > > > > memory reclaim on the one hand and RCU on the other might be in order. > > > > > > > > One possibility would be an open discussion. If it would help, I would > > > > be happy to describe how RCU reacts and responds to heavy load, along with > > > > some ways that RCU's reactions and responses could be enhanced if needed. > > > > > > > > > > Adding fsdevel as this should probably be a cross track session. > > > > Perhaps broaden this slightly. On the THP Cabal call we just had a > > conversation about the requirements on filesystems in the writeback > > path. We currently tell filesystem authors that the entire writeback > > path must avoid allocating memory in order to prevent deadlock (or use > > GFP_MEMALLOC). Is this appropriate? It's a lot of work to assure that > > writing pagecache back will not allocate memory in, eg, the network stack, > > the device driver, and any other layers the write must traverse. > > > > With the removal of ->writepage from vmscan, perhaps we can make > > filesystem authors lives easier by relaxing this requirement as pagecache > > should be cleaned long before we get to reclaiming it. > > > > I don't think there's anything to be done about swapping anon memory. > > We probably don't want to proactively write anon memory to swap, so by > > the time we're in ->swap_rw we really are low on memory. > > > > > > While we are considering revising mm rules, I would really like to > revised the rule that GFP_KERNEL allocations are allowed to fail. > I'm not at all sure that they ever do (except for large allocations - so > maybe we could leave that exception in - or warn if large allocations > are tried without a MAY_FAIL flag). > > Given that GFP_KERNEL can wait, and that the mm can kill off processes > and clear cache to free memory, there should be no case where failure is > needed or when simply waiting will eventually result in success. And if > there is, the machine is a gonner anyway. Yes, please! XFS was designed and implemented on an OS that gave this exact guarantee for kernel allocations back in the early 1990s. Memory allocation simply blocked until it succeeded unless the caller indicated they could handle failure. That's what __GFP_NOFAIL does and XFS is still heavily dependent on this behaviour. And before people scream "but that was 30 years ago, Unix OS code was much simpler", consider that Irix supported machines with hundreds of NUMA nodes, thousands of CPUs, terabytes of memory and petabytes of storage. It had variable size high order pages in the page cache (something we've only just got with folios!), page migration, page compaction, memory and process locality control, filesystem block sizes larger than page size (which we don't have yet!), memory shrinkers for subsystem cache reclaim, page cache dirty throttling to sustained writeback IO rates, etc. Lots of the mm technology from that OS has been re-implemented in Linux in the past two decades, but in several important ways Linux still falls shy of the bar that Irix set a couple of decades ago. One of those is the kernel memory allocation guarantee. > Once upon a time user-space pages could not be ripped out of a process > by the oom killer until the process actually exited, and that meant that > GFP_KERNEL allocations of a process being oom killed should not block > indefinitely in the allocator. I *think* that isn't the case any more. > > Insisting that GFP_KERNEL allocations never returned NULL would allow us > to remove a lot of untested error handling code.... This is the sort of thing I was thinking of in the "remove GFP_NOFS" discussion thread when I said this to Kent: "We need to start designing our code in a way that doesn't require extensive testing to validate it as correct. If the only way to validate new code is correct is via stochastic coverage via error injection, then that is a clear sign we've made poor design choices along the way." https://lore.kernel.org/linux-fsdevel/ZcqWh3OyMGjEsdPz@dread.disaster.area/ If memory allocation doesn't fail by default, then we can remove the vast majority of allocation error handling from the kernel. Make the common case just work - remove the need for all that code to handle failures that is hard to exercise reliably and so are rarely tested. A simple change to make long standing behaviour an actual policy we can rely on means we can remove both code and test matrix overhead - it's a win-win IMO. Cheers, Dave. -- Dave Chinner david@fromorbit.com