From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6BE0C5475B for ; Mon, 11 Mar 2024 17:49:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E9308D0001; Mon, 11 Mar 2024 13:49:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 399566B00EA; Mon, 11 Mar 2024 13:49:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 260FA8D0001; Mon, 11 Mar 2024 13:49:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1283D6B00E9 for ; Mon, 11 Mar 2024 13:49:18 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CB49080D22 for ; Mon, 11 Mar 2024 17:49:17 +0000 (UTC) X-FDA: 81885494754.18.8E7292A Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf08.hostedemail.com (Postfix) with ESMTP id A524F16000F for ; Mon, 11 Mar 2024 17:49:13 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EQ7NWk1L; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710179355; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3sclfjk7uHjKT6T6Tpe26askdUlAuFe/SQW97Cm6HUc=; b=IzyLGXvtEZR4RzR6P6l3++rGJRffcMyplhLe1sA7ZvtsFowBhhIq4jUovATGHFXmKXKDXV jI++m6dLKOw54lNmDnLY1zSP6yCvly98OhkiEfzzmf3jBvHFpvHkdD50wO0I/NaA9locrC d2RZMnSM6Jx6MxFKlZwie4DOF10wzjI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710179355; a=rsa-sha256; cv=none; b=Ul6aLIkypI+aIY+BXUVjemcs/40qsuTiOmGanLepSNrR1sSGy+qN6PsEwSxqzqyHWJSpnB 6Ww6DkrIqcf6pu8PC8mgOCRx6DXYyWWfu4jdpANnq4g9M20j2+0QeBxiOUXHXkkcNb9VU4 MracijIS9U7cOhiY8wiySCgTpXEmUXc= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EQ7NWk1L; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=3sclfjk7uHjKT6T6Tpe26askdUlAuFe/SQW97Cm6HUc=; b=EQ7NWk1Lx5TiU/d96XA8//vlbS 40xq4rg69NT6fyd7M+Le/np4GyjYDA/I0ISdXpEjDtnDgGjThGYm77d5xkRQbuEqwiIm69xS1gYmW 0mazVXuKtvpiKRjBT79/KHLme5f9jQA1dkdY9Nt1+tVSidTsBoiz3yK/klgTRcpPtOQ1R/Ot7aa9B ZCrs/TYrVsCtwWc/DuNmx3zZnFqTqEYULbcokMy4MZVgves4U1xfnL5zYWu0X5qwvzqnkG7R83NiX ty8TByl7dJpttsKiyAZJ+DuVi7ymi85Y3SUt1Ov5Ydy73y5wL4ZfsQG6PMcJNyocynFDALmKqhl4h GvFL53SA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rjjlt-000000018nW-2ZFx; Mon, 11 Mar 2024 17:49:09 +0000 Date: Mon, 11 Mar 2024 17:49:09 +0000 From: Matthew Wilcox To: Ryan Roberts Cc: Andrew Morton , linux-mm@kvack.org Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Message-ID: References: <02e820c2-8a1d-42cc-954b-f9e041c4417a@arm.com> <9dfd3b3b-4733-4c7c-b09c-5e6388531e49@arm.com> <79dad067-1d26-4867-8eb1-941277b9a77b@arm.com> <7177da75-0158-4227-98cf-ca93a73389a0@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7177da75-0158-4227-98cf-ca93a73389a0@arm.com> X-Stat-Signature: yy14d7noxt9qjhcw94jncaupncohdp3i X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A524F16000F X-Rspam-User: X-HE-Tag: 1710179353-210179 X-HE-Meta: U2FsdGVkX19w1a+V0gVJcSMgNwRJSXqo5aX4LQ3PAourQPht4P9aMl1RpQJrTSJHL2uSauo0eqo2lF5ES1UhT97A5t+YuaZACtyJS6zdlezsHxIQsolodG1eJKA82jypYl8sbJ0ioFuc3N3YXMDutS9QyzhR4EUcA+kG8Tx9kV/9lCBzJLEhV8EtuzvCXW8dbGEiOHfCjknV5PJysHQMUWrZ5TYA3rfmXisJK3tfqodrpF46oJAW4De51rNVdH59ZUWwKMjOgQyKrUGFDqCTGYuTo2c60ji08man1SZpOSKkhWwibBfQ18IM5RjlNAJIiTJrwAYoF6y0ZjAyhGtaVhZt9mkNz/HnQQmn1vt19ppyPTloqAkRKh4nf4f5e287lfIja6esl5hGXV4ww7Ytt2eTsNxk5X5xg8ivWIGdXR3HbaIcC7gL7hVapsbtR0RZ8X3qGKmk/AXLtOObhl9wlXp6YWhv9/VLWkGo/RneTa3zEPCh1u8kJe8sPcNirtB6E7TVJv3fWBKabkCtG5WwMIhF0whVhkjNEz5j0Eb/jwrqhXLRykVIYdYBw+5sCM0xCjYwMJ01oldjM0SImCSp6uWXR4ekxwNgnuDag354zKLoor4V3JtgN5NnGbTF7rzZ0ISJSJRRrk35TsYK0dK8XRde9eTmMqbWcBwHwoRFEvJX+ep3G7mMLtMVwb1dKglgGiu40Li3KUt5Lg8CWKdE3TRR7FQUKKK1b+88BnQd8Uit1VqUXmoMcVmfx2xsbJP/5RdamH+W10aw/9PBAwjUpb4hqkBK2qwKE2/rr9FLyTK5wIOQbRi1EcedttEx/gorWdjOqHJC6bf4ki8cOzNL7YqqjC5jirlmbcNfXp1FQ/PwQ0+TfQWZqXFjWe2zzqw9EljFvMvwDaWOyCeVkctktfe0VsNqn5by+fhJIWFsyTlveUv2LPVe78oRGWyiCF/qBQCDbB7IE/80kAOl3df yFwvpjCK si8GT4KcC9Qxm2WYfxGak4Ndvt1Sql1KgStVGnUo8QpLTM3qx4qR0t0wqRz+/AI5s5PyP0sjGdNv3tek3pIrx8V4/zPZn8WHziBHNoFqXRstfngNypcp13/nqiQ4UYG90ILSmNKxziRxmLiU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024 at 04:14:06PM +0000, Ryan Roberts wrote: > > With this patch, it took about 700 seconds in my xfstests run to find > > the one in shrink_folio_list(). It took 3260 seconds to catch the one > > in move_folios_to_lru(). > > 54 hours?? That's before I even reported that we still had a bug! Either you're > anticipating my every move, or you have a lot of stuff running in parallel :) One hour is 3600 seconds ;-) So more like 53 minutes. > > That one is only probably OK. We usually manage to split a folio before > > we get to this point, so we should remove it from the deferred list then. > > You'd need to buy a lottery ticket if you managed to get hwpoison in a > > folio that was deferred split ... > > OK, but "probably"? Given hwpoison is surely not a hot path, why not just be > safe and call folio_undo_large_rmappable()? Yes, I've added it; just commenting on the likelihood of being able to hit it in testing, even with aggressive error injection. > >> But taking a step back, I wonder whether the charge() and uncharge() functions > >> should really be checking to see if the folio is on a deferred split list and if > >> so, then move the folio to the corect list? > > > > I don't think that's the right thing to do. All of these places which > > uncharge a folio are part of the freeing path, so we always want it > > removed, not moved to a different deferred list. > > Well I'm just thinking about trying to be robust. Clearly you would prefer that > folio_undo_large_rmappable() has been called before uncharge(), then uncharge() > notices that there is nothing on the deferred list (and doesn't take the lock). > But if it's not, is it better to randomly crash (costing best part of a week to > debug) or move the folio to the right list? Neither ;-) The right option is to include the assertion that the deferred list is empty. That way we get to see the backtrace of whoever forgot to take the folio off the deferred list. > Alternatively, can we refactor so that there aren't 9 separate uncharge() call > sites. Those sites are all trying to free the folio so is there a way to better > refactor that into a single place (I guess the argument for the current > arrangement is reducing the number of times that we have to iterate through the > batch?). Then we only have to get it right once. I have been wondering about a better way to do it. I've also been looking a bit askance at put_pages_list() which doesn't do memcg uncharging ... > > > > But what about mem_cgroup_move_account()? Looks like that's memcg v1 > > only? Should still be fixed though ... > > Right. > > And what about the first bug you found with the local list corruption? I'm not > running with that fix so its obviously not a problem here. But I still think its > a bug that we should fix? list_for_each_entry_safe() isn't safe against > *concurrent* list modification, right? I've been thinking about that too. I decided that the local list is actually protected by the lock after all. It's a bit fiddly to prove, but: 1. We have a reference on every folio ahead on the list (not behind us, but see below) 2. If split_folio succeeds, it takes the lock that would protect the list we are on. 3. If it doesn't, and folio_put() turns out to be the last reference, __folio_put_large -> destroy_large_folio -> folio_undo_large_rmappable takes the lock that protects the list we would be on. So we can analyse this loop as: list_for_each_entry_safe(folio, next, &list, _deferred_list) { if (random() & 1) continue; spin_lock_irqsave(&ds_queue->split_queue_lock, flags); list_del_init(&folio->_deferred_list); spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); } We're guaranteed that 'next' is a valid folio because we hold a refcount on it. Anything left on the list between &list and next may have been removed from the list, but we don't look at those entries until after we take the split_queue_lock again to do the list_splice_tail(). I'm too scared to write a loop like this, but I don't think it contains a bug.