From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E6BE0C5475B
	for <linux-mm@archiver.kernel.org>; Mon, 11 Mar 2024 17:49:18 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3E9308D0001; Mon, 11 Mar 2024 13:49:18 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 399566B00EA; Mon, 11 Mar 2024 13:49:18 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 260FA8D0001; Mon, 11 Mar 2024 13:49:18 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 1283D6B00E9
	for <linux-mm@kvack.org>; Mon, 11 Mar 2024 13:49:18 -0400 (EDT)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id CB49080D22
	for <linux-mm@kvack.org>; Mon, 11 Mar 2024 17:49:17 +0000 (UTC)
X-FDA: 81885494754.18.8E7292A
Received: from casper.infradead.org (casper.infradead.org [90.155.50.34])
	by imf08.hostedemail.com (Postfix) with ESMTP id A524F16000F
	for <linux-mm@kvack.org>; Mon, 11 Mar 2024 17:49:13 +0000 (UTC)
Authentication-Results: imf08.hostedemail.com;
	dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EQ7NWk1L;
	spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org;
	dmarc=none
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1710179355;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=3sclfjk7uHjKT6T6Tpe26askdUlAuFe/SQW97Cm6HUc=;
	b=IzyLGXvtEZR4RzR6P6l3++rGJRffcMyplhLe1sA7ZvtsFowBhhIq4jUovATGHFXmKXKDXV
	jI++m6dLKOw54lNmDnLY1zSP6yCvly98OhkiEfzzmf3jBvHFpvHkdD50wO0I/NaA9locrC
	d2RZMnSM6Jx6MxFKlZwie4DOF10wzjI=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710179355; a=rsa-sha256;
	cv=none;
	b=Ul6aLIkypI+aIY+BXUVjemcs/40qsuTiOmGanLepSNrR1sSGy+qN6PsEwSxqzqyHWJSpnB
	6Ww6DkrIqcf6pu8PC8mgOCRx6DXYyWWfu4jdpANnq4g9M20j2+0QeBxiOUXHXkkcNb9VU4
	MracijIS9U7cOhiY8wiySCgTpXEmUXc=
ARC-Authentication-Results: i=1;
	imf08.hostedemail.com;
	dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EQ7NWk1L;
	spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org;
	dmarc=none
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version:
	References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description;
	bh=3sclfjk7uHjKT6T6Tpe26askdUlAuFe/SQW97Cm6HUc=; b=EQ7NWk1Lx5TiU/d96XA8//vlbS
	40xq4rg69NT6fyd7M+Le/np4GyjYDA/I0ISdXpEjDtnDgGjThGYm77d5xkRQbuEqwiIm69xS1gYmW
	0mazVXuKtvpiKRjBT79/KHLme5f9jQA1dkdY9Nt1+tVSidTsBoiz3yK/klgTRcpPtOQ1R/Ot7aa9B
	ZCrs/TYrVsCtwWc/DuNmx3zZnFqTqEYULbcokMy4MZVgves4U1xfnL5zYWu0X5qwvzqnkG7R83NiX
	ty8TByl7dJpttsKiyAZJ+DuVi7ymi85Y3SUt1Ov5Ydy73y5wL4ZfsQG6PMcJNyocynFDALmKqhl4h
	GvFL53SA==;
Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux))
	id 1rjjlt-000000018nW-2ZFx;
	Mon, 11 Mar 2024 17:49:09 +0000
Date: Mon, 11 Mar 2024 17:49:09 +0000
From: Matthew Wilcox <willy@infradead.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch
 processed
Message-ID: <Ze9EFdFLXQEUVtKl@casper.infradead.org>
References: <Ze2VbmOsb0s5fT4m@casper.infradead.org>
 <02e820c2-8a1d-42cc-954b-f9e041c4417a@arm.com>
 <9dfd3b3b-4733-4c7c-b09c-5e6388531e49@arm.com>
 <Ze4cQtxRWeAJI_Kh@casper.infradead.org>
 <Ze4rsB6s3hKK_tyC@casper.infradead.org>
 <db9cd982-5ce5-4709-8999-fd81e3d437fd@arm.com>
 <Ze74cbay57-0pEGw@casper.infradead.org>
 <79dad067-1d26-4867-8eb1-941277b9a77b@arm.com>
 <Ze8oMcUrmWWzt9Qd@casper.infradead.org>
 <7177da75-0158-4227-98cf-ca93a73389a0@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <7177da75-0158-4227-98cf-ca93a73389a0@arm.com>
X-Stat-Signature: yy14d7noxt9qjhcw94jncaupncohdp3i
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: A524F16000F
X-Rspam-User: 
X-HE-Tag: 1710179353-210179
X-HE-Meta: U2FsdGVkX19w1a+V0gVJcSMgNwRJSXqo5aX4LQ3PAourQPht4P9aMl1RpQJrTSJHL2uSauo0eqo2lF5ES1UhT97A5t+YuaZACtyJS6zdlezsHxIQsolodG1eJKA82jypYl8sbJ0ioFuc3N3YXMDutS9QyzhR4EUcA+kG8Tx9kV/9lCBzJLEhV8EtuzvCXW8dbGEiOHfCjknV5PJysHQMUWrZ5TYA3rfmXisJK3tfqodrpF46oJAW4De51rNVdH59ZUWwKMjOgQyKrUGFDqCTGYuTo2c60ji08man1SZpOSKkhWwibBfQ18IM5RjlNAJIiTJrwAYoF6y0ZjAyhGtaVhZt9mkNz/HnQQmn1vt19ppyPTloqAkRKh4nf4f5e287lfIja6esl5hGXV4ww7Ytt2eTsNxk5X5xg8ivWIGdXR3HbaIcC7gL7hVapsbtR0RZ8X3qGKmk/AXLtOObhl9wlXp6YWhv9/VLWkGo/RneTa3zEPCh1u8kJe8sPcNirtB6E7TVJv3fWBKabkCtG5WwMIhF0whVhkjNEz5j0Eb/jwrqhXLRykVIYdYBw+5sCM0xCjYwMJ01oldjM0SImCSp6uWXR4ekxwNgnuDag354zKLoor4V3JtgN5NnGbTF7rzZ0ISJSJRRrk35TsYK0dK8XRde9eTmMqbWcBwHwoRFEvJX+ep3G7mMLtMVwb1dKglgGiu40Li3KUt5Lg8CWKdE3TRR7FQUKKK1b+88BnQd8Uit1VqUXmoMcVmfx2xsbJP/5RdamH+W10aw/9PBAwjUpb4hqkBK2qwKE2/rr9FLyTK5wIOQbRi1EcedttEx/gorWdjOqHJC6bf4ki8cOzNL7YqqjC5jirlmbcNfXp1FQ/PwQ0+TfQWZqXFjWe2zzqw9EljFvMvwDaWOyCeVkctktfe0VsNqn5by+fhJIWFsyTlveUv2LPVe78oRGWyiCF/qBQCDbB7IE/80kAOl3df
 yFwvpjCK
 si8GT4KcC9Qxm2WYfxGak4Ndvt1Sql1KgStVGnUo8QpLTM3qx4qR0t0wqRz+/AI5s5PyP0sjGdNv3tek3pIrx8V4/zPZn8WHziBHNoFqXRstfngNypcp13/nqiQ4UYG90ILSmNKxziRxmLiU=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, Mar 11, 2024 at 04:14:06PM +0000, Ryan Roberts wrote:
> > With this patch, it took about 700 seconds in my xfstests run to find
> > the one in shrink_folio_list().  It took 3260 seconds to catch the one
> > in move_folios_to_lru().
> 
> 54 hours?? That's before I even reported that we still had a bug! Either you're
> anticipating my every move, or you have a lot of stuff running in parallel :)

One hour is 3600 seconds ;-)  So more like 53 minutes.

> > That one is only probably OK.  We usually manage to split a folio before
> > we get to this point, so we should remove it from the deferred list then.
> > You'd need to buy a lottery ticket if you managed to get hwpoison in a
> > folio that was deferred split ...
> 
> OK, but "probably"? Given hwpoison is surely not a hot path, why not just be
> safe and call folio_undo_large_rmappable()?

Yes, I've added it; just commenting on the likelihood of being able to
hit it in testing, even with aggressive error injection.

> >> But taking a step back, I wonder whether the charge() and uncharge() functions
> >> should really be checking to see if the folio is on a deferred split list and if
> >> so, then move the folio to the corect list?
> > 
> > I don't think that's the right thing to do.  All of these places which
> > uncharge a folio are part of the freeing path, so we always want it
> > removed, not moved to a different deferred list.
> 
> Well I'm just thinking about trying to be robust. Clearly you would prefer that
> folio_undo_large_rmappable() has been called before uncharge(), then uncharge()
> notices that there is nothing on the deferred list (and doesn't take the lock).
> But if it's not, is it better to randomly crash (costing best part of a week to
> debug) or move the folio to the right list?

Neither ;-)  The right option is to include the assertion that the
deferred list is empty.  That way we get to see the backtrace of whoever
forgot to take the folio off the deferred list.

> Alternatively, can we refactor so that there aren't 9 separate uncharge() call
> sites. Those sites are all trying to free the folio so is there a way to better
> refactor that into a single place (I guess the argument for the current
> arrangement is reducing the number of times that we have to iterate through the
> batch?). Then we only have to get it right once.

I have been wondering about a better way to do it.  I've also been
looking a bit askance at put_pages_list() which doesn't do memcg
uncharging ...

> > 
> > But what about mem_cgroup_move_account()?  Looks like that's memcg v1
> > only?  Should still be fixed though ...
> 
> Right.
> 
> And what about the first bug you found with the local list corruption? I'm not
> running with that fix so its obviously not a problem here. But I still think its
> a bug that we should fix? list_for_each_entry_safe() isn't safe against
> *concurrent* list modification, right?

I've been thinking about that too.  I decided that the local list is
actually protected by the lock after all.  It's a bit fiddly to prove,
but:

1. We have a reference on every folio ahead on the list (not behind us,
but see below)
2. If split_folio succeeds, it takes the lock that would protect the
list we are on.
3. If it doesn't, and folio_put() turns out to be the last reference,
__folio_put_large -> destroy_large_folio -> folio_undo_large_rmappable
takes the lock that protects the list we would be on.

So we can analyse this loop as:

	list_for_each_entry_safe(folio, next, &list, _deferred_list) {
		if (random() & 1)
			continue;
		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
		list_del_init(&folio->_deferred_list);
		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
	}

We're guaranteed that 'next' is a valid folio because we hold a refcount
on it.  Anything left on the list between &list and next may have been
removed from the list, but we don't look at those entries until after
we take the split_queue_lock again to do the list_splice_tail().

I'm too scared to write a loop like this, but I don't think it contains
a bug.