From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BFAED12D53 for ; Sun, 10 Nov 2024 21:08:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82CBA6B0082; Sun, 10 Nov 2024 16:08:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 78DDE6B0083; Sun, 10 Nov 2024 16:08:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 655586B0085; Sun, 10 Nov 2024 16:08:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 418AF6B0082 for ; Sun, 10 Nov 2024 16:08:54 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E6B4CAD5B5 for ; Sun, 10 Nov 2024 21:08:53 +0000 (UTC) X-FDA: 82771423308.27.73A3198 Received: from mail-oo1-f53.google.com (mail-oo1-f53.google.com [209.85.161.53]) by imf07.hostedemail.com (Postfix) with ESMTP id 61D4040011 for ; Sun, 10 Nov 2024 21:07:56 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=unESgbBE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.161.53 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731272758; a=rsa-sha256; cv=none; b=rkN2cypCHXwrzNnk+u7hYlnOxeQ6PLIveZ7/RsAdof22q1MUNGyymKF9SCiQo+aDdhNw47 bgEMfeSnNHSFhIhDMVSBvw1HmGxkNZy94XEAEF9Gsz7tr7yn8D+4oAhzlo2s1/gO0gTxkY j0MmrHov7+kt3un7JnfD+tajSB3qnhc= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=unESgbBE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.161.53 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731272758; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WMht+vxaFjz9axh0ZalC7+1ISXKbvuYgeiAVju4o7E0=; b=cJoAfILbKifJQDuPjNInASc2iRVTDNLBwUgwq+5Z0RbWB8TFrZ09ZF4Yf6sef342SKVMH9 kcZqPtJQ3zsyJcl8d+4hmKFBrlRXyOuLwm3jocZ+b44d60maXtsZu/bceG808DFpuQjVw3 qU4zMSs/BY8RnILInx6zRXOIBDQ8o10= Received: by mail-oo1-f53.google.com with SMTP id 006d021491bc7-5ee53b30470so1422306eaf.3 for ; Sun, 10 Nov 2024 13:08:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731272931; x=1731877731; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=WMht+vxaFjz9axh0ZalC7+1ISXKbvuYgeiAVju4o7E0=; b=unESgbBEphFkYQafGLBz9TW1IWZowN9fouA27io+Fx5gFHKToVZ2cLFzZ6pPM1TT3a 1sDbbHDzy5qNJUERbDAw9a9xkiBvM8emJBi4AfywZ6NNkpsW37R8cQ5/PWW613jijraL o0Ghi0HR3DMc2/IZ2B6ajU6hmQLZ/D0+sFYcYbnH9CxHAOOD25KFxHHjIl2tOboHvDW3 6A8Yg45d9Wwv8vFiMsZKFnxfvJLTwDE9EpfN1BtqwrAdc9yHcS0eicW9ADKvYWHowmEi XJ8D4xvbKeCoph3H4WYyquS8PFi53L2hUPBnHg828xCi7+cFhpqAlKBM5yftwdHQeeJn zLvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731272931; x=1731877731; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WMht+vxaFjz9axh0ZalC7+1ISXKbvuYgeiAVju4o7E0=; b=fou+J6Le+88Zn5ic4PPxYtur08vRk0K9HgbRcAE+X9Cv7kdyD/eBkRdw2d8hr6KlSP uWP/KCHZYqK3tdcnKXKeH0jiaY/jETZuPED/TyQUQxS1gs+4sxPFmEijUy19T+HapT63 zbxeJfqrchbmgm7BePN5WuUb9ssllFQr0F64sI9nb7+cuBh4e+LdZtpFLDurqbgJpF2W dgTxLrs5/AQJM66Chh+7JF+Z5I6rLQo/7mJjNzw4cRxdn73V2BzB1VEihFjJyN1opwc5 W55SfJOi/GB7yUNPk/jrMBWsLBW5tkt2svOpgzHVmXKCTpH181Ya2GWYMu+6QmQyrM1L 8w2Q== X-Forwarded-Encrypted: i=1; AJvYcCUxV6JRubFJmfKpiwIq4MWi7ffJEBEeARxX7jzF6JrQ/PGz6kwReCWNeuPanNmdFzjIOWt0M1rglw==@kvack.org X-Gm-Message-State: AOJu0YwbxqVQ5YingGJ/MEkiW3AEvaZwqnSXInXQ6BP1n8kXJUDtg5q1 EiP3kvrjrnhbiy6iIkoZjV248INB1cSwZoO+xyymvcfnkxgYsDdaFFwK7FzslA== X-Google-Smtp-Source: AGHT+IGFj8eLtBd7Jy5XaqUUsdZIzcV0pQhujMhDeP/QPwnZPS3lmAbgCtfgoEh5J5pjLxtyPfKjQA== X-Received: by 2002:a05:6820:2d0b:b0:5ec:c22c:72db with SMTP id 006d021491bc7-5ee57bba7b6mr7528549eaf.2.1731272931003; Sun, 10 Nov 2024 13:08:51 -0800 (PST) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5ee495275f0sm1668915eaf.25.2024.11.10.13.08.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Nov 2024 13:08:49 -0800 (PST) Date: Sun, 10 Nov 2024 13:08:37 -0800 (PST) From: Hugh Dickins To: Zi Yan cc: Hugh Dickins , Andrew Morton , Usama Arif , Yang Shi , Wei Yang , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Johannes Weiner , Baolin Wang , Barry Song , Kefeng Wang , Ryan Roberts , Nhat Pham , Chris Li , Shakeel Butt , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH hotfix v2 1/2] mm/thp: fix deferred split queue not partially_mapped In-Reply-To: Message-ID: <6fcaaa72-4ef6-ebda-cf37-b6f49874d966@google.com> References: <81e34a8b-113a-0701-740e-2135c97eb1d7@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Stat-Signature: p5kk1ds6feegrp3ox6og8s8zujzfysco X-Rspamd-Queue-Id: 61D4040011 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1731272876-729984 X-HE-Meta: U2FsdGVkX1/IBjmpze9QTXeSZaiZyokAUoT4ySnY4TrXmi6XrgVSEWnJOrTLhmzd0Vx2q6r2iU4vbHQXDBb1oEy8BIr17yq+qark64p++8b7b/kd7s86CeG7M7gI0PfT9kbmTCnvOjBnjRe6zWdW9SGwYK76mE76YJv3cHtnBZYYLWvwW/sUt9j0px9gH5KZ6Gwv2DriPe545GaVHexwpnTAy2ltPXQa2fj4Vs61IGFJ2tOOsOXsI6IJdDSFVYVX+1wjrdyhlb/pr2OchEl8XCI9+NXRQLKwwc8kgkkUw+v2hCTf7kxF8NiiDWsvHIMjI3dVfXyQ/93Od/g4dDhLz6b1ju7q/EWPGF5tzAGRekUnrPARRKRhxfj1NaNUPCZTp1bVSoSVrPGlxCGgzQp7b8uQjUC8FXZPBuKyY/OHe/Yw8ki9Emzj2UxIy0rCBhEWncVT9qn3NdIjxOgpDnA7ryhKtn0AXDJaugC0G/+bK5E2S3tXWgLoSdJ+FhB9ZCtW2AoDKnBm6eQ4/GxjqoNm+hlQlFbQBYVLHk4To95SGWrHCNLwRH60DzrZf1FLBXIdzytNLyglCGO1ZrpOreTFBdTUfzDCrxrB6zTRAhlZUxe6fXk4hRuBZukxulZ0cWKJBXK/qxeNWW6tCJ+AVvl5ZrWSKaJxpD4uQB3RNQzTLVpn3Q+shOytuiAGSESy55zlaiwNdQD1aYFkVLzkEAWN8GDdXBFXvmOcGBDxrOLb0jhvuSEemkfs5Fq09Gk3wbCbxeefluF95nrpfcFRDwjKDJyBrIHQW8tImZFNiOGecdZNboZZgj3HHDG8t9naJs4M0ascUwpgCRWpxWVZMey88yoBKoGT0GGVHe9KHzDnbP6VDwWJP2KCLxvVQvN4yjd0IpYfJiBG/I7UguUmGM0wLZxRRNx0Etmjy6z6+soWrnIvQ15IdvGAFEdqvMZ4hrSwCGoFTn3CHBA+R07qh+6 9HkoDHs0 vxeUOc2wuERJ4mQMVeziyezXXiJEoI0YAOF17toWo26uY3MkOBHZmhoD2NFwN4l6CYqyC3wDBKOwjXwcveXrPBfs234WoOIH/fMoJgiCoGVeY+zmW2bvt2lDandgHS+28pRCckc5R4blPMzQzz0Fs4h3OcyIa8Of+Bl/QX+zOGF0+RkKZhmD4sL1Q5Jj5C9jPhxfGk+K5ShwdJClJQD7cQfymSohYjCUBqKTraqkpduB64BEhAM8mB16WlC3RypsXnjuzdZEmyVjNWk5fh0XGkAQXN+QETwwmmKZYkeNf75BSYNw0+oXWwrvcpBZ3dBL/He2Elf4AyehNDEkrBuhOnaHnoSQlmVR7ckamuJH8WXnkaL3SrIUF92nGRcQBqBJ0BGlWwNhfnasHRPcdNVU2JS30KN8HVp9mQ505tFTPJTANzY1n/SM1Se1z9E5ZxlBMaf+IOmsSfAM7/Vf+bbh43qQbHuNHQKk8+VYscdYIrN54oG0YEotHHSKRCK3j1hRRZQI0MxBBaUym3hxyfqB1p3i5uSmaJWyGp2F/RDT9PfjSTxjkcikSiofTsA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, 27 Oct 2024, Zi Yan wrote: > On 27 Oct 2024, at 15:59, Hugh Dickins wrote: > > > Recent changes are putting more pressure on THP deferred split queues: > > under load revealing long-standing races, causing list_del corruptions, > > "Bad page state"s and worse (I keep BUGs in both of those, so usually > > don't get to see how badly they end up without). The relevant recent > > changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, > > improved swap allocation, and underused THP splitting. > > > > The new unlocked list_del_init() in deferred_split_scan() is buggy. > > I gave bad advice, it looks plausible since that's a local on-stack > > list, but the fact is that it can race with a third party freeing or > > migrating the preceding folio (properly unqueueing it with refcount 0 > > while holding split_queue_lock), thereby corrupting the list linkage. > > > > The obvious answer would be to take split_queue_lock there: but it has > > a long history of contention, so I'm reluctant to add to that. Instead, > > make sure that there is always one safe (raised refcount) folio before, > > by delaying its folio_put(). (And of course I was wrong to suggest > > updating split_queue_len without the lock: leave that until the splice.) > > > > And remove two over-eager partially_mapped checks, restoring those tests > > to how they were before: if uncharge_folio() or free_tail_page_prepare() > > finds _deferred_list non-empty, it's in trouble whether or not that folio > > is partially_mapped (and the flag was already cleared in the latter case). > > > > Fixes: dafff3f4c850 ("mm: split underused THPs") > > Signed-off-by: Hugh Dickins > > Acked-by: Usama Arif > > Reviewed-by: David Hildenbrand > > Reviewed-by: Baolin Wang > > --- > > Based on 6.12-rc4 > > v2: added ack and reviewed-bys > > Acked-by: Zi Yan Thank you: but I owe you and Andrew and everyone else an apology. Those 1/2 and 2/2, which have gone in to Linus's tree this morning (thank you all), have still left a once-a-week list_del corruption on the deferred split queue: which I've been agonizing over then giving up on repeatedly for three weeks now (last weekend's seemed to get fixed by applying a missed microcode update; but then another crash this Friday). Sorry if the timing makes it look as if I'm trying to game the system in some way, but it was only yesterday evening that at last I understood the reason for (I hope the last of) these deferred split queue corruptions; and the fix turns out to be to this patch. Perhaps if I'd worked out why sooner, I'd have just switched to proper spinlocking as you asked; but now that I do understand, I still prefer to continue this much more tested way. My ability to reproduce these crashes seems to be one or two orders of magnitude weaker than it used to be (generally a good thing I suppose: but frustrating when I want to test), and there's no way I can satisfy myself that the crashes are completely eliminated in a single week. But I have been successful in adding temporary debug code, to check that the preceding "safe" folio on the local list has non-0 refcount: that check fails much sooner than reaching corruption, and I've run it often enough now to confirm that the fix does fix that. Fix patch follows... as you'll see, it's very obvious *in retrospect*. Hugh