From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4773DC3ABBC for ; Tue, 6 May 2025 06:45:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D9516B000A; Tue, 6 May 2025 02:45:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 161406B0082; Tue, 6 May 2025 02:45:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1CA66B0085; Tue, 6 May 2025 02:45:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CEACB6B000A for ; Tue, 6 May 2025 02:45:12 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AE112121B01 for ; Tue, 6 May 2025 06:45:13 +0000 (UTC) X-FDA: 83411546106.02.0734BEE Received: from mail-oi1-f180.google.com (mail-oi1-f180.google.com [209.85.167.180]) by imf24.hostedemail.com (Postfix) with ESMTP id D18AA18000C for ; Tue, 6 May 2025 06:45:11 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=aXBKofgM; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.167.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746513911; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pOy7LO18ZOVs0lu6y78gTLGQOe6Je3TlNtuMcY0wQFI=; b=3+eqk7ERxPj+H0HMfHwTDR1j9SDckBxukDJCy3qia5T6NTQEejJZ9SvwSr/mtO026slhLo 78S3THbegrgicfeCI0djDXUOX+jc0CWJQVss/C7yhYiUbJJ6DPTCZRbSpBCzlkQMFILT/M ncoTpan00EOfIaQe4pC7sLVqkc1jo7I= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=aXBKofgM; spf=pass (imf24.hostedemail.com: domain of hughd@google.com designates 209.85.167.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746513911; a=rsa-sha256; cv=none; b=iXHtwIgr1ZGy1oYlgwySLIERRBa9/fnMhmNgi8o5D6Hqhlrw//gSYQ0QIiYlv8ahknI261 tliVp4ulM4N2B6nL5Pq+va8pZ6wkrzYderRBuxerBrujfE5rfByYTxtvxpqxQe26CYDVtY /YL9gj0lAcXHcKp/zNPbnHYuPP5DM4o= Received: by mail-oi1-f180.google.com with SMTP id 5614622812f47-4034118aeb7so949893b6e.0 for ; Mon, 05 May 2025 23:45:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1746513911; x=1747118711; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=pOy7LO18ZOVs0lu6y78gTLGQOe6Je3TlNtuMcY0wQFI=; b=aXBKofgMF9btXSFMQv0h2tNxNmfB6xPw20pzpdwLPZvpVrZ/dzc4Dra3L62NFfxQjh xjkE6PaTunbILYoqHHE7C/xAx7E84tHHiVhuSLCx2qqcXNy6JmTe9GI3THM30GNpb0cG Zmb7IMMoQUXg7iWf/LBVKZ0UtVrODHv25myOL+6plIB4jj4c3Jp4aVpahnwnJWyaSLLv plUSgWOLNuvZjDCio1qD7G46qiFsK1UXgWGH9p7VWbkr99rLW5+ldSdsZDmh8zeqKrQx kNfygJZNi5HDMXKBoExKQioGGnrcOH5/Qs29WSs6Av004V3mhImmchnD47u+MlFr49TM 4zNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746513911; x=1747118711; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pOy7LO18ZOVs0lu6y78gTLGQOe6Je3TlNtuMcY0wQFI=; b=IJj9yc9erRP1nbucMq2TEnYtvRy8GWIPNlPbduEAQ4pWhsajkIDXMFEBkl0fcv1K1W aEAxPx8LJ+SwSKeK3LkKFACRt1+rDHFD+UcxX/hNulM1ZHWZF2w+i0KQtMMWdNs/uBKc ZknSFQHubfRV8YoJJuYjWB2qORgl3vmCELNmzEkwYf5H+mUd0JdTW/fEj1JcOpBISg1g Hh64M1Q1SCGPrTnnGIS2w8K4KYM7Mk9lwHSi9bxXS0+SlUgnqmcRlJEsc33kkkBWNPLs 0lM9oTBAVQFW8XBwPi4rwdH4/23KUGE0VFyD+7ZPHXDkGBcRalPX+tUSmWrciaW3Vwaj mI/A== X-Forwarded-Encrypted: i=1; AJvYcCU2c/uDuY+Tt/m/FDy1GEWjvOwgUabpX8yONBCElhL5PChX7erXH5DP1QZX5bvrDIOCeDnvUnnLfQ==@kvack.org X-Gm-Message-State: AOJu0YwSgL5uW0Yy7mrak/2GloNDw6FDZiLxDdB0UHnWvDcbQRiFpM8z jg576IlTn6WaJgJ5t4Qb/6N1T9Quo1gU2CGFRYCNIwLS15IcyWb/63vkZpRA9Q== X-Gm-Gg: ASbGncvdapUKjI/E9nHD7F3Uf9d8z3qx+HADzkJ7rcAwmEJNTYj6b/+jFHNbl7ly7Cc 7DRbnCeAP6XOMql6C4feCng9vj0T6agWCaUzgSeY9+8nRVh55KlEBejs4bG2ci8/EYv8Msxg4LZ OhV3jBc5lxOCqrHT0g/tKWCU/+RoSndESvAHkBEd+vpQinurQN+6AJMya7RXd+uXj1oXzUmbRTb K7ntCLE4Cubif9OeCAdg3U4sO4yaOK3E34B9zQofc21I/fBAS6GPkl0o8cYPB/k179eI7Tr6eJD 59DxjLWK8HMxoq6V6ilLPRpBNUf+RiJr6+P9zPwASllLxpTFpyAr6xItl1Egg7N+husGuHXYN9z ycnEy7k+/jJsMVqIF76xNZLw5 X-Google-Smtp-Source: AGHT+IHI3FvBIxrRJdUZJwJ+nd8eUirpcwZEVNZHwxBf+/mwHrhWJ1GNsR+1w55ZqfJ9e1Y3jae2fA== X-Received: by 2002:a05:6808:1a1b:b0:3fe:aedb:19ea with SMTP id 5614622812f47-4035a5cb6acmr5761585b6e.25.1746513910524; Mon, 05 May 2025 23:45:10 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 5614622812f47-4033dae6f2dsm2354235b6e.29.2025.05.05.23.45.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 May 2025 23:45:09 -0700 (PDT) Date: Mon, 5 May 2025 23:44:55 -0700 (PDT) From: Hugh Dickins To: Johannes Weiner , Muchun Song cc: mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Hugh Dickins Subject: Re: [PATCH RFC 07/28] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() In-Reply-To: <20250430143714.GA2020@cmpxchg.org> Message-ID: <235f2616-99dd-abfa-f6d1-c178d8ffb363@google.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> <20250415024532.26632-8-songmuchun@bytedance.com> <20250430143714.GA2020@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D18AA18000C X-Stat-Signature: bsery8yb53dgar6bb3skhb6nnyd73do8 X-HE-Tag: 1746513911-531379 X-HE-Meta: U2FsdGVkX1/yFM08H61yl7uincs+bM0cCsXeCla21MrS5L0+9dnvdOWurLgXaDrdlJ0StzV/dlyN3Z9xB8ucKon/iHjvmVpYDXcOi9C9GklRdIw+kJRETaeeoxa+b8Xk4JjuCwt+M8+iRkeULH9tDWCPWfrva434BsN79PDeYvtXe/WWI5yDTfbPTJ7KVL/TqUi/IksIIVZO75uNPYaogIihX6seyuz5kkyljhkVUhT7N+Ew4R0vli+DA0kelbu/0XlteFa5X9/U8suJQIBi57pLTPdcVHYfwyOxv3M4jIx3PnqJW6kTSYJg1/CpDpZF/iv9zRxxd/F4Azq6ptouM2dNv7dqT58UciPfa88Ad+2/NOEeQNjEIq2kkxWFHJRHG6flGKPdxJHFBZn+XCAXs8yarjR31g0ncv8hpnT+O/1YsooCkEIWgLvYztkqwCPENvQBdMM0HIRXjEqslBgZZM0W5Dh95l+kpvLT0nMON5vVk+GgFQxQ9ZjKfOdWNiTpvYGKhXkblUo6sCVqlUZVFDhxy8GlZUSf7eGgmeyQN1d8bT2owhEvXiQfVij3DCthoVzU6ELsqoNL/dxtdOkSTXkEskFEEv4nODty8i8iP5qcjzKfgO8MmM/2kqBgqr67k4JC1s3/lXXYLz/uxX3UsghmMvWrwacikVrkOVSR3/6z4Z7PzBVZTvvsOBShSybeZkD/D62MmyPhouHCv5DbWG+hkuQrn83BMEl2xrk4HdaY8EMeQyja5Pu3J5y5p6BQnR6ZN3o5a71Exbk4wOXjs9vXS4QGDHy7LE0R4b4TGTkQ4QltZoMuP3wRUPDfa9rpZHxfIGQTHI/OKdkLB+22DbS/f+barlmNuPe/CxyVQof6Fc82TcmyM8a2eCDMmrAG8esCjDt6B3M9edHypDvXYtUedl7hzLN4yM/WqwYRaRImMTUhSmxxeMsTNHZYGV0ngfF2PkXgbG0C0REk08K V6e83sac 3svCT6sxK640Za3V8UrWqCXks+9w9roLXfpZsvnvTziolHaWudDGGjR71txTRlFsMUei4My1PkmS+o8KEPtM4VodXC6TGDGZ9qLrlM/pZD7Fr7Z8llHw1phCYsEuHHMLzR4vc3w7lIMmGABt/2h4/ZbZJQrs2CBQSYB1oMSc+RgGCV7tkGlppvfgD0BVpV8FH1cXbHQGa9d7nsruHfXYp4/sEpiQ0R0Cgow0+/GBi/nsHrulEWqJPU0a47k9Be1k709w7cSQ8QuBCMvu4k0XB6LcYCT/HMAQfTZTdSDNrEXdBt3T0S+k0BGgJp9fN1AqZGxwItRnOU4DyejrIYDDQJTHHfdPzo1wL82bpOc8jDv07SDMCy5vTbzge+zzq0b2qr0suuJPCRIjU66bRYqZKxjSIzv/m2hMx1oLftzxLbghLBg8iPdYUGZSWWw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 30 Apr 2025, Johannes Weiner wrote: > On Tue, Apr 15, 2025 at 10:45:11AM +0800, Muchun Song wrote: > > The maintenance of the folio->_deferred_list is intricate because it's > > reused in a local list. > > > > Here are some peculiarities: > > > > 1) When a folio is removed from its split queue and added to a local > > on-stack list in deferred_split_scan(), the ->split_queue_len isn't > > updated, leading to an inconsistency between it and the actual > > number of folios in the split queue. > > > > 2) When the folio is split via split_folio() later, it's removed from > > the local list while holding the split queue lock. At this time, > > this lock protects the local list, not the split queue. > > > > 3) To handle the race condition with a third-party freeing or migrating > > the preceding folio, we must ensure there's always one safe (with > > raised refcount) folio before by delaying its folio_put(). More > > details can be found in commit e66f3185fa04. It's rather tricky. > > > > We can use the folio_batch infrastructure to handle this clearly. In this > > case, ->split_queue_len will be consistent with the real number of folios > > in the split queue. If list_empty(&folio->_deferred_list) returns false, > > it's clear the folio must be in its split queue (not in a local list > > anymore). > > > > In the future, we aim to reparent LRU folios during memcg offline to > > eliminate dying memory cgroups. This patch prepares for using > > folio_split_queue_lock_irqsave() as folio memcg may change then. > > > > Signed-off-by: Muchun Song > > This is a very nice simplification. And getting rid of the stack list > and its subtle implication on all the various current and future > list_empty(&folio->_deferred_list) checks should be much more robust. > > However, I think there is one snag related to this: >... > There IS a list_empty() check in the splitting code that we actually > relied on, for cleaning up the partially_mapped state and counter: > > !list_empty(&folio->_deferred_list)) { > ds_queue->split_queue_len--; > if (folio_test_partially_mapped(folio)) { > folio_clear_partially_mapped(folio); > mod_mthp_stat(folio_order(folio), > MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); > } > /* > * Reinitialize page_deferred_list after removing the > * page from the split_queue, otherwise a subsequent > * split will see list corruption when checking the > * page_deferred_list. > */ > list_del_init(&folio->_deferred_list); > > With the folios isolated up front, it looks like you need to handle > this from the shrinker. Good catch. I loaded up patches 01-07/28 on top of 6.15-rc5 yesterday, and after a good run of 12 hours on this laptop, indeed I can see vmstat nr_anon_partially_mapped 78299, whereas it usually ends up at 0. > > Otherwise this looks correct to me. But this code is subtle, I would > feel much better if Hugh (CC-ed) could take a look as well. However... I was intending to run it for 12 hours on the workstation, but after 11 hours and 35 minutes, that crashed with list_del corruption, kernel BUG at lib/list_debug.c:65! from deferred_split_scan()'s list_del_init(). I've not yet put together the explanation: I am deeply suspicious of the change to when list_empty() becomes true (the block Hannes shows above is not the only such: (__)folio_unqueue_deferred_split() and migrate_pages_batch() consult it too), but each time I think I have the explanation, it's ruled out by folio_try_get()'s reference. And aside from the crash (I don't suppose 6.15-rc5 is responsible, or that patches 08-28/28 would fix it), I'm not so sure that this patch is really an improvement (folio reference held for longer, and list lock taken more often when split fails: maybe not important, but I'm also not so keen on adding in fbatch myself). I didn't spend very long looking through the patches, but maybe this 07/28 is not essential? Let me try again to work out what's wrong tomorrow, Hugh