From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66EDAC27C79 for ; Thu, 20 Jun 2024 04:43:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F17698D009E; Thu, 20 Jun 2024 00:43:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC7078D0066; Thu, 20 Jun 2024 00:43:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB5A58D009E; Thu, 20 Jun 2024 00:43:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BEE658D0066 for ; Thu, 20 Jun 2024 00:43:06 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4FBBA1A1689 for ; Thu, 20 Jun 2024 04:43:06 +0000 (UTC) X-FDA: 82250022372.18.756DA2E Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf23.hostedemail.com (Postfix) with ESMTP id 9753014000E for ; Thu, 20 Jun 2024 04:43:04 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=swDGHGkp; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718858580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=01SZhB5ujI2y9fdmpPK1qf8cG1B/3CKult8+T23QD6E=; b=HlVGe52ua9v3kHIA8CTiTN5ofuFQDEm/LnOYVK22FyQdwyXDmF24/0kMpB7R3GfF+jZX/p lQv3szINg+Ezb8MChrnWP/aY8UrwAz/h5BWf3wP0g30pDgEhrjbWGamRwyPO2E7jvnq2iO pP1p6xCVk5P14lbgfwuBZTNl8FTULWY= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=swDGHGkp; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718858580; a=rsa-sha256; cv=none; b=H9X+ep8FODjVIfLRNnP5tMuajmOOhploy12I1ZqARyiy0wR3m9Mvkk6ff1PuOgVw7svdgw 9HcSND43Rtupgaw5OSc+Y+55+YgOIOiaPbXHQsWMDHgfXAEQPMg7ncnua6GQ7TFaL3OjXY A1+ZPU1Rfvhqwd/9UTwNVEcvlC0JmrA= Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-63036fa87dbso4008217b3.1 for ; Wed, 19 Jun 2024 21:43:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718858583; x=1719463383; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=01SZhB5ujI2y9fdmpPK1qf8cG1B/3CKult8+T23QD6E=; b=swDGHGkprU0J9BeMDZYAsmqAg7pvA6CvuMim50Y8gUSPaelVm+1aJX+a3quTlEZVxL S9t7q0BBoh2OaQ56ZHv9Be/NdFIQ0Q+AxSdOKbZFSM2TlQOpz/jaV3CuuRC3MPbDU4FB tLkZ57vFa3npqB+nPxQldD79ZmHwB8Ob1CA3RYQy6Z/tTFsRKn4MYjnauSmGYGzwBOTN voNitPuXGJ6MVf2C5NbrJ/2rcyV7z4LnaUBs16I3pIZJeEqilavzrJUzyz1IhMxPEROr Tpo6vejQ0xzdFm5u2crfXKY4Ox7alL63RbQRaIRyepRTHTne2BjZQR55jPcNvEPbQNvI odSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718858583; x=1719463383; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=01SZhB5ujI2y9fdmpPK1qf8cG1B/3CKult8+T23QD6E=; b=KWecZbCBl0qc12sjWOmp80kqaGTU4FM8DzDEOCx9ybV3Ifjoju6d+7XnTcY4SiPQ2w S0b7fwJPnicXOCQz6QpYjwlPeeTKf2vlR/ZdIZTgmkoGlQiz4nlJy1BIUZSOr85zN0ug v0nSdMorKBErjubUnwKXDMw6uud7ij8dSLoPTy9aQ4uJIi15kLY75fx/WVGI1XGOSl5e 6Y44a6H2JwErKoXVgu6r89fUPA53zR+Wzixm13wb+wu8taVNOyju21Ye43GVSNLGOY0D BgZUfhENWBi+05P2lCFFyFSOGKPo+vlVFw1nLdn+PBcfNA1rG2vn2xC/fg8pbxlM3ICh Te5A== X-Forwarded-Encrypted: i=1; AJvYcCWbTpX3dWI63rBmQxH43dxLyfy9PKFZU6eFXk82XxXwUI2fS27sIRVbndrHIbnL1ZouZoDHigLAxOlppYOfXUYnBfU= X-Gm-Message-State: AOJu0YwLS5EewAQRLVMjt2K0E9hL/0He/78sKuehLRI+AerWgOtj6Oip 4za52C3o11mm5RdfMIJWT7Yhpd86krxWIZNomChOAI0D0HUUprSZW20VM6n0hA== X-Google-Smtp-Source: AGHT+IFpakVSGf3XU9qQllNR2HV7g2LeVRfmuNU9kl2neAt8H43cyClnXpBsdywJYuZZUq5lClUtYQ== X-Received: by 2002:a81:d809:0:b0:630:4fab:a090 with SMTP id 00721157ae682-6394a5c92bfmr52378587b3.22.1718858583387; Wed, 19 Jun 2024 21:43:03 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 00721157ae682-631183d7d84sm27099397b3.19.2024.06.19.21.43.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jun 2024 21:43:01 -0700 (PDT) Date: Wed, 19 Jun 2024 21:42:59 -0700 (PDT) From: Hugh Dickins To: Andrew Morton , Barry Song cc: Hugh Dickins , Baolin Wang , willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 0/9] support large folio swap-out and swap-in for shmem In-Reply-To: <2683b71d-aebd-5527-348c-18c0e021b653@google.com> Message-ID: <25ae21b4-23d5-73ba-2e0a-e642ec4b69a0@google.com> References: <20240618130538.ffab3ce1b4e66e3ba095d8cf@linux-foundation.org> <475f0f2c-afc7-4225-809f-93c93f45c830@linux.alibaba.com> <2683b71d-aebd-5527-348c-18c0e021b653@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 9753014000E X-Stat-Signature: bxubo8s6u6f9spgk6awwnqyfbo8c1zzh X-HE-Tag: 1718858584-804533 X-HE-Meta: U2FsdGVkX1+pZYcaWi2m7A57nOQnSHmhyLvg4N4BRb/Cf4jnZ3MXYC8i+gvjNSX17Fwq8b7qTxPGwTRnN16fnk0b6bl8Y2gwZbIKk/dBNJyXqR1xwtl5OHHqQs+xnGWNFewMzZu+mmQNQb/n5BpfgmQj/IbLq3cYQRNAkp5kLgInLLS63haR8rVKKl7BrMhNFipoCNGUr1f0hIVGFTlD+vbNg3XAuV9p5Vx6ZT15VK/PD9ubJ0prlWCWAzy/LRIOzHYVauSV2Bmi7wMFdjzJgBwTCzyXaXqXP3gDwnJ4chDU/s9PTFOOSZDlkm76ucM3qlqCq1QROoQTllnhgzh8oZqYuXWWMvJjns2c/zu9D5djHj2fLq+Mb4wVFr+5uN5Bn8AN+hfw8Pb95Hk3i9rjO8UbNpoNAg+pXQNeDnPlH5ibCHNP6m9sJKtj0iU4JYuLFGLzklBtggYyiBwtA9J2NQg77Gp5PymrlmrZYakNff8sz5KHzs/KQb9cHWDe1qx9uawKAT+yMyoVfK/h4Zmiy4UOW27enz6K8O1azedjPVkO+uzvCC2Zd6vS+TAQJ0QfWJQ2r/eEJJIGV3klAt08PBuH1QkjkBzjt69hCy5Sc8PA4DdaCQdJopbAB3vGiQ3mdtr2QVuaDaDC2PMPYcChk6PrgL8070M3k9HynIqmAlsB7nbm6qXQtDrW5VgTy3ZsqtFxS/padBvkUC81jjMS2vti6vEPGeHq+0mMQdEVqia1C9bVU8+tUNFL9iIrveJ9KI9EvqOBaxzppvQ4NnCoQhmRuArj4+ru/OkcKF7294wZsrL5KB8X3gsZraqLxrY1ObNzE01BJCYrqjBOmchHeCZhD/qqRuQi6KloXCqUDd40Sby6NXfgQp/mP3ogCwLMYzDY39idVg+HLVyQFXuQNW/9jTIe3i3Q43WohdQCOVp/ppaZ5QYH0NvP+ayCEWjbRni0lHka4vZJRx6Yn8p XyXUQzyT 3Fa4wr1JK0rrHtSSyxlJyhJ1I/ghDl6wpMj5yNabyWMtLR9/9cbNE34zRQpA3PG983wmXewdvQmg2lUrVoW+chc1AHgGHwT/lLV79R+6TAVKUvx7C4A8RfR2IAAqTO5KBZ7R0547Rqulu5mQcdltL/ZIKegfvEgX7VaUeYKXH7WKbSjOh7JEOV/qvi961pvNcXptuZCK0ic7822hSWU7+779Hfi7GERySFeLLE/bjgZH1avKBx1QvO6xrj2R2LeO2UNvnifyaIJhCEc2YzFyfz97cFTksfx5mu6KSwZKs9RlaCOk5rGOqDCjINA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 19 Jun 2024, Hugh Dickins wrote: > > and on second attempt, then a VM_BUG_ON_FOLIO(!folio_contains) from > find_lock_entries(). > > Or maybe that VM_BUG_ON_FOLIO() was unrelated, but a symptom of the bug > I'm trying to chase even when this series is reverted: Yes, I doubt now that the VM_BUG_ON_FOLIO(!folio_contains) was related to Baolin's series: much more likely to be an instance of other problems. > some kind of page > double usage, manifesting as miscellaneous "Bad page"s and VM_BUG_ONs, > mostly from page reclaim or from exit_mmap(). I'm still getting a feel > for it, maybe it occurs soon enough for a reliable bisection, maybe not. > > (While writing, a run with mm-unstable cut off at 2a9964cc5d27, > drop KSM_KMEM_CACHE(), instead of reverting just Baolin's latest, > has not yet hit any problem: too early to tell but promising.) Yes, that ran without trouble for many hours on two machines. I didn't do a formal bisection, but did appear to narrow it down convincingly to Barry's folio_add_new_anon_rmap() series: crashes soon on both machines with Barry's in but Baolin's out, no crashes with both out. Yet while I was studying Barry's patches trying to explain it, one of the machines did at last crash: it's as if Barry's has opened a window which makes these crashes more likely, but not itself to blame. I'll go back to studying that crash now: two CPUs crashed about the same time, perhaps they interacted and give a hint at root cause. (I do have doubts about Barry's: the "_new" in folio_add_new_anon_rmap() was all about optimizing a known-exclusive case, so it surprises me to see it being extended to non-exclusive; and I worry over how its atomic_set(&page->_mapcount, 0)s can be safe when non-exclusive (but I've never caught up with David's exclusive changes, I'm out of date). But even if those are wrong, I'd expect them to tend towards a mapped page becoming unreclaimable, then "Bad page map" when munmapped, not to any of the double-free symptoms I've actually seen.) > > And before 2024-06-18, I was working on mm-everything-2024-06-15 minus > Chris Li's mTHP swap series: which worked fairly well, until it locked > up with __try_to_reclaim_swap()'s filemap_get_folio() spinning around > on a page with 0 refcount, while a page table lock is held which one > by one the other CPUs come to want for reclaim. On two machines. I've not seen that symptom at all since 2024-06-15: intriguing, but none of us can afford the time to worry about vanished bugs. Hugh