From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D613C30658 for ; Wed, 3 Jul 2024 01:51:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 421BB6B0095; Tue, 2 Jul 2024 21:51:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AB1C6B0096; Tue, 2 Jul 2024 21:51:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 272836B0099; Tue, 2 Jul 2024 21:51:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 057376B0095 for ; Tue, 2 Jul 2024 21:51:45 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9D5721406BA for ; Wed, 3 Jul 2024 01:51:45 +0000 (UTC) X-FDA: 82296764970.10.48BA929 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) by imf05.hostedemail.com (Postfix) with ESMTP id 22095100009 for ; Wed, 3 Jul 2024 01:51:41 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=YpQcY8jE; spf=pass (imf05.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719971485; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9ySc3bYg15NcsW1Q0nlz2/qGVTHyuUlAyowi9v5Eek0=; b=JgimcLyO6Lkn3mil0+nunIo2Zz7H2OWIMgFZ7gqT2w+UtHukVctTjwYc+KsGrtw/5Sb0LL 2isBM0ukXk0afhc2BOZEKDWGLNAXm7kSbdaiIc8F+HbspzgbAuRX/mBIi/W0+PefIvCG80 ciNQByg8SgICPOQamvNLghjsN/Az2eI= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=YpQcY8jE; spf=pass (imf05.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719971485; a=rsa-sha256; cv=none; b=oEOPX2Z0C0JMUEo2LdVUILGOwYSQKnmpTUIeApxJs973UrTzqZgoSYB9IrYlDFJa4CZbYk dkrYIiv3z4bU9KdYkxLe0msuM/TtVnGENmYDDqoxXAAoWStApvTCaEDf/fZamZB8nxFs9j 5IAyHL9+maNQcpPXr6/xUMKK9Trqtck= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1719971491; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=9ySc3bYg15NcsW1Q0nlz2/qGVTHyuUlAyowi9v5Eek0=; b=YpQcY8jE0tDqOnBjhjAMR8fYzQTXGAj0pc5hiMfYp4kJtnnxkYrRPYfdNfvpObvqpjbPTW4TQufunCOkxvXm6eCLUHm0MFyfhUmaR6mMoBOiP1LPrkfFo2FMOA+gboIENy6delQmmY7Mz2FAP35JBqzAkPgdLppdIbpKlRx6LUU= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R831e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037067111;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0W9kB2Z-_1719971489; Received: from 30.97.56.62(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W9kB2Z-_1719971489) by smtp.aliyun-inc.com; Wed, 03 Jul 2024 09:51:30 +0800 Message-ID: Date: Wed, 3 Jul 2024 09:51:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH hotfix] mm: fix crashes from deferred split racing folio migration To: Hugh Dickins Cc: Andrew Morton , Nhat Pham , Yang Shi , Zi Yan , Barry Song , Kefeng Wang , David Hildenbrand , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <29c83d1a-11ca-b6c9-f92e-6ccb322af510@google.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: djgr1jmtn3rk31d65smu4nq8yg9844fc X-Rspam-User: X-Rspamd-Queue-Id: 22095100009 X-Rspamd-Server: rspam02 X-HE-Tag: 1719971501-34018 X-HE-Meta: U2FsdGVkX19oun/sxwRpPBLg1czznjFGawvNO7AaOhn1PGiGNpG9Tpd6kqsJgqMd9iWRQm25yj4OAFEXml80lq/Nz1J7eCg3FBlGItK6oCxIfYMe0h7suJdpqgEHWoWj5tSvS/L5HE8Ghgvejymy06I3JPMn0v7La+H+qtk9VwmH2i7IKageB/j43u42G00qgTZhzZ30hBDMN1kAsfiqm2a/NaxbvOrQVa+LrXe1WUuhOb3zRrgmOuF0zHfVj/2IszYY2h9KzPQh1xnkYOk6upvYT4lwQwgk8XPTZHYm4bkkOz6/cgabTLwR3vqf/VzWiqHH6qYHmQNn8LyoWEXJaT3EceF0mAcOrLjDejz3+E30ckXkKVTnnO0S13nODyO09lICT7JBgVh9CyQ7kNvKZ92Vr/JeEv1+uch3kBkj3Eb4ZO3ysNoHyoCSR9EuDP0ybotp9ChIu9N1bzD1XAxTkvN3r5vw0T1+Zew4SE9TmOcglJ8wEjkLAt20EKKOkW48+6QUhkfDffdU+T7RiWZOyIM6si94PrDTSKTYrUdQwg76+N4Z07m/7YbkxIcUpaR4IcnPFSHn8WkubhGRyTaWxMcjGKUg1y3GZjy8vQE8ThO0lXJ6akeqqp1oeAtvPypc4CC/tpNK53LmjLL24lwGiyzExgNSYs+oHUkWqPX/+/WPffasat5oz/mhzaeDi6sz1PAhkD7aTBNrIkA9dUkt3nTBUsrd9AJ6AVfkBgZeIMEKc+W391yQuIKu+YdhJaEde9ld6l+FV8taqdiMOiEaE1ylEoQiJxTXQuWsXiPAZsr8CHONPBIuvRm2BHZrTwSvXTmdI6mUi7PaQ0C3A6Ojfg034vmWUr3G2bMePTkBVrAJfhROM5P++ZGgpxDCg3tGQlhcHpA/svpY8XqSD/eWfe66SS8XOMVXgJmEz609vVSOjy46Kb1urCufxd/N7k7WDZGlvb5o1cii4WHcWOu Bm6YOF60 SN5bpWvaMj/LeSc6Dt28E5lYigD0VaxzvWMXDMYUNyT0y7mcajPe005QwZOpyEGp0zXBM+092y44UgbHSuYqZTqLfTSJa66sCqWFzWTrVZqt0tJDhi8rwBcI7OY6zttCAm2c75yj09p6IwSRvnma1E/Lx4hhl/7u5khAKLIXqQPNJPF7P2awxFuNDCmYmO7kQRsLgLb0uZrg4MNURk8JaJ2y/1Z6NYtytcl+c39kZ4E4mkpFtPxetjMfYlmi9sxgiyPRwsAM6hfijn5Z6pc1X7Sw6NQjOsuwn+/hrqILDdpy/Y1aVCe1ncP7ebWhheVuZF0LA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/7/3 00:15, Hugh Dickins wrote: > On Tue, 2 Jul 2024, Baolin Wang wrote: >> On 2024/7/2 15:40, Hugh Dickins wrote: >>> Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on >>> flags when freeing, yet the flags shown are not bad: PG_locked had been >>> set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from >>> deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN >>> symptoms implying double free by deferred split and large folio migration. >>> >>> 6.7 commit 9bcef5973e31 ("mm: memcg: fix split queue list crash when large >>> folio migration") was right to fix the memcg-dependent locking broken in >>> 85ce2c517ade ("memcontrol: only transfer the memcg data for migration"), >>> but missed a subtlety of deferred_split_scan(): it moves folios to its own >>> local list to work on them without split_queue_lock, during which time >>> folio->_deferred_list is not empty, but even the "right" lock does nothing >>> to secure the folio and the list it is on. >>> >>> Fortunately, deferred_split_scan() is careful to use folio_try_get(): so >>> folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable() >>> while the old folio's reference count is temporarily frozen to 0 - adding >>> such a freeze in the !mapping case too (originally, folio lock and > > (I should have added "isolation and" into that list of conditions.) > >>> unmapping and no swap cache left an anon folio unreachable, so no freezing >>> was needed there: but the deferred split queue offers a way to reach it). >> >> Thanks Hugh. >> >> But after reading your analysis, I am concerned that the >> folio_undo_large_rmappable() and deferred_split_scan() may still encounter a >> race condition with the local list, even with your patch. >> >> Suppose folio A has already been queued into the local list in >> deferred_split_scan() by thread A, but fails to 'folio_trylock' and then >> releases the reference count. At the same time, folio A can be frozen by >> another thread B in folio_migrate_mapping(). In such a case, >> folio_undo_large_rmappable() would remove folio A from the local list without >> *any* lock protection, creating a race condition with the local list iteration >> in deferred_split_scan(). > > It's a good doubt to raise, but I think we're okay: because Kirill > designed the deferred split list (and its temporary local list) to > be safe in that way. > > You're right that if thread B's freeze to 0 wins the race, thread B > will be messing with a list on thread A's stack while thread A is > quite possibly running; but thread A will not leave that stack frame > without taking again the split_queue_lock which thread B holds while > removing from the list. > > We would certainly not want to introduce such a subtlety right now! > But never mind page migration, this is how it has always worked when > racing with the folio being freed at the same time - maybe > deferred_split_scan() wins the freeing race and is the one to remove > folio from deferred split list, or maybe the other freer does that. Yes, thanks for explanation. And after thinking more, the 'list_for_each_entry_safe' in deferred_split_scan() can maintain the list iteration safety, so I think this is safe. > I forget whether there was an initial flurry of races to be fixed when > it came in, but it has been stable against concurrent freeing for years. > > Please think this over again: do not trust my honeyed words! > >> >> Anyway, I think this patch can still fix some possible races. Feel free to >> add: >> Reviewed-by: Baolin Wang > > Thanks, but I certainly don't want this to go into the tree if it's > still flawed as you suggest. Now I have no doubt for this fix, and please continue to keep my Reviewed-by tag, thanks.