From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7970AEB64DC for ; Wed, 19 Jul 2023 01:52:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC401280021; Tue, 18 Jul 2023 21:52:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B73F88D0012; Tue, 18 Jul 2023 21:52:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A44E0280021; Tue, 18 Jul 2023 21:52:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 934568D0012 for ; Tue, 18 Jul 2023 21:52:49 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 53F23803C5 for ; Wed, 19 Jul 2023 01:52:49 +0000 (UTC) X-FDA: 81026687658.12.884A531 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf15.hostedemail.com (Postfix) with ESMTP id 510DEA000D for ; Wed, 19 Jul 2023 01:52:47 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=mo41huiF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689731567; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hm8UsgnXaymaPZWlJt+yM3rqUkl+oIHbLCaIfDuxi60=; b=p5YzA8AulNkUgiz6T2eMjALfPhg8TtXzK4ilXi4pDMA4pw8kIPyzurj8hAReD3h6ZUHrk+ HUK0PB4J0iRskzt5e/fqeEC0qVV9rE2hdVwrwrMQ4h8kfHB+xAUpYJ7X9fOfuWIP6/CShV XW3w4JL5Cu5Pyb/JedqfDJ4MptdhjDQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=mo41huiF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689731567; a=rsa-sha256; cv=none; b=8VhZONESUA3ew5vlLP+LfzuTlZPFADIZQCG3Vb2qirIcGiXp+c8rd7reTpDC/QpzagRKlP 5GycUcvPxqyppNJMBIezabj65rypwEOn/4bcFo19kCshpArrVyKa81cnWry+AqEfsZAiKK 4Gzf7M3MRYeKz3GIG8dYP4U2HooODSw= Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-9891c73e0fbso64432866b.1 for ; Tue, 18 Jul 2023 18:52:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689731566; x=1692323566; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hm8UsgnXaymaPZWlJt+yM3rqUkl+oIHbLCaIfDuxi60=; b=mo41huiFOp4/LAQot5cnPMGXdzt4LO4jLSKIKDb2tXvojrOpRthxN7keBAr+vOg9Td JN2PJUwPl4Bsph6C9KLtjW9SxnefSU3H5iFyb2Vg4CthS/AI4atcu0zselh7c6HId4Lt 8r78egy2N9bumGVSEYj6qX2288xTQ+cv3xRaEK7fjcmDeFxqr7IBxFhBGO6Ny9ZrhZ9I S/aTKEAchuOxmK48OnHv15YEF+ZhsXeHM6DLUrRdFX1vYTJ+dsqRTbuVN7ltKYOr+qVo YEEiounkKh2m6dk8rT5RsMAE5C3I8W0n6vmiVk585J9Ao3ao+2KHY3SJI0j5FPunsP8n Vesw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689731566; x=1692323566; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hm8UsgnXaymaPZWlJt+yM3rqUkl+oIHbLCaIfDuxi60=; b=N4f5wFq9pWFPa4nySwVV7vAvX3jsBOjNfyZOJ8xgLf3PfB/Z1NtXe7GqW1+RJ5lq7B BCk/vHyRJxnPRHprOV0cPBydZZ/RwGzZwdFD52Q2y5hgou6M746S+ilpgo0MYxrm9UlB VldQMyIHgcTsqbiKg6jLb4gElDeZYDiKsxPvwJ/KaUBe5+LQk9lmbIO8XF6ZnccEDdv/ MM0wvKfoWdGfdu7gQmp4RH5apGPW3kFhsFj8QgREwXpFOVntXuAK321GHMu0Ql96tIGE oQfYQtsBaPN0jQEGtqBfEwUXj+KP5H/Xa2OSfcIxH9qW9GkPEb8n9FjqsIoAezrQC9TO 2LGg== X-Gm-Message-State: ABy/qLa2mF1tcnMvCrrHADbPpRgyX6amqlIl9NEro7SUbiZY5VQ9S5tM c1qwtQHGC3JIFmr1BTzy+IFzXPzVML3ceioYZHNjWw== X-Google-Smtp-Source: APBJJlGYxqE9W3VQnXNQBmfNYSj9BiqiGm6XNyY2ubOaY1wGxJAlXmw5qiQJVpqpUcXM/CXJZWvyvjCCm/fNTFRWKcM= X-Received: by 2002:a17:907:3ea0:b0:993:f9d8:9fd0 with SMTP id hs32-20020a1709073ea000b00993f9d89fd0mr996265ejc.1.1689731565653; Tue, 18 Jul 2023 18:52:45 -0700 (PDT) MIME-Version: 1.0 References: <20230712060144.3006358-1-fengwei.yin@intel.com> <20230712060144.3006358-4-fengwei.yin@intel.com> <40cbc39e-5179-c2f4-3cea-0a98395aaff1@intel.com> <16844254-7248-f557-b1eb-b8b102c877a2@intel.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 18 Jul 2023 18:52:09 -0700 Message-ID: Subject: Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio To: Yu Zhao Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com, Yin Fengwei , Hugh Dickins Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 510DEA000D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: uczzkkodi8yx1zwfjgngjx39eye6h69g X-HE-Tag: 1689731567-409922 X-HE-Meta: U2FsdGVkX1/VYeJmiTQ+NwD/ubk2B1g1JNs5YccZddlzx9LKvuAtqiAJYJm9+d/HPgdn0XRCEyytHbkaWPikd0ojuoNJ8TXNeI9Vdt1LYqMPVtiSSCPRpD6Q1BDV6GPI7scRMH1CRrrYLSIu0gCseggk+ZhpWCa6wLQXHzsCuNKym01qWbUj5lgpKMTvzSE6Q6qNYo0fTohF1cx8ocPV7iGIb7WA2NRMlRdhvgP2YMJDDud9WBPX411t6SlT1E5rmvdvf3EgkBtY/zft+cOksB1nSK7ZhMWvYtCmVQeGcE0GSghb7m820tBC9rHqE+rTWvFklfww6FgslOE2AR3c1M8WcaHVt/1fhSigG/rkRnl3EzeiUCcl1aQcZVAxGUQ3QKFTCxYqK2lGQGo2qQNa6SRhIi5qX0nXPKHuX8tk2ZcJMHHuxPt4EjiLMC33y/qNs60YEdN5fJaj2gy+fkQIPIhy6BpGvS6slHbL3klb2FODEpOxtcsb9F50vc31UlqFALYCScXC7iY9Qn2LJQNEQLvZBC3Lzsjl63Ow2uMRG3CmDCAAdgU0gwZkqT/h/vWlD7g2DhKwDKezbRW32/j9SEsSses5wZ476N47xUfeyc8yibuTximQhYHQuuhjrEmgAN8M/GUQ/3D0lZWQ6Mg+KuoWk6nO+CugWJWJx7qxAAex5m0BcdRU6MYb5L4yYPCYPjajC6ni7cu0A+N/8JkIDyJidzLQvemgKbf1CjqOyEqe13x90LIEshKVjYOeHMrGGMoQBY1TpZQTVwV3RlQt6n+wdGPEZ7Q8RcLSoNPGVEjnR67ehY3bD/Jar3emqAH6ewEmO4PPjGFn7kz9JbE849dN99ONr/4mHKmuNVBerK1iyqB/uAGAMwqpkP8DPmX+crn+TwslqZY2c9Out4vv0FeEM4lbzTc3GPjNwZ8ddeasPfOWNeyoBlrVQWoJiSORznRK2CjBUQ08numrAm5 MsZAVHfV YSfrD1gVtX6XW6iPqHEhTSKw4jmKTDPh/B6Dcj3D69/y4/MPCSNwq87T3iVnGp9xWZM8ZRxr6OA2fvX5Sl8Bu3G0N5Nt6j3xFWccm78tF1cTQ0gMqIovfDd++LFDPHapHjUjIDAvKjlMxe4Y4X1NAE6dza6093BBkPb9GFYWDz33q74oBhSu5P7mmcKwZ2ADki1qgA6GyxCqGwzf3l9uiNNj6L3gUtuyyDBR3+lRp13/GG02LtTFyOxc2cG3E9wQBWGF3XcwOICBeIl+3cDjL3haZ1AI6DIPk7lhuuA192JsiDZfNDVg9F2/qhQqYvS+VDauzk8LYefAavWWxKUF437h+QHe2qKWhWuYJNSyi9OXzJgHF9X5gHOok6WIVz00e53na X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 18, 2023 at 6:32=E2=80=AFPM Yosry Ahmed = wrote: > > On Tue, Jul 18, 2023 at 4:47=E2=80=AFPM Yin Fengwei wrote: > > > > > > > > On 7/19/23 06:48, Yosry Ahmed wrote: > > > On Sun, Jul 16, 2023 at 6:58=E2=80=AFPM Yin Fengwei wrote: > > >> > > >> > > >> > > >> On 7/17/23 08:35, Yu Zhao wrote: > > >>> On Sun, Jul 16, 2023 at 6:00=E2=80=AFPM Yin, Fengwei wrote: > > >>>> > > >>>> On 7/15/2023 2:06 PM, Yu Zhao wrote: > > >>>>> There is a problem here that I didn't have the time to elaborate:= we > > >>>>> can't mlock() a folio that is within the range but not fully mapp= ed > > >>>>> because this folio can be on the deferred split queue. When the s= plit > > >>>>> happens, those unmapped folios (not mapped by this vma but are ma= pped > > >>>>> into other vmas) will be stranded on the unevictable lru. > > >>>> > > >>>> This should be fine unless I missed something. During large folio = split, > > >>>> the unmap_folio() will be migrate(anon)/unmap(file) folio. Folio w= ill be > > >>>> munlocked in unmap_folio(). So the head/tail pages will be evictab= le always. > > >>> > > >>> It's close but not entirely accurate: munlock can fail on isolated = folios. > > >> Yes. The munlock just clear PG_mlocked bit but with PG_unevictable l= eft. > > >> > > >> Could this also happen against normal 4K page? I mean when user try = to munlock > > >> a normal 4K page and this 4K page is isolated. So it become unevicta= ble page? > > > > > > Looks like it can be possible. If cpu 1 is in __munlock_folio() and > > > cpu 2 is isolating the folio for any purpose: > > > > > > cpu1 cpu2 > > > isolate folio > > > folio_test_clear_lru() // 0 > > > putback folio // add > > > to unevictable list > > > folio_test_clear_mlocked() > > Yes. Yu showed this sequence to me in another email. I thought the putb= ack_lru() > > could correct the none-mlocked but unevictable folio. But it doesn't be= cause > > of this race. > > (+Hugh Dickins for vis) > > Yu, I am not familiar with the split_folio() case, so I am not sure it > is the same exact race I stated above. > > Can you confirm whether or not doing folio_test_clear_mlocked() before > folio_test_clear_lru() would fix the race you are referring to? IIUC, > in this case, we make sure we clear PG_mlocked before we try to to > clear PG_lru. If we fail to clear it, then someone else have the folio > isolated after we clear PG_mlocked, so we can be sure that when they > put the folio back it will be correctly made evictable. > > Is my understanding correct? Hmm, actually this might not be enough. In folio_add_lru() we will call folio_batch_add_and_move(), which calls lru_add_fn() and *then* sets PG_lru. Since we check folio_evictable() in lru_add_fn(), the race can still happen: cpu1 cpu2 folio_evictable() //false folio_test_clear_mlocked() folio_test_clear_lru() //false folio_set_lru() Relying on PG_lru for synchronization might not be enough with the current code. We might need to revert 2262ace60713 ("mm/munlock: delete smp_mb() from __pagevec_lru_add_fn()"). Sorry for going back and forth here, I am thinking out loud. > > If yes, I can add this fix to my next version of the RFC series to > rework mlock_count. It would be a lot more complicated with the > current implementation (as I stated in a previous email). > > > > > > > > > > > > The page would be stranded on the unevictable list in this case, no? > > > Maybe we should only try to isolate the page (clear PG_lru) after we > > > possibly clear PG_mlocked? In this case if we fail to isolate we know > > > for sure that whoever has the page isolated will observe that > > > PG_mlocked is clear and correctly make the page evictable. > > > > > > This probably would be complicated with the current implementation, a= s > > > we first need to decrement mlock_count to determine if we want to > > > clear PG_mlocked, and to do so we need to isolate the page as > > > mlock_count overlays page->lru. With the proposal in [1] to rework > > > mlock_count, it might be much simpler as far as I can tell. I intend > > > to refresh this proposal soon-ish. > > > > > > [1]https://lore.kernel.org/lkml/20230618065719.1363271-1-yosryahmed@g= oogle.com/ > > > > > >> > > >> > > >> Regards > > >> Yin, Fengwei > > >>