From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DEE3C021B8 for ; Tue, 25 Feb 2025 21:32:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20996280008; Tue, 25 Feb 2025 16:32:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 191A2280003; Tue, 25 Feb 2025 16:32:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2686280008; Tue, 25 Feb 2025 16:32:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CFAE1280003 for ; Tue, 25 Feb 2025 16:32:14 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5B39EC0344 for ; Tue, 25 Feb 2025 21:32:14 +0000 (UTC) X-FDA: 83159765388.01.623D81A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 0710420008 for ; Tue, 25 Feb 2025 21:32:11 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UYRYHhzj; spf=pass (imf03.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740519132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JPm4MkJNuyAcesE/fA8MzWK9oiSIKGTrlxcF5Yd8m0s=; b=TUxA7RD7tJ3bb54KFZFJC4pvUqcGb/KXOLurDFdHkgTgwbRD71oA3anyYmkVpIe4YMAu1t eML77bYKTtrtaB5HBkeSDrBFUWyPSyDK5V2WyYgNRHVbqguG8x4mstit1YNz/1X9sw/buG yifz7u9SFuOT2fvDz5rH5pzDlGxqlYk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740519132; a=rsa-sha256; cv=none; b=HtIShHauhrIJgD5F0ht+XFTO/teWzoCRpXPRHCg1k7M+8x36XDORJsNkZUif0zHjovZywu 9+nCaoDCKA6w6OeaybS9IqRhYb6t0CwKnH+au6ZAyijDtvKm+dNtzhDbpi3UuWd2LMjMJ1 EL2Fm4z0LdDPIUISxRiteG3GUDJvJck= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UYRYHhzj; spf=pass (imf03.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740519131; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JPm4MkJNuyAcesE/fA8MzWK9oiSIKGTrlxcF5Yd8m0s=; b=UYRYHhzjnU6q9iccCav9e/s9lvlAV8ib8h2AtmUeeSFmB2GvAtEYbYWba4PaGi8MpPLVWR yq+NwMX/X3zw+fiRS7rUJccgug/5Vnryn4SghuSwmHFrKasFYUEqYc+B3yuhUzFn2UVUBO sdCG6i71qAQLbFTJuC8xMGH9EckbL/Q= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-499-s2MztU0TMGeBQPbqhq1iCg-1; Tue, 25 Feb 2025 16:32:08 -0500 X-MC-Unique: s2MztU0TMGeBQPbqhq1iCg-1 X-Mimecast-MFC-AGG-ID: s2MztU0TMGeBQPbqhq1iCg_1740519127 Received: by mail-qk1-f197.google.com with SMTP id af79cd13be357-7c0ae84aaceso1046576985a.3 for ; Tue, 25 Feb 2025 13:32:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740519127; x=1741123927; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JPm4MkJNuyAcesE/fA8MzWK9oiSIKGTrlxcF5Yd8m0s=; b=tWo4sixOZlsCMPlqI53YItVPZWHInzd/YQI+5SXbNHukBKFC8RKEOGpLzSnNT4OZae dLHXWzGfSLpwfXIlXZxySQC60BAu15pflUU5TfvUBIh3BV7awUp4JoAUzhq9PGvwv4QQ yL0XyXzjuLazW2QldUBnOneEPGiK2zmKVQ4+gvY8uUt3vp+qFG6QnCT1PhbjV0+7SEWn ZtITDATKHhdfOYNZtnkf7x5VzQZxl2dJykCKbcoE4O3zXn3vvbAI9jDfwJuFURuuDkhk GyT4rbNJxRSSrMjGkzDn/hr47NyPTxZrueazE26d33dYTgSDOxFuGC5fTPX7hqZuu5WV T5LQ== X-Forwarded-Encrypted: i=1; AJvYcCXsl51R6hPsPxbUBq5UbgC/X7EIkrobM1pPUecxVUXqn3P+w7DS5NHLySeLEo5f5rVFk88/K5WGfw==@kvack.org X-Gm-Message-State: AOJu0YxN7vdotyuLMlJVqN5obCVx2kNUW3ZqqdoT4k4OhBZ1d/Zq6/+u L7cBi8zQ8ioCk2aE028tpKKaek8ZpRwOb4gilqXFmfTatjhFj+nn29UwMdYDpQ3asol86bBU8oe TkX1/64HVRWM8kqGDhzrmoeoTyyDm2lUHYNgQG268xW5zDR1T X-Gm-Gg: ASbGncsmtkymS7WashT9u01IRxFsG5dcq+9n+RWkdDbIIlwfyW7txGQh3liup01wply pr9hbyQw3JaWrlH5924Z3I6uqsReIh6HJ4DTRumJ1YeiZxdHgTc1ZtZLZ77RmA19OSUG4AWUJzN upTvM29rjdln3SDIdOidiEUGjEVju69f6HIEaXDTc3GKyCDRB5a/MdJhI9qOwgtuUi29rZIvvzk 67UYr5qNo0AVz06mkx29TmFPKHFT6JpxBGj4id7c9qeMVu/p5UExGEnH+FnNmAKDDN/YQ== X-Received: by 2002:a05:620a:40c1:b0:7c0:abe0:ce4b with SMTP id af79cd13be357-7c247efcb22mr153621085a.12.1740519127631; Tue, 25 Feb 2025 13:32:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IEy1G5ZQ5qaqa3S8e95ScSceHmfp+x9BMsy0EsZrltR4KBLBIK8l1gnOWF3g5A7O3cTRRe+Bw== X-Received: by 2002:a05:620a:40c1:b0:7c0:abe0:ce4b with SMTP id af79cd13be357-7c247efcb22mr153617685a.12.1740519127350; Tue, 25 Feb 2025 13:32:07 -0800 (PST) Received: from x1.local ([2604:7a40:2041:2b00::1000]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c23c3271e2sm155457385a.88.2025.02.25.13.32.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Feb 2025 13:32:06 -0800 (PST) Date: Tue, 25 Feb 2025 16:32:01 -0500 From: Peter Xu To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, lokeshgidra@google.com, aarcange@redhat.com, 21cnbao@gmail.com, v-songbaohua@oppo.com, david@redhat.com, willy@infradead.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, hughd@google.com, jannh@google.com, kaleshsingh@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/1] userfaultfd: do not block on locking a large folio with raised refcount Message-ID: References: <20250225204613.2316092-1-surenb@google.com> MIME-Version: 1.0 In-Reply-To: <20250225204613.2316092-1-surenb@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: zOJfPZKIsSNYWXd5aWCrYTXLiRJt1mGwc15-Caj9Dek_1740519127 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Queue-Id: 0710420008 X-Rspamd-Server: rspam07 X-Stat-Signature: jx51tozbb531e5nzx81n533tg8qjmt8e X-HE-Tag: 1740519131-25353 X-HE-Meta: U2FsdGVkX19Lhh/9OJE9cCTfOsvsmjso4U+t5pjbf1hurYS3Atgc0Csd+UBBQZkEBnJpUDL2V7uMjz3XIonEyBXBvZAD4IMZV7i+S6PQgX1seoEuGNZCT5rMEdOdX0G1V1M3MAWnM0lCqvZSLm3FY49j8LnMOaZS5L2bEA5uD+bp2HTbCVVGqzpicl9OI6ES8lQsd5Yg/KoWpEjthAUf7ghM3QqA0Nb/WO98+X0WYwpECUWRcOxW2BKmBDkQcjk2JdqfmeR6h+NzMYmvyRmoPYgFiPPLWPmPrSE19KZXev9jyjb37CCfr1scthEwbaUH9fA+oZihhD42XPeEJlEsC1EDj/EyeAdLT5KyKkXj0mR5gAcrs3E5PXTREakga8Fd/Pzqa4okaGVh2eQ5fQ+bXBifiPjNAmzXcYy0ARxlMsaBF9c6u8PH0ioAhqRJg8la04n52NBIPW89USVNy/UCFfsrTeoFFL9yG50uOQeivOf7k4OYPdEOhmY6lecbluMu1jNMce3Qa08sYaeOHLwAnUBJQPIuQ7yzOergl+FxdcpaPGZirbiFv9KhJLMvMAI40ymX0+7ZsPkSgkzqu8zC04eXmjncWEqdr7wsUpbNihb0JPRU6OSvyIyzYJN2QUjGQESzU/tIQDOW+sqE88eVw0KeeKn8cHLaOd5AV3o3VxNF35n17UvAQtjYeEat4xg1h6l336qRBM/OvM9Ud/6jkYDvCg9Z4eGLX3VLuH6uluerVoEBw/wl5W58a/KD+F02o4ss++lCPm4JccbnGlY5V1ZkvWTltFogGMAH+DswXJncj7c5Vugdw80QRfrGvXOazLEStuofiM9rVJJV0ldjRB4bUMna4y0wQmiRP6xZtFxP+cu+mzFoEY3AqnYD7711XUQva9lYLLEBM8lM5yUoT8rosDBhpkb2RQXN9c1MG35HPvUX2ic5FzDrgp/XTZ/gJAWmJeIVbzPN5SeQu3/ qXS2gOSx oEI38+2mUpBTecya5MegHGXoERMyRho0NyyDGpKWO/akfaJJ5gTQHENeGSzx91/NHhgrTpShSDkG/VAXQc/MVwpNHk3ya2bd5nS/41OVzQZYlppZ2tj8RJxWLbs6NT4cTJGmxhHEdsjUIlgdqW7Tu0MoYUiSn+XccklaNinnKV/OmQe3KItpBx5WB6FijcJ/k+3jQuELEHzPbDJ4l72cRKfH12VQlVA/QrJVFRScEDbIVqh5bUhPkTu+w5PDjG60oEwsxBPOsO8p1Xu3BldNR/zNM9tnhLphRMkvDZqDUC0cuCkv3bVirEk1tIT9nb65DZyz2zDpg2sNeBoR0aN73P7BIrCxtDRbObRRjRtSF3r1jlYcvWKwN2YlTAYTa1Xbm+eLIqEsxB01plVBYP6+140iPBtbFJTIF65Ok X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 25, 2025 at 12:46:13PM -0800, Suren Baghdasaryan wrote: > Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock > state when it goes into split_folio() with raised folio refcount. > split_folio() expects the reference count to be exactly > mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with > EAGAIN otherwise. If multiple processes are trying to move the same > large folio, they raise the refcount (all tasks succeed in that) then > one of them succeeds in locking the folio, while others will block in > folio_lock() while keeping the refcount raised. The winner of this > race will proceed with calling split_folio() and will fail returning > EAGAIN to the caller and unlocking the folio. The next competing process > will get the folio locked and will go through the same flow. In the > meantime the original winner will be retried and will block in > folio_lock(), getting into the queue of waiting processes only to repeat > the same path. All this results in a livelock. > An easy fix would be to avoid waiting for the folio lock while holding > folio refcount, similar to madvise_free_huge_pmd() where folio lock is > acquired before raising the folio refcount. > Modify move_pages_pte() to try locking the folio first and if that fails > and the folio is large then return EAGAIN without touching the folio > refcount. If the folio is single-page then split_folio() is not called, > so we don't have this issue. > Lokesh has a reproducer [1] and I verified that this change fixes the > issue. > > [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock > > Reported-by: Lokesh Gidra > Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu One question irrelevant of this change below.. > --- > mm/userfaultfd.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 867898c4e30b..f17f8290c523 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -1236,6 +1236,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, > */ > if (!src_folio) { > struct folio *folio; > + bool locked; > > /* > * Pin the page while holding the lock to be sure the > @@ -1255,12 +1256,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, > goto out; > } > > + locked = folio_trylock(folio); > + /* > + * We avoid waiting for folio lock with a raised refcount > + * for large folios because extra refcounts will result in > + * split_folio() failing later and retrying. If multiple > + * tasks are trying to move a large folio we can end > + * livelocking. > + */ > + if (!locked && folio_test_large(folio)) { > + spin_unlock(src_ptl); > + err = -EAGAIN; > + goto out; > + } > + > folio_get(folio); > src_folio = folio; > src_folio_pte = orig_src_pte; > spin_unlock(src_ptl); > > - if (!folio_trylock(src_folio)) { > + if (!locked) { > pte_unmap(&orig_src_pte); > pte_unmap(&orig_dst_pte); .. just notice this. Are these problematic? I mean, orig_*_pte are stack variables, afaict. I'd expect these things blow on HIGHPTE.. > src_pte = dst_pte = NULL; > > base-commit: 801d47bd96ce22acd43809bc09e004679f707c39 > -- > 2.48.1.658.g4767266eb4-goog > -- Peter Xu