From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C43ECEE49AF for ; Tue, 22 Aug 2023 18:34:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FD1D280053; Tue, 22 Aug 2023 14:34:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 086E3280040; Tue, 22 Aug 2023 14:34:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1A8E280053; Tue, 22 Aug 2023 14:34:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CE267280040 for ; Tue, 22 Aug 2023 14:34:34 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8F9B216044C for ; Tue, 22 Aug 2023 18:34:34 +0000 (UTC) X-FDA: 81152591268.30.627A68F Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf09.hostedemail.com (Postfix) with ESMTP id A5E7F140032 for ; Tue, 22 Aug 2023 18:34:31 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=CqKdw30W; spf=pass (imf09.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692729271; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1WMcORuWW+xF6gY7IBQ8Ihi596ledfzVY2HvWzF69bs=; b=UBEfoosRSlN+JBD18SFU5nvViBfO3K7g9Xz5VOMtr4B7E1KvjaD62337GjbbQCybt7Kl8X 8+MLZakxjznFhV5vmkv+Hdu5v/QpvECJIQ3/eH021fpGqTT4N/Dx5iFQmexQYkfD0YxZud bzFs82O9vDY6Am8QxxNqEuWZ2LamPNw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=CqKdw30W; spf=pass (imf09.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692729271; a=rsa-sha256; cv=none; b=CRiBCXNr8R+RElCPS8xIbzd9DiS7Ke2htn9Fni5O5ZwACZmd9K41G+HZyVFdZyU1oH+bdJ IkE0qUdbPWYomotgjbttXZdBD8HYZL/jT6kb8VThYetehhHPDkBFApWsxeHbyFlz+tnzxR SQpvcxh/tfQBNed44dQpB2AqH1Dtxvg= Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-58caaedb20bso53896977b3.1 for ; Tue, 22 Aug 2023 11:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692729271; x=1693334071; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=1WMcORuWW+xF6gY7IBQ8Ihi596ledfzVY2HvWzF69bs=; b=CqKdw30WIbzK4C9BVl49dnPtQXsgCMK0Fwag7SfAQ7kJ3tRw3a7cCtAS2xlL+AAHPv r4CyCLMJ96e3fzvi6KQ2+Mk2O4mxYTfynjOc4lp8n2WAusPnJ0mJQFK7v37QkqTuklZU tnqIswyE/oOd6pufMG74u6bbYZve+ApM3b7Zli+E8Pyg6W+tz1RxcHGcTmcykMK316cc LUBM256+2HbMmx9hRSmMDeQiyL3Rq3sOsZSKyAj8s8Z/mqaB/WaY7mvAo/e/OzP+iP5y 9dZszyvUph+0J7RUe79fX1n/CELkDCL8vwhLU2dWg1KmyLTTud17ZRZdF97E/T8sm0bz yJyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692729271; x=1693334071; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1WMcORuWW+xF6gY7IBQ8Ihi596ledfzVY2HvWzF69bs=; b=A6gcDQFQV4jmCWsnaRbZ338u3heaHTK0guDLfnt/Sv+7DMNRm6mloi0bp2r9DYUC2p QwFPH+UK6v8LCLLQT127kymdHPqaGGH4rK5dbm0vCSeyEqYt/LvAktvXGK8sh/EcpdX4 vUXPZbvsnTLWq/vc9XOW1dYCN1G5egO7t1/ZmfkMGJuJChqHvRfY+ekhs/iZKL1AGKAY LjX4RUh+u0b8PKq3uFXr3+/KoSvcidfxLHlSaifqeqCy2oW3vZWlZdcheCB2xgwIIdEM 31OovSR56qquFDrRErwnTpjF537U2zZGdq3yvMOpNuckpS21OF49YcMIRufc3vEduI7w C50A== X-Gm-Message-State: AOJu0YyNyzlSRxQX7xiHll4u8M24ydvxh1z/CvG2+2JcilPnPNwKCv1W hnyjiGW/qGSI4lkY9LIbcK3H8w== X-Google-Smtp-Source: AGHT+IHKsobp5e7zqoup7VzvHJTcsGRbS8zrGrBX3SoF5xm7qcy+ngOpN9zqdPkzqj75WfGQt5eHpQ== X-Received: by 2002:a81:6009:0:b0:582:5527:ddc with SMTP id u9-20020a816009000000b0058255270ddcmr10163921ywb.34.1692729270534; Tue, 22 Aug 2023 11:34:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j1-20020a0dc701000000b0058ddb62f99bsm2956436ywd.38.2023.08.22.11.34.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Aug 2023 11:34:29 -0700 (PDT) Date: Tue, 22 Aug 2023 11:34:19 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Peter Xu cc: Hugh Dickins , Jann Horn , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Vishal Moola , Vlastimil Babka , Zi Yan , Zach O'Keefe , Linux ARM , sparclinux@vger.kernel.org, linuxppc-dev , linux-s390 , kernel list , Linux-MM Subject: Re: [PATCH mm-unstable] mm/khugepaged: fix collapse_pte_mapped_thp() versus uffd In-Reply-To: Message-ID: <1b7c7056-d742-86bf-fec-fdb024b2381@google.com> References: <4d31abf5-56c0-9f3d-d12f-c9317936691@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: A5E7F140032 X-Rspam-User: X-Stat-Signature: cdojack4wepyuq94i1zmwtzhk8cmr65p X-Rspamd-Server: rspam01 X-HE-Tag: 1692729271-410756 X-HE-Meta: U2FsdGVkX1+UPo1o8Sz0F81H02llpoT8aWyA+y6N1LwaLU7KKTgyIqr/bUZ3+rELdATfD2rp0Edt6/6JIQOaRT2Gg7Q0I7ZnUqQ+it7EQCunYyPTPUKQ9Oyt22G4pWEoj0yVHM9gb3KG/RIoJ2mKTLUvFRC5SP3UzqLoj8OYlHU+cMmWc0YEBjC+jLTD6pkB7/P0m0uN/3P+/TmYqLxYGRqcE+ZQlqSok0U9iZ3+uCOWohjVhMFujoKORuirJjQJFVcfHEQV6gF9tHrMdPF7wM5ywBGXxb+LNAOSOVFONhkGFO0sO5xB1G9tDPxJPuH9q2I8tGUUap15g4WI0ZDU+b6G/qQXeZJojhqYgyx/IHOfU9NkOHyhRfFQ02V7i+tg319kEX02FI+Nyo6wYQDrZLFeXkrfWcOz/NTk87dc0YlScF+0tF3ow6/b3CPEumf1ZipsV8nEwN7IGwYApuXdFNlpW2VWSONAa39ib8Bft9GCwMGFg8nyTtrR14ACym4AkDKt+JV0endwkQSHIqaEsv1znxzRroc9vECFcW9gcFdaAsw7787YLVRvVPb+0+TGt6faYE3fPqE7rOek9KGNvj1EqGZpfNp0O/b8pSFtvbW5TjXSv6B9o7vvbblt8krnyyXPbeGBQbsTQYpxXgYCYZeKXXa9vTGcLir/MODDY0FSRFDXLM3b+qX/4M9X07mlpROby5CI7ptg38lAr9QugJj0wsp5C1BONOg5YAJ5dJSuC1UXf5Xm1+1ZpHI2YP5d5/a4vAy9xgaaOqulkEbrznSRe1aA+8dLXBJo/fAFM8Yu72WCGEbH7ONktF7VeyyAjfNmmCx/30PM6evd3AJF9vAMyGk+Bclq2IXm9CBY6xgR2dBfdMRAxlYTj3ZPEOsra+7Hq+/syOQRapXqsb4QvkO6h0ju+PY/2VKP0MXYqZCsx193l0EUbMU4Jrmh4LTOB273Xfv5sZJAwz5BJ5G DNjyK4W3 88s9uTRrfA7H32vc8S5SFJAFNmUN+H2LVMIvGD6yat2hMyu3PyHJp369xSG8/92u/fKzVkgchFKdWVbHxmU7eFHTmPOyh4X5YkGfIriFa7kxogNipR78kVN7pJ9CxSsEsPUGJ79cQEh5iZoV9N8zvloB6aBiG8dOEVq2dT/w+1XBuxkhmFd+XBGA9fz0ykQRxu0IIN29PxA488gFSXHXE7rHtK9qTy961CxB/sFodM9Ul3j5LrVPuzTkUDJGDumTx2zBuq7uavfnjadm6qukSsK/iL1Cz2oIlkH9wmI0IQ8PfFaaG7GC9lp1I3MlA98B+qdepXiLbN7WCYxcWoLUB1s8MkY1Cv7mTLGFrZYOL5a/JOQ+i/AwR7ICVk9VwUqfXIcNmPwRXetV42vOiYOloZjyoKTwrtjRNisxA2OzFiea8qR7vWVDCPJVVx7WVY+EFRuNv6hHCgtN1VjkUAZB3a0suJAzurh9FCeVS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 22 Aug 2023, Peter Xu wrote: > On Mon, Aug 21, 2023 at 07:51:38PM -0700, Hugh Dickins wrote: > > On Mon, 21 Aug 2023, Jann Horn wrote: ... > > > > > > I guess an alternative would be to use a spin_trylock() instead of the > > > current pmd_lock(), and if that fails, temporarily drop the page table > > > lock and then restart from step 2 with both locks held - and at that > > > point the page table scan should be fast since we expect it to usually > > > be empty. > > > > That's certainly a good idea, if collapse on userfaultfd_armed private > > is anything of a common case (I doubt, but I don't know). It may be a > > better idea anyway (saving a drop and retake of ptlock). > > > > I gave it a try, expecting to end up with something that would lead > > me to say "I tried it, but it didn't work out well"; but actually it > > looks okay to me. I wouldn't say I prefer it, but it seems reasonable, > > and no more complicated (as Peter rightly observes) than the original. > > > > It's up to you and Peter, and whoever has strong feelings about it, > > to choose between them: I don't mind (but I shall be sad if someone > > demands that I indent that comment deeper - I'm not a fan of long > > multi-line comments near column 80). > > No strong opinion here, either. Just one trivial comment/question below on > the new patch (if that will be preferred).. I'm going to settle for the original v1 for now (I'll explain why in reply to Jann next) - which already has the blessing of your Acked-by, thanks. (Yes, the locking is a bit confusing: but mainly for the unrelated reason, that with the split locking configs, we never quite know whether this lock is the same as that lock or not, and so have to be rather careful.) > > [PATCH mm-unstable v2] mm/khugepaged: fix collapse_pte_mapped_thp() versus uffd ... > > @@ -1572,9 +1572,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, > > haddr, haddr + HPAGE_PMD_SIZE); > > mmu_notifier_invalidate_range_start(&range); > > notified = true; > > - start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); > > - if (!start_pte) /* mmap_lock + page lock should prevent this */ > > - goto abort; > > + spin_lock(ptl); > > .. here will the ptl always be valid? > > That comes from the previous round of pte_offset_map_lock(), and I assume > after this whole "thp collapse without write lock" work landed, it has the > same lifecycle with the *pte pointer, so can be invalid right after the rcu > read lock released; mmap read lock isn't strong enough to protect the ptl, > not anymore. > > Maybe it's all fine because the thp collapse path is the solo path(s) that > will release the pte pgtable page without write mmap lock (so as to release > the ptl too when doing so), and we at least still hold the page lock, so > the worst case is the other concurrent "thp collapse" will still serialize > with this one on the huge page lock. But that doesn't look as solid as > fetching again the ptl from another pte_offset_map_nolock(). So still just > raise this question up. It's possible I just missed something. It is safe, as you say because of us holding the hpage lock, which stops any racing callers of collapse_pte_mapped_thp() or retract_page_tables(): and these are the functions which (currently) make the *pmd transition which pte_offset_map_lock() etc. are being careful to guard against. [In future we can imagine empty page table removal making that transition too: and that wouldn't even have any hpage to lock. Will it rely on mmap_lock for write? or pmd_lock? probably both, but no need to design for it now.] But I agree that it does *look* more questionable in this patch: there was a reassuring pte_offset_map_lock() there before, and now I rely more on the assumptions and just use the "previous" ptl (and that's why I chose to make the !start_pte case a VM_BUG_ON a few lines later). I expect, with more time spent, I could cast it back into more reassuring form: but it's all a bit of a con trick - if you look further down (even before v2 or v1 fixes) to "step 4", there we have "if (ptl != pml)" which is also relying on the fact that ptl cannot have changed. And no doubt that too could be recast into more reassuring-looking form, but it wouldn't actually be worthwhile. Thanks for considering these, Peter: I'll recommend v1 to Andrew. Hugh