From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74318EE49AA for ; Tue, 22 Aug 2023 14:40:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9725428003E; Tue, 22 Aug 2023 10:40:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9231F94001B; Tue, 22 Aug 2023 10:40:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C25128003E; Tue, 22 Aug 2023 10:40:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6A07094001B for ; Tue, 22 Aug 2023 10:40:24 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1AD53C034C for ; Tue, 22 Aug 2023 14:40:24 +0000 (UTC) X-FDA: 81152001168.26.7A9FA18 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf08.hostedemail.com (Postfix) with ESMTP id 3A855160021 for ; Tue, 22 Aug 2023 14:40:21 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=OIlW5unF; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692715222; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1C3quYQxeFIPsrSFnkoQ66g6yHaIAyBnzAc9BMpYipc=; b=AArw+83bzCYTfNSAH2OGZWKAvkih3LsFGDAlD2nOFiLBiE3dlFKW5NQuslVW5AbCGuLgXR QL43nGO26DuRIKqRrdUa6vfR7sA5kWVYnNUVSg6LxpduPr3NdI0nLcdx7DeJ2TG2de0b7i pu6f93r3b2jlccD7XBUv8PjU85cgQjQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692715222; a=rsa-sha256; cv=none; b=JPM8cxgMbMlyHc6GDYgkfgOBe7D1ERfpxOlZJEZj7Gjiu6bl/Mj70LbbxJAJkDZBJh68oB hJrpv1c7ZQnGO1t3ZnsJtKvjJooLQWfb9pb9Y95SA/TWDo8pnZDUBNU1UbPjdarmQ+CQB+ Ha9FpdXji7aK0JWMVk5rFYZ4EapdtJ4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=OIlW5unF; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-529fa243739so14143a12.0 for ; Tue, 22 Aug 2023 07:40:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692715221; x=1693320021; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1C3quYQxeFIPsrSFnkoQ66g6yHaIAyBnzAc9BMpYipc=; b=OIlW5unFzV7NWOV6TmLpbogd3cj66r1HwrEAEgnOF/ZkPVlYd9eVUxDfxmFaXHg08Z +WZDP1byYy7Gdu80MZ3Qu2k2pNdEoCOyB6ATO//CRJGvDD5AorBPal+hvU31QZLFCMBJ 6e4zzSDy/w4RB03LeKCvDfLUw5YRCZ28WtCzWt2Vbng2AJDEzISlqMA6jYWoMUXyaL1K qPhxmkouE6NH0WXVEhbblWYZd/uH//lE0dz/pdeDr5wClUDPBLh7bplFMBpzoPs4OvzL 8leJoRVtJ826ssAKqPLkH9r1rGZxyXZwvlP1wBqLPNiDz5aY4+PZ0CiZ1yMzuJTRBRsk 3e/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692715221; x=1693320021; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1C3quYQxeFIPsrSFnkoQ66g6yHaIAyBnzAc9BMpYipc=; b=gcrobpEzOvWfl2khaIOSHgzclMQ6rNsIHSd8UGxYT+aLd4ny+XqeeFvu4if0/NFQSn kP8I73k+2Rr+cy+exgKwuo+n0Kynu1pRKiWI64i1KPhWNvcm3skhazFCRLgT4N4Fhbou RJm47gC7hQHV/l5xaxodMT0/OG9dXQZcgh0/M/z665FFoJ6EsLg77LdY2aPr38P/RVWI tcbHLxrSi7/wLJ3Fw0FFDIfoyIo+ECf7gmVv8Of6fu/wDprklw9RQyziF+DZz0EYcmTt sc2jeWrB38aXAa9z9DdXs10bD7zdFPa9HGpDFNXKsgshW2jo3yYuBEGgVjHeCUZvszHc tX+g== X-Gm-Message-State: AOJu0YxJcrlbGIPUAOyUXUSr+eaMjWCm3e+SndHD08qaeRo05Zq/Obqk 8gpjr7h9VzM4fnyVmhnPNdrXLi51XhO6F70rc8AGgQ== X-Google-Smtp-Source: AGHT+IF6H9w5RaJ7PE+MBKpDky923+JUzdrwmx+So9M6hSjxTrf0I+zQW7oUyRqPX7VRVw66ynmMZa9PJ/Zeqq91bQo= X-Received: by 2002:a50:f690:0:b0:523:d5bc:8424 with SMTP id d16-20020a50f690000000b00523d5bc8424mr108311edn.5.1692715220567; Tue, 22 Aug 2023 07:40:20 -0700 (PDT) MIME-Version: 1.0 References: <4d31abf5-56c0-9f3d-d12f-c9317936691@google.com> In-Reply-To: From: Jann Horn Date: Tue, 22 Aug 2023 16:39:43 +0200 Message-ID: Subject: Re: [PATCH mm-unstable] mm/khugepaged: fix collapse_pte_mapped_thp() versus uffd To: Hugh Dickins Cc: Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Vishal Moola , Vlastimil Babka , Zi Yan , "Zach O'Keefe" , Linux ARM , sparclinux@vger.kernel.org, linuxppc-dev , linux-s390 , kernel list , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 3yf1x4yic8h6hyjsrt7smjuftwor4myt X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3A855160021 X-Rspam-User: X-HE-Tag: 1692715221-847586 X-HE-Meta: U2FsdGVkX1+ibbSggRjYBhAApnfgkP+16uDiQIO6us1JihUYQtI9Bp2TA5uRvmaMc2XE6YGynWqAbNQUx8po+nGtss9UWLEXAgVsSvcJdHriSe/SiKQ63I9nQF66/jMRUwpimG/xQMzkP3Su96Idh8yshBkfBL98aonFaM0Cmv2EpW3vaYlVldqN4CQ1WHzvjJPC+X35MeNX7AgEfxmJsPTpwIYjJZo+WY/NblTCybVgy1FCDetUReNBnhC/rHRC3bqmK0LapmEL/uQm0acUvKifo26kv3ZLA6abRMVu7NNbNSg9hNu2hj9SZooBCVwMvXLvTTXc2Y21Kh66berP1KfDmpG9SQZNM65/9/niq1BdKNvCGwC/KpWF7vfZGs2w1sLu1A5eyAC4rnwZj9/L1qfT11aTPUiHSFJbMUOQn/x4+XGLgT/TfSshETJnhl6v2L2NhBkzXR0WNEnSPY3Ve7IjRNP25YlZtlNd4MRTUbGsmGMmsCfC6AU7zQI3p3Bi3LwUfmYnawsXDKWTmZxvR09pbbZ2d6XbfLdDk21icEhQF3fapKLiOvsDBDYy44c8F9EFRnqGUPlKJHj08mxGcq+1rFEtkbM8RiTwEVvpfHwyhg5/7rS+9YOGLlgHFbGOuDmjKN4kexPi6CBP0vD2fqoYJjoaS8SL8pI2En9lHSD1jug4VSV7MYI4gPQWkf8TcZ8w5uWRaSfAIcQ14DMf5xX/jqAFhpPcb4iv58XGmdqJAR+8I3qUPGs9D8UHf1JPXmcv2XYBc54UnLMDw/tV+BijUDXHXnAG6QaelxL/8KCtlKpJSKKsZ3uj32R5+03Q1rpxPsvdqzBXaoqKk8KI8ZYWHuxIAgBBAjIMEoqgVAg8ZI+Vo/mpj3N9ni7kEtDy2Ua1vTDT7jM3EMaV304lH69PnTCNZB8nib8oFFoVsSYBYn6MIt1y/stc7NNVRgD84mK72k9fMmqTHF73PDA 2r5Fb4pD GBRB05ZHA1iic6I7j/O72zUkmOkmILTrbJ1m/orix0E6EsGFCSEQ5eGUcOhnSaaKGbq+B5xuHvIZehNpAgfdEZWqQd+5ZXe/6pt/ygeAe5VLLZ8LvX4igIhnI4Tl0HWm94o1Ujdb/C56rWTx0tV5oklEpHKknvrWf4khxSK+QVDeLVZnIs/OXFKfo6opYibfMdWLTlrC7HPqhQswbVQHJn3eYHvXJiMAgG8OAv6IU5QpKXjgME35hYd7FdEOdl/y28PgXGwX4KDtuyIxlmuqZP11wyuxEBXLPa08vxRukd8mRsQjURYAPal69ZgtUb6Ia505zOhNgYTSxwAMEvfvN1aqzR3lwxK6GUnn6Vi1L/J448Jb6eSRs2D+GoHTwHNDGQrvMEN6LyAqtFgOgcSuz9Lmrtw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 22, 2023 at 4:51=E2=80=AFAM Hugh Dickins wro= te: > On Mon, 21 Aug 2023, Jann Horn wrote: > > On Mon, Aug 21, 2023 at 9:51=E2=80=AFPM Hugh Dickins = wrote: > > > Just for this case, take the pmd_lock() two steps earlier: not becaus= e > > > it gives any protection against this case itself, but because ptlock > > > nests inside it, and it's the dropping of ptlock which let the bug in= . > > > In other cases, continue to minimize the pmd_lock() hold time. > > > > Special-casing userfaultfd like this makes me a bit uncomfortable; but > > I also can't find anything other than userfaultfd that would insert > > pages into regions that are khugepaged-compatible, so I guess this > > works? > > I'm as sure as I can be that it's solely because userfaultfd breaks > the usual rules here (and in fairness, IIRC Andrea did ask my permission > before making it behave that way on shmem, COWing without a source page). > > Perhaps something else will want that same behaviour in future (it's > tempting, but difficult to guarantee correctness); for now, it is just > userfaultfd (but by saying "_armed" rather than "_missing", I'm half- > expecting uffd to add more such exceptional modes in future). Hm, yeah, sounds okay. (I guess we'd also run into this if we ever wanted to make it possible to reliably install PTE markers with madvise() or something like that, which might be nice for allowing userspace to create guard pages without unnecessary extra VMAs...) > > I guess an alternative would be to use a spin_trylock() instead of the > > current pmd_lock(), and if that fails, temporarily drop the page table > > lock and then restart from step 2 with both locks held - and at that > > point the page table scan should be fast since we expect it to usually > > be empty. > > That's certainly a good idea, if collapse on userfaultfd_armed private > is anything of a common case (I doubt, but I don't know). It may be a > better idea anyway (saving a drop and retake of ptlock). I was thinking it also has the advantage that it would still perform okay if we got rid of the userfaultfd_armed() condition at some point - though I realize that designing too much for hypothetical future features is an antipattern. > I gave it a try, expecting to end up with something that would lead > me to say "I tried it, but it didn't work out well"; but actually it > looks okay to me. I wouldn't say I prefer it, but it seems reasonable, > and no more complicated (as Peter rightly observes) than the original. > > It's up to you and Peter, and whoever has strong feelings about it, > to choose between them: I don't mind (but I shall be sad if someone > demands that I indent that comment deeper - I'm not a fan of long > multi-line comments near column 80). I prefer this version because it would make it easier to remove the "userfaultfd_armed()" check in the future if we have to, but I guess we could also always change it later if that becomes necessary, so I don't really have strong feelings on it at this point.