From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37789EE49A6 for ; Mon, 21 Aug 2023 22:00:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 663E394000F; Mon, 21 Aug 2023 18:00:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ED6E94000B; Mon, 21 Aug 2023 18:00:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43FDD94000F; Mon, 21 Aug 2023 18:00:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3301394000B for ; Mon, 21 Aug 2023 18:00:27 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0556B40127 for ; Mon, 21 Aug 2023 22:00:26 +0000 (UTC) X-FDA: 81149481294.05.35C7DD5 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by imf05.hostedemail.com (Postfix) with ESMTP id 2B65C100019 for ; Mon, 21 Aug 2023 22:00:24 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ipxAnVHK; spf=pass (imf05.hostedemail.com: domain of jannh@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692655225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7KOwCMKv6gJICwnu0jRAkIqxjiFqWwB1qyLCAsgnqG0=; b=l/oT1UGBO54oBVV6XeaIxRzVStmXgZ+bwhvjBGwg5y4Qpe5OGze59Wua6/w52K8ezgke33 jQHHJ+DAkVkfUyM3+WQZe2ipERiKGHpnPlJb6W8SqQvtHDAX4N3iM44hwISYwN+nQ4VZFR ZVtB+pNbVf+JhSAE8EWMENVgTgIQRT4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692655225; a=rsa-sha256; cv=none; b=X6HB4gwkaLHeJp1c2kBfNNlGWHVH2jdQL7hLgGAJrXCrAeEhoSL5cipLpXsxFgAk3fvRtt SCxA0P58uEWm2SBsQxYa2a7thZQbjNYXfGGMUQ0jLEo7Xfs8wxf/G09mH+9wLhp5caBwcD 3NpbF0aq9RF9WBsCQ9yF59JaT+1fPd8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ipxAnVHK; spf=pass (imf05.hostedemail.com: domain of jannh@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-3fef56e85edso14215e9.1 for ; Mon, 21 Aug 2023 15:00:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692655224; x=1693260024; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7KOwCMKv6gJICwnu0jRAkIqxjiFqWwB1qyLCAsgnqG0=; b=ipxAnVHK5hrxVLmvntmZQrofxIux7YPNhttG1XpTAHAPHEbJX74AyC6D4ZvDaOaFFY 0u7VO6HQsy9Si6WU/4KQ10X3DsXebeRA9C4uF7BsmGmhr/b1m4wFOGUNBoH1rH/c96ns MggRJw+aieI4JfjXdgUWoV/cCP0nogIMoWDaf610sIRFm82YPcnV89QELcdfkpl7k5tb vj2fNyIEO+jILsmwR1cpX7MNprfnDDbjw88QMqN1iZkXUwb/wTCTWxbJP9qJIuSMBYEJ pJo0wONEosBk90GtUzsRO2I04ynlMXp9sa6iFVk3XLg4jUVcaNA+gfK91UV+6QJq9QkO icXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692655224; x=1693260024; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7KOwCMKv6gJICwnu0jRAkIqxjiFqWwB1qyLCAsgnqG0=; b=aAU1eE28e9Vd9UJqKETyla/cEz6UiIxJr7p76IrhRhkhJet3b5vSqw5iPbiEyU/hLI +hGiIdziRCgqkpYZJT+0SexSIdtF/KgvD3sklmcRbywRPE2pPUhtJbHK9Ya0R0G83Pns CMQbj5SlfpKJLa3uqdv/ppvgG2mIe8mNd2brIXE2HusoWx/N8mTNpuHS+pXvf/ibG2YS 2NUI9/AYClCdWCPk25hHnvPgEj0H9zb7GknTqi564zDuAsJqx43brPfzb7foMout6xVl 3YV+CkkvMpwYesCCDypD+G0awCoGdqRH/I8lAZ/lX5eovWL/OrqfcP4QEUMnGSwqOJbQ WXxw== X-Gm-Message-State: AOJu0YzdusQd7+N6NQf0ThGb86eehWOVYMQ6LQHoHwLhZCRocuusCQOX MkAmEr5zrms9jadH4ejvhy8U0MrSFkeLUoATkTSZtw== X-Google-Smtp-Source: AGHT+IH64SpGveN9dGK2zvuRPaovHVk4+hMnYMZFm087iCMpl0qOKktqy2IA+4bd+qpox23ZOFiqzmDaa1UD37turPI= X-Received: by 2002:a05:600c:1d23:b0:3fd:e15:6d5 with SMTP id l35-20020a05600c1d2300b003fd0e1506d5mr45830wms.2.1692655223487; Mon, 21 Aug 2023 15:00:23 -0700 (PDT) MIME-Version: 1.0 References: <4d31abf5-56c0-9f3d-d12f-c9317936691@google.com> In-Reply-To: <4d31abf5-56c0-9f3d-d12f-c9317936691@google.com> From: Jann Horn Date: Mon, 21 Aug 2023 23:59:46 +0200 Message-ID: Subject: Re: [PATCH mm-unstable] mm/khugepaged: fix collapse_pte_mapped_thp() versus uffd To: Hugh Dickins Cc: Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Vishal Moola , Vlastimil Babka , Zi Yan , "Zach O'Keefe" , Linux ARM , sparclinux@vger.kernel.org, linuxppc-dev , linux-s390 , kernel list , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2B65C100019 X-Rspam-User: X-Stat-Signature: k1t5mdp75t83ob4dho8dmdraa4iinaue X-Rspamd-Server: rspam03 X-HE-Tag: 1692655224-88578 X-HE-Meta: U2FsdGVkX18BOSuap9Lz2NydgiriHgITuxjFA7+Zb2F8mFe1ws/z/qUztgt5W1Q+wuu4BVQIH35AYxKJeY4M+4Yij3QWb4WTexy0IaRdgatNNm3KOVOXeY9m6bMHL98915n3YLAoinW1Bxt9R+4+ZzGodw4Bxehn0nttfso8ozIDKn1tNmb3F5QWWrt6shdA1otjrh1vpE0H9ME760VRFY0reDVyn+sy3k9G5X0g2LffHCJ7w52q6JWfB2Qw9LJBMBfWOFTC7/Ff2lp3FFlX+xXX1u5aWHFz8knNIQ/O9rB6nti8fNwXqogDTQlMP1GKV4FIi+n9Lhl8BY7UQxXpR8MPaVRJfhCwsb21NXb16PIGsakPwtGL/O6yxxJVitdFdUVzmLhUp85nSMN2+YRBdroGGWqfQih/Yp2mvI6/ckZNFIQ7b6MP/E7qqJMlWbULgOK3DoeAG2A1+de9l0DgERjgm2bV3KyJGj2f3C8hQ5089bby4sGNa/X4L9hAPTHSd0TBslIe0zVtMcCBPT9SseNohHkfb//uoU0WcFkseICrMrFMfrC5x3DFVlWolCFwSelN07U8YwFT7sxEj9hUqjU7mUXYEfeRi0V6SCFNk0aJAEj6N1tMbOGLi3KknfsteVpy3yDvGZfii4OlPEc+kp1XjPopjC8mpvfy2swUzInxpRVPOduonY9F90Vo2O7wUfBMqBoEJsY1jYFhxYOb6P1oC8loIkAgguUEerz0ROF3govPaJadj8XbJNU7LrO7TedhnMkMbGe1Mhj6/qY1HgcYiYMEZTsdut/OzR7IfuoXijlf6fiur/md8AwtCSobuvVAwEAVS/0IQ8EX4mDGzlE1FyYNK5z2OfJYVijZR6W2KahDEEhbLjpZNmWHDTWpBkWFMeb4rx1weG1K+MBFdkZzn/NjF9Vh2SLZN4KpNhsFCfZMlKektfd9LYqC8SE1Gr9HogoVCknjgC5Gx/j bp6W/D+9 EK9yJboPOKL8qyCnZIRedhkiiusgfXEXO0uyaDrBfkyVpIh2wnuZKOxs4JaNKeLtqT1HeQTHvzBJqFDcfm8iPtATi0i/JNq8IYzGZ1f4Lg6CmEAX1ySpPWc6hDT1lSjWIZWGRRMYxHzb2kPsBqVeHmX6m0KI/UKdANU21CeyxC50zwZAIkwHAsw8LYW9N1qWbO78PNfDezCmzRyxmtwqZ0WTxeEa+5bUxj61SxIgwrTEgOVtxPn71w4aGnKoBQ8RpCzbGIi5HjPSzkuffH5UU/43Vv5VgR7lpS+pZSGWQl4AOb0l2gkJbXQb0yVGKQIfxS7qethQ2eePUtV7KxSMxosYaakD3SY8kZypYFQOVt7DDHHU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 21, 2023 at 9:51=E2=80=AFPM Hugh Dickins wro= te: > Jann Horn demonstrated how userfaultfd ioctl UFFDIO_COPY into a private > shmem mapping can add valid PTEs to page table collapse_pte_mapped_thp() > thought it had emptied: page lock on the huge page is enough to protect > against WP faults (which find the PTE has been cleared), but not enough > to protect against userfaultfd. "BUG: Bad rss-counter state" followed. > > retract_page_tables() protects against this by checking !vma->anon_vma; > but we know that MADV_COLLAPSE needs to be able to work on private shmem > mappings, even those with an anon_vma prepared for another part of the > mapping; and we know that MADV_COLLAPSE needs to work on shared shmem > mappings which are userfaultfd_armed(). Whether it needs to work on > private shmem mappings which are userfaultfd_armed(), I'm not so sure: > but assume that it does. I think we couldn't rely on anon_vma here anyway, since holding the mmap_lock in read mode doesn't prevent concurrent creation of an anon_vma? > Just for this case, take the pmd_lock() two steps earlier: not because > it gives any protection against this case itself, but because ptlock > nests inside it, and it's the dropping of ptlock which let the bug in. > In other cases, continue to minimize the pmd_lock() hold time. Special-casing userfaultfd like this makes me a bit uncomfortable; but I also can't find anything other than userfaultfd that would insert pages into regions that are khugepaged-compatible, so I guess this works? I guess an alternative would be to use a spin_trylock() instead of the current pmd_lock(), and if that fails, temporarily drop the page table lock and then restart from step 2 with both locks held - and at that point the page table scan should be fast since we expect it to usually be empty.