From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D219EE49A5 for ; Mon, 21 Aug 2023 19:48:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7846490000C; Mon, 21 Aug 2023 15:48:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70CC48E0012; Mon, 21 Aug 2023 15:48:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55EE190000C; Mon, 21 Aug 2023 15:48:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4167B8E0012 for ; Mon, 21 Aug 2023 15:48:44 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1A310160526 for ; Mon, 21 Aug 2023 19:48:44 +0000 (UTC) X-FDA: 81149149368.14.7034596 Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf30.hostedemail.com (Postfix) with ESMTP id 54C9880018 for ; Mon, 21 Aug 2023 19:48:42 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=O+l83h2L; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692647322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TJzCfpxhVh8Bz41vGC0rk+kXrKf+GdQm6b6DEBJLp4Y=; b=tfz/iV7yx835Njc1GOMimaw77sRSV42ivgI4KTxFgy1ZzObbksLw04KJbVEXK8d10sE3rw A4qlGK+yeLlRW5w9NSeaTigY9RW3dDZVOiZwNncmaEuwCqGB/Fmu0pvzH0+kbQk+P8CZQg Q2R+7JhEbuv8VWuhaO9Wr/amLqCuU6M= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=O+l83h2L; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692647322; a=rsa-sha256; cv=none; b=5CpLiGax7Q+C1+PR51MKTNOyO/qIgy5z2/buQR34xUfU+w57yd+218fvx1LP4hIrUjGbvB 2oVNdAZIQ3GyYjfxpNiZRtgGshVNnI4K+Mms7Ax4Q6v83sQDHyQEIMaFVY5/mblWLZ0BSi MO1p53KppZODNm0L6Y/sqMngrvTqOis= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-591ba8bd094so21497907b3.3 for ; Mon, 21 Aug 2023 12:48:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692647321; x=1693252121; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=TJzCfpxhVh8Bz41vGC0rk+kXrKf+GdQm6b6DEBJLp4Y=; b=O+l83h2LbZSW4NI8EbJgXLVItI++RMuU2J1wzOBcgSbKmzQuEPI9eXLtSoWmn6sq+J am9dOa1SFWGD7j5vN6h935pR1gM8rVAxXJEVaSWFJUrDFCRm4UId8TzMlS8g+nEJHA0L HVmidw/jdT5JEjucAIAVHt18vnuAYYelLXsgg/3kJuYSEB8p5NXviLWxA6djEjfHQuvd Tut8+np5BGIFwbD617Gj+qlh50yuNuK3sAB8TJC0oDpzrOvNE36+gxIvWggycyL1tEG0 x92tpf/QJeruWhF9JQ1/ULRqj6tCPttInh9D1ON2tnwj/rA0zB3U4trgXTgmjiozXI6e zKYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692647321; x=1693252121; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TJzCfpxhVh8Bz41vGC0rk+kXrKf+GdQm6b6DEBJLp4Y=; b=Vp5TFvGuZ/qhSSRUMnWMZOccDRPLywY8nFxTcC5z2I6jFlBei/XspV+z8GYgMyC2tA B2UBoJiWJZytis91xY5w0L6LxRD9xPzQ8NB3sfu02j8s8+Pp5ReCVTCLoJuX/FCze1iX V6zaORiWRKHuK1hIGXyoHo/j5B+uDY5l8iFiBfWCQ74mobFCCjnL99yFoZZE1W5umZyv SK8EQJcLZ1vj5kUgf5cKaXKQyAok7wXfysDytb7BghYYTRIyAVC+XcVZfvhRvmJobOFL YkcgEJPc5TyVakWLtwjqRQyXpgeXAkEm4feYiUQdl3sOJXW9WKFcah2TIq36ZCiJXQp5 GJ7A== X-Gm-Message-State: AOJu0YyjdXvK20NxKdfJ5hGVPVOpE8qYhoVdcC7Ikjr5itzrQWu+x0Lp XfQ1r2cdIuSQtebB3xyU4mHcRw== X-Google-Smtp-Source: AGHT+IGa6N7zVO856T4SuLkkv15XMUTmB6PRYCGx3mJYt/D6fAu3o2zmClsjUxquvXD3gtA1EwMqjA== X-Received: by 2002:a25:157:0:b0:ced:974a:1aae with SMTP id 84-20020a250157000000b00ced974a1aaemr7593100ybb.58.1692647321154; Mon, 21 Aug 2023 12:48:41 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id z4-20020a259cc4000000b00d5c4c949349sm1335996ybo.13.2023.08.21.12.48.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Aug 2023 12:48:40 -0700 (PDT) Date: Mon, 21 Aug 2023 12:48:29 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Jann Horn cc: Hugh Dickins , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Gerald Schaefer , Vasily Gorbik , Vishal Moola , Vlastimil Babka , Zi Yan , Zach O'Keefe , Linux ARM , sparclinux@vger.kernel.org, linuxppc-dev , linux-s390 , kernel list , Linux-MM Subject: Re: [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() In-Reply-To: Message-ID: References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463760895-1083013748-1692647320=:5414" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 54C9880018 X-Stat-Signature: ebcpi3kdn7x8skbox9y8soxg749uj384 X-HE-Tag: 1692647322-412496 X-HE-Meta: U2FsdGVkX1/1QkHmfrwwQw7OoEcje59p5gIpfDtywP0eSuQ4uRpUmIsijAffolo47knmQhVS5/MAFdqPD3ChAYcQ1FJHx+JXJBYG7jkJ5vU+VkKUnZepfbDMaPcqscktIwgt6FcnPH62M42GRKR8LxckSwFJSqjdIfT75dw4OqqWCq2xdcU6sxN2CA0j6A1MkHWzf5DFvDl4bEWdxabScrCH0pP/JAmyY1cGr2563NB2YKlbZPLGtDUFqFjQVIgrqH+YQHbTfIn9qoW80iOxgvrdFiRNaYmXqOlbKgvT2MlRvtA00Uw5WHtKrwialFXIuwLzg7Vq7+ZXnO61ft9cle8oErubXXDtKY1Ptv6zhklK5IsYHeO1PlkqNyVeqg2smufIfc59VtrHsDLVa1LLfVPTNGPJUXRDFirD4IfX1/goPNRjwNPXkF7kW4MW/pGCStg79hbqu82MSppa9ZdaR4tPyezh0Ed7BuC0ex8eSA1ME57WasZupG6t0ZSj/mxgp4es016cjcRIpLClBU13SsWEk7th9alaN/lFjNhNEvWAkD01gC3uemhCnTLcQ58L3EW09TC4nrEWxa1baUFzQIAKKC4P8G0r+Mu4k+NGppQUVhePZCZhdlFn9pI3yyReYGyjQuv4HfpqOGJiOSUZdC1deRT5AzqYFuNgJwUR6RckoskmGOx+6VJD0z01AkP8MfYmDWydeQ6j6w2LkUlX0v8jG9SGIgyvB6IPeJHSUdzTO71OJFEToE230rAsRcRLDih8qM7rtTNaA30JsTF11QUAd7FIdUFLVarmM5hfnKEPlhd/XlwBLHKbwSNdZVNDmvk/iWBPK8Sve/KzFESpx2B376nADgi7uXyegk2dI+FY0osG5bQ5Scw9Y8CM8gKs8ZGQ5AGRbXV2pyhteCP6zJdXzY50v4cSEoEHUnUENbFnljzrlgM+6z+bvifqWSdGVdm0rzk9miugrbSQ9ir CPK+XCL1 Jj26rjhKUUiJRp6stMLh3Z3/yF15d1yb+hC60O1vMNFHNspdeSkh3SWaFchbC3Qaq7Kd8pM4e2G08OEygzDrXZSbcGyoX/yjucleBX5C8n0b8XiEY4E8ZVPQG3wsmdXlRl5/iQtOrJiIA4NaQ+IwGNQF2K7kC004/KqqFoPZ3DuAC/hjyNZqdhpCLHWGPBMs4ShpH+PGsSjoOfHAbMq3lW25FRWoCuQaQ451LCn9lMUzGbraYmkj2GLQcjSj0XKcwE9p4T6SPb+kT2n2TRmQS4FMzCCrOV1N5hF29j3famOspZgdQqMryMo0t/OxtvD8sMkMz9ndVyyYG0idtH0lO5kTyZvUHQWJoZWOPl1gVFAKdz2HJeIkaftG4AmuAIrCL5+FBL0pfl2yH1fH533luMyGqSxeC1oiXikPHNgmcbmS8/P7vMiWv/8AplpQWuwFT5C1hdrvox483XsftkNE+BIwzUbGIfA8LrIms+QSLIBjp7M/afK69r+AYDHWpZ8JOgGhDtp42+OTjPK1oybfQEWsecb+m+F8iJ+i4wv+9tnGfPCFP4/4zrc1eIuj1cvcsyt3wEkbM9YItaHKDRiWCy7/HyeEIlbhv5PYLA6qwhGFmq6P57cm62H+1X9R/sxkZbXpoKwWGYdwKAbG93Oi2SdaerE/+NIZ48j0p X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463760895-1083013748-1692647320=:5414 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 14 Aug 2023, Jann Horn wrote: > On Wed, Jul 12, 2023 at 6:42=E2=80=AFAM Hugh Dickins w= rote: > > Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp(). > > It does need mmap_read_lock(), but it does not need mmap_write_lock(), > > nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing > > paths are relying on pte_offset_map_lock() and pmd_lock(), so use those= =2E >=20 > We can still have a racing userfaultfd operation at the "/* step 4: > remove page table */" point that installs a new PTE before the page > table is removed. And you've been very polite not to remind me that this is exactly what you warned me about, in connection with retract_page_tables(), nearly three months ago: https://lore.kernel.org/linux-mm/CAG48ez0aF1Rf1apSjn9YcnfyFQ4YqSd4GqB6f2wfh= F7jMdi5Hg@mail.gmail.com/ >=20 > To reproduce, patch a delay into the kernel like this: >=20 >=20 > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 9a6e0d507759..27cc8dfbf3a7 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include >=20 > #include > #include > @@ -1617,6 +1618,11 @@ int collapse_pte_mapped_thp(struct mm_struct > *mm, unsigned long addr, > } >=20 > /* step 4: remove page table */ > + if (strcmp(current->comm, "DELAYME") =3D=3D 0) { > + pr_warn("%s: BEGIN DELAY INJECTION\n", __func__); > + mdelay(5000); > + pr_warn("%s: END DELAY INJECTION\n", __func__); > + } >=20 > /* Huge page lock is still held, so page table must remain empty = */ > pml =3D pmd_lock(mm, pmd); >=20 >=20 > And then run the attached reproducer against mm/mm-everything. You > should get this in dmesg: >=20 > [ 206.578096] BUG: Bad rss-counter state mm:000000000942ebea > type:MM_ANONPAGES val:1 Very helpful, thank you Jann. I got a bit distracted when I then found mm's recent addition of UFFDIO_POISON: thought I needed to change both collapse_pte_mapped_thp() and retract_page_tables() now to cope with mfill_atomic_pte_poison() inserting into even a userfaultfd_armed shared VMA. But eventually, on second thoughts, realized that's only inserting a pte marker, invalid, so won't cause any actual trouble. A little untidy, to leave that behind in a supposedly empty page table about to be freed, but not worth refactoring these functions to avoid a non-bug. And though syzbot and JH may find some fun with it, I don't think any real application would be insertng a PTE_MARKER_POISONED where a huge page collapse is almost complete. So I scaled back to a more proportionate fix, following. Sorry, I've slightly messed up applying the "DELAY INJECTION" patch above: not intentional, honest! (mdelay while holding the locks is still good.) Hugh ---1463760895-1083013748-1692647320=:5414--