From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6F23C32793 for ; Wed, 18 Jan 2023 09:40:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57FF66B0078; Wed, 18 Jan 2023 04:40:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 557466B007B; Wed, 18 Jan 2023 04:40:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4205A6B007D; Wed, 18 Jan 2023 04:40:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2F1566B0078 for ; Wed, 18 Jan 2023 04:40:15 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 099F7C087C for ; Wed, 18 Jan 2023 09:40:15 +0000 (UTC) X-FDA: 80367423990.24.6FC7D4E Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf24.hostedemail.com (Postfix) with ESMTP id 48131180017 for ; Wed, 18 Jan 2023 09:40:13 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=XXI0e3SJ; spf=pass (imf24.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674034813; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qkSsxdGDhi7OFnoEmmEI+/6UTMsWfEo5f5xKht27nUU=; b=X/Sqh20HvPp1i6+19Nx+G/OvAFQpRxv38HKcLZxUoZb5qAntUa55YWV7wPWAtWATSSoyjE gtAB/8E5ZjW+Ex8Rzdjvf/j1YbBwpmCCIYPgl9PLj7TuPCfold+vXC8jvvtYnxwe6LOFk4 izoyZJ+5rQh7BFQNINiJsOqkgLs85/A= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=XXI0e3SJ; spf=pass (imf24.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674034813; a=rsa-sha256; cv=none; b=ZE6S684F9pkifoMfainSy/RNUT6hK8QNcZVWpExbmp7rPW7+sinpYxIDLd/7RjIhvdr3Xz 76BpycTox7ichxW9+0biL61LfydUn0/UD1NEzW7ebya/vmBpePIlR+OLAu1o0W7/4SG6qT ymXJbQpJkbLVJIrhaj/wWueC2pVcH1I= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D418521066; Wed, 18 Jan 2023 09:40:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1674034811; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=qkSsxdGDhi7OFnoEmmEI+/6UTMsWfEo5f5xKht27nUU=; b=XXI0e3SJYUfmbTv7Egu/VOj5qe2veVpfQOCo54bZ/6FeH0dQKnz5D55LikQtuGQQuUNqfI Gn7IKTKaqjbGrtd1XZzr4g/KpxDTk79Mv+uc0zAiZxKfI8Y788ts0MzsC3fxrwN4O4u8pN Wa2w/A2JXRQ13PjgVrmAhrdegohLoFI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A9B3F139D2; Wed, 18 Jan 2023 09:40:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GwnzKHu+x2MPPgAAMHmgww (envelope-from ); Wed, 18 Jan 2023 09:40:11 +0000 Date: Wed, 18 Jan 2023 10:40:09 +0100 From: Michal Hocko To: Jann Horn Cc: Suren Baghdasaryan , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 18/41] mm/khugepaged: write-lock VMA while collapsing a huge page Message-ID: References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-19-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 48131180017 X-Stat-Signature: esic5xqk66krj3rx4m9em3me7oszeofi X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1674034813-850287 X-HE-Meta: U2FsdGVkX19zOStsAqtHA2CiqQvTwi4L6jWq0CmULx1AoCK2o+mIndC9NuFghMgReIINk/Va26ip/BbRQibmMQp+24oLVn/Xea68XEpwGRP+E63oWUFJjXisLD+G3VIt58HprUsDSHFfTMledUlSoThkUDxU6JbgE45KpRTYO+rF8s+EK+8qWix/04YWE2lU8UNr6pVHDPksHAX0GNPFCp0FCiBA6V2TwqpFq1KBlq/x+SqcVYL6qcvEqUuairfQyC1QdD5/dABLXlotiM9BqAFRjErXq04Brwu2VQCtjkR3W3hiAQE867ZeUaKdnPsg0fGjvjvvmKEJgEJUYXo9rC/JVCuElHazsW+i5ioLFvj4dlfR6YUl4kJby65wiJeWI1xFni9LTVh9DQfsdHSa3P1Gh2mBbCynB32QvCdAtUSVcH8KccOByRwZtmsDghl9n++ZQAcLqKJPtKfogqQ3bnk95ASV/hvyDx67epXFI4ZseZww2lMBDN1CvPzONKuQ27erkfzevVOjkGQlN50X2lxwiMCzjBNUY9Q1dozAPuTdhLx8TzkR3Pk3fFjz0bPh0UzJTIXt+73gJwG723EVSqJp+S1AZHU91X6vGCrKuIkwgTRj0qJ7kBN7376nUz+tBvH6T0AnOYH2LmcDqLHELDCOzANVhKmHGbqloR12pWWOsmnHHQDRzPEaZpXEBnVO6d7tGcpMGxeQTEb4Yn1n5VWAOYWiOEW2PPuyfj/EvOcUu5PBMAoheuFa11PEWqaNLDdZW74WNgKbOSE8xe+XedgEliSrV0EBCtoOGm2Ca8uTGNo0M0Hd13Cw8JmUw1IN1cVEjRLqndjG+FswdqZYmVkxefGR+trNRjfhc9RzTVBewhEJRoKW2w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000009, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 17-01-23 21:28:06, Jann Horn wrote: > On Tue, Jan 17, 2023 at 4:25 PM Michal Hocko wrote: > > On Mon 09-01-23 12:53:13, Suren Baghdasaryan wrote: > > > Protect VMA from concurrent page fault handler while collapsing a huge > > > page. Page fault handler needs a stable PMD to use PTL and relies on > > > per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), > > > set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will > > > not be detected by a page fault handler without proper locking. > > > > I am struggling with this changelog. Maybe because my recollection of > > the THP collapsing subtleties is weak. But aren't you just trying to say > > that the current #PF handling and THP collapsing need to be mutually > > exclusive currently so in order to keep that assumption you have mark > > the vma write locked? > > > > Also it is not really clear to me how that handles other vmas which can > > share the same thp? > > It's not about the hugepage itself, it's about how the THP collapse > operation frees page tables. > > Before this series, page tables can be walked under any one of the > mmap lock, the mapping lock, and the anon_vma lock; so when khugepaged > unlinks and frees page tables, it must ensure that all of those either > are locked or don't exist. This series adds a fourth lock under which > page tables can be traversed, and so khugepaged must also lock out that one. > > There is a codepath in khugepaged that iterates through all mappings > of a file to zap page tables (retract_page_tables()), which locks each > visited mm with mmap_write_trylock() and now also does > vma_write_lock(). OK, I see. This would be a great addendum to the changelog. > I think one aspect of this patch that might cause trouble later on, if > support for non-anonymous VMAs is added, is that retract_page_tables() > now does vma_write_lock() while holding the mapping lock; the page > fault handling path would probably take the locks the other way > around, leading to a deadlock? So the vma_write_lock() in > retract_page_tables() might have to become a trylock later on. This, right? #PF retract_page_tables vma_read_lock i_mmap_lock_write i_mmap_lock_read vma_write_lock I might be missing something but I have only found huge_pmd_share to be called from the #PF path. That one should be safe as it cannot be a target for THP. Not that it would matter much because such a dependency chain would be really subtle. -- Michal Hocko SUSE Labs