From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D2D6C3DA78 for ; Tue, 17 Jan 2023 20:28:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A419D6B0071; Tue, 17 Jan 2023 15:28:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F1686B0072; Tue, 17 Jan 2023 15:28:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BA036B0073; Tue, 17 Jan 2023 15:28:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7CC2D6B0071 for ; Tue, 17 Jan 2023 15:28:45 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4E9BC802D9 for ; Tue, 17 Jan 2023 20:28:45 +0000 (UTC) X-FDA: 80365429410.30.A4B3CDC Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) by imf08.hostedemail.com (Postfix) with ESMTP id BBDE616000E for ; Tue, 17 Jan 2023 20:28:43 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aJ1tuCqC; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.166.52 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673987323; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w6lBLtQcwXXW0D4MwnkLaJbFSlKXy17eV5mc4KZgrNg=; b=RIrmyXmk2JDs9k761QWf+OunWxQaTRhxRpLBV0D3AhBkdPal9do+aM1l0m3DP+Ug2odyWN f1ggYG6i9wHyHU1pwwTUG1OyECUZn+ONf0Vejq4SGx+lAUVz39pXRKzxw0pPC2hzth5hzh ORvSX2MRj5HHFzJVIpupBtJFQWhjqfE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aJ1tuCqC; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.166.52 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673987323; a=rsa-sha256; cv=none; b=qU2+L0YCvWASIXsSAuSwkTYb+N/JprrukhiSt/HCxBAXla44ZEf7A4jthPppvMZ9wUrerH jqTUvLYafV2L24Usi9NArPKVWfo1eo3Gj4P4RfZmo35GjvVgm5qNyG5k0y89meAmHJceOB Qerz5HUBaAen0+e2+3bODWr49x2OiFw= Received: by mail-io1-f52.google.com with SMTP id d22so6653316iof.5 for ; Tue, 17 Jan 2023 12:28:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=w6lBLtQcwXXW0D4MwnkLaJbFSlKXy17eV5mc4KZgrNg=; b=aJ1tuCqCEyVYlKQ3YOYO3zSRB48viA6UM19LIAhgZpv4eGDtYMK/hPPo5lfXgu+gTS M+n1QCd/7KOhkd/YvSx5PKyIGahYeRaRSV3luKpLj+YUzBh1z1C4rdBug3i5gt6VJHD1 39PJ1/Yodm39kmb9guNck1iq5XvL2xrPhssLwKDLzrKFQnQ1xeFi+kC4I4GZpNT/eBpb abGCyD4BAVT7vVVXEJdAV9lOIkQSg8qUbJc6A1EUb6vTmPXnnOMDcovrH9fQo2UpnH0l cKSBU2d7t0U9wiJWBcBPmqXGrs0e92f8O5N9QZo7wxujcaSt0nmgoUC4wJTaq7OBQpsv F8Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=w6lBLtQcwXXW0D4MwnkLaJbFSlKXy17eV5mc4KZgrNg=; b=0wOpJ73BYLDph1zFs0WER+UBXa+xsjlvEcyXn/ZiSYAwKPbohVEBaFcxoftx+icuBO 3E7FeISAKQ+dhvztIhsRLbVbR1upav0JCORXBnQtu5zem0H3eFq/xS+vBboPognJeTA4 72ia+Qbd6v5twomgxi4/VNjpqqFfGAJ9sOcmtOE/OrNDq3lIzyZ260f5ULtpVhfnuXFE VhtNb5CMymDm7v1uplAc5Qtb6RcYjH1E/TYVsPar/BN6/rfQo4ZoPtQUJHGebSQxdIkp T6chTYLJis5AxXbiFfPxBoo6qwCJn/6CSqoNFObFYjGL4eNEvgybUwOzGk0rAMUzD4sa sCWw== X-Gm-Message-State: AFqh2krMLzJmvXcog7pfRK3+x4bi2RPhaAKC228laV3aF/kSj94v/lB5 5Ts8t0OvQoMCeFu4VJRYZGp5hwpqb0EcPjTvePBV0w== X-Google-Smtp-Source: AMrXdXsdfz3ZSkVOQq7xodxuPXlHOjr2OrBoiOuZE3Eq2JZpQR9esxUP0A6G43/Zbr959H9oSQiEOLhkS8jLPG4mH0c= X-Received: by 2002:a02:c884:0:b0:39e:9d33:a47 with SMTP id m4-20020a02c884000000b0039e9d330a47mr309615jao.58.1673987322674; Tue, 17 Jan 2023 12:28:42 -0800 (PST) MIME-Version: 1.0 References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-19-surenb@google.com> In-Reply-To: From: Jann Horn Date: Tue, 17 Jan 2023 21:28:06 +0100 Message-ID: Subject: Re: [PATCH 18/41] mm/khugepaged: write-lock VMA while collapsing a huge page To: Michal Hocko Cc: Suren Baghdasaryan , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: BBDE616000E X-Stat-Signature: 8cdhqew5uut5hhxu4q6mkiem6jsmn35i X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1673987323-927784 X-HE-Meta: U2FsdGVkX1/eYg2oSc9Gf+HyyH2fR0jO7reT5hizJ+iGYmE8HWy2AH8yJz60vYSVP/wNYyCPSQKPc4RepQTTS5a3OP7kmgukQQ7uFJ9yxcxx+sGOCu4rgsqG8wXsVjyb4bNq+s3MRCWTIOo1A1Uec3B3egxbQxy87KPA2SwFjUXEwGbP9GlDDgLsv6/Rc9h33ZDffS0d9iq4IvigL8mWW75d74dltgislavnlU6YWwklFNA7E/y97dV2Er0Vww3eDtw7rEnWNju9D7eAXA7nGC3j0veKgiCUyvraiDfUyDEQiRA5QRmBaxtXLtSdmI/QNUSk+thSFhJolCBfPV2TAVOPyI6FTmEMpPpyZB5ZZ1znN9ou8HYtcQGo4UJbzMXyMHieFk8JnrJGIRpe746K5KZTx8ofRWPjyqeTCm/NVPIRDIhr9+qbLcMxpcIIMu7OVboHKumrqPCsNBYCJq+4QbuvtchyRv1ODFWxl7M84hOiSXbZ5vmOBDxqiUkvAY2bibUBq3oXM6FI16Cfn48l+OVM1eakXlxDXDsQONC1fDq/antvj9Uiobn662rGoCUrOn9kdYZWr6vpJQXUu/W45f4FuDbmJxIpc3tYtEhBIXqBOjf/OscjKM8l8Q0QpcJvT8pRGfFZUgdjxULm/OBxfNJMFMWV6weXkx1TRSnsb0GCVKDF9Wgf0U+ctTRwctHJcaBUJz2GMagFLmmVqqY35PwDLb6F9SFxv45abjOMFkSjrmSI9/MVrrO0eBoF2eHe9uc02Kh/FN5iTi8nq/V0i3nOgX7Y8p9HeCpygGbf+cS2LtsWfl2G0YpJLPGjo5aZFlOtNbjdKJ2rO2txc3Odwfa62nFYK6mgEUCKhDXVVY+QEHQXk86JG+LhmZyFQ3y89ovBJ+oLy0FqugN+oYhjND8RjOQm8xoghSOjHvrs8SBqh7h2nD2j8ScR6HOMcrJaMcsrrSI3FmLkMfbkI4H YdEeZRal CKfwZEZcOQmDGenA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.076776, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 4:25 PM Michal Hocko wrote: > On Mon 09-01-23 12:53:13, Suren Baghdasaryan wrote: > > Protect VMA from concurrent page fault handler while collapsing a huge > > page. Page fault handler needs a stable PMD to use PTL and relies on > > per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), > > set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will > > not be detected by a page fault handler without proper locking. > > I am struggling with this changelog. Maybe because my recollection of > the THP collapsing subtleties is weak. But aren't you just trying to say > that the current #PF handling and THP collapsing need to be mutually > exclusive currently so in order to keep that assumption you have mark > the vma write locked? > > Also it is not really clear to me how that handles other vmas which can > share the same thp? It's not about the hugepage itself, it's about how the THP collapse operation frees page tables. Before this series, page tables can be walked under any one of the mmap lock, the mapping lock, and the anon_vma lock; so when khugepaged unlinks and frees page tables, it must ensure that all of those either are locked or don't exist. This series adds a fourth lock under which page tables can be traversed, and so khugepaged must also lock out that one. There is a codepath in khugepaged that iterates through all mappings of a file to zap page tables (retract_page_tables()), which locks each visited mm with mmap_write_trylock() and now also does vma_write_lock(). I think one aspect of this patch that might cause trouble later on, if support for non-anonymous VMAs is added, is that retract_page_tables() now does vma_write_lock() while holding the mapping lock; the page fault handling path would probably take the locks the other way around, leading to a deadlock? So the vma_write_lock() in retract_page_tables() might have to become a trylock later on. Related: Please add the new VMA lock to the big lock ordering comments at the top of mm/rmap.c. (And maybe later mm/filemap.c, if/when you add file VMA support.)