From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D647FC02182 for ; Thu, 23 Jan 2025 04:14:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 444016B0083; Wed, 22 Jan 2025 23:14:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CD0A6B0085; Wed, 22 Jan 2025 23:14:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2478D6B0088; Wed, 22 Jan 2025 23:14:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 025E56B0083 for ; Wed, 22 Jan 2025 23:14:41 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 63E0A120AFD for ; Thu, 23 Jan 2025 04:14:41 +0000 (UTC) X-FDA: 83037400362.16.542F5BB Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf09.hostedemail.com (Postfix) with ESMTP id 8014D140006 for ; Thu, 23 Jan 2025 04:14:39 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="asbf/4vT"; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737605679; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ppkXNpZBrPqSK+GsWeQ9xKnnQtyLYONJUiaFJ4TfvHE=; b=kF1S3YUgG2SmWHpDv+QZIUF19UvEO1XofEl8fytTghGzEFccCh5dubFUXM1fnNWeW9A/zA Yc0N65XVThPZbVL2k9dNcn3a3zr6wetgjh056zjIn6yIX5VCjbfoyUkmjqmPtVdsb+9ake G/9/D29M/PFK629GxPhu+PHvg9XrnTw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="asbf/4vT"; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737605679; a=rsa-sha256; cv=none; b=mNBm6zFWShf+FtfpCiBODib0rxbMUlgOYT5Z1g+XPICQap7vh0eEa9dpIKuHfML/32Sl3K kEDiAjIo1Y3p4TfHocjRPc61gc+gqC7X9N+q9qUlDQl+3kLwF20dAWI2Yc0MVFh8xcs/Y9 QpnKvflSWuFTGiKyHAxpwDzjrPVh6CI= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-2161eb95317so6652335ad.1 for ; Wed, 22 Jan 2025 20:14:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737605678; x=1738210478; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ppkXNpZBrPqSK+GsWeQ9xKnnQtyLYONJUiaFJ4TfvHE=; b=asbf/4vTOCDEp0dQ0QdsaUAzYvMyoEgdEzD510O6fRy2vfGxN2HJNA/0wEKhodHL+M Xg/HQvpnYgG5sfcH37ucXMq/GyO+4yeE1xxTWTDLQn/aAEVbP6I/LBQtcFbrhiTV5mrq btM9IsNM4Zuu5rlyhymwT4rdV7nbn0wpRn4iRwFgoO+UZ6t3LIDAXCWBg/YBh0yq8wMd rlQpJVaC5dGXC5+WOtftuPkx7j5x807sNEImfl8r1C8D5HQroNHhG8duVzSUv7aLUCQT TT5zihF7E+JhDHwxZPA88xsZ7PchSXFX6hELUumV9Cpd/XY9yHX8HB01WD6LkmAdaBA9 of3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737605678; x=1738210478; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ppkXNpZBrPqSK+GsWeQ9xKnnQtyLYONJUiaFJ4TfvHE=; b=tV61dc1nFSwK/QSoKZRvAmmcc0XsWwRgvtJp2uiNyZlRlur0ms5OTsu/ljd90EEyvr 2KiApuQDCUc4Kht1Pd//9Caapg7CIBOhXAylraB6RNUZM66HHj60ZT2Fwc9OM2znDupm tF38kchMiiUDR0NBvmnKLkVWmN7kkL7g1BuNORu2wywR0IP8KyU/bPxGeYkcZoYc2bqa raGjjINl4eJH7YnJaSl3sM0nhh9JrR3JDPqN3a48uCeodT3Mj7CYwSTKDXFAMCr+TtBs rZYWc7JRXkdzMFSRXMNbEVrj+oPx67xhAqRTKlJ+9x3JULy8v0nLEtODnT6k7x6JQahv BqTg== X-Forwarded-Encrypted: i=1; AJvYcCW1Tl/8e4FmdcuxvzKP1fjSSryv6J6MffA306zF78880XxzAkqio1V/RDSjA6jaPCqujdLruITTHg==@kvack.org X-Gm-Message-State: AOJu0Yz+ehdRgaINu1ixOcABD3FzAc7j04xItbOgSZpkpPJVHsMDAy00 F87A6YN0Ux1c6P3CXF6NAdbPzFObdtPQgCgXOtL3ismP0q19gWPG X-Gm-Gg: ASbGnctO/pMD9KO6QmCzkY5WpAphHc7EbpOHjLtbFYxmJSBlMYbLVthw2rwOOg7Waaq hX7LpzkHnuIOI1RITqbMraMNLnUhSWyOyloZ0hnGpJodCg8V3YlyJ7M4c0s3E4qXtwqlNIvZpTY qvMv/Hrd7bUy4aqtwNjgDaRfvXvKVQJbgbzwGlVcf98Sk7jzXpSG7zEsxo/AMHjdQRvdovgvlWE VpaIImFCdW/yGjnt7iTvIPwV+paHPESl2axdtA16dG16ahR/a08m/Iks3kG39JwuHQCcHSKGGW7 lyR3wlErIy+4iNtREdeHkKWGH2xepNboktiP81cdhv0= X-Google-Smtp-Source: AGHT+IH5Bh8F2DrNQIkivjf6Em4wLwWvvPxYEnSMigCZuYYQfZPIU5Ke/xGHGISHVWN27gDow2cH5w== X-Received: by 2002:a17:902:c951:b0:216:5268:9aab with SMTP id d9443c01a7336-21c355e832amr340474985ad.46.1737605678018; Wed, 22 Jan 2025 20:14:38 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:2131:57ed:b1d2:a6a5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21c2d0b543dsm103567005ad.103.2025.01.22.20.14.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Jan 2025 20:14:37 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: lokeshgidra@google.com Cc: Liam.Howlett@oracle.com, aarcange@redhat.com, akpm@linux-foundation.org, axelrasmussen@google.com, bgeffon@google.com, david@redhat.com, jannh@google.com, kaleshsingh@google.com, kernel-team@android.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, ngeoffray@google.com, peterx@redhat.com, rppt@kernel.org, ryan.roberts@arm.com, selinux@vger.kernel.org, surenb@google.com, timmurray@google.com, willy@infradead.org Subject: Re: [PATCH v7 4/4] userfaultfd: use per-vma locks in userfaultfd operations Date: Thu, 23 Jan 2025 17:14:27 +1300 Message-Id: <20250123041427.1987-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240215182756.3448972-5-lokeshgidra@google.com> References: <20240215182756.3448972-5-lokeshgidra@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8014D140006 X-Stat-Signature: ch5xc94d68dr5yw1eghjpzqa48txbqt8 X-Rspam-User: X-HE-Tag: 1737605679-944651 X-HE-Meta: U2FsdGVkX18havdaFweBr6KuYDG320FznHNt14SsdiHo4U+1yaR8py+sW9gN4RlCgmX9fp+WGZ+DdP4EV1AaBiG8gPktSrWe7wVsluLH0gfXOjSzwn382ZRdNLT67UrxmfBmHw/pMwbGsKCoFnYpjUdFuurzU0Re85jatDh3h4zKZaymbW5g3VPgG6GJsSaQ+y9tmSCkW/VBEKKhLBIKn4kgRGfv6NBrmGOZ5nPvz+xg19W3z2U0jlfRoX7qmiATZBtsK/wQ1nkhcNeH3FfU+dRtEISeopeBIXIjnrcKQfviwyterzikreEWK2SbWrOCxXtPqDiArS8L3xZauyDBC0js3sMBFlNvl6VQz/hjP6jGQnbghnbFDTn3U6Olp1Y64bXMfvLr9T35x6fMSrxjcybsW1jmsFuLlmRATBU0MqXrEN0arQIdCcbcfKXSh4A+8OIfmdyERG8nzE/77N3z3CWYeTkK8MmrXSDmQcoJ8kBBCnZRyHM+1xGfdldg9I+Aj7SacNpwDd96YJxXpXXlviDZ2JXdIPdazozy5NEvDD9Y+59BGBWCjok+F1v6TM3wUAanPOrGVbLwvrgmf4sbc8GeO4Gq8jOnQAZEb3g6dlwpWs5u6ZIB+1DSDGZ8fkgZD36IEpeGn+xDoEU4t5pNsb2urW/gun179Q0jfVCUyLOa6nLfioroexMTo+DJpRkVozQMp94Vr8fr/4nEnjrON9op09o+OWTgLv/yaNufTfQIOG8XDOPejJqnFG/okNSsu0IWxcYIG54RBLt5AC+7MnIcQLA83CQ4aXYq+A8NAXgi2VwgSqqCHRU879vSxEmrlH801Z+Wy2QBkl4H94024/jq4/KyvE9JMMbWB1/6hk3S8eMIOA5zpirQ8KtA6aXPQ9mnnqzS+2QbdKDTmu/XC7D9uS5X7eFsut4xpaEelbbBs5ds/xeTkR6sUKbiKNbHruBUDKyRPU6O2Yey6KU b62Leefn j2AwWY9leibjow9i2ihYZc/gKTS2nAbRXnfoxV5iV3UOwyUmWiMb/qobz1hFrlSjHemmjbKSEKqx91hyPR+zEf+eVkF15MYj3lMMDTnFIIYO93wgm26bXqM/g9b2uWxb8vHB54uW7MoHOL53UGNC5Hmj/k7qHVMHC3fbd92waj5YdLeVU843vc543isXXx2UPXTKM0sj6DMD0yA2JANClmO/lTYhzxM/LyHBAt3HevsZS6DlOOgxrGT4Auox3bRXaPLgo7fcbuDuM8oJgGglozWS177iYygORPk772lcAEnN2D5BCBoo9Eq5Zy1k3PQ6MiSdShRCGpmtLj7ISskHRCEYQSc5XFjHnnHeNausA+eXzKeNUI3ZCfuySzqLK/LUnM0OWKt8ipxrubg0Cf0RsmnFu2d1mmnW2HPugMl7xbjrMgvGKYywJVIZZOph2gDGaEn1Q8MpjK4iLzank/7pgcp5snU3vI80z/ASdUGmwtv9B9R7K+avjoLTv3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > All userfaultfd operations, except write-protect, opportunistically use > per-vma locks to lock vmas. On failure, attempt again inside mmap_lock > critical section. > > Write-protect operation requires mmap_lock as it iterates over multiple > vmas. Hi Lokesh, Apologies for reviving this old thread. We truly appreciate the excellent work you’ve done in transitioning many userfaultfd operations to per-VMA locks. However, we’ve noticed that userfaultfd still remains one of the largest users of mmap_lock for write operations, with the other—binder—having been recently addressed by Carlos Llamas's "binder: faster page installations" series: https://lore.kernel.org/lkml/20241203215452.2820071-1-cmllamas@google.com/ The HeapTaskDaemon(Java GC) might frequently perform userfaultfd_register() and userfaultfd_unregister() operations, both of which require the mmap_lock in write mode to either split or merge VMAs. Since HeapTaskDaemon is a lower-priority background task, there are cases where, after acquiring the mmap_lock, it gets preempted by other tasks. As a result, even high-priority threads waiting for the mmap_lock — whether in writer or reader mode—can end up experiencing significant delays(The delay can reach several hundred milliseconds in the worst case.) We haven’t yet identified an ideal solution for this. However, the Java heap appears to behave like a "volatile" vma in its usage. A somewhat simplistic idea would be to designate a specific region of the user address space as "volatile" and restrict all "volatile" VMAs to this isolated region. We may have a MAP_VOLATILE flag to mmap. VMA regions with this flag will be mapped to the volatile space, while those without it will be mapped to the non-volatile space. ┌────────────┐TASK_SIZE │ │ │ │ │ │mmap VOLATILE ┼────────────┤ │ │ │ │ │ │ │ │ │ │default mmap │ │ │ │ └────────────┘ VMAs in the volatile region are assigned their own volatile_mmap_lock, which is independent of the mmap_lock for the non-volatile region. Additionally, we ensure that no single VMA spans the boundary between the volatile and non-volatile regions. This separation prevents the frequent modifications of a small number of volatile VMAs from blocking other operations on a large number of non-volatile VMAs. The implementation itself wouldn’t be overly complex, but the design might come across as somewhat hacky. Lastly, I have two questions: 1. Have you observed similar issues where userfaultfd continues to cause lock contention and priority inversion? 2. If so, do you have any ideas or suggestions on how to address this problem? > > Signed-off-by: Lokesh Gidra > Reviewed-by: Liam R. Howlett > --- > fs/userfaultfd.c | 13 +- > include/linux/userfaultfd_k.h | 5 +- > mm/huge_memory.c | 5 +- > mm/userfaultfd.c | 380 ++++++++++++++++++++++++++-------- > 4 files changed, 299 insertions(+), 104 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index c00a021bcce4..60dcfafdc11a 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c Thanks Barry