From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04D45C4345F for ; Thu, 11 Apr 2024 15:34:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CAF66B0083; Thu, 11 Apr 2024 11:34:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77ACC6B0085; Thu, 11 Apr 2024 11:34:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 643D46B0087; Thu, 11 Apr 2024 11:34:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 457296B0083 for ; Thu, 11 Apr 2024 11:34:53 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E5F8A160A9E for ; Thu, 11 Apr 2024 15:34:52 +0000 (UTC) X-FDA: 81997648824.22.C323731 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id D543380015 for ; Thu, 11 Apr 2024 15:34:50 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dLHjh555; dmarc=pass (policy=none) header.from=redhat.com; spf=temperror (imf30.hostedemail.com: error in processing during lookup of peterx@redhat.com: DNS error) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712849690; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=avxqW5icrlL/wJOMzFGMgKqJg8lEP9EgTCeFt7eX5jo=; b=Jnf9rBb6VDljt8I4gLWhvCYG0AJhZMmJ/7qWpiXUKatrV6MRhkLhzP5OqTWk/BNFWiGDGi v5AYCIBZ9DxWw9Kwrs7rhXAD++yBCJUl4MUt9n6CLfpDi8rd6XyDbXgwGkpbX3G61m/upd MSlocxhs3yizEcw0utwwdCkzj0+nhBQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dLHjh555; dmarc=pass (policy=none) header.from=redhat.com; spf=temperror (imf30.hostedemail.com: error in processing during lookup of peterx@redhat.com: DNS error) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712849690; a=rsa-sha256; cv=none; b=1x9MzyJq1IJaE6LXO10nXEIlWHXREioGIES2spXzqwbD10nJzMnhdc8EgNQ9i3nJSsb8cK cJQJ4oG7CqenuoewllD/gS2GO9YFYfbdJUxqOqYg/xqe1qLYg0kKLRbPrbpLihLZSwzi9j +sR4Ea12xNGVocZvaZBjmbQ3BxoQxR0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712849690; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=avxqW5icrlL/wJOMzFGMgKqJg8lEP9EgTCeFt7eX5jo=; b=dLHjh555swZGqwhtxvVaQFxnTmmWOPJ87Nk7+dM8ebumOfslcuhVWYPnGA3SnB+prVYnem Lks0jMS143CHoNe5lgKzgl9w8C2/JqlwLBWDrYK6EkOsRlIkJXnRgyz6kB6zhTrs/YJJpP vQtzpxb1lI+of5iFdAgPubV1+zhL1dI= Received: from mail-oi1-f200.google.com (mail-oi1-f200.google.com [209.85.167.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-17-EW3v8G1KMwqOa-zUU0_izg-1; Thu, 11 Apr 2024 11:34:48 -0400 X-MC-Unique: EW3v8G1KMwqOa-zUU0_izg-1 Received: by mail-oi1-f200.google.com with SMTP id 5614622812f47-3c5f0e4c257so913770b6e.0 for ; Thu, 11 Apr 2024 08:34:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712849688; x=1713454488; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=avxqW5icrlL/wJOMzFGMgKqJg8lEP9EgTCeFt7eX5jo=; b=s5/XJdHE9vjkzgvc3hwmv+1w1tiTLdl3pqqaEFcnblmz+ag7UNk9dIYjIrVBzBDQLb LEKw72JGSQZjNw4fMKPmn0U8G/JCZ/gbBJO8XXI2YJEwDLHoZCzXcHi/b2Vv+u730uSS 4oYP8FQTfSh+q5AjGqGOuRQtA41H7FqlSQberusdslB5QS/U9w5SUn5PAEmjp11wpraQ hmnxUAfP22ZJObc/jvOnTZ/fWKfhvxUgReeU4M+cnX9bX4uT+MLIdFUFhu0fxFGWJmeR YhtdsiE168CQICa+OxuXfs6RGHC3LwO8TfPZUQJf9I9Z3f4zPpOErzQEQBmOVfQMQILD oS5w== X-Forwarded-Encrypted: i=1; AJvYcCUwz5JNP1KP4K8a2jeofTUelc9RivIQh8+aDDbS33RLJrpUykZVyUluZyodQC1iCjUNCbAsLHtayuHS8Va1zEUlh0M= X-Gm-Message-State: AOJu0YzVaInUFynSyiazucnk8s7XEFu1Lk+Kx936RO7fxMa9rHxUm71V 7TaxVn8694ra+JEU85W8GEumZsNEZqq+uEqQIhRCWJz88DidYbM9FiIZDPkjxLhDcT9Dvgq/kTF g2TjDJoAYLvTQLxFCgicCayGfUe8tBb+Kby27fNw2ghtTqE7menP0AmCN X-Received: by 2002:a05:6870:7183:b0:222:81cc:ac9c with SMTP id d3-20020a056870718300b0022281ccac9cmr6013591oah.5.1712849687444; Thu, 11 Apr 2024 08:34:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IENCPttxDxnqyLDvpsNEEi1TWNb4KAi5sScPfv0sxEc/cpCWMTUy2cy8iLk6skyzbDcqc2/AQ== X-Received: by 2002:a05:6870:7183:b0:222:81cc:ac9c with SMTP id d3-20020a056870718300b0022281ccac9cmr6013563oah.5.1712849686948; Thu, 11 Apr 2024 08:34:46 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id s11-20020ac8528b000000b0043476c7f668sm1040828qtn.5.2024.04.11.08.34.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Apr 2024 08:34:46 -0700 (PDT) Date: Thu, 11 Apr 2024 11:34:44 -0400 From: Peter Xu To: Matthew Wilcox Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Suren Baghdasaryan , Lokesh Gidra , "Liam R . Howlett" , Alistair Popple Subject: Re: [PATCH] mm: Always sanity check anon_vma first for per-vma locks Message-ID: References: <20240410170621.2011171-1-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: D543380015 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ftzjh8fp5b9jtq6be7sdasbair55n7yx X-HE-Tag: 1712849690-969287 X-HE-Meta: U2FsdGVkX197Qz4KbJEpsXgTHQhIQk2mX8I17yP9+fN+RIwlDLMs7E5I8fKZSHliY7soa1Bu1W9Fw1N0sDdqktEk7Cr8rXk0RThU7vxnU3QeG/n05F1YA3jjDARlpUHhFVlcLDFGeJY7SyrW4tEj6vPRWwdPxko+M8l3U1S0y/0oMh4/zC46qK/NU4K6srkiTKAy/sYzPiMolcdXBAtvs48zYjcFXbX2X0kNyU2nRzOCxs2Uonf2K1TakH2gni36AWeanH4V8I06vd5+evRl/Iy489vnvhMD3J7rj2Mpj8v2NJwoLc7wwaIC80/nZmdq9dXF2FxQ8jtYr85cx3orbOYXYtwjmTNuT9NPvtKzWlb2eKjG4pzD4UOa9QVDbblhxTOA/zFYp7uVyokwyhFKvPo2g/2QC1sWxl0EF2tefHP+0N5KVEdfXAt0GU1v5xK1Scs1etXrHxQXYLEaOARc8IgipvLcU0r0fupAtZDZQsae+TOb7I7CMVh45I8Fc11/uXFj1+PTB746xQfHkBINax1fGkaTj8a3/2/ld8TmK4AtOGwcrDnfrbaE+Olbro3curmmEfcyzxOy+w7hxV5vdtx+E5wilAC+gMJPygYVbvw6BzmRULx/O8hNCxi8lf3PGSkJ+qt7IwdEwKe8XaSfMxUVDhlFSB/CANtcoiYgfCpghKU9ANxvL5jnCaOzcGFvv/zHaOPtYAd2pjBJ+FxF/U+6ib8AhfuRNdHTOr3U9Ne8HvoOYgiv8D+3pHy3kC/BvOsQZ0WryptIE+4XNwDuXVAuujrY/VlWJ6SiABIOxZqKuqDor3yUaZ0Dyppnk/8HeIjPpow9B+GEFdgMZtuqiFc6IlRHWTZhQS77XGWqRX34r6mExC9Cd9YoEg/5chrO//JISkcdSTKwDYn2HTv38L2DMu+ex4FxACX0dl8OtkxZsrKNMALe8O/iGTfQ4+dnnj0w9WzFV9lsOv8BvCO CKF1uVf1 zy2WcYCjOcowwl7n3+g0HorT1Tpat53zVhLGSUBzcjUtjw4flT0OjyvppMaldNLOZMkhsdmKHI2aTyGaqefhjOyS+ONzdPepm243F2SSHet3x6wqu2xVmlCWivEvS2VStszfrwN30djFfRUagwFxQwEc+Bc/AheGfEErIBde6Ynhj5n08Lks90CQJ1yaHYJLqMERVwTa96UoakyMwLCx9H7dbMfK/1rbrApIqx842lWkWPBMxQu4oUVMUGnk+j7xvkbJ9XYxM4BayCg9opaVVhNX2oF9u8HutHZnL3CDsG/y6GI8KptLpAzhBeollcK9ACv9Vz8bfMVYW2KNJvYmzbdDWgY1PGgJ1byqEWgcr+06+lZrmHYFYjw6tZ7KC+3F4ryC4JAAvu/ZOLUSwj75/gzpnQOG8BH2zqkcgHF15Cgl6Ujqd78zTv7T0SIlcFK5y/GOdkoRptLnfB+bf8rKN7G+gMJCqtdUOegTN33xaEE/J04nlBTYTbCHiI46ogHS0DolwkRhJS/Ldj1k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 11, 2024 at 03:50:54PM +0100, Matthew Wilcox wrote: > On Wed, Apr 10, 2024 at 08:20:04PM -0400, Peter Xu wrote: > > On Thu, Apr 11, 2024 at 12:59:09AM +0100, Matthew Wilcox wrote: > > > On Wed, Apr 10, 2024 at 05:23:18PM -0400, Peter Xu wrote: > > > > On Wed, Apr 10, 2024 at 10:10:45PM +0100, Matthew Wilcox wrote: > > > > > > I can do some tests later today or tomorrow. Any suggestion you have on > > > > > > amplifying such effect that you have concern with? > > > > > > > > > > 8 socket NUMA system, 800MB text segment, 10,000 threads. No, I'm not > > > > > joking, that's a real customer workload. > > > > > > > > Well, I believe you, but even with this, that's a total of 800MB memory on > > > > a giant moster system... probably just to fault in once. > > > > > > > > And even before we talk about that into details.. we're talking about such > > > > giant program running acorss hundreds of cores with hundreds of MB text, > > > > then... hasn't the program developer already considered mlockall() at the > > > > entry of the program? Wouldn't that greatly beneficial already with > > > > whatever granule of locks that a future fault would take? > > > > > > I don't care what your theory is, or even what your benchmarking shows. > > > I had basically the inverse of this patch, and my customer's workload > > > showed significant improvement as a result. Data talks, bullshit walks. > > > Your patch is NAKed and will remain NAKed. > > > > Either would you tell me your workload, I may try it. > > > > Or, please explain why it helps? If such huge library is in a single VMA, > > I don't see why per-vma lock is better than mmap lock. If the text is > > combined with multiple vmas, it should only help when each core faults at > > least on different vmas, not the same. > > Oh, you really don't understand. The mmap_lock is catastrophically > overloaded. Before the per-VMA lock, every page fault took it for read, > and every call to mmap() took it for write. Because our rwsems are > fair, once one thread has called mmap() it waits for all existing page > faults to complete _and_ blocks all page faults from starting until > it has completed. That's a huge source of unexpected latency for any > multithreaded application. > > Anything we can do to avoid taking the mmap_sem, even for read, helps any > multithreaded workload. Your suggestion that "this is rare, it doesn't > matter" shows that you don't get it. That you haven't found a workload > where you can measure it shows that your testing is inadequate. > > Yes, there's added complexity with the per-VMA locks. But we need it for > good performance. Throwing away performance on a very small reduction > in complexity is a terrible trade-off. Yes, this is a technical discussion, and such comments are what I'm looking for. Thank you. What I am not sure so far is that what you worried on a performance degrade for "this small corner case" doesn't exist. Let's first check when that vmf_anon_prepare() lines are introduced: commit 17c05f18e54158a3eed0c22c85b7a756b63dcc01 Author: Suren Baghdasaryan Date: Mon Feb 27 09:36:25 2023 -0800 mm: prevent do_swap_page from handling page faults under VMA lock It didn't seem like a plan to do late anon_vma check for any performance reasons. To figure these out, let me ask some more questions. 1) When you said "you ran a customer workload, and that greatly improved performance", are you comparing between: - with/without file-typed per-vma lock, or, - with/without this patch? Note that I'm hopefully not touching that fact that file per-vma should work like before for the majority. And I'm surprised to see your comment because I didn't expect this is even measured before. To ask in another way: do you mean that it's your intention to check anon_vma late for private file mappings when working on file support on per-vma lock? If the answer is yes, I'd say please provide some document patch to support such behavior, you can stop my patch from getting merged now, but it's never clear whether someone else will see this and post it again. If it wasn't obviously to Suren who introduced per-vma lock [1], then I won't be surprised it's unknown to most developers on the list. 2) What happens if lock_vma_under_rcu() keeps spreading its usage? Now it's already spread over to the uffd world. Uffd has that special check to make sure file private mappings are fine in lock_vma(). When it keeps going like that, I won't be surprised to see future users of lock_vma_under_rcu() forget the private file mappings and it can cause hard to debug issues. Even if lock_vma_under_rcu() is exactly what you're looking for, I think we may need lock_vma_under_rcu_fault(), then lock_vma_under_rcu() should cover private file mappings to make sure future users don't expose easily to hard to debug issues. [1] https://lore.kernel.org/r/CAJuCfpGj5xk-NxSwW6Mt8NGZcV9N__8zVPMGXDPAYKMcN9=Oig@mail.gmail.com Thanks, -- Peter Xu