From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04A63CD3435 for ; Tue, 3 Sep 2024 13:55:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 622A98D0171; Tue, 3 Sep 2024 09:55:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D2618D016E; Tue, 3 Sep 2024 09:55:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 473268D0171; Tue, 3 Sep 2024 09:55:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 28DE98D016E for ; Tue, 3 Sep 2024 09:55:54 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BF7A2C034C for ; Tue, 3 Sep 2024 13:55:53 +0000 (UTC) X-FDA: 82523575386.24.1B6E8DF Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf18.hostedemail.com (Postfix) with ESMTP id CF52F1C000F for ; Tue, 3 Sep 2024 13:55:50 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nFN60nqa; spf=pass (imf18.hostedemail.com: domain of cmllamas@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=cmllamas@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725371625; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aFVlJE2wcPJ/73dJ2/hsvJKsdhKy12KE5SwOspRc07o=; b=Cs65qpBhXMouTT4DJMRPpckAZEnZSWYcqVpdI6SWkLYGRxXGI/5XqziRLdZUPACFscsLr1 M4XEjPONo+GsqhPAOopjOgKZCy88v2BQ0BDPSM/2Ad2gWjYDrs/2gz2w1w7mVSbI4Hj61v 8KTVHoXnVQUK8c5dCdeE/mzy2UlNqXw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nFN60nqa; spf=pass (imf18.hostedemail.com: domain of cmllamas@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=cmllamas@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725371625; a=rsa-sha256; cv=none; b=GSlbq1z7GLlUXX9bRmW7MdPJDduXwRfLO3sdPuuNZO5j5HS9C0WFUk+VMvPQHZni9yVNjR egB1McwPe4Juzs+56QkF8nrcTQEqUBPz8LTSVDLvfZxn2G5ckr8sWb9MrWLY8sB8a93AsY RrMwH0xIheBvtKhDHJkxNINhJarta+A= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2056aa5cefcso428375ad.0 for ; Tue, 03 Sep 2024 06:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725371749; x=1725976549; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=aFVlJE2wcPJ/73dJ2/hsvJKsdhKy12KE5SwOspRc07o=; b=nFN60nqaho8d65IHPoXIPl8Mmls8LN/z9dC37FCxw5dXARYDjSIi2PRhwPRFyKlKUI 3bMFgP4W59oar3ldddPLDgWjGJpXj2BYex1d3HHI7qPN1yvC4F8SSlaic/dWK2wah5Co ELUKffMJxizUbk9zpocxEdx543JZPt1d2jAv7PC4cb89FtIFVMOifuJrMhbO/xJF/2Af yd5CnnHXihyEBtU7GTt7mYKw/LiR6L6L+j3Hw8i+EM9I464WM5hKv8SFMT7WHz7xpmTG 3v7RMPJFunb3z7nfnaY1vN2cJTlIBR44IA27mhFKyvqdqGFQfdiaiAbgkAFhG3qTpZJB ll6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725371749; x=1725976549; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aFVlJE2wcPJ/73dJ2/hsvJKsdhKy12KE5SwOspRc07o=; b=wllXOyQpQQwBzi/UK0lfNY0A16Tp1NAvF9m+aQTftgvBQp5h6JlwU42SpefjT3tW1G afmey4yYqUSk7iogVEpIXlhkIHRMbSc2JKVEO1696/kzPnTKAOYsSibzROz98CS23BOh o05PcfqSs9yI/b5x4t+TrJtEgk2/JfD9wBnVMQfV/6riXgLY7OgAhQSRMshjzsf9yHjH YjJHVQsiStC2RkEZXN8NqB/i14QfH1Eq01YIVB1TZXSQNe089FxzEkFmXR7xCA07hh3l HO2/nacl4zAQzPGAcsUycuf5GBaVvj5GkMrbzTVE0PsAKwdTiVIW1Bwj4wVMHc6dZWmD bBiQ== X-Forwarded-Encrypted: i=1; AJvYcCWSm5l5egcFIuTuUA57F4yuBxjNqy3Kgm/e2o2p4wh5gI8iYnV9iHAXFoCUvVe9ohl9nMzINyNTbA==@kvack.org X-Gm-Message-State: AOJu0Yw27WLxk13HeX4nPUR9PplAgNI5LevklSGtAIvrhpvj/b+yXVR7 WvuNtWqzC3BALRbJUE6WFMvule1V2xhw6uYRD09YnM5ISV5JCuQwGnnn4iDomA== X-Google-Smtp-Source: AGHT+IGrElXa+ZnVJVusZLLZ3n1gY1PRDfijfMKbWF3ERqoNh+U8MNTJcATggm62KZcSLZN+diKQuQ== X-Received: by 2002:a17:902:da92:b0:1f9:dc74:6c2b with SMTP id d9443c01a7336-20549b8e7c6mr6672125ad.29.1725371748904; Tue, 03 Sep 2024 06:55:48 -0700 (PDT) Received: from google.com (55.212.185.35.bc.googleusercontent.com. [35.185.212.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-205152cd9a3sm81638545ad.90.2024.09.03.06.55.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Sep 2024 06:55:48 -0700 (PDT) Date: Tue, 3 Sep 2024 13:55:44 +0000 From: Carlos Llamas To: Barry Song <21cnbao@gmail.com> Cc: Hillf Danton , Greg Kroah-Hartman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Suren Baghdasaryan , Michal Hocko , Tangquan Zheng Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to reduce the lock duration Message-ID: References: <20240902225009.34576-1-21cnbao@gmail.com> <20240903110109.1696-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CF52F1C000F X-Stat-Signature: 4z4uu39wbmznes5tzu7rpa1gpz4ip8e3 X-Rspam-User: X-HE-Tag: 1725371750-629403 X-HE-Meta: U2FsdGVkX1/1GQDCQs01bqhqMpojdfG3U9DB5+e3FB+j01VpodxYa9VkvmTr63UA2Wh6ngwdVvk+rI/eiIe++x+D1E4/OFTWxOedFrT148ygbzRtOBOH2GYCQd6PlUjDmjZi+mklDsBSux3zu/hKzoA4PRCKCoEH9hMD4oDOdv8aRh7EATl0ihxHiyoZwkPSMjPWdxq9mQ/f9fOLhTppv8dEZ3TNJlwPOcJCquE0CJHtoXW5KbDX8KCHG2flBnkbevNYHq+Xus4Lb7N7DkhR1XStvcvWZH9tOk86yXw+zw/WcF6LGTjl7PDCe65tYh9ZzLugmHeQiMOodg8tYM4vXd7rcimUYjAYPqGoeQ5oKwp7CqfEubnAsb1L1d4w4Gy+AZVwXsH6hcCpNhdDGr2R1WmGDjpqavwzad/fkCuy9BYTZlhiKry/mQUng7Z5ofQ8upASvWxdqJmSYY4GXr138Va1Cx1ANo5Z9MXhZdu9aQkx2LRnvo5b9hu/KUY8pQMKwwQVuzqXqcG15gaVsuLpez43CRZerrenD5odh2lzSrv/7RsxAVjNvSZqJQ8f8P/v5QpXGiVglV9O0w/+Ob61f4GunyKfsU4SLrJnINNvZaK6VhZNqdsBRWiEwUIP4KHS+Dxl0QUGNEBZ6H533Y0fFuiGwv7eh9qvIQHrIQj9ZOPc+Sn7Ir+mn+MzvI4htIaWQ3zEBwfxdZxNhVPHjNtTIgUGwno4pGWNSmy19+1VSoONjr4lBSw2/lkDzffLKmE4wseGyPeCMA4SVMgCcQalYK36e0gSvZ28q7Xr8O8tX8h/x/edFnpZ7QewcSsaCP1wffdtFDV7bdRyeAPsE550+Zaw1HIu8oyL9+bMKoxntrkK7W0DPO68AuAuseGTNxK18G9DhEYdfZoyr6LktC7xk6kGkCrbPtj7/i1PEkSCNyz/fk1J2XJovq/De/1LiJFNdDWaQrwRHaeHqz1jPsK z7dfVwCV u6d2aaWvT5lEwpblf42/ADkU7+7I8js071rU1tU3z/6KApEgIh/uEYHzaUa8oAJu5+5Es0gdMR1AwSGzErOkgHyHKW5SgOZqCCXuqkvccmM9GxEPAv2wG8X1FFeakbNymHJYYv6xFZwvUmX1XIY9M+RjJ/irdMTS8ac8+krH7qJxnfrJL/1/KRH55JtoD1Kj0WQ6A4N/O5B7WiuQdOzgKM6rg//x45JcVz6w2vRTOwXgeGkz/H9TljkFUU89jkHYltEuliO24tkPJWU+9kAAKSzSpRAJfph9QzEpP8pArcNqV3NkGrdPjJWhkiz6Y6KjM/ibSm/B5XSx68gF6I7HUTy/+yR3QE/8A2LgfYQg1I1JI0keXywUHVP3WIKLhVa6q7hMuBMyjlkKpvkPkx+zLr/axgXkEdBQYYqX31YkiLG/zN1V7Nb9Ky7n/cwEwglvIQKA2kb05O7z3HcW6V2Dk9c8kp2LCXMazgnFjjuSnHq6SyVo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 03, 2024 at 07:45:12PM +0800, Barry Song wrote: > On Tue, Sep 3, 2024 at 7:01 PM Hillf Danton wrote: > > > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > > From: Barry Song > > > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > > faults. Performing memory allocation while holding this lock may trigger > > > direct reclamation, leading to others being queued in the rwsem for an > > > extended period. > > > We've observed that the allocation can sometimes take more than 300ms, > > > significantly blocking other threads. The user interface sometimes > > > becomes less responsive as a result. To prevent this, let's move the > > > allocation outside of the write lock. Thanks for you patch Barry. So, we are aware of this contention and I've been working on a fix for it. See more about this below. > > > > I suspect concurrent allocators make things better wrt response, cutting > > alloc latency down to 10ms for instance in your scenario. Feel free to > > show figures given Tangquan's 48-hour profiling. > > Likely. > > Concurrent allocators are quite common in PFs which occur > in the same PTE. whoever gets PTL sets PTE, others free the allocated > pages. > > > > > > A potential side effect could be an extra alloc_page() for the second > > > thread executing binder_install_single_page() while the first thread > > > has done it earlier. However, according to Tangquan's 48-hour profiling > > > using monkey, the likelihood of this occurring is minimal, with a ratio > > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > > negligible. This is not negligible. In fact, it is the exact reason for the page allocation to be done with the mmap sem. If the first thread sleeps on vm_insert_page(), then binder gets into a bad state of multiple threads trying to reclaim pages that won't really be used. Memory pressure goes from bad to worst pretty quick. FWIW, I believe this was first talked about here: https://lore.kernel.org/all/ZWmNpxPXZSxdmDE1@google.com/ > > > On the other hand, holding a write lock without making any VMA > > > modifications appears questionable and likely incorrect. While this > > > patch focuses on reducing the lock duration, future updates may aim > > > to eliminate the write lock entirely. > > > > If spin, better not before taking a look at vm_insert_page(). > > I have patch 2/3 transitioning to mmap_read_lock, and per_vma_lock is > currently in the > testing queue. At the moment, alloc->spin is in place, but I'm not > entirely convinced > it's the best replacement for the write lock. Let's wait for > Tangquan's test results. > > Patch 2 is detailed below, but it has only passed the build-test phase > so far, so > its result is uncertain. I'm sharing it early in case you find it > interesting. And I > am not convinced Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in > race with munmap()") is a correct fix to really avoid all UAF of alloc->vma. > > [PATCH] binder_alloc: Don't use mmap_write_lock for installing page > > Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with > munmap()") uses the mmap_rwsem write lock to protect against a race > condition with munmap, where the vma is detached by the write lock, > but pages are zapped by the read lock. This approach is extremely > expensive for the system, though perhaps less so for binder itself, > as the write lock can block all other operations. > > As an alternative, we could hold only the read lock and re-check > that the vma hasn't been detached. To protect simultaneous page > installation, we could use alloc->lock instead. > > Signed-off-by: Barry Song > --- > drivers/android/binder_alloc.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > index f20074e23a7c..a2281dfacbbc 100644 > --- a/drivers/android/binder_alloc.c > +++ b/drivers/android/binder_alloc.c > @@ -228,24 +228,17 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > return -ESRCH; > > /* > - * Don't allocate page in mmap_write_lock, this can block > - * mmap_rwsem for a long time; Meanwhile, allocation failure > - * doesn't necessarily need to return -ENOMEM, if lru_page > - * has been installed, we can still return 0(success). > + * Allocation failure doesn't necessarily need to return -ENOMEM, > + * if lru_page has been installed, we can still return 0(success). > + * So, defer the !page check until after binder_get_installed_page() > + * is completed. > */ > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > - /* > - * Protected with mmap_sem in write mode as multiple tasks > - * might race to install the same page. > - */ > - mmap_write_lock(alloc->mm); > - if (binder_get_installed_page(lru_page)) { > - ret = 1; > - goto out; > - } > + mmap_read_lock(alloc->mm); > > - if (!alloc->vma) { > + /* vma might have been dropped or deattached */ > + if (!alloc->vma || !find_vma(alloc->mm, addr)) { > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > ret = -ESRCH; > goto out; > @@ -257,18 +250,27 @@ static int binder_install_single_page(struct > binder_alloc *alloc, > goto out; > } > > + spin_lock(&alloc->lock); You can't hold a spinlock and then call vm_insert_page(). > + if (binder_get_installed_page(lru_page)) { > + spin_unlock(&alloc->lock); > + ret = 1; > + goto out; > + } > + > ret = vm_insert_page(alloc->vma, addr, page); > if (ret) { > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > alloc->pid, __func__, addr - alloc->buffer, ret); > + spin_unlock(&alloc->lock); > ret = -ENOMEM; > goto out; > } > > /* Mark page installation complete and safe to use */ > binder_set_installed_page(lru_page, page); > + spin_unlock(&alloc->lock); > out: > - mmap_write_unlock(alloc->mm); > + mmap_read_unlock(alloc->mm); > mmput_async(alloc->mm); > if (ret && page) > __free_page(page); > -- > 2.39.3 (Apple Git-146) Sorry, but as I mentioned, I've been working on fixing this contention by supporting concurrent "faults" in binder_install_single_page(). This is the appropriate fix. I should be sending a patch soon after working out the conflicts with the shrinker's callback. Thanks, -- Carlos Llamas