From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7C42CD3420 for ; Tue, 3 Sep 2024 10:37:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 477A08D0159; Tue, 3 Sep 2024 06:37:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FE0E8D013C; Tue, 3 Sep 2024 06:37:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29F5F8D0159; Tue, 3 Sep 2024 06:37:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 079EB8D013C for ; Tue, 3 Sep 2024 06:37:14 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B55E1A1437 for ; Tue, 3 Sep 2024 10:37:13 +0000 (UTC) X-FDA: 82523074746.08.F9820A8 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf03.hostedemail.com (Postfix) with ESMTP id 0FF3F2000D for ; Tue, 3 Sep 2024 10:37:11 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linuxfoundation.org header.s=korg header.b=ew2bIW0p; dmarc=pass (policy=none) header.from=linuxfoundation.org; spf=pass (imf03.hostedemail.com: domain of gregkh@linuxfoundation.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725359755; a=rsa-sha256; cv=none; b=1aK8gR6OaF/w+E6ogCct/33ldB39UUwxmgqZ9sVwlZvLJRbSUwRs88u9ZRXCe5o9BcJtHu tHDatrnzy5CJXZ07yMEP0T/wa08waJwTkP8LAVu+Sa9lAdTmL/Z2F/WiNgL0mr+38mAXi7 /YpwN5YitZTDvY7GQdpwkNZi021mJrs= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linuxfoundation.org header.s=korg header.b=ew2bIW0p; dmarc=pass (policy=none) header.from=linuxfoundation.org; spf=pass (imf03.hostedemail.com: domain of gregkh@linuxfoundation.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725359755; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QwFiHFPmpl6hITmumHOmHcHH262zjxCXR2maEi5nZ88=; b=A7X77MyhubFNQqpiemu7YlGpFG4pMnkvVqspDsavoLNMtv7wTj11UnTJsdhkhpdYrq7mq0 jx0x7csJXLVVBbQKBlbS0d0zG/OVw5zblhXS6/bdjyuekMsR4oKvcslvONPDxjiHPgA6uM M2FaI6SBeLuMQzhb4VN9z7bmf0oKRb0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 0088DA4262C; Tue, 3 Sep 2024 10:37:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57F97C4CEC5; Tue, 3 Sep 2024 10:37:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1725359830; bh=lgFfoM9wea4mTJR7aEIx+Ij9DkleM4tc+jOoWTKZ3e0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ew2bIW0pEtrD5VzvPKrMlwVAbI4f+7Oc/dfujjthQkAzX5oBReRBiS5fcEuKaQA8h OO3gvBhp3/yJ3qwoHOidZaaus87ctEDfPlrgrmm7lgaAmH/HFdcu6mQu0Om/P0KHK3 nUr1q20AvI1QNiFEysbA86+pROmi9HGTItEPwnnM= Date: Tue, 3 Sep 2024 12:04:57 +0200 From: Greg Kroah-Hartman To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Arve =?iso-8859-1?B?SGr4bm5lduVn?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Carlos Llamas , Suren Baghdasaryan , Tangquan Zheng Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to reduce the lock duration Message-ID: <2024090331-rewire-ransack-e73b@gregkh> References: <20240902225009.34576-1-21cnbao@gmail.com> <2024090325-sublet-unsworn-b6a3@gregkh> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0FF3F2000D X-Stat-Signature: c9fa36w1nb9i6fnegjwqyj83f9ndw7ui X-Rspam-User: X-HE-Tag: 1725359831-509547 X-HE-Meta: U2FsdGVkX1+s3WzLOkcfaSWS8vVROZx+VO68ek6I3OUeUbiMW8dlqtOfAs3p0CP41Va2DuBCJcdpvXzvif97Qd+eyYBqNGZ5NghWpc6Jdttv8ZuYJ2nOCN5YpFAmMRcqbIntQZZgTUAE2d7TU4uIrRhspduy3w2iZ3oIJ6BJYjCkmMtOyM2B0svxD8zAXDSVteCvJNIqrZFZEOb5Bx8TJrqvZLT6+lk4UCrvbz1ABy7Srq9fD7OjkVGs4p4xIIYCnoNTqUHfMFINNR3qmL9gATSdqH9lErmPsDPgKQXeyhwZQPTLi1D1OR2sVVEwDQBAqF/93SNpsQhdMl6COIbx26siwgEkuCBcP+xujnC/vZF0FTHe1NuVKOMbIeUWL3kJvuVReZOPHGxV3Oul31EBhC0MXz/fuYzBkBwRU5JG6ZU5eJt33/rpDR+PeoRtz72zzBZl8L2v/7c1U4TzkumkYXBy4f0ALcibSQu2g3Zzr7jMZrdBm0sqVHae8Xbr67qF9AmAYCmA3Y7tm/YeTnjQ8A+JGo4v/UKdSJZoJjlHOzyI14rVwbzeUwq+T8m3INKLjkvR75cYCRj1/N4i1pPF/bLx0VFvozp4pflEdWH4Hc3kKOu/AEArFFjz5hz4YuZ6S4usKydpbONyYm22RFIfQdwbntMXR99S2LBoXIuH3ozfgkv9Hn00SZGe4yFGOWG9F7ioDVY1GQYOZe3sd2nx/QyFLZgEwKPlqf9UHDSywd1MJZkNsWCmYxIKrr2ujnmTf2AxjXfh5U9DIhhAbstXDXUK0PKz66HCg1uD7qITqElVhEXPs/SRgIF+vDzu4nDbyKb5IwPbWN7fohTXa6zW5GeIroB2Gg2NQ6rwCYAay+kGza+D/lmngd4kmL1uMpCPiCi9SmZpchexNislw59hvzBYsrFQggx6Q3plWYliy6fusSjfpmNZIb//I5VdVTIDFrd/us8mIfahI8GQqdf P8pt1ZF1 zq+bTlwya1iZbnqVP401t4fh/d2GdmaECWhoVIyxdaZnz685s3I3e5XWcomEAc9mD0XWK93m6ko+lljqq0Fdk0VyliMdMSTtd5j+IbSBIpJnoro/hHsQMtR7GXia/q9lUXKQ3UpoGL0LSGIDNJ3Z4o72CfAjYSTGdfGPb4m1mmAwR8iAn3l5DkMWQXlWNgpCBgJFhvBS+aZPKDrhV2MMPf0X87cRJNSQF277wX2nxtOM6acWepf2eumDbYwFXu0JHk6rue6GO40YaeKnlVsBxAPJeHdBfC6kt+PG5hd7qO1+CMnEDrBSBGmd+ulx2iAP2239JPOfDSHXUO6LuwJxFrjG8ImbwoxXau7EXdfS20HCxndeoLGNmmGsIYJDnSO44ep+iqi9ixKkyL1P1FMdbPH2dtMSyFdaTJncRU42u02T0C4KW21ksPZT0USugMkA52u7fAGY3j/KeIHsuf59yVvcGZIabcQaJq8fxpOr4FZdgOYJ0i2YRIl2D0gSI73Hb0x1gVPyiwhFfGFGn8vZfnUIE0pgRw5/QYkTHupCQQZfnI1FlllCMsfLk2hGO8OkcO/SgKFW4ol8hS9CG3kMIme5R8CBEFmzJZe9NLydzogQdvPplC1zlPdU+g1wZLaTlIOJo/QweRcqxLWVo36rtSJIy2w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 03, 2024 at 05:07:23PM +0800, Barry Song wrote: > On Tue, Sep 3, 2024 at 4:57 PM Greg Kroah-Hartman > wrote: > > > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > > From: Barry Song > > > > > > The mmap_write_lock() can block all access to the VMAs, for example page > > > faults. Performing memory allocation while holding this lock may trigger > > > direct reclamation, leading to others being queued in the rwsem for an > > > extended period. > > > We've observed that the allocation can sometimes take more than 300ms, > > > significantly blocking other threads. The user interface sometimes > > > becomes less responsive as a result. To prevent this, let's move the > > > allocation outside of the write lock. > > > A potential side effect could be an extra alloc_page() for the second > > > thread executing binder_install_single_page() while the first thread > > > has done it earlier. However, according to Tangquan's 48-hour profiling > > > using monkey, the likelihood of this occurring is minimal, with a ratio > > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > > negligible. > > > On the other hand, holding a write lock without making any VMA > > > modifications appears questionable and likely incorrect. While this > > > patch focuses on reducing the lock duration, future updates may aim > > > to eliminate the write lock entirely. > > > > > > Cc: Greg Kroah-Hartman > > > Cc: "Arve Hjønnevåg" > > > Cc: Todd Kjos > > > Cc: Martijn Coenen > > > Cc: Joel Fernandes > > > Cc: Christian Brauner > > > Cc: Carlos Llamas > > > Cc: Suren Baghdasaryan > > > Tested-by: Tangquan Zheng > > > Signed-off-by: Barry Song > > > --- > > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c > > > index b3acbc4174fb..f20074e23a7c 100644 > > > --- a/drivers/android/binder_alloc.c > > > +++ b/drivers/android/binder_alloc.c > > > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > if (!mmget_not_zero(alloc->mm)) > > > return -ESRCH; > > > > > > + /* > > > + * Don't allocate page in mmap_write_lock, this can block > > > + * mmap_rwsem for a long time; Meanwhile, allocation failure > > > + * doesn't necessarily need to return -ENOMEM, if lru_page > > > + * has been installed, we can still return 0(success). > > > + */ > > > + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > But now you are allocating new pages even if binder_get_installed_page() > > is an error, right? Doesn't that slow things down? > > very very unlikely, as the ratio is only 1/2400 while write lock 100% blocks > everyone. Ok, but: > > How was this benchmarked? > > i actually put Tangquan's profiling result: > " > However, according to Tangquan's 48-hour profiling > using monkey, the likelihood of this occurring is minimal, with a ratio > of only 1 in 2400. Compared to the significantly costly rwsem, this is > negligible." That's not a benchmark, or any real numbers of how this overall saves any time. > > > /* > > > * Protected with mmap_sem in write mode as multiple tasks > > > * might race to install the same page. > > > */ > > > mmap_write_lock(alloc->mm); > > > - if (binder_get_installed_page(lru_page)) > > > + if (binder_get_installed_page(lru_page)) { > > > + ret = 1; > > > > That is not a valid error value :( > > > > > goto out; > > > + } > > > > > > if (!alloc->vma) { > > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > goto out; > > > } > > > > > > - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > if (!page) { > > > pr_err("%d: failed to allocate page\n", alloc->pid); > > > ret = -ENOMEM; > > > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > if (ret) { > > > pr_err("%d: %s failed to insert page at offset %lx with %d\n", > > > alloc->pid, __func__, addr - alloc->buffer, ret); > > > - __free_page(page); > > > ret = -ENOMEM; > > > goto out; > > > } > > > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc, > > > out: > > > mmap_write_unlock(alloc->mm); > > > mmput_async(alloc->mm); > > > - return ret; > > > + if (ret && page) > > > + __free_page(page); > > > + return ret < 0 ? ret : 0; > > > > Please only use ? : for when you have to, otherwise please spell it out > > with a normal if statement: > > if (ret < 0) > > return ret; > > return 0; > > > > But, this is abusing the fact that you set "ret = 1" above, which is > > going to trip someone up in the future as that is NOT a normal coding > > pattern we have in the kernel, sorry. > > > > If you insist on this change, please rework it to not have that type of > > "positive means one thing, 0 means another, and negative means yet > > something else" please. > > I was trying to consolidate all free_page() calls into one place. Otherwise, > we would need multiple free_page() calls. I'm perfectly fine with having more > free_page() calls in both the ret = 1 and ret < 0 paths. In that case, > the ret = 1 > path can be removed if you prefer. Remember, we write code for people first, and compilers second. You have to maintain this code for the next 10+ years, make it _VERY_ obvious what is happening and how it works as you will be coming back to it and not remembering what was made for what reason at all. thanks, greg k-h