From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EDA7CD13CF for ; Tue, 3 Sep 2024 11:01:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C065F6B02ED; Tue, 3 Sep 2024 07:01:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8E8C6B02EE; Tue, 3 Sep 2024 07:01:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A30DE6B02EF; Tue, 3 Sep 2024 07:01:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 770926B02ED for ; Tue, 3 Sep 2024 07:01:28 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 33781811D3 for ; Tue, 3 Sep 2024 11:01:28 +0000 (UTC) X-FDA: 82523135856.13.442DCFC Received: from mail-vk1-f172.google.com (mail-vk1-f172.google.com [209.85.221.172]) by imf13.hostedemail.com (Postfix) with ESMTP id 68B5520024 for ; Tue, 3 Sep 2024 11:01:24 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Afai0JcG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725361178; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z+W5ncyMf/3/rrJ60/0xzarvVQ+hV/TRfWbGHxMBYTY=; b=cJuV0j+O0Z1fGLiSNzRSuDmwNPet2MsQLMPPPlY3SizBhq7pUZiaAYLVzTcb4KtjzJOgFV Kcl9v5Ybrzu/jXVUYqK8F6w85cNeRQXd4Lf7WtC5iD/fupYWxs7lJhDAqpXR1eoFoUNkO7 lML2VGkyUcIR+jU2HZht2W0Qi/PNTTg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725361178; a=rsa-sha256; cv=none; b=zGZug0+2p2RbHIp5GgukeHVh2eFMT9/YPSeinC5DZD8wbLZPS9l095jyKkppcghuXSd8Ix OTITsbCJl1RaBzP2l2c7HJDu8oeKgJsOLaWil50DE0DHxck+G2WSbBqBK5uWBX36gHP1rN /summ/IouaTR1DOcAJKNQ1RXJPvdZKU= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Afai0JcG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vk1-f172.google.com with SMTP id 71dfb90a1353d-4fce352bd28so1450870e0c.2 for ; Tue, 03 Sep 2024 04:01:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725361283; x=1725966083; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Z+W5ncyMf/3/rrJ60/0xzarvVQ+hV/TRfWbGHxMBYTY=; b=Afai0JcGnN7bQcLSixNtZwnMepOvrqdbUkLCdKmhSgr38aQH1usVMIZ0qROU3zrrOi kAe7g1pTFXcW/BnVEPYES7vjkQ83ULNx9CaAY2d1P1XzI+lSPWr19ftXkr1YfkPevX62 4hWkxPuvjO4jO49i6+sDh1t/PU4cuafmiegv7VjYjEIkDaG/N45oa5qkWUb2Jn6B2se5 N1mKX12kP58O5ECy1rn6AK8gSdoDNTEI/6j/U1tpHgpX1J6lJkRvW2j6tRAF5PCyoPlM ZRuyAxB7i5BYmieShldTS3p8YfjlFUgGKz1r1SOJFAFffq6s/iuxklOPAk/ZHaWNQ9aC tZPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725361283; x=1725966083; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z+W5ncyMf/3/rrJ60/0xzarvVQ+hV/TRfWbGHxMBYTY=; b=a0qEuvXBPwTbctKRYs51gHw0hkgVKxoGDPKivRDcadP21MIdr29/Kh38FjWVkbav+m s6oUBfKakJ1t+qmm9Zsf+CfUboR3Wi6XQ+UKQSnKl5bYX011n9PXMTL8cIR57hRKdE4q oHQC+kuwHNHD9gSAji7eeHbesgvQzLCjPfmZUP85kz6LjtAEy2uOX6lA80uhirlhlqWI JUpTIUXJAs5RxuSMTAHvdcc9hXnOmsbrDUeN4Y4OqCv7Dd8INxxEIuzp3zvzYyt3YqTj oTLir5MSM3YOrbw4PpnXJyvLgWoSfoH2sS/IlweulI2YFdBw+G0d7SEj9Mm+tvHz86ig T3fw== X-Forwarded-Encrypted: i=1; AJvYcCUBRQd5Yng2hy+NWC0zcn7/BLBQ3wZP3PcfXndZB/kKS5U0g40OnfPC9Ppqa/O89bhO1aNxMQzUSw==@kvack.org X-Gm-Message-State: AOJu0Yz0FpRFlPwM2fDY+0bcm7YsZFmfO1M6NgWNTp0SOOXjoQiBxhjf svoms13YA6HT7BB0doiY20LcTdq5d5TZ7tZPmgKYqlczHFDMwqQYyCJ4cvWUC9gQIFd5N4IpDJ3 7piYoGhzRMYZ+bIwj6pY7AyXrkZ4= X-Google-Smtp-Source: AGHT+IEqU2aFaT+yL685jTuWGCTiEYj5BFrK9lDDwFRzVWMgsH4ldSem0QL8Xruidl3kmw3nBoEuaaC7+Rb+LgnOQpQ= X-Received: by 2002:a05:6122:182a:b0:4df:1d06:eeb7 with SMTP id 71dfb90a1353d-5009ac02acfmr11087699e0c.1.1725361283250; Tue, 03 Sep 2024 04:01:23 -0700 (PDT) MIME-Version: 1.0 References: <20240902225009.34576-1-21cnbao@gmail.com> <2024090325-sublet-unsworn-b6a3@gregkh> <2024090331-rewire-ransack-e73b@gregkh> In-Reply-To: <2024090331-rewire-ransack-e73b@gregkh> From: Barry Song <21cnbao@gmail.com> Date: Tue, 3 Sep 2024 19:01:08 +0800 Message-ID: Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to reduce the lock duration To: Greg Kroah-Hartman Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Carlos Llamas , Suren Baghdasaryan , Tangquan Zheng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 68B5520024 X-Stat-Signature: wq4xuy8b9hdfxgf138ja6rzd58wfiykp X-Rspam-User: X-HE-Tag: 1725361284-213695 X-HE-Meta: U2FsdGVkX1/zvNgDfI2yj+A+FeaIgdfHD1ena47E9gY1CHW3rxs8HKJUhb9VgEeKyFOFYlig/wTlIRpniRjNzTf9hmR1FT2qAr3hgvpKNecbHpB300E1YR9mWmEFWVmeX1rCE0XGanM9Y3l7lLzBvUwgNQ1Eep6SVvdeN0rsZdCYo98E5v/VOSSL5zcqiooMa+PWfRdtCeGWuBoIc0yTLTSXg6vDkNRyWZSucERN9dKVB1vE+usCYEUzuLC+GRm0kgPgDuRwVnRCevvT2ZjMUAPlZ1z9nEJuFTuFiJqgXfPGyumtgjMIS0Dbn/BSihLt74vYBW6StLI/1VdSNgiepJ+qzIVlhOxhrV+tk3Do6VFYdG3bc5bmuB7tbKvS/3tVpZryI+Kd6ybApfnXZTWy+u0j0v/Ax45xEoCXZvJoUKzriXyi8H0O1wout244xxePDDFPyDFEMElNtybEbjL+jJxuWGI5pabJOIQuppw/kAGJv59AUaPRJ+ZlZ+VtkHyV98NSOLNiZht5hbDOF4HntfWwPU7mNZqlWCTu/zHWO7ElO+PuysBPvW3lf916+OlfO/H2nsvn9RWjwPt7pM2RZyyiRd0Ex4nGnefoskuBSlxqT+WkVSaCM/uT8qvzOpslYrpxb+8z4zVHOr+qR92WskZzcB8GDnEpap9Jg8ssNxDejfyyl51f5pmpIneBJ9OfBJW8sXbg88AkTOIdMO1tP0jXA/ya7gpxYp0wh3dTAE3lbpZvqUjZz8HeW4LDholjAX4IedB7/Hy2Q2csa3ZsfvSKL4tscGISuqzCy2YECUxdoSsrRnOdI/qh9VLOyOzUsN/NCE0dwrgfcaXIi+CBIjBPrIM0cBqfoVUivyJ5Z5OehKzYMjmUgbgy+J+h8OCrDx/J0SYZndlTRgZsvoDldGAv3uCoetZCO65iPeADmzaEMMn1g45RG6PLgL9JfA+anoyIj4Ey4svETJnrZBY iSB41/x6 gcU8O5RVpEPmrHZ4iKJypuEV5KV49VLP7vJc94S8qtBkNptORgj8FNyC5K2RVngFz6TYe9zTvN0ZOyWGWIxSUdpqfOTsk/z61hi8Y4Sg1J/smzlbbPA7/TS9rfkbgLHaGEayHq74Evw9aWjfOgJAOBbs6sQpT4VJSfS3CYRxPJ7Vjj6BjL9Akq9VflPR4a51v3TPRh2TsGg4nCz/i5/nJ7CzeuStmI+14zVz3dducGsgXEgDUOUwoQgt0a5ya1FxenVXhiOwLmb/Bj/RvjjLNqTVunKYDfy3q73FnnDnaUgovLkxdKMjIy3r1Kbg91qh2OVQn4e7s0hvy9vEahLGCNrMlrhGtdapBopaP/VQ8uFow9Jpt6gRUGaGcNFcvCdH1tcaYbNdYq8XkCBtFp9n4gKOeJSswCrQbPo6SKAq6jXLbtGLU4uH+py3lgS07jw0E9wmIxR5TLPbLlGl35tS8sT8voXDqJZOPWTF9c+e0lR5XXqb1rgpdRKryZOw5pgb5+NnpFCCIpmFwfocDErxIVxTZN/zmO/MCgyRnyAEvwcNNY4hOelk32QqGvPhUpKX3iXt4hLYQvV5G4Sk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 3, 2024 at 6:37=E2=80=AFPM Greg Kroah-Hartman wrote: > > On Tue, Sep 03, 2024 at 05:07:23PM +0800, Barry Song wrote: > > On Tue, Sep 3, 2024 at 4:57=E2=80=AFPM Greg Kroah-Hartman > > wrote: > > > > > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > > > From: Barry Song > > > > > > > > The mmap_write_lock() can block all access to the VMAs, for example= page > > > > faults. Performing memory allocation while holding this lock may tr= igger > > > > direct reclamation, leading to others being queued in the rwsem for= an > > > > extended period. > > > > We've observed that the allocation can sometimes take more than 300= ms, > > > > significantly blocking other threads. The user interface sometimes > > > > becomes less responsive as a result. To prevent this, let's move th= e > > > > allocation outside of the write lock. > > > > A potential side effect could be an extra alloc_page() for the seco= nd > > > > thread executing binder_install_single_page() while the first threa= d > > > > has done it earlier. However, according to Tangquan's 48-hour profi= ling > > > > using monkey, the likelihood of this occurring is minimal, with a r= atio > > > > of only 1 in 2400. Compared to the significantly costly rwsem, this= is > > > > negligible. > > > > On the other hand, holding a write lock without making any VMA > > > > modifications appears questionable and likely incorrect. While this > > > > patch focuses on reducing the lock duration, future updates may aim > > > > to eliminate the write lock entirely. > > > > > > > > Cc: Greg Kroah-Hartman > > > > Cc: "Arve Hj=C3=B8nnev=C3=A5g" > > > > Cc: Todd Kjos > > > > Cc: Martijn Coenen > > > > Cc: Joel Fernandes > > > > Cc: Christian Brauner > > > > Cc: Carlos Llamas > > > > Cc: Suren Baghdasaryan > > > > Tested-by: Tangquan Zheng > > > > Signed-off-by: Barry Song > > > > --- > > > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binde= r_alloc.c > > > > index b3acbc4174fb..f20074e23a7c 100644 > > > > --- a/drivers/android/binder_alloc.c > > > > +++ b/drivers/android/binder_alloc.c > > > > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct = binder_alloc *alloc, > > > > if (!mmget_not_zero(alloc->mm)) > > > > return -ESRCH; > > > > > > > > + /* > > > > + * Don't allocate page in mmap_write_lock, this can block > > > > + * mmap_rwsem for a long time; Meanwhile, allocation failure > > > > + * doesn't necessarily need to return -ENOMEM, if lru_page > > > > + * has been installed, we can still return 0(success). > > > > + */ > > > > + page =3D alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > > > But now you are allocating new pages even if binder_get_installed_pag= e() > > > is an error, right? Doesn't that slow things down? > > > > very very unlikely, as the ratio is only 1/2400 while write lock 100% b= locks > > everyone. > > Ok, but: > > > > How was this benchmarked? > > > > i actually put Tangquan's profiling result: > > " > > However, according to Tangquan's 48-hour profiling > > using monkey, the likelihood of this occurring is minimal, with a rati= o > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > negligible." > > That's not a benchmark, or any real numbers of how this overall saves > any time. My point is that mmap_rw is one of the most notorious locks impacting system performance. Our profiling indicates that this binder's rwsem lock is among the top two write locks, blocking page faults and other accesses to the VMA. >From the memory management perspective, I believe the improvement will be quite evident. However, I agree that we should include more data to show how reducing lock contention can enhance overall user experience and minimize UI lag. > > > > > /* > > > > * Protected with mmap_sem in write mode as multiple tasks > > > > * might race to install the same page. > > > > */ > > > > mmap_write_lock(alloc->mm); > > > > - if (binder_get_installed_page(lru_page)) > > > > + if (binder_get_installed_page(lru_page)) { > > > > + ret =3D 1; > > > > > > That is not a valid error value :( > > > > > > > goto out; > > > > + } > > > > > > > > if (!alloc->vma) { > > > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func_= _); > > > > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct bi= nder_alloc *alloc, > > > > goto out; > > > > } > > > > > > > > - page =3D alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > > > if (!page) { > > > > pr_err("%d: failed to allocate page\n", alloc->pid); > > > > ret =3D -ENOMEM; > > > > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct bi= nder_alloc *alloc, > > > > if (ret) { > > > > pr_err("%d: %s failed to insert page at offset %lx wi= th %d\n", > > > > alloc->pid, __func__, addr - alloc->buffer, re= t); > > > > - __free_page(page); > > > > ret =3D -ENOMEM; > > > > goto out; > > > > } > > > > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct bi= nder_alloc *alloc, > > > > out: > > > > mmap_write_unlock(alloc->mm); > > > > mmput_async(alloc->mm); > > > > - return ret; > > > > + if (ret && page) > > > > + __free_page(page); > > > > + return ret < 0 ? ret : 0; > > > > > > Please only use ? : for when you have to, otherwise please spell it o= ut > > > with a normal if statement: > > > if (ret < 0) > > > return ret; > > > return 0; > > > > > > But, this is abusing the fact that you set "ret =3D 1" above, which i= s > > > going to trip someone up in the future as that is NOT a normal coding > > > pattern we have in the kernel, sorry. > > > > > > If you insist on this change, please rework it to not have that type = of > > > "positive means one thing, 0 means another, and negative means yet > > > something else" please. > > > > I was trying to consolidate all free_page() calls into one place. Other= wise, > > we would need multiple free_page() calls. I'm perfectly fine with havin= g more > > free_page() calls in both the ret =3D 1 and ret < 0 paths. In that case= , > > the ret =3D 1 > > path can be removed if you prefer. > > Remember, we write code for people first, and compilers second. You > have to maintain this code for the next 10+ years, make it _VERY_ > obvious what is happening and how it works as you will be coming back to > it and not remembering what was made for what reason at all. no objection to this. > > thanks, > > greg k-h Thanks Barry