From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C327BCD3427 for ; Tue, 3 Sep 2024 09:07:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FA788D0155; Tue, 3 Sep 2024 05:07:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A9ED8D0151; Tue, 3 Sep 2024 05:07:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14C8C8D0155; Tue, 3 Sep 2024 05:07:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E75078D0151 for ; Tue, 3 Sep 2024 05:07:40 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 79E68A11A5 for ; Tue, 3 Sep 2024 09:07:40 +0000 (UTC) X-FDA: 82522849080.06.8F612B5 Received: from mail-vk1-f182.google.com (mail-vk1-f182.google.com [209.85.221.182]) by imf13.hostedemail.com (Postfix) with ESMTP id A32C62000F for ; Tue, 3 Sep 2024 09:07:38 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bPnuZnSa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725354381; a=rsa-sha256; cv=none; b=CNmFInSCJ65UuGkHvKkZdWj8aEtVSAfyLRXg7p7cPwsMXlPHHo/e7VPRNjoslEvj8CEJc2 JB+QFBUt2W5ms2ht8Gcs9Xz7Lg5ILybGm++Ay9IbFtDHoxRqUEG9LKZWSrOmo+xjYYt4OS iG4PjCLMOcGZ6EeNqTEfYXSR0Ru0WAE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bPnuZnSa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725354381; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e85t1jRey5hTU/qXZuJShM2FVDPgwHYzlYskgkKH3lk=; b=bE3EquCv1T+Ju53IYQAgZDNFkkYzXgvKhM8k6x28XcO6P71RnT1GJ2a4tcknmNRysOqfvd iJLkKx4lyuPFPELKSWkQ48sg5zoLD7yBvZVJ7TCJuAGftfmK9u2rVIcCZcgZDYI9VMTJs7 IJ90u1nrX52KtYDWfwp+ZD9kE5RGGNY= Received: by mail-vk1-f182.google.com with SMTP id 71dfb90a1353d-4fd094c88afso1675828e0c.0 for ; Tue, 03 Sep 2024 02:07:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725354458; x=1725959258; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=e85t1jRey5hTU/qXZuJShM2FVDPgwHYzlYskgkKH3lk=; b=bPnuZnSa5al38oBeEIkymL38uoo6MaiiYHGfAPToYRmA135jyg9N/JVqi/6FIxRXBd xHTuxHknfvQBGEvp3wGy3bPQ4ZQc8sSjLE6AYG98RKFov20GA84l4eEZPVcuXrbArIA8 GGTgP926r9wJlYtz8zxXLVPoRHsLgXyEumTN2l1aQ2fq+3Re7LGaqkqhKDG+DXesXIy5 VG9dFJfMEJvyO6dIMUPMHB0tWXi+Oeg+va4VwidOSCJvxS9b1ctSJEN9tl8haQPHkK0r OxgMPtYibAXgJliJVfukntwB1C8O7rYck0PEDmyXemGQvn1vX0INP6X34h/SOnJtN4VH T9rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725354458; x=1725959258; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e85t1jRey5hTU/qXZuJShM2FVDPgwHYzlYskgkKH3lk=; b=OUjiknZfGgKpq+uAMXl+FoNuytJuSetlDxIli4GV2xSK+YJDFvmfHhL70nPnc+SOlP i+wwQoLKEDK9VpoBAlGL2qz+YoMUryBbuduK4ehW0vJl86iRggY1mpUIH0HAUuRE1bCF a+GpDiVIU8ij32Xoq/9ED5Ugr1GZGyaOOExTZL3CSiuBF++LNuUjo32CONdVKzxRAc0c 5X32c6eGgN4frShb/4mIgWIIMFHU+z3RwlIpzZ+s6N/imyINsy8dMk9SAo/oOpa6THML 1ax71qDlmBI/gdljMV+2luIAX2T5KYoAHjUgxd50pfQXQkKbZuuYkiU5AKjaxGu5O09Y ohkA== X-Forwarded-Encrypted: i=1; AJvYcCX0m/GoInN+tQJTHNvvTEEjSblKccjSuw5Z2m2I0exNl3VACZOVMUF+qs91Jkeh8cO4mDLvBos6XQ==@kvack.org X-Gm-Message-State: AOJu0Yxxl96w5Xe8JL0/109/bV0138UStpE8zxrbFIVi9zUHaOiRh+YG WKxLwNkZftPfe2R55juKgzqYhWPBaQ71RY9+Pqhzfg9HIrKRPreNm38YJ0PGIIDMjYid2pi3eRl 9wreZ0gqYB3TFqeDoz4xmCe3gNL0= X-Google-Smtp-Source: AGHT+IHj1AYVj3zCjimE6wgEWqwQiNxIzOdiHbhd55efBHjy3uYLPaw0NRbvJjhQByOZI/TtPwKxBmLc4xUZcamm5RY= X-Received: by 2002:a05:6122:c97:b0:4f5:446c:f749 with SMTP id 71dfb90a1353d-5009ac67965mr11745711e0c.6.1725354457477; Tue, 03 Sep 2024 02:07:37 -0700 (PDT) MIME-Version: 1.0 References: <20240902225009.34576-1-21cnbao@gmail.com> <2024090325-sublet-unsworn-b6a3@gregkh> In-Reply-To: <2024090325-sublet-unsworn-b6a3@gregkh> From: Barry Song <21cnbao@gmail.com> Date: Tue, 3 Sep 2024 17:07:23 +0800 Message-ID: Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to reduce the lock duration To: Greg Kroah-Hartman Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Carlos Llamas , Suren Baghdasaryan , Tangquan Zheng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A32C62000F X-Stat-Signature: zhu633na1d1hqt64a3gsca4kp4ot9jfg X-Rspam-User: X-HE-Tag: 1725354458-610789 X-HE-Meta: U2FsdGVkX1/4LtGDfqoI9AzHoQ9xESXo7oJ/H+u48He+IdwKY1gEqcxjc3Hp2wnywujInErO6aMz/VHlYOUFG5/dsm55WdbLy7+7nB42tYpx3DCYTyDkJMZOvhTc/wCf84Ii3KmF/T0k0a2tJu2hQCvoaagTWpUHT7+RdxwuI+e8byP9S3PlXJllk0Zmpd5djI1k1rDZVPwrQvm/gktMJzZyILyic8vCIv3R0uK8zcdVOagVWF6lETjuHC7/5dj5GVS9BddB9DMHSpd2am+YoJz83P8qAJqQ2oi8ZZYZ65u8WglaWpU7p8GK7TmFf2hlP4immMcSWO0QqJx+kA99yxaqbYdi+Iw1yyWXU6aL+qBnxu1Ed2UFZuyiBSQiGqCPgv/cPtyysgVyeZu8GG79EiL8DV6JH9M1m4hQei8DlYetL4VR1jI4eJjcqlMvzuQ0Q0HCwKrKvDXXdUKT2RQVLHhnTPuhdXoFBr31IRvkJdVeNrmpIW8TQtk6kpH6pDsQTwraxL/HgIJDRcVepERN0t7O1yeQoaYe4vjvrR0ofJv+IKGIZ+V3nWMOu7S3QTWtZd7WxMTq6pC1I0792IrG/ZJ+qCtRpZog7iOteyU9f+amn7d7/NhLCUZ6kYxwwu9lutT5HHPEEm1Jp8Sldxgx8ZRHpBMQkPhLHwhoUbzsFvmNfFtZ7XtCG8naIDk1zVRjXB12o1UEBje8i7pEK/CwNwZjrm5hvi5iGvra/g3dcWW2+NLOgDMDA2hhg9+Qzjf0BcZWdukrmS/aBN1e1TcfSLRr1hbreZYfDIBozuAGxiZYZodKq2+CNAmesEuExgUnAedtEIvuz9vx6rPf+5ZhZIgewLazrcP6asmQbsFXLpAxPkLb/S0NOEANd/R0C/Z1KS2mi7d+5mzGgo/3QEDor7MyZ+ZZrCLCpeg/i0kIi+IcRJ/fEWq/Zwg2QfSPAy18qQ6pmBRgvTYw7RDiO42 Z4WGVeYU Qg0YNaRWNindBv7h9QLrhabnmGpG7lgn6lzW1XGOCpqhu7gYN+24rXSX66wnNkpTHywXtPFlARLDiGoEkQPlVueNshF7T8Qq8BUgJHnszqIRS069e+ij2bYhW1P82h/JLwfwrmOJWAjhvih2NaY2+Daiikcgj5FhJsHr/LeND3b/ZQS2xZyR3rNdKlBNIZnocj0Trw9QfjuPOSKN52VH9NpBdgqP/m39eF0D6DY/3CtsOTsbFL4MaKYbZez/1+BFlzRIY528Mo+XPRhLYpBKW2Wa2VuU5WDWfdZytvl6dgY3ivWEoCH5HRSLfHT/T+3iv4aI7nmhbALlODteqRd14hmVAEyeIj/O5BSoQIKRs0GNpqyrRVUHKqDQSA7G91WxlsQ3Y0r6jWtAGJi/XgRI44ifVnZEA4VwZlW7zQ39lElLeD9YZ9UD0wENHZd4FWAxUIAjyAvsiwm4+qRErHxiZaAQrXcSbMuigg2SJ6gEmzf5kgDsCYxovrfUWpXKsRXvu/cF2492OE0QwBX1CtTGgnpgmsM2emja7AJevhkbTa+Ih90urzNYrJ3Qoby74FoPbcK7eLboN43El0MU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 3, 2024 at 4:57=E2=80=AFPM Greg Kroah-Hartman wrote: > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > From: Barry Song > > > > The mmap_write_lock() can block all access to the VMAs, for example pag= e > > faults. Performing memory allocation while holding this lock may trigge= r > > direct reclamation, leading to others being queued in the rwsem for an > > extended period. > > We've observed that the allocation can sometimes take more than 300ms, > > significantly blocking other threads. The user interface sometimes > > becomes less responsive as a result. To prevent this, let's move the > > allocation outside of the write lock. > > A potential side effect could be an extra alloc_page() for the second > > thread executing binder_install_single_page() while the first thread > > has done it earlier. However, according to Tangquan's 48-hour profiling > > using monkey, the likelihood of this occurring is minimal, with a ratio > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > negligible. > > On the other hand, holding a write lock without making any VMA > > modifications appears questionable and likely incorrect. While this > > patch focuses on reducing the lock duration, future updates may aim > > to eliminate the write lock entirely. > > > > Cc: Greg Kroah-Hartman > > Cc: "Arve Hj=C3=B8nnev=C3=A5g" > > Cc: Todd Kjos > > Cc: Martijn Coenen > > Cc: Joel Fernandes > > Cc: Christian Brauner > > Cc: Carlos Llamas > > Cc: Suren Baghdasaryan > > Tested-by: Tangquan Zheng > > Signed-off-by: Barry Song > > --- > > drivers/android/binder_alloc.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_al= loc.c > > index b3acbc4174fb..f20074e23a7c 100644 > > --- a/drivers/android/binder_alloc.c > > +++ b/drivers/android/binder_alloc.c > > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct bind= er_alloc *alloc, > > if (!mmget_not_zero(alloc->mm)) > > return -ESRCH; > > > > + /* > > + * Don't allocate page in mmap_write_lock, this can block > > + * mmap_rwsem for a long time; Meanwhile, allocation failure > > + * doesn't necessarily need to return -ENOMEM, if lru_page > > + * has been installed, we can still return 0(success). > > + */ > > + page =3D alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > But now you are allocating new pages even if binder_get_installed_page() > is an error, right? Doesn't that slow things down? very very unlikely, as the ratio is only 1/2400 while write lock 100% block= s everyone. > > How was this benchmarked? > i actually put Tangquan's profiling result: " However, according to Tangquan's 48-hour profiling using monkey, the likelihood of this occurring is minimal, with a ratio of only 1 in 2400. Compared to the significantly costly rwsem, this is negligible." > > + > > /* > > * Protected with mmap_sem in write mode as multiple tasks > > * might race to install the same page. > > */ > > mmap_write_lock(alloc->mm); > > - if (binder_get_installed_page(lru_page)) > > + if (binder_get_installed_page(lru_page)) { > > + ret =3D 1; > > That is not a valid error value :( > > > goto out; > > + } > > > > if (!alloc->vma) { > > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); > > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder= _alloc *alloc, > > goto out; > > } > > > > - page =3D alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); > > if (!page) { > > pr_err("%d: failed to allocate page\n", alloc->pid); > > ret =3D -ENOMEM; > > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder= _alloc *alloc, > > if (ret) { > > pr_err("%d: %s failed to insert page at offset %lx with %= d\n", > > alloc->pid, __func__, addr - alloc->buffer, ret); > > - __free_page(page); > > ret =3D -ENOMEM; > > goto out; > > } > > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder= _alloc *alloc, > > out: > > mmap_write_unlock(alloc->mm); > > mmput_async(alloc->mm); > > - return ret; > > + if (ret && page) > > + __free_page(page); > > + return ret < 0 ? ret : 0; > > Please only use ? : for when you have to, otherwise please spell it out > with a normal if statement: > if (ret < 0) > return ret; > return 0; > > But, this is abusing the fact that you set "ret =3D 1" above, which is > going to trip someone up in the future as that is NOT a normal coding > pattern we have in the kernel, sorry. > > If you insist on this change, please rework it to not have that type of > "positive means one thing, 0 means another, and negative means yet > something else" please. I was trying to consolidate all free_page() calls into one place. Otherwise= , we would need multiple free_page() calls. I'm perfectly fine with having mo= re free_page() calls in both the ret =3D 1 and ret < 0 paths. In that case, the ret =3D 1 path can be removed if you prefer. > > thanks, > > greg k-h Thanks Barry