From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4095CD3420 for ; Tue, 3 Sep 2024 11:45:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62A218D0161; Tue, 3 Sep 2024 07:45:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DA668D015F; Tue, 3 Sep 2024 07:45:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47AA18D0161; Tue, 3 Sep 2024 07:45:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 26A558D015F for ; Tue, 3 Sep 2024 07:45:31 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C585B1C2297 for ; Tue, 3 Sep 2024 11:45:30 +0000 (UTC) X-FDA: 82523246820.01.8C04124 Received: from mail-vk1-f181.google.com (mail-vk1-f181.google.com [209.85.221.181]) by imf24.hostedemail.com (Postfix) with ESMTP id 05B92180026 for ; Tue, 3 Sep 2024 11:45:27 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HExPhri4; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725363833; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ac8wg5vVxztSH0hsUHH1RGi5ZxFIg+rR7fcxRj41EWc=; b=rJKNce90CcZVe6NjkqPNParHcDl1DFfTD0kKVB5W2wx753ssCqfe/vurQgn0QAkMOTtWI6 AA/m6+HVL8Gsa8Kfl0WYkbAi3I4aE219SmQ3lKYVo5rKq7ESotjciiMkFYltju5Hx/3GBJ /8XXdF/y2JjBUlqhzNW/zZUsu1+ufYE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725363833; a=rsa-sha256; cv=none; b=GKsunPQokp8LdFoEuhFLBVYi6bwxH7ZbsdphFR4/9/To41XKfoun5u8sIwemzqMLtAeIoT MZX9vrfsr6+EDzdLViBlLyjB2dTRbZzBw2UyL2Osw806tvJ3QFhCXn46dbBRELOi44bQVL QW7aul3wayB/zk77YO2VdehxvfCeevs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HExPhri4; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vk1-f181.google.com with SMTP id 71dfb90a1353d-4fe9f618665so1745071e0c.1 for ; Tue, 03 Sep 2024 04:45:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725363927; x=1725968727; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ac8wg5vVxztSH0hsUHH1RGi5ZxFIg+rR7fcxRj41EWc=; b=HExPhri4QDVG21F22cHuqb5HWnbzVa9xqaky9KQXvQihWo056iq6zKyigIdjJdJFPC 6Nz1p3cFBkZvzgJ/wqMrpFbfNWIbgqBvlN+gFtlfS/B1qf+6DTwxdj3yqGQV84XKkwbg PKsfJP0Ptq7Vtj0YfquSXcxTI3itouFnZBR4h3x0Kx8XXvTFUqfCxXJgsud+0V3gTZS/ l/H542M8ULBuU/hzh5VfZg0Qdt1vKCJBfcVYoUmEAlmwCicccQej4XHAJeqsmOy1prcw SfEZFMQzdkWKvoEGcu2E8HkDK5LVrhR70bsp0EYMZOlzrW1WFKPoSVkOvwMpnFFPyx0I PJZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725363927; x=1725968727; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ac8wg5vVxztSH0hsUHH1RGi5ZxFIg+rR7fcxRj41EWc=; b=HGqIBnltARpB3RUWTUNvU2HNbJpDcpYb+qiqtjwv+JS4thMpr00cId1jyXvGNnLlsX HJTUhCJNtpWdVj5vruulvUQHJTryFxxYBnTXv8I9EKYbszzfwn+/+oTIwwCiuH9MlzxQ quU2f3h7HOg4sTOOGSbcoGEfzswQrLaVsufWRnMn1G2SAUoLirjwZPgQz8RQoscBVjyF RSWYyd5CIpN379Zkgy8nnSpxuP6T5olXF9OOb3N0Sw1dBdlNK5YZON4/Vt4mI3QsokCH ++M4wJuYErJefSwwH5G1L95tQbQpzISmytOuovffhjXD3J9Eel8d2efu7kfCLNwbWAHa lIXg== X-Forwarded-Encrypted: i=1; AJvYcCU7OxH56EUSoBvq0tsAqnU/qe/baZ7kQPjr61+zekq3sHgB8Vf21oktsz0QizCHRRLLof9wpBxlSA==@kvack.org X-Gm-Message-State: AOJu0YyRdIRzCKKIqlFzorXMVyqegfqvCoRCnBcywmJv34qwft7oyZal l0FoVJ/CQkN1BNHAPpRBJSqMFzI6U4mon1oLy4yIdFVtAF8Jx+C8nqHEb6L4b5QsY/f6+LXaaaw CJa4GFGxOAQ+tY9CggwLef0ni/504fkpSwQA= X-Google-Smtp-Source: AGHT+IHYMvDwkTdUR5bfTMCeqJTcS8otsYNrbeRZ5Yur7qnM4Rq+zwENd+b/UbSwEAoxlAzkelenCAJbOjfJeUxYx6o= X-Received: by 2002:a05:6122:3b11:b0:4f6:a618:61ce with SMTP id 71dfb90a1353d-5009db5b0e7mr10178833e0c.7.1725363926973; Tue, 03 Sep 2024 04:45:26 -0700 (PDT) MIME-Version: 1.0 References: <20240902225009.34576-1-21cnbao@gmail.com> <20240903110109.1696-1-hdanton@sina.com> In-Reply-To: <20240903110109.1696-1-hdanton@sina.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 3 Sep 2024 19:45:12 +0800 Message-ID: Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to reduce the lock duration To: Hillf Danton Cc: Greg Kroah-Hartman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Carlos Llamas , Suren Baghdasaryan , Michal Hocko , Tangquan Zheng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 05B92180026 X-Stat-Signature: i5o87o7bx64d4du8mt8ihk1ieenage13 X-HE-Tag: 1725363927-484249 X-HE-Meta: U2FsdGVkX18qmL53bdJK9FCq59WbqETJ9SQghELacRkFNElmkt1bkI5Zi4izfVHujIdcMuFD1T2vZFLmvGD0X4TvpEyLx57dHl3enuF+Ob8U/lxHZgLWaE7NzVBEyAGUmnGRsu+dpYIcp5lja6Jh7vd21PprHs9UxWWCJsJdZBmRs2uVlYKZbHM9eLGB4Kb3NM95MbmHVFQzx4V3MrUXsJfj5pzqZFzFUV9V0C+4gLlxMBtfXmVeKQRHgVytjZxN8YjvZGbW/YiC3JIQ9fraO2MPIF8l4EbOxo62kUtnr23CmzyzV5H1gUdeBgK2nKVn/JunWDia+3i70yH+GNNRw63q5LcdyX0v4ctUFRBUnIgNJfLHxDlixtysUbr7Q0VMssT5VAZuRoIesJFh48PwVh0GJsXsp5LTYfZOY1hN9Njz7R1eWG33ie7Ag336sh4PbxuGmdw7RnBpgpWcR7iAVDB6bxu17ZHGCDUkyMghJa9oP/rA9hD3+RzOwI9Hmv35G5pzQXUIaxrpIpQydpC1H4pQ4Nf3hXyPwGulsK7C+fLhI33eZtJTSIZNZDZh5krDElJSE/J1pFf7yEu8dI4lAV6JfC0jhZzUHEHCbwBjsQmspHwYQIHgOO7YjpN3SzkW72/3n0/B3U/SwUSlKvjV//jF54v/Ft1HdJPhjZXVKcMuS6INWfGpFv6+CNWGmbIZSsHUhhL7Q7WP6/TjdXfpbvFaTR2g/rXWjeQ0XP1zOWd4N5ioarN7ABAEoKCqXizaXuAhJAxgwirGBDbunu4lpvv+3u+jhpdd2LmOGCRgUdAvYx14p5z8+fJwD3ViQLS/XRiDoMY7kn26G0V4zQbvMnezE/Y4Gd7+sJ7SV9g/Fuv/D1oa7+z3NS5mZKDrjuKVYnrqqs9yl+mOZevg8gNCni7mrofEgk74KfsWxSDZLPpjix8OOCr1Ycg+GgyD9JHneZYSduEfAUyaEKQ4s1w +CTx2wsu W7uu/1L9aCCoSqopxx+2WjaoVJ1Js+9/6es+JUek20FTQqGiHhrnkyGtDAAyD6TjgmY3OeQA/+AAipRnEXvfLGIAG7h4zC6g55aQVLfC7f3TmBUuSkCkMtDWAnnu22srVYtHyBKFK7y75AZQsjE7mw5mgHxueN/oXV70a0rjzurZAE65ZrAp1uyZ1Wpc0P9Rm2FbwgtHBMJs8o9f66b07SciictQHuBmmM8HtAAXkNuR4gzKU79No1D0iwlnraFgLgwqZurLk2Fijzf4t6qVZma40aNTfc+YlzPqjtv+PJIzwmpkkXcA7p/dZcOCieFGq/owvwww/+HmgSqHXEEKKxCQU1sufxm06SHpZjF+DaYQ3PGYU5Me32wQrYmjAy2kn4DxYdZi0/UFgut/nw2elcmJMMXF4cdmVoxiF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 3, 2024 at 7:01=E2=80=AFPM Hillf Danton wrot= e: > > On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote: > > From: Barry Song > > > > The mmap_write_lock() can block all access to the VMAs, for example pag= e > > faults. Performing memory allocation while holding this lock may trigge= r > > direct reclamation, leading to others being queued in the rwsem for an > > extended period. > > We've observed that the allocation can sometimes take more than 300ms, > > significantly blocking other threads. The user interface sometimes > > becomes less responsive as a result. To prevent this, let's move the > > allocation outside of the write lock. > > I suspect concurrent allocators make things better wrt response, cutting > alloc latency down to 10ms for instance in your scenario. Feel free to > show figures given Tangquan's 48-hour profiling. Likely. Concurrent allocators are quite common in PFs which occur in the same PTE. whoever gets PTL sets PTE, others free the allocated pages. > > > A potential side effect could be an extra alloc_page() for the second > > thread executing binder_install_single_page() while the first thread > > has done it earlier. However, according to Tangquan's 48-hour profiling > > using monkey, the likelihood of this occurring is minimal, with a ratio > > of only 1 in 2400. Compared to the significantly costly rwsem, this is > > negligible. > > On the other hand, holding a write lock without making any VMA > > modifications appears questionable and likely incorrect. While this > > patch focuses on reducing the lock duration, future updates may aim > > to eliminate the write lock entirely. > > If spin, better not before taking a look at vm_insert_page(). I have patch 2/3 transitioning to mmap_read_lock, and per_vma_lock is currently in the testing queue. At the moment, alloc->spin is in place, but I'm not entirely convinced it's the best replacement for the write lock. Let's wait for Tangquan's test results. Patch 2 is detailed below, but it has only passed the build-test phase so far, so its result is uncertain. I'm sharing it early in case you find it interesting. And I am not convinced Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with munmap()") is a correct fix to really avoid all UAF of alloc->vma= . [PATCH] binder_alloc: Don't use mmap_write_lock for installing page Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with munmap()") uses the mmap_rwsem write lock to protect against a race condition with munmap, where the vma is detached by the write lock, but pages are zapped by the read lock. This approach is extremely expensive for the system, though perhaps less so for binder itself, as the write lock can block all other operations. As an alternative, we could hold only the read lock and re-check that the vma hasn't been detached. To protect simultaneous page installation, we could use alloc->lock instead. Signed-off-by: Barry Song --- drivers/android/binder_alloc.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.= c index f20074e23a7c..a2281dfacbbc 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -228,24 +228,17 @@ static int binder_install_single_page(struct binder_alloc *alloc, return -ESRCH; /* - * Don't allocate page in mmap_write_lock, this can block - * mmap_rwsem for a long time; Meanwhile, allocation failure - * doesn't necessarily need to return -ENOMEM, if lru_page - * has been installed, we can still return 0(success). + * Allocation failure doesn't necessarily need to return -ENOMEM, + * if lru_page has been installed, we can still return 0(success). + * So, defer the !page check until after binder_get_installed_page(= ) + * is completed. */ page =3D alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); - /* - * Protected with mmap_sem in write mode as multiple tasks - * might race to install the same page. - */ - mmap_write_lock(alloc->mm); - if (binder_get_installed_page(lru_page)) { - ret =3D 1; - goto out; - } + mmap_read_lock(alloc->mm); - if (!alloc->vma) { + /* vma might have been dropped or deattached */ + if (!alloc->vma || !find_vma(alloc->mm, addr)) { pr_err("%d: %s failed, no vma\n", alloc->pid, __func__); ret =3D -ESRCH; goto out; @@ -257,18 +250,27 @@ static int binder_install_single_page(struct binder_alloc *alloc, goto out; } + spin_lock(&alloc->lock); + if (binder_get_installed_page(lru_page)) { + spin_unlock(&alloc->lock); + ret =3D 1; + goto out; + } + ret =3D vm_insert_page(alloc->vma, addr, page); if (ret) { pr_err("%d: %s failed to insert page at offset %lx with %d\= n", alloc->pid, __func__, addr - alloc->buffer, ret); + spin_unlock(&alloc->lock); ret =3D -ENOMEM; goto out; } /* Mark page installation complete and safe to use */ binder_set_installed_page(lru_page, page); + spin_unlock(&alloc->lock); out: - mmap_write_unlock(alloc->mm); + mmap_read_unlock(alloc->mm); mmput_async(alloc->mm); if (ret && page) __free_page(page); -- 2.39.3 (Apple Git-146)