From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08DB6C0015E for ; Thu, 20 Jul 2023 17:32:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 040C128014A; Thu, 20 Jul 2023 13:32:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F33D728004C; Thu, 20 Jul 2023 13:32:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFAE728014A; Thu, 20 Jul 2023 13:32:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D0B7A28004C for ; Thu, 20 Jul 2023 13:32:21 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 578CA1A014E for ; Thu, 20 Jul 2023 17:32:21 +0000 (UTC) X-FDA: 81032684082.02.9AB96D9 Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by imf28.hostedemail.com (Postfix) with ESMTP id 6A396C0011 for ; Thu, 20 Jul 2023 17:32:19 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SNcnLekr; spf=pass (imf28.hostedemail.com: domain of surenb@google.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689874339; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i7AsfnIKmvyvCsY6eDxujX+cY5xcSctl9uuvFHitDZk=; b=HhxyD0DqpRRRBvEB48koq6NWRSLVyjIo+ZEN12CLxUfq7N9o+HsoX1CnDX5YMZFbKgK69c bi+/fdP+P50EGFvqTRFS9+CWyD+93ICi73nmp84U5ahxwYkbOmoHKnCMP8quZUPs5k2X7i 0NlYMuoNhTCnWfDCFXKfSWop7FfKuqM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689874339; a=rsa-sha256; cv=none; b=WvZbWo2eVjtSvYpfEt8wWsuENlwlgekV36mW8jif3/bWeScX1c6NSDw7sXL+y3TUWnskIH Hhahdv/0zgwN0klKif+BudEz8p+qhwlqvr9hwSBFT9CGw1/yyqtiOZQVizir6S2N1GXqQr E1cIn0NSlfpz5G0l9orwXi7i3wDrJKM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SNcnLekr; spf=pass (imf28.hostedemail.com: domain of surenb@google.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f177.google.com with SMTP id 3f1490d57ef6-cefa8fe4cc2so965896276.1 for ; Thu, 20 Jul 2023 10:32:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689874338; x=1690479138; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=i7AsfnIKmvyvCsY6eDxujX+cY5xcSctl9uuvFHitDZk=; b=SNcnLekrZaJog6vk/L9FFl7FENC68LdFzyguxofqO5rjDrFBU1VyfCTUCGu5pl40YD x1CoIbXBVOvEaRxzsaebCQy8Wh459IA4W4f2QrbvLJeyRGhOtxZHIlUe28Vvs95ELCUB 9JLC7/XkBK8T1IsdCbsmIqoX3HYjJd/vTctzd1AWOiaF8ChN7x+u/HjcOaH+JqdUq0Cd eKML85R4ZCOO36K/1Ms1hwC6yT2N6ZzKzGYuDNnaHZ82GF87Oqv9TftafmpTKiMQQJQ8 dCd/Nh1LCQLZdbQjpC+9EJ3Qni/xFEwPRwW0+u8zo1QwxV5TnDqwiY6I+lNZy2wjMQZB OElg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689874338; x=1690479138; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=i7AsfnIKmvyvCsY6eDxujX+cY5xcSctl9uuvFHitDZk=; b=YjBPPDRJqooR3E6oV0aqCADUTjS9zKgKJQzsM6f9cvY7L4OZVOROMi1Zs4sjCBgMbc 93SRPHFz46BtVTi/MIBwxw1FWCu7oR0evAHPLgDJPIdQsdq1Cmu9R6TyLwNDYds85L8v sPmF8vDUXZWj6inEfCAlVa6alGSWCp6EOqTr/SHuS3UrJQ2Uq7zZS12ifLKsb6hpNpgw J765OUpGYHA++WKvGppbGDs6Wxl/aQ+aCfI5UgsBnhyGqT4MIJIoOdp1Ff/GT2VCoOxw wm2lKnF+CUkTpZO1y24JE9ypCIEBNCghp+SHDFuQ6MpT4g7zdXavrXhoM8HWtqgtdzZb pu5w== X-Gm-Message-State: ABy/qLZU9ldgzSDNm+9m1JxgaJIhzZNp0tJn02SDvwp3Rsdy9BBuaxDF hRSkJUnamjak3cgRWRq6uJftUg1lMDMU17+FInTtWQ== X-Google-Smtp-Source: APBJJlEFop1syaIXXs0KPjhh/vJPKg2FJq6CjVuGelPVbfq1a04Opzv5Nq7Gm+9/RLEq0m7QQ7FqN73if5Ra1FCwTY8= X-Received: by 2002:a25:8046:0:b0:c18:4f16:aaf6 with SMTP id a6-20020a258046000000b00c184f16aaf6mr5473058ybn.58.1689874338052; Thu, 20 Jul 2023 10:32:18 -0700 (PDT) MIME-Version: 1.0 References: <20230720013249.199981-1-jannh@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 20 Jul 2023 10:32:04 -0700 Message-ID: Subject: Re: [PATCH] mm: Don't drop VMA locks in mm_drop_all_locks() To: Jann Horn Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: trnafjmq3hdw7eujwiohwnqhbu1bsm13 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 6A396C0011 X-Rspam-User: X-HE-Tag: 1689874339-457411 X-HE-Meta: U2FsdGVkX1/qdAu0jDy8VLwiE/hGB/ZOw8l3E3Sk6yDQr2MB0omdleuhnkKCl3gMVtq6oeqpS4w6+7uyBSAqtl1GKnGIP8nAaTkAnG6TG5QZt+NOOti361tcnb5nrjec+afjsmgnawx7yDXWXgrJBbBH05ReVwm845Isw2W0NOda4+vMXyhPz/Hw1Y7TKqvWActJkfDYOBVuy6XNWIS7afM62x+uZS1qkuK/hDWDkQkJbDMUg6pd5E0uAQoyoSn7ehezQsT0I1p/EW838Y57ir1zO0WdlAShupnT0JpchljX4ZSeQU19miOHnsqKOVcVCLX62tNS7LTB3AB4Zgwsx/x5m2K4TeMzAZQubX9R1J7aaBtwGQInwNeFcEpFVcK2HB9zPS4HmKyu8VXWECSD5/867X1Yiz+SDt5zoNSPc4sPDMIoI6EYve3dF3fadEhvoh0x5sB62mE3x3XQd32Zk8EQjQaVzA9RyKRUWyLcRFJxXFKhwToCOb6r6iSX5XjH7XiII66RktCfG+cgtRtXZkbVEmYX3I7YpL2zKvfcwI/CAkI+USYMq04vz1p/6ftksi6fqiIpv13OR1s/rvO9myvuYxHatt8xkS8IRTWQwVxWgkkTUrOv0Cp0Yt1/6Et+zoZML5PRo1h4Oyumg6p04r+yfIOPcs5O72zWq7PDa0WyC85oiLVnKmGgsZ+EaAmKsLzK2rBdiaHt+88jRI808pz+m56LgaPsud1QfnyJfV31yynVGAzzvi9i2wt2sOuXDncpuuXAE2RGrGk1aYSkbwY4hdTz4z3GJ1nt/BZqbo48l07pf1kQ1gQtvnaJKS1pNhMcPyB4I8VYEDHplgmlJRn0er4nUV2qlYGSZ+kNQpYQkbijqlbQ9p1v02IpcGcRcK8PTun5HLJhe7QDHu4SHUODIh0vOuRjpcwNAZWBY9/dpcSDt2N1N+ujFy4ZCOKFR11oLhI0gBMU62SMj94 qDbXc1Xf tmu0omCYfznyA4cj2fMZ9ROyJGRH3kGgM9TkUZUYH6ZFrWjefCz2pGX2AVKaraHsur13xNT22ywO/TpYUP6hGyK405oxtF/DOMU1On1LXZNAtJ1G1GOt5nobO6BSxZoRGF+XgT6t0WXBjErIpgqEfaMhha9wMbo/wuy4mih7srFVO/QY4xPPfdDWbGZJTsqZQN1I5VffIURrbPf+UUSUUqr+K0la+UVxQH5YJJZr+TVsZaJnwtEpJTisNHm/hFytd6ET8xw1+QTXYuzQYbVD3o7bpHtjvPJAaQ1id9e/E/RZY5Ag= X-Bogosity: Ham, tests=bogofilter, spamicity=0.079376, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 10:14=E2=80=AFAM Jann Horn wrote= : > > On Thu, Jul 20, 2023 at 6:52=E2=80=AFPM Suren Baghdasaryan wrote: > > On Wed, Jul 19, 2023 at 6:33=E2=80=AFPM Jann Horn wr= ote: > > > > > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the = mmap > > > lock is held write-locked by the caller, and the caller is responsibl= e for > > > dropping the mmap lock at a later point (which will also release the = VMA > > > locks). > > > Calling vma_end_write_all() here is dangerous because the caller migh= t have > > > write-locked a VMA with the expectation that it will stay write-locke= d > > > until the mmap_lock is released, as usual. > > > > > > This _almost_ becomes a problem in the following scenario: > > > > > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other= . > > > Userspace calls munmap() on a range starting at the start address of = A and > > > ending in the middle of B. > > > > > > Hypothetical call graph with additional notes in brackets: > > > > > > do_vmi_align_munmap > > > [begin first for_each_vma_range loop] > > > vma_start_write [on VMA A] > > > vma_mark_detached [on VMA A] > > > __split_vma [on VMA B] > > > sgx_vma_open [=3D=3D new->vm_ops->open] > > > sgx_encl_mm_add > > > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > > > mm_take_all_locks > > > mm_drop_all_locks > > > vma_end_write_all [drops VMA lock taken on VMA A before] > > > vma_start_write [on VMA B] > > > vma_mark_detached [on VMA B] > > > [end first for_each_vma_range loop] > > > vma_iter_clear_gfp [removes VMAs from maple tree] > > > mmap_write_downgrade > > > unmap_region > > > mmap_read_unlock > > > > > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it = still > > > holds a VMA write lock on VMA A, the VMA write lock has actually been > > > invalidated inside __split_vma(). > > > > > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > > > actually happen here, as far as I understand, because we are duplicat= ing an > > > existing SGX VMA, but sgx_encl_mm_add() only calls > > > __mmu_notifier_register() for the first SGX VMA created in a given pr= ocess. > > > So this could only happen in fork(), not on munmap(). > > > But in my view it is just pure luck that this can't happen. > > > > > > Also, we wouldn't actually have any bad consequences from this in > > > do_vmi_align_munmap(), because by the time the bug drops the lock on = VMA A, > > > we've already marked VMA A as detached, which makes it completely > > > ineligible for any VMA-locked page faults. > > > But again, that's just pure luck. > > > > > > So remove the vma_end_write_all(), so that VMA write locks are only e= ver > > > released on mmap_write_unlock() or mmap_write_downgrade(). > > > > Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, > > even the ones which were locked before mm_take_all_locks() seems > > dangerous. > > One concern I have is that mm_take_all_locks() and mm_drop_all_locks() > > become asymmetric with this change: mm_take_all_locks() locks all VMAs > > but mm_drop_all_locks() does not release them. I think there should be > > an additional comment explaining this asymmetry. > > Another side-effect which would be nice to document in a comment is > > that when mm_take_all_locks() fails after it locked the VMAs, those > > VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. > > This happens because of failure mm_take_all_locks() jumps to perform > > mm_drop_all_locks() and this will not unlock already locked VMAs. > > Other than that LGTM. Thanks! > > But this is not specific to mm_drop_all_locks() at all, right? It's just > fundamentally how per-VMA locks are used everywhere. Somewhere deep > down in some call path, while the mmap lock is held in write mode, a > VMA is marked as being written to, and then this marking persists > until the mmap lock is dropped. Yes, but for all other locks mm_take_all_locks()/mm_drop_all_locks() is perfectly symmetrical AFAIKT. I see pretty descriptive comment for mm_take_all_locks() explaining locking rules, so mentioning this asymmetry in there seems appropriate to me. > > If we want to clarify this, I guess some comments on > vma_end_write_all() and vma_start_write() might help, but I think > that's independent of this patch. Yes and no. This patch changes vma_end_write_all() to be called exclusively from mmap_write_unlock()/mmap_write_downgrade(), so it introduces the notion that once VMA is write-locked it's never released until mmap_lock is released/downgraded. So, I agree that a comment for vma_start_write() is appropriate but I also think it does belong in this patch. Thanks, Suren.