From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8890EB64DD for ; Thu, 20 Jul 2023 17:14:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B40E280149; Thu, 20 Jul 2023 13:14:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 463F428004C; Thu, 20 Jul 2023 13:14:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32BC5280149; Thu, 20 Jul 2023 13:14:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 23FD128004C for ; Thu, 20 Jul 2023 13:14:10 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EADA7160246 for ; Thu, 20 Jul 2023 17:14:09 +0000 (UTC) X-FDA: 81032638218.22.F62EE26 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf01.hostedemail.com (Postfix) with ESMTP id 1E9A04000D for ; Thu, 20 Jul 2023 17:14:07 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=PDGGXvYQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of jannh@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689873248; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S4Y03rAkNqCt5Z1k6DcGBxBkwsMLJ+wU/AutVSSMAN8=; b=Sbv6eOrEAFelN8ewYyta9yXbo71r6P1uq9Ass69wNu6/Ir1CcTiwDppzApsbJOFWxubFrO AyfMs8qkZKSi6GxG8ylEzRoTS0JAi/+PAXCqS5/ZLvFPFYsEGnDliWNpfzbrmra/SR9Eht ukyqXoNGiovX86Z5ReRfx6jaFY/B+64= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=PDGGXvYQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of jannh@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689873248; a=rsa-sha256; cv=none; b=bPPXQqYyu+QXlXNjLuyinIgpJCGFJw/aYt/75ssBjvQUaqnmGSO53UFqUeBcFNmesW5VYO glVBeMEtMuO3Vsa86hucdkIptaM4E95psdGCXdApY2ZXE2DW0R9u7AQEaoXQiZnOcYLcHE 0SmetDs8+h7kJi6dY9kPccS5h6bBCn0= Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-3fbd200d354so3475e9.1 for ; Thu, 20 Jul 2023 10:14:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689873246; x=1690478046; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=S4Y03rAkNqCt5Z1k6DcGBxBkwsMLJ+wU/AutVSSMAN8=; b=PDGGXvYQRpL6m2UfvJe1nbfMDURy5iUxtugVbwOqWaSQPrLv1WC1AEPWglyrIz9LGw S8iv7yjmYQXA/2/TAtpofD+zjk894+pxOEZeTjFcmighBWiJ02M/gC7SR2UuJLS4p4fX KVIMDzrgQYVatzrZS3HF36WhjxFo3K47h6mJPDlE/x8SLxBByqPz1G/5r3ZTasVvNwNq UmXv77aNDNthkmigNIEGLSI3Jpd9BD6CuCA18WuyMUk1KmEPN3OzgU71rOzEBpV24dB0 lvMZjfQZHY4GMdU2kCTAwSwCIIZDm1w0uRfRJn2Fw5cu/NHZ5M9Y/pQDRm4sO1DnQuvv BZBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689873246; x=1690478046; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S4Y03rAkNqCt5Z1k6DcGBxBkwsMLJ+wU/AutVSSMAN8=; b=inbkDzB7TOzNlYJrfLmeAlomWQ3UxsdmD2P72fFIsCY2obzc7UohgieqRJPkuwOUG/ 0Xfp810oZ1VBQCrdyLSWY5DfhuN6PksHN5sIc7xnY+ynrrg95zCZXfMs94zd7dxvTskj kYGxiyVTYzfwomrmiKXxANOAEnbMC5s4GNV2JJGx3oZfuj0Xn7IOTsIr3wbPMl8WZpef fVmLky6BhG5LOZ6ctxkNYuFJaAblFdW4WtWcCGCXHrgZe7rTNQX2MDwFamyLdDd26Hmo 2SqWFh59KKiN5qauqm7oV1Hfc+wR/AkuotYT6/PLmUQkdGHkJHFAtH4k3EUJ+elHgWtP r/Cw== X-Gm-Message-State: ABy/qLZpLfTnkm9BrVX0+jJvWgiYtyyslWHo4PmPw8GNQz63wR3xcTp7 xXv7h6x18cRHwsnZcNgO+b7xLv8ltShLXocpLbfpHg== X-Google-Smtp-Source: APBJJlGIi5rUga4ez2Y2JH9AiMsdM5lCMyHAYCWLLn5IWxGgU2OiKgJJXtsJeLKXMukAdYKMWWJkotIdruJmUwPiaFQ= X-Received: by 2002:a05:600c:3516:b0:3f4:fb7:48d4 with SMTP id h22-20020a05600c351600b003f40fb748d4mr109817wmq.3.1689873246368; Thu, 20 Jul 2023 10:14:06 -0700 (PDT) MIME-Version: 1.0 References: <20230720013249.199981-1-jannh@google.com> In-Reply-To: From: Jann Horn Date: Thu, 20 Jul 2023 19:13:29 +0200 Message-ID: Subject: Re: [PATCH] mm: Don't drop VMA locks in mm_drop_all_locks() To: Suren Baghdasaryan Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1E9A04000D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jxsejkyo6f3uxn9spew8fu8976o8tt31 X-HE-Tag: 1689873247-447350 X-HE-Meta: U2FsdGVkX19kZr00MbYbBcSzW+/I4kvj9P1rYr2JWsTZRh33uuFqlyf1VdGiPsmMntqwTiGcjaLJg8cH3RFfcY1ADG37pIOH386/mtmv6N1zMtNd5Lvz3b7rTW6FCC18PTn7hY5R4RfIFjNiZD8W+sIKCqppZ7Wa27no+mYtl1G7kHPqfqC5FNQKipfeABHSNgbJWIq53j3XVbuFoIfBcTiadECIrjnL0uwOsX8VX9+zycz2DdUMEK4AfDLLAn2Y2YyePw6nzMHxw1ZfL9VKzl0Tl1MrFQxKM9Qm4RIpoE/1JhKL8y+U5kOqGGC0iy9YYzPRhi2aar5dnMcE699R8AjLVaTIvsLRp2c2TBCJ9LKm0FDrjyfzW3l+JXUKe74kRKVyVWthqbz8k6o5s7OvsZUAL+ERMk2LWsOZTQYHsJXfy1g/tHgcdQQa0yL4GdPYQhvGONrPolARN6W4VW+P/6GjhO2UvQtuAS/mTjtPjdgK6uh0jBWpzPoL2wmSJzML/DZJ5MzuEm1XEhMg9hMfsoSRuPhyQn/WTZeHeBRKD7j+LK6yFRvt1jGKfGWMxCXKV1ACNEYddQtcxesjQFvaoes3Tjynw+GQTNOA9kS2/QKJi4jjLkOdPAfwyhjZaVZtAQmFLlfY3ejDp7I8Y0dGvY9dP1EHxTUoxT2p1hoGGbyJH4zK0JppdEnuzysE3nnno0pHTGKVWo+Pn0iFZzW7XdQbHldPO6hN5IboL8aVoXJ9STxsfS834S5VV+mJQaL/V4JrW6JvjAJgqUUhZ43LGXGZNThW5b4XOJLnPqs9cGP/I+6gNG9ACa+KLLafFeF05kbMk6XNWPTHVeStn3voB9i3ckS0jzBey8uT2uv2gDUBzXlSd+HuRM9TVqtGA8PMzpfwx+jjt+LWrcAFUGARjRoehkzwJxVYcnBDQDinj0c3mqVQBlQXmNNas9o5GHI8U0l53ydRUvC7ww8zyiZ SByDieU8 5cLiThYpq/iuaiG03OwyVdQYtp+9Ot+j9bhnCp2Pdr9HEMejg/s60EVxu3VidZsdt+MF06HSDhXxw1o0iXSaIk+YyQIL9GSdLWAnBtBrlJ0itQ1AFEvnMg5AJIUr51+pdcvkExmeVWj/NAD3Vvo8pUbFVqR6+jd9pxM9UYj/Jrjit+yUjwhEt+IQf+j3y3I1UhfzQxR3dy6qoT0TIz2KxFs4eRXQR1ZzYaXKNXi8Vwna14eBoApUPoOvWTeVh7XbPXqOJSua6chu0+6X15QLH3hJVLm4wPpDnFX8nB+GECN76YEM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.449729, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 6:52=E2=80=AFPM Suren Baghdasaryan wrote: > On Wed, Jul 19, 2023 at 6:33=E2=80=AFPM Jann Horn wrot= e: > > > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mm= ap > > lock is held write-locked by the caller, and the caller is responsible = for > > dropping the mmap lock at a later point (which will also release the VM= A > > locks). > > Calling vma_end_write_all() here is dangerous because the caller might = have > > write-locked a VMA with the expectation that it will stay write-locked > > until the mmap_lock is released, as usual. > > > > This _almost_ becomes a problem in the following scenario: > > > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. > > Userspace calls munmap() on a range starting at the start address of A = and > > ending in the middle of B. > > > > Hypothetical call graph with additional notes in brackets: > > > > do_vmi_align_munmap > > [begin first for_each_vma_range loop] > > vma_start_write [on VMA A] > > vma_mark_detached [on VMA A] > > __split_vma [on VMA B] > > sgx_vma_open [=3D=3D new->vm_ops->open] > > sgx_encl_mm_add > > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > > mm_take_all_locks > > mm_drop_all_locks > > vma_end_write_all [drops VMA lock taken on VMA A before] > > vma_start_write [on VMA B] > > vma_mark_detached [on VMA B] > > [end first for_each_vma_range loop] > > vma_iter_clear_gfp [removes VMAs from maple tree] > > mmap_write_downgrade > > unmap_region > > mmap_read_unlock > > > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it st= ill > > holds a VMA write lock on VMA A, the VMA write lock has actually been > > invalidated inside __split_vma(). > > > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > > actually happen here, as far as I understand, because we are duplicatin= g an > > existing SGX VMA, but sgx_encl_mm_add() only calls > > __mmu_notifier_register() for the first SGX VMA created in a given proc= ess. > > So this could only happen in fork(), not on munmap(). > > But in my view it is just pure luck that this can't happen. > > > > Also, we wouldn't actually have any bad consequences from this in > > do_vmi_align_munmap(), because by the time the bug drops the lock on VM= A A, > > we've already marked VMA A as detached, which makes it completely > > ineligible for any VMA-locked page faults. > > But again, that's just pure luck. > > > > So remove the vma_end_write_all(), so that VMA write locks are only eve= r > > released on mmap_write_unlock() or mmap_write_downgrade(). > > Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, > even the ones which were locked before mm_take_all_locks() seems > dangerous. > One concern I have is that mm_take_all_locks() and mm_drop_all_locks() > become asymmetric with this change: mm_take_all_locks() locks all VMAs > but mm_drop_all_locks() does not release them. I think there should be > an additional comment explaining this asymmetry. > Another side-effect which would be nice to document in a comment is > that when mm_take_all_locks() fails after it locked the VMAs, those > VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. > This happens because of failure mm_take_all_locks() jumps to perform > mm_drop_all_locks() and this will not unlock already locked VMAs. > Other than that LGTM. Thanks! But this is not specific to mm_drop_all_locks() at all, right? It's just fundamentally how per-VMA locks are used everywhere. Somewhere deep down in some call path, while the mmap lock is held in write mode, a VMA is marked as being written to, and then this marking persists until the mmap lock is dropped. If we want to clarify this, I guess some comments on vma_end_write_all() and vma_start_write() might help, but I think that's independent of this patch.