From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB3A1EB64D7 for ; Wed, 21 Jun 2023 05:56:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B6308D0003; Wed, 21 Jun 2023 01:56:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3666C8D0001; Wed, 21 Jun 2023 01:56:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22F168D0003; Wed, 21 Jun 2023 01:56:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1698A8D0001 for ; Wed, 21 Jun 2023 01:56:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CA0311A06CC for ; Wed, 21 Jun 2023 05:56:38 +0000 (UTC) X-FDA: 80925695676.16.BA59E92 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf30.hostedemail.com (Postfix) with ESMTP id 10A2980005 for ; Wed, 21 Jun 2023 05:56:36 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ApWBaV3T; spf=pass (imf30.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687326997; a=rsa-sha256; cv=none; b=B0m4KOQQUhGDfsmYPx81jzSOSNNg65HbOWiMnUcrX+B2l25/TnHQTuZRLf0dX7400hZUvI mwDPIV0pqoJGC4n9EfRZnPhe076YDyD4BfzgxhnFwsXrknBxwT0XaZzm73ETUNuuB5d4Rd +mcGjqi1gi88d/HFdOPxkgTQdfJje0I= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ApWBaV3T; spf=pass (imf30.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687326997; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dNod9SmAh5bXGJqsnotODRNDpF91TOvUX2oEAIJT7fs=; b=D0+zpDShyuNPvT9Kd0paGOT586LyxC8OL4ajUJIPruCuJBWloxNQalOS/kq0RaoyDFw8jn P/KRER6pDyVH5hsu7PRVq840kvruUNCVE+zdqJt+Ri7oys0mICR0o8ojqQX8Cj7OK0iIqJ 5PaykMtsdAgzQgEaS1usKo5Fi8KlG9Y= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C969161418; Wed, 21 Jun 2023 05:56:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24983C433C8; Wed, 21 Jun 2023 05:56:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687326995; bh=US3Xe7yz1+AVCinByPs1eOKpdoQy+FtobpB/5DLulWo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ApWBaV3TgZW9WTYCyLlgF0j+nqO+nqmitwslpm8g44yprfBubzuc/R2N7ybTRvLh/ Ns2uFpGWP95Ae6rWEKexoj9fyhiRxZA/wLLqVZFEM73cbQIxwPlddy3soGTxo8yvU+ KmhjFIivoFQ3qB5ah+k5e8Gio3hzQKnNmch9z4ONG0+pLoezU4BG4f+2eIC028XG9j utIhP2VICtz5KV8tRvjIeDWimNFy4DzoSprB2oAuTb+YWOSZbTF0QLt2bsSZrnA02M 1XRU1qkVxPK3DQN1U2NTHyiuI13Tc/yQ9JPpS4V1ZAVmfzUQehSI+M00JtumYmXzHB sFdwPG5FVkIVw== Date: Wed, 21 Jun 2023 08:55:51 +0300 From: Mike Rapoport To: Jeff Xu Cc: "Liam R. Howlett" , Peter Xu , linux-mm@kvack.org, linux-hardening@vger.kernel.org, zhangpeng.00@bytedance.com, akpm@linux-foundation.org, koct9i@gmail.com, david@redhat.com, ak@linux.intel.com, hughd@google.com, emunson@akamai.com, rppt@linux.ibm.com, aarcange@redhat.com, linux-kernel@vger.kernel.org, Lorenzo Stoakes Subject: Re: inconsistence in mprotect_fixup mlock_fixup madvise_update_vma Message-ID: <20230621055551.GE52412@kernel.org> References: <20230614011814.sz2l6z6wbaubabk2@revolver> <20230614125731.GY52412@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 10A2980005 X-Stat-Signature: bzaeb3yn4oycysznespgcg67gq4g8zp5 X-Rspam-User: X-HE-Tag: 1687326996-701334 X-HE-Meta: U2FsdGVkX1/awqkNVnBV6Xn0IvE+b5bvmfk+OPTYFRv9wop4/IE7Va+hqJl2tZjjPCBOdoc2vqLkqEojY7iqF2BgfM4IL6mDOf1wtp4f/Hji+CogzH6mxxXrCkSFwpIDOccYYKYFGH56ewnwp2ULPWW5zjzEM8AQENH040fVNwYN4XAjk/dwFq/B8QTedEuNPYx2ASqaA3RKPuuN8WKxiFyaD1ty8OfSVyH4dLNFEhcaM3DJOj03HFQK2B1N9fBDTYLp/gAlBFCFYj8yRNHQEmCUKjkmCvSs3QGRgOoMAW761YKl9h+/7oJ8aHHltoXi3BqJMCzn+MWDM3bgdBgOP3lN6VFiWfGlblHkw2e3Z0/ZBQPWeSqeAp2fTaJ3ItEyN8FnmOvlWUh9LGgyAa4XOetWOEruIA+qzoQnBlq9GGWinnwwNQcquXSF/+XUxz9WKZoCcDws6imhOkKUEIHvPKUgDlXCboVt3cRp8LdbzQQup7Rag9odx9KCaArQp4+SEISLQBGS/FKlKhb59csqZaYbH1wnZUmy5GxJg/ZNQDFySy74isHYiGgXbZJsukPaKWUC4xP9teALhhHt4FwTrPg9y7ImBqg5+G/ef0WguCIDIVkPvUj/ajNlTl/xzuUZWm7/spOHJdwcmIDp12ng+0mHTkqlQSzZAc3Q7lwmnZKdFv+znTZqGcfKIYTKy2SABhKzxBelds/3Wu71pH2lD7mR7kQU+q/Mb/cWG/6f/CQlT+NvPw9HfGQ4A4Ukh2BhdtfX8m5Yphkj6sC6PIpICpO6zbMG/qyXSfFYXn03Tp+Rt80xwarGIjY2/A7fAH4DZ7OjyPKJDCYvTV3LcPtTohCxPE5AkLIdWeTqCOx0tc5hNc3oBfg8etM+wW5hZEuhEUrU1GYUPUgROsYdv6lNnAD1oUI8VCqCv6GytFaEHwY7UknTjnQ4zHQeNTkmi6yIdMyrYF1upAZNz5jBx4i 9xV021qR ZyZ23uQf9CHAhVBiGq5yp8wa/++DbR74GV62pNKQkDNvNObQBOuMW4iMXKAlZ4K9sX9i0mT1alUHmaYikjgmccTVkF3v73KjoRbZJugnXA5SbDasyZEhpO5kvqiYwpLPiF5wpVDwVom0nP/DQ87EaTok+keR6zxQCceGfl8VCh7SgUHx9kBIe5fLWVydC3sPxaET51RHVUYd9up1Xb23O4MnKMMGmWEGR7WGiOptA8h3oFtubxaPRnUOExfggjpSABveoY1DPv4ZOlt6IWmeQ1D2tdNEUvFDPZzkAOm8o+kKs0uyAgqa3jEH4+d5EaYIjspqlhEGEozdEu2cQM3wpY2Recnr0QiA38UGdXvT1xrGF5lzpXB1wC3DDES3mkUSK8iswiYUhOCuj7AN4cunL6jPnbV0y7qUYkQoyjxBYhGq+ey9J4IhLGIphj9joYXziP0ZRAynQAXvV50Eac+KPsldBt7BAPci+laxO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 20, 2023 at 03:29:34PM -0700, Jeff Xu wrote: > On Wed, Jun 14, 2023 at 5:58 AM Mike Rapoport wrote: > > > > On Tue, Jun 13, 2023 at 09:18:14PM -0400, Liam R. Howlett wrote: > > > * Jeff Xu [230613 17:29]: > > > > Hello Peter, > > > > > > > > Thanks for responding. > > > > > > > > On Tue, Jun 13, 2023 at 1:16 PM Peter Xu wrote: > > > > > > > > > > Hi, Jeff, > > > > > > > > > > On Tue, Jun 13, 2023 at 08:26:26AM -0700, Jeff Xu wrote: > > > > > > + more ppl to the list. > > > > > > > > > > > > On Mon, Jun 12, 2023 at 6:04 PM Jeff Xu wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > There seems to be inconsistency in different VMA fixup > > > > > > > implementations, for example: > > > > > > > mlock_fixup will skip VMA that is hugettlb, etc, but those checks do > > > > > > > not exist in mprotect_fixup and madvise_update_vma. Wouldn't this be a > > > > > > > problem? the merge/split skipped by mlock_fixup, might get acted on in > > > > > > > the madvice/mprotect case. > > > > > > > > > > > > > > mlock_fixup currently check for > > > > > > > if (newflags == oldflags || > > > > > > newflags == oldflags, then we don't need to do anything here, it's > > > already at the desired mlock. mprotect does this, madvise does this.. > > > probably.. it's ugly. > > > > > > > > > > (oldflags & VM_SPECIAL) || > > > > > > It's special, merging will fail always. I don't know about splitting, > > > but I guess we don't want to alter the mlock state on special mappings. > > > > > > > > > > is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) || > > > > > > > vma_is_dax(vma) || vma_is_secretmem(vma)) > > > > > > > > > > The special handling you mentioned in mlock_fixup mostly makes sense to me. > > > > > > > > > > E.g., I think we can just ignore mlock a hugetlb page if it won't be > > > > > swapped anyway. > > > > > > > > > > Do you encounter any issue with above? > > > > > > > > > > > > Should there be a common function to handle VMA merge/split ? > > > > > > > > > > IMHO vma_merge() and split_vma() are the "common functions". Copy Lorenzo > > > > > as I think he has plan to look into the interface to make it even easier to > > > > > use. > > > > > > > > > The mprotect_fixup doesn't have the same check as mlock_fixup. When > > > > userspace calls mlock(), two VMAs might not merge or split because of > > > > vma_is_secretmem check, However, when user space calls mprotect() with > > > > the same address range, it will merge/split. If mlock() is doing the > > > > right thing to merge/split the VMAs, then mprotect() is not ? > > > > > > It looks like secretmem is mlock'ed to begin with so they don't want it > > > to be touched. So, I think they will be treated differently and I think > > > it is correct. > > > > Right, they don't :) > > > > secretmem VMAs are always mlocked, they cannot be munlocked and there is no > > point trying to mlock them again. > > > > The mprotect for secretmem is Ok though, so e.g. if we (unlikely) have two > > adjacent secretmem VMAs in a range passed to mprotect, it's fine to merge > > them. > > > > I m thinking/brainstorming below, assuming: > Address range 1: 0x5000 to 0x6000 (regular mmap) > Address range 2: 0x6000 to 0x7000 (allocated to secretmem) > Address range 3: 0x7000 to 0x8000 (regular mmap) > > User space call: mlock(0x5000,0x3000) > range 1 and 2 won't merge. > range 2 and 3 could merge, when mlock_fixup checks current vma > (range 3), it is not secretmem, so it will merge with prev vma. But 2 and 3 have different vm_file, they won't merge. > user space call: mprotect(0x5000,0x3000) > range 1 2 3 could merge, all three can have the same flags. > Note: vma_is_secretmem() isn't checked in mprotect_fixup, same for > vma_is_dax and get_gate_vma, those doesn't have included in > vma->vm_flags > > Once 1 and 2 are merged, maybe user space is able to use > munlock(0x5000,0x3000) > to unlock range 1 to 3, this will include 2, right ? (haven't used the > code to prove it) But 1 and 2 won't merge because their vm_file's are different. > I'm using secretmem as an example here, having 3 different _fixup > implementations seems to be error prone to me. The actual decision whether to merge VMAs is taken in vma_merge rather than by the _fixup functions. So while the checks around vma_merge might be different in these functions, it does not mean it's possible to wrongly merge VMA unless there is a bug in vma_merge. So in the end it boils down to a single core implementation, don't you agree? > Thanks > -Jeff -- Sincerely yours, Mike.