From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 096E8EB64D9 for ; Tue, 27 Jun 2023 06:28:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E0298D0002; Tue, 27 Jun 2023 02:28:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68FF58D0001; Tue, 27 Jun 2023 02:28:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52FE68D0002; Tue, 27 Jun 2023 02:28:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 40FD38D0001 for ; Tue, 27 Jun 2023 02:28:30 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0A4061A086C for ; Tue, 27 Jun 2023 06:28:30 +0000 (UTC) X-FDA: 80947548780.11.FC222DA Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf07.hostedemail.com (Postfix) with ESMTP id A22EF40003 for ; Tue, 27 Jun 2023 06:28:27 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=uGtApQVu; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=n5E2xF4m; dmarc=none; spf=pass (imf07.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687847308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z6YA6fL2joJ0ClnNKexgdl9gAy80OqHyknDXy5To9p4=; b=Vm79WSKXV3s/bOrn6J39QbX6Vz30Jti0zG5HC1gYIgLApbz0l8KgCT0HcYeKJQ6HpCfkKC W1rHuRFa+kwb9OiVwQohXalV0f/AuQtENxBVuSoG0bsUe+n/1oNM35a2pI59cxIx6G07hk /Ie0wP1GFJ9IZeIv1xaEMvVVOMTC+F4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=uGtApQVu; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=n5E2xF4m; dmarc=none; spf=pass (imf07.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687847308; a=rsa-sha256; cv=none; b=O1/9N8rcTNasHyXl9BWp3/J1USd/isuMKFukgsf+QPiVXnzmJpJ461nVr/lJluzsvG4H6/ BKEN0g7RcGvItEuEHH6l91lOmGL0CggNKRDjZWTdBJW2rEQwckpkaeh8ZKZ6GEQsdBJQ1N JtQhpvoT/i2JbnxINUjaeUG7b3zdls8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B80F31F8B6; Tue, 27 Jun 2023 06:28:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1687847305; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z6YA6fL2joJ0ClnNKexgdl9gAy80OqHyknDXy5To9p4=; b=uGtApQVu0dfKXKjxe0gwObmVprQXHVApCJ4elryKMNlC5ZUHh3Abo1t4vi+56HdQPFagBd VrvI3lbXk3n3Dqe380ebtgBlFUYQoXNPP5HStBM2saLhf50XXvTR8Mu4DABfBpOpegqIEu KgsE4a7BsgTAqK5VoPN/J57jcHiSrWs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1687847305; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z6YA6fL2joJ0ClnNKexgdl9gAy80OqHyknDXy5To9p4=; b=n5E2xF4mbQsbDk4XDfxofqz8E4o1JAU6hpvQ21+R7lFWd5EQtlQ04QXLDEkBeZ9wM8rGxq duTNEpqum5uS9RCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9A85C13276; Tue, 27 Jun 2023 06:28:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id LwIQJYmBmmSPfQAAMHmgww (envelope-from ); Tue, 27 Jun 2023 06:28:25 +0000 Message-ID: <074fc253-beb4-f7be-14a1-ee5f4745c15b@suse.cz> Date: Tue, 27 Jun 2023 08:28:25 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH] mm/mprotect: allow unfaulted VMAs to be unaccounted on mprotect() Content-Language: en-US To: Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mike Rapoport , David Hildenbrand , "Liam R . Howlett" References: <20230626204612.106165-1-lstoakes@gmail.com> From: Vlastimil Babka In-Reply-To: <20230626204612.106165-1-lstoakes@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: 54ohfcntgtr96zusxa8znyjh3o5beez8 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A22EF40003 X-HE-Tag: 1687847307-414716 X-HE-Meta: U2FsdGVkX1+YVj08HeT9fJbb/uMZ9To0Lbdvs1U0dCZc3+lgX5/jDM1xe9Mt1GUhTp+tI9DODwbxvtjQSu2injYO8DeQB/29dy4XSNjk9Uy68Xb+ghCu6fjzeVPFaGQkyE3Xz7+QjTvHsVzvqpGGOhxDJuqUz6zTKFuNHv7shVmHjXUmeOjXP0+MEAjSWBaATHHzuIQXQR5Q6ZLiF2DHRhLVjE8+SYYdTU2nZcuJfOWlMbuJLo2BT79+sooOzDpf9AIJdv0gPUVK4KFmMIuPtCY/rTjXbTtKioLnUqOeBsnMGRs5aHSX9z4SZ8iG/g9KrYEFK/DMseCvA6C2SKxVJfBe9MPtRUC0I7sx8Z9fFjOzOED9MhH1tuBOG+WEIsRGPp48BMbOW/FshXy5C+KQ8eZHU7u3YSBhPRsMhWxEU2DCj1bbQbbDFWdWy4vwlB4k7Sz1j3IUXQC3Mq5wOy3S4v66o1/otRHLXp9MnZeWckDw1NG+2ATOqIPiVKww5kZ8NgOheP9htZBuIO7rXwwIqEMFUOwZZHMeDPj1ZmyxMcyIZ2ETJE/Gydjgy9vo4ABm1+CSqbp9gIpDWWo9gB6pojGPY2j7NvcnPksDr77cQtRxvzphEuXUKiAR2ma/xD9mRba94p0I3SN5c7+WHGFvx39GNbvr6Ts20sGMC7ZSNKasWDf6jSqo/KVK/A2vc9MpsKVe7+fSF14E80QbwwliCDe58WieWyFMD8cq9wudlwdiTW96F1x4kQJ3+lHaaNwaEEZtifwV8IY56G/SajX6kcTVCpYxb4sAdMcfZiXUrXlITbkLxo+J4xDbD357hPT4hvum2OP9WxGsyPNb4mRAMFFvK4gG4hG4kl2umHYnHMpUakY3w/oQkYcFwCJIRss6iuuo9IJz0JRxLMgjh32LYqEgVGZGoysv7B2eLtGUllWAstyDAoEMmsa0SmVn2zJkxbG5zbWC+6W0vpW7xGD XHLoCj5q /2r/l45HrK0ULxFfEwrvbj4mHEC4v3Uodl8SLECQwZHc2lRWbLDrd9tDa7MwmnTubHvmG+WQ2P1ISfb86LvImOXqbdlgpLVBZTh7I1btmMydAWF+cKp/IkmO8itxDaEUA0vbECHhE9M5wczk9W7ni5dtLSg+f/Fmt7ww9d8fp2w4oFsetLOF1nhjAw3Y4407bVYOd28+tZELLKoAW9y5wGdnKyG7GwDj3fxHT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/26/23 22:46, Lorenzo Stoakes wrote: > When mprotect() is used to make unwritable VMAs writable, they have the > VM_ACCOUNT flag applied and memory accounted accordingly. > > If the VMA has had no pages faulted in and is then made unwritable once > again, it will remain accounted for, despite not being capable of extending > memory usage. > > Consider:- > > ptr = mmap(NULL, page_size * 3, PROT_READ, MAP_ANON | MAP_PRIVATE, -1, 0); > mprotect(ptr + page_size, page_size, PROT_READ | PROT_WRITE); > mprotect(ptr + page_size, page_size, PROT_READ); In the original Mike's example there were actual pages populated, in that case we still won't merge the vma's, right? Guess that can't be helped. > The first mprotect() splits the range into 3 VMAs and the second fails to > merge the three as the middle VMA has VM_ACCOUNT set and the others do not, > rendering them unmergeable. > > This is unnecessary, since no pages have actually been allocated and the > middle VMA is not capable of utilising more memory, thereby introducing > unnecessary VMA fragmentation (and accounting for more memory than is > necessary). > > Since we cannot efficiently determine which pages map to an anonymous VMA, > we have to be very conservative - determining whether any pages at all have > been faulted in, by checking whether vma->anon_vma is NULL. > > We can see that the lack of anon_vma implies that no anonymous pages are > present as evidenced by vma_needs_copy() utilising this on fork to > determine whether page tables need to be copied. > > The only place where anon_vma is set NULL explicitly is on fork with > VM_WIPEONFORK set, however since this flag is intended to cause the child > process to not CoW on a given memory range, it is right to interpret this > as indicating the VMA has no faulted-in anonymous memory mapped. > > If the VMA was forked without VM_WIPEONFORK set, then anon_vma_fork() will > have ensured that a new anon_vma is assigned (and correctly related to its > parent anon_vma) should any pages be CoW-mapped. > > The overall operation is safe against races as we hold a write lock against > mm->mmap_lock. > > If we could efficiently look up the VMA's faulted-in pages then we would > unaccount all those pages not yet faulted in. However as the original > comment alludes this simply isn't currently possible, so we remain > conservative and account all pages or none at all. > > Signed-off-by: Lorenzo Stoakes So in practice programs will likely do the PROT_WRITE in order to actually populate the area, so this won't trigger as I commented above. But it can still help in some cases and is cheap to do, so: Acked-by: Vlastimil Babka > --- > mm/mprotect.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 6f658d483704..9461c936082b 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -607,8 +607,11 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, > /* > * If we make a private mapping writable we increase our commit; > * but (without finer accounting) cannot reduce our commit if we > - * make it unwritable again. hugetlb mapping were accounted for > - * even if read-only so there is no need to account for them here > + * make it unwritable again except in the anonymous case where no > + * anon_vma has yet been assigned. > + * > + * hugetlb mapping were accounted for even if read-only so there is > + * no need to account for them here. > */ > if (newflags & VM_WRITE) { > /* Check space limits when area turns into data. */ > @@ -622,6 +625,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, > return -ENOMEM; > newflags |= VM_ACCOUNT; > } > + } else if ((oldflags & VM_ACCOUNT) && vma_is_anonymous(vma) && > + !vma->anon_vma) { > + newflags &= ~VM_ACCOUNT; > } > > /* > @@ -652,6 +658,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, > } > > success: > + if ((oldflags & VM_ACCOUNT) && !(newflags & VM_ACCOUNT)) > + vm_unacct_memory(nrpages); > + > /* > * vm_flags and vm_page_prot are protected by the mmap_lock > * held in write mode.