From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A692C4167B for ; Mon, 27 Nov 2023 09:35:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09AB26B032C; Mon, 27 Nov 2023 04:35:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04A746B032D; Mon, 27 Nov 2023 04:35:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7CCB6B032E; Mon, 27 Nov 2023 04:35:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D95E76B032C for ; Mon, 27 Nov 2023 04:35:56 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B6CA6401A0 for ; Mon, 27 Nov 2023 09:35:56 +0000 (UTC) X-FDA: 81503227512.04.E261755 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf07.hostedemail.com (Postfix) with ESMTP id E911B4000A for ; Mon, 27 Nov 2023 09:35:54 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf07.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701077755; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0Vjnc8pYvaxyXXHN12ZM9ZJo8FRfoNRqnV0q4pQkEGw=; b=6Nu4M42oewWziiakEt5OudlXrC2gZYGSDYwj0icJiAbNJdwk18rTskukZYS3Mrd0f1VGgl 1pXTa1D+7lSIDZsqDTXL+aYq6FmLObsUup8/CBDMbUUC0Vv/G65cE/qDlIdS1oal8vkGkl /FxXkfa4PYXg7jWo5ZzgO5QB+KUmLFo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf07.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701077755; a=rsa-sha256; cv=none; b=7GcB8LnYxKwMGzF9RFAA+2EZME9QcaqDJXjUXR5Uw6BLF/mytiL+chWw0+IYasNrA59/KE F+AvMQqGBtpJCtDafGUmqdwHLU8IVJV1aS85jYho0as3uYrF9WoKxhJDkQIspu48QetN8a POeKh59w3lSBS1ardN4TSRx3aWuCM34= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 954202F4; Mon, 27 Nov 2023 01:36:41 -0800 (PST) Received: from [10.57.73.191] (unknown [10.57.73.191]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A92323F6C4; Mon, 27 Nov 2023 01:35:50 -0800 (PST) Message-ID: Date: Mon, 27 Nov 2023 09:35:49 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork() Content-Language: en-GB To: Barry Song <21cnbao@gmail.com>, david@redhat.com Cc: akpm@linux-foundation.org, andreyknvl@gmail.com, anshuman.khandual@arm.com, ardb@kernel.org, catalin.marinas@arm.com, dvyukov@google.com, glider@google.com, james.morse@arm.com, jhubbard@nvidia.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mark.rutland@arm.com, maz@kernel.org, oliver.upton@linux.dev, ryabinin.a.a@gmail.com, suzuki.poulose@arm.com, vincenzo.frascino@arm.com, wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org, yuzenghui@huawei.com, yuzhao@google.com, ziy@nvidia.com References: <271f1e98-6217-4b40-bae0-0ac9fe5851cb@redhat.com> <20231127084217.13110-1-v-songbaohua@oppo.com> From: Ryan Roberts In-Reply-To: <20231127084217.13110-1-v-songbaohua@oppo.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E911B4000A X-Stat-Signature: i6nykpno6mf7djwzreo9he69ft38nwbe X-HE-Tag: 1701077754-694006 X-HE-Meta: U2FsdGVkX1+fT+1sy4K/gQKwEX2R+ccj1DryUlzOdUxrOuT/91dTqDjQxcSYGq9S/bykN01F4yMBu4FQeKQKZQK+HvfCj1lLs4n46NOT3OYEtUL9dhil3Jqv0Wnxgp8UdreoO+w3VEmpIoVNpHkkBGxpKSWcVmNrNgkz1lK3ofvGyapGwfBk5RVV/sOb42UHLmD6Jv/Ha/DLhsQLXb2RKCH88vKl0+fozHzTwrnM4V27pRPlMIyYt6wKAETJmZa36snW9o65q/CCxJ9W6vGa6gPttFNSu3qkXWwoShB7a4YMEjTuW8Z4sPX/eJljUUSpRAzasOAvT4Ac3FB5bBIOLnCf1JrWFphzThgOTrR10k73pSyp4LW+0rk+zOyKcLBYCWmoFLwYNxB6K6wstB8LnX+rtK/bSFtaUdd5EpeFvRNph0Fra2+u3feq0OSv858exSHRh73oGhmjqDSHmB3MmUsIQSndxB9aJuueTqOoNFULOZOsZDdSrPlC90xQg9+NqzlYvkMM0bnVzZ5HdvUkXiiZM9JfQMl4ipsO18NHNykfF+jKXvjV8H7+psfk5Uuen9dca4YcaRYL+9iD0iBOxow0A7wHW01NtiIg3g15aMB9FDYke3YmFLA8N/qy4yz04iteXrNumpAUu1sGOTovtOW0uv/jE+FtDkvQmctJUeYgiWvTqPLHcdoZOJNcPJiu4mkZZ8wzV9TBQcmNM3mStvV6UIwHtTzs+EF648UK0SvtR0iwU0Wb2ZNvjmUXlehAfoyljtBRqNNdRnNYO+8TWpBKnJXYLkiBMbwJ52svqbJtjKHuXNTh9lOhfXvJ2vTgYfSWDaNI7YnSxm2wyuxCheIKERWdlb8WNuEsRTY5SwrPclkXm5HYygR7EuFSw3F/VP32uS5CW4eKUYPRm9wwX2gGbJo2IbDm+JAkFDJLLXBdOlZ0ACVYxZXPfiuIiGp191Ebz63VoB0ErPrWLh0 uMc3dhyn RHGmjAJs9Y58+YttJyXS1lYh9fBv5tuAYI85E33HgXDiblsqO3pPKbLf/zT00QDFGrddxbiHizXgTN+WsVM1LgvrsZiZ5SLxLgdWzRFsOCRxjBHnN+LIKnzco/6vSQBXQ+GcLaOj3gRz/2U3++GWtY/kmciGpBa2RuJubVnUDFj6naS9BAhnIM/WGXnejIqJU0wr/JsoABZtaFjCn2ZcqJQb5m980L0b+846o X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 27/11/2023 08:42, Barry Song wrote: >>> + for (i = 0; i < nr; i++, page++) { >>> + if (anon) { >>> + /* >>> + * If this page may have been pinned by the >>> + * parent process, copy the page immediately for >>> + * the child so that we'll always guarantee the >>> + * pinned page won't be randomly replaced in the >>> + * future. >>> + */ >>> + if (unlikely(page_try_dup_anon_rmap( >>> + page, false, src_vma))) { >>> + if (i != 0) >>> + break; >>> + /* Page may be pinned, we have to copy. */ >>> + return copy_present_page( >>> + dst_vma, src_vma, dst_pte, >>> + src_pte, addr, rss, prealloc, >>> + page); >>> + } >>> + rss[MM_ANONPAGES]++; >>> + VM_BUG_ON(PageAnonExclusive(page)); >>> + } else { >>> + page_dup_file_rmap(page, false); >>> + rss[mm_counter_file(page)]++; >>> + } >>> } >>> - rss[MM_ANONPAGES]++; >>> - } else if (page) { >>> - folio_get(folio); >>> - page_dup_file_rmap(page, false); >>> - rss[mm_counter_file(page)]++; >>> + >>> + nr = i; >>> + folio_ref_add(folio, nr); >> >> You're changing the order of mapcount vs. refcount increment. Don't. >> Make sure your refcount >= mapcount. >> >> You can do that easily by doing the folio_ref_add(folio, nr) first and >> then decrementing in case of error accordingly. Errors due to pinned >> pages are the corner case. >> >> I'll note that it will make a lot of sense to have batch variants of >> page_try_dup_anon_rmap() and page_dup_file_rmap(). >> > > i still don't understand why it is not a entire map+1, but an increment > in each basepage. Because we are PTE-mapping the folio, we have to account each individual page. If we accounted the entire folio, where would we unaccount it? Each page can be unmapped individually (e.g. munmap() part of the folio) so need to account each page. When PMD mapping, the whole thing is either mapped or unmapped, and its atomic, so we can account the entire thing. > > as long as it is a CONTPTE large folio, there is no much difference with > PMD-mapped large folio. it has all the chance to be DoubleMap and need > split. > > When A and B share a CONTPTE large folio, we do madvise(DONTNEED) or any > similar things on a part of the large folio in process A, > > this large folio will have partially mapped subpage in A (all CONTPE bits > in all subpages need to be removed though we only unmap a part of the > large folioas HW requires consistent CONTPTEs); and it has entire map in > process B(all PTEs are still CONPTES in process B). > > isn't it more sensible for this large folios to have entire_map = 0(for > process B), and subpages which are still mapped in process A has map_count > =0? (start from -1). > >> Especially, the batch variant of page_try_dup_anon_rmap() would only >> check once if the folio maybe pinned, and in that case, you can simply >> drop all references again. So you either have all or no ptes to process, >> which makes that code easier. I'm afraid this doesn't make sense to me. Perhaps I've misunderstood. But fundamentally you can only use entire_mapcount if its only possible to map and unmap the whole folio atomically. >> >> But that can be added on top, and I'll happily do that. >> >> -- >> Cheers, >> >> David / dhildenb > > Thanks > Barry >