From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CD5CC4167B for ; Wed, 1 Nov 2023 13:56:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 227978D0064; Wed, 1 Nov 2023 09:56:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D8608D0001; Wed, 1 Nov 2023 09:56:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 09FB38D0064; Wed, 1 Nov 2023 09:56:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id ED0538D0001 for ; Wed, 1 Nov 2023 09:56:39 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C9B7540106 for ; Wed, 1 Nov 2023 13:56:39 +0000 (UTC) X-FDA: 81409535718.24.F450959 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf29.hostedemail.com (Postfix) with ESMTP id 8C418120011 for ; Wed, 1 Nov 2023 13:56:37 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698846998; a=rsa-sha256; cv=none; b=6Hb79ZyYCavzKiPTgjkv/pLSP2swwHWZCfTtMqomboqJTz6FQ887whLIhQbZ8LQs2rIjev 5r92UGjsnXJ0H9TXY5xo4ZWjLwo6SBgDoTm8ppzNrrSju1lC8yIBb83I5LRvCgB7jAS+Ce WT58vyNAfeX0AKtcsXznaq0jOfAZGfc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698846998; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5owW9J9iu4qcA1iPIzCwCJhaOJrgXERv7uFxRS8u1r4=; b=TyEFnFUJfC7flfkXmUlioeb3spEKDH3lVGwH631TokVM6hUoiKa0XtFoG23xcHpJDdKOY+ asBpzye0OGE3Slt/a9lYxwLWECgLnqGUjZFkO7CyV+NnYzhrnCZhw//1ShuOqo9EH1l2PN AA9xgBbJ3NH3ikx5LJ0dy8YRTI9PhFE= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1C1622F4; Wed, 1 Nov 2023 06:57:18 -0700 (PDT) Received: from [10.1.34.131] (XHFQ2J9959.cambridge.arm.com [10.1.34.131]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EF9EA3F64C; Wed, 1 Nov 2023 06:56:33 -0700 (PDT) Message-ID: Date: Wed, 1 Nov 2023 13:56:32 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 5/9] mm: thp: Extend THP to allocate anonymous large folios Content-Language: en-GB To: John Hubbard , Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230929114421.3761121-1-ryan.roberts@arm.com> <20230929114421.3761121-6-ryan.roberts@arm.com> <8a72da61-b2ef-48ad-ae59-0bae7ac2ce10@nvidia.com> <5993c198-0d27-46c3-b757-3a02c2aacfc9@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8C418120011 X-Stat-Signature: 4k9ygfjq4hekxonuizh4euwt5ba6cqqo X-Rspam-User: X-HE-Tag: 1698846997-385300 X-HE-Meta: U2FsdGVkX1+8xUEr/P3dkQMpERzYDo1hBa8wc5aXoo/f6imZUjltX9r4zKOLl9aH59OcRa0xw2DeWMYO1DgLcfBtS9dUxmpJUbb5nyceiqxp661lib9PiGS4JHuU3aZg/aEHOEbvo2ckmsjuWplD9ATYQ4VEO1sRkeZGDU322+bAn1RbGgfIwr0WUmiGZ82N+94aXQI0EKEtztz6WhHO5yItl7CBFYFepwQrQQWLYQa185qebu1sH+BQ/ARRaL7ptXmfCtachY3l29PaDgaPbpQZAXRaZWW/wZQe3WJymVKDHdmknhXmeVt6283fmgzfNaAKhgFtUxHLwPpOAURitshogvBP6PTXj27z8xQqMcsVnh3hCW7HcWRpIhywBKNJTbGHos/9CV0mlkGax4VW6s28vWtjXGSkPGq2YSSzEevQWLF2ChHlZJK3OZ3r4VNTLRcsV3EsEScVdllrxm0YIxuOy0Hp7gmrSiBl7bfk6i7yaBolACAJvv4P7gFQJHxzzf5tiNifCxREcMH2iRHd6UdDMhh7r49df9i+PxShCsbQ2g8C2xTSvl3H18C1cWNZOVJeUDOM04YNnYPHPxFKikYxDJq9rdBF+B65YMhlL2JurnmHKKmX3He2TFkuFbpFro2mnxpMVlONfRTmkSrHVi9trw3CWjMJtUhsTd18AfjM3R5t2I9BVsXJ7XSj3m5pdFJtb2ZguG4kxd0Ri0YorlCFrBpJMWpAJOGyUjq8gvCRkdnEzPueeVHLUk1FWCLJnG7leM7/qMwWK+8M2NXD5TRr8DICpn1FAaKz+k16+8BhLngwneKiIVMo6lmTQd48cPWUKS9PxZCQ2HAUyjKfDSTNbZW3Ql1O8h1MYhw2UbSl2QHf4wut40VarLSmYbfQMkl1bBUMgGb3h5oANoiYgNn8qmRbRZFDp3xUPTG5nd+WhsQqBQZmLzDaqBpFaDgFm+7XuzX/0kpHKiDm4Gr VMalwAOL QHfSU4+SBFpnMPS7+HbKEHt4OGeGuy/R5q49/YQLbv2uyAwVR/3Ok4ZL6YYAZN1HL3yfTqOGibMk88TddY7XUPVTP1ovGTVkke7wdzZjpALj2/RbSFlfjGGI6VLCkBHBiBDvgByy1GgpDIGClcIdyW/Hc80loRXBLlkUOvUg9xEjTdphXsFl9jY7NDwO/ilLZ6XYc14w4kR045w/H9pObLRUAxXIBiGEN1j4+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 30/10/2023 23:25, John Hubbard wrote: > On 10/30/23 04:43, Ryan Roberts wrote: >> On 28/10/2023 00:04, John Hubbard wrote: >>> On 9/29/23 04:44, Ryan Roberts wrote: > ... >>>>    +static bool vmf_pte_range_changed(struct vm_fault *vmf, int nr_pages) >>>> +{ >>>> +    int i; >>>> + >>>> +    if (nr_pages == 1) >>>> +        return vmf_pte_changed(vmf); >>>> + >>>> +    for (i = 0; i < nr_pages; i++) { >>>> +        if (!pte_none(ptep_get_lockless(vmf->pte + i))) >>>> +            return true; >>> >>> This seems like something different than the function name implies. >>> It's really confusing: for a single page case, return true if the >>> pte in the page tables has changed, yes that is very clear. >>> >>> But then for multiple page cases, which is really the main >>> focus here--for that, claim that the range has changed if any >>> pte is present (!pte_none). Can you please help me understand >>> what this means? >> >> Yes I understand your confusion. Although I'm confident that the code is >> correct, its a bad name - I'll make the excuse that this has evolved through >> rebasing to cope with additions to UFFD. Perhaps something like >> vmf_is_large_folio_suitable() is a better name. >> >> It used to be that we would only take the do_anonymous_page() path if the pte >> was none; i.e. this is the first time we are faulting on an address covered by >> an anon VMA and we need to allocate some memory. But more recently we also end >> up here if the pte is a uffd_wp marker. So for a single pte, instead of checking >> none, we can check if the pte has changed from our original check (where we >> determined it was a uffd_wp marker or none). But for multiple ptes, we don't >> have storage to store all the original ptes from the first check. >> >> Fortunately, if uffd is in use for a vma, then we don't want to use a large >> folio anyway (this would break uffd semantics because we would no longer get a >> fault for every page). So we only care about the "same but not none" case for >> nr_pages=1. >> >> Would changing the name to vmf_is_large_folio_suitable() help here? > > Yes it would! And adding in a sentence or two from above about the uffd, as > a function-level comment might be just the right of demystification for > the code. Actually I don't think the name I proposed it quite right either - this gets called for small folios too. I think its cleaner to change the name to vmf_pte_range_none() and strip out the nr_pages==1 case. The checking-for-none part is required by alloc_anon_folio() and needs to be safe without holding the PTL. vmf_pte_changed() is not safe in without the lock. So I've just hoisted the nr_pages==1 case directly into do_anonymous_page(). Shout if you think we can do better: diff --git a/mm/memory.c b/mm/memory.c index 569c828b1cdc..b48e4de1bf20 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4117,19 +4117,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) return ret; } -static bool vmf_pte_range_changed(struct vm_fault *vmf, int nr_pages) +static bool pte_range_none(pte_t *pte, int nr_pages) { int i; - if (nr_pages == 1) - return vmf_pte_changed(vmf); - for (i = 0; i < nr_pages; i++) { - if (!pte_none(ptep_get_lockless(vmf->pte + i))) - return true; + if (!pte_none(ptep_get_lockless(pte + i))) + return false; } - return false; + return true; } #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -4170,7 +4167,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) while (orders) { addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); vmf->pte = pte + pte_index(addr); - if (!vmf_pte_range_changed(vmf, 1 << order)) + if (pte_range_none(vmf->pte, 1 << order)) break; order = next_order(&orders, order); } @@ -4280,7 +4277,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); if (!vmf->pte) goto release; - if (vmf_pte_range_changed(vmf, nr_pages)) { + if ((nr_pages == 1 && vmf_pte_changed(vmf)) || + (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages))) { for (i = 0; i < nr_pages; i++) update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pte + i); goto release;