From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF12EC4167B for ; Thu, 7 Dec 2023 14:45:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 085B06B007E; Thu, 7 Dec 2023 09:45:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 036776B0088; Thu, 7 Dec 2023 09:45:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3FAB6B0089; Thu, 7 Dec 2023 09:45:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D0E2E6B007E for ; Thu, 7 Dec 2023 09:45:46 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A45B01C07DD for ; Thu, 7 Dec 2023 14:45:46 +0000 (UTC) X-FDA: 81540296292.05.5CE6743 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf15.hostedemail.com (Postfix) with ESMTP id E43B7A0007 for ; Thu, 7 Dec 2023 14:45:43 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701960344; a=rsa-sha256; cv=none; b=BGWWhpyQ6DSuhevW9Ugc8bjpz/RH1w7NVZSBGoFjaseQArHsPhRIsLwIuUC9InL30pIpmf M50kPZEhaVwI4y8n2SNkqyhCZVVEyw9JA26pYkyY7+Baxdeedd6IH3G0bJXOlJ1cCb5sFq 4IcnJEn5OFPAftxnI1y31o/PyVmo0yM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701960344; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uWOQJcrASnePGzg4t2l9x+WESrxzkWP/CBSeufNvIPo=; b=3MC2M3CwxlU1Clc6b8rQYsqQKASRvBwxnvNyxWKcgD1GBVSasBrkR2sYbzNPF7hLRhy2WH OOtUUTDLjVwM3uVnm7bh4exthfUrnuAVIfdx7jyeyP+unF7sWf3IwQpf4kR0tQUb9bvDN0 W4KCf9iVaSYKKBe8Klzg5r8D+Te5rps= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9CCE92F4; Thu, 7 Dec 2023 06:46:28 -0800 (PST) Received: from [10.1.32.134] (XHFQ2J9959.cambridge.arm.com [10.1.32.134]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A89C33F6C4; Thu, 7 Dec 2023 06:45:39 -0800 (PST) Message-ID: <3d49bcbf-1f9b-48e8-a91a-ede0762b795c@arm.com> Date: Thu, 7 Dec 2023 14:45:38 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 04/10] mm: thp: Support allocation of anonymous multi-size THP Content-Language: en-GB To: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Alistair Popple Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204102027.57185-5-ryan.roberts@arm.com> <71040a8c-4ea1-4f21-8ac8-65f7c25c217e@redhat.com> <126c3b71-1acc-4851-9986-4228cb8a8660@arm.com> <94806b4f-2370-4999-9586-2c936955cb87@redhat.com> From: Ryan Roberts In-Reply-To: <94806b4f-2370-4999-9586-2c936955cb87@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E43B7A0007 X-Stat-Signature: 1rip893wb41ixe6y7tqacn4kw6pc4i31 X-HE-Tag: 1701960343-970236 X-HE-Meta: U2FsdGVkX18TFdW1rAbwNyjXCRozq+6BdLySls6tkysdcbhtvwOc+F93e4ORiQfWB4WXtHX56OoH8skroj1yyvEYHnrCO8Y5eaD9nUJCtcZV0PWJSD5QI1+5jKxny4xNa5UpFYpAC4IPZdsOYyP7MQ8NAK2tiX0YOIsHpWFb6zFgk2pxArxsZqr7GGM9G7e7wR0VCjFqML4sEUaIoIr43q58dYc6z7y98X9W2crC0t+L1WZbcoSGjdmD/ZNDnbnDHIT8f7ZwIXVrZ58IcEKE0W36m0V4STfZm5D3ei2HIA+3UNw3o9+xvLDRjHq5xFTh/O7Vvl5gsQItkuYpyCTtgs8O9NDyjll2AuFYEr/EdYEiBEX/S4KomWnVAl69CCMNvfMStd5kzszay027NEJzZtiKEcGzahmuFf8/G+VMmwjMHEDLhcCrKMDyYvYCJXUCKLXcCsrdtuOTNrZNyqXA+Pd1JqMpJAdbDy59eGDwLWs7y9IPC6GCES/kl4LkQu6nOeFRN0riiByaji+kLUBeaWXO7k9MapXpUVCNycIvtBNCh349J9jZdyYdK1KyHGWcaQBjeoWZmam5hYhNUt2oco2nbtdO7r/h3Mmzl+Pt72rtkEhnZOqaX1hbLG+ggkGvl014y6n7JfOTsZO7/r53hGtbJ5YU13/PC+o6XH6pRzW329qWAobAKkxVh1JeA7mtFogLWTexBI/79iJ4cBpTCxq0VvnJMXUVJvu9hZ4fGOmWRb96FKjDvZTsSLHni2YnSTI/A1kzKcD99SY/GrKQejHvtVMgOXrtP5oWI0otNf2vjimx74pQOkasNHzCFpSjllZNEn66MEel4jasFF7VCha8E1gV8IQj79cmjz8HZA7e7uJKJOeD/VaO//PZ6oIW82qDZwJrjQFe4usQguTlaO2HYrtKp94nfI9T/82TGG2J1nCHisWOhiioGl/Bce4Pj1tTQltbSCVJEgQezD0 yix86BJS kf3yzGxH2r7zn97yNdG11PB/fQWa2hXV6MtaDbOJXw3HwzmngaplftFNZ9dIy1LxS12cPNkX4/0Q/H57CMFVWHzSwR85Nzqgo1SAeiWNUOdXQQt6hjhhPO3v8Ip4tIWYnB2X+nzpvXqAYjsemCB56rVjELQthQcbeD2ozZ68Mg4hbuqAlqZWd8HnnwXLh8Qejfiut+SLoexXkz55reXHVvL8kSgM5xz5GLi1m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07/12/2023 13:28, David Hildenbrand wrote: >>> >>> Right, but you know from the first loop which order is applicable (and will be >>> fed to the second loop) and could just pte_unmap(pte) + tryalloc. If that fails, >>> remap and try with the next orders. >> >> You mean something like this? >> >>     pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); >>     if (!pte) >>         return ERR_PTR(-EAGAIN); >> >>     order = highest_order(orders); >>     while (orders) { >>         addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); >>         if (!pte_range_none(pte + pte_index(addr), 1 << order)) { >>             order = next_order(&orders, order); >>             continue; >>         } >> >>         pte_unmap(pte); >>         >>         folio = vma_alloc_folio(gfp, order, vma, addr, true); >>         if (folio) { >>             clear_huge_page(&folio->page, vmf->address, 1 << order); >>             return folio; >>         } >> >>         pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); >>         if (!pte) >>             return ERR_PTR(-EAGAIN); >> >>         order = next_order(&orders, order); >>     } >> >>     pte_unmap(pte); >> >> I don't really like that because if high order folio allocations fail, then you >> are calling pte_range_none() again for the next lower order; once that check has >> succeeded for an order it shouldn't be required for any lower orders. In this >> case you also have lots of pte map/unmap. > > I see what you mean. > >> >> The original version feels more efficient to me. > Yes it is. Adding in some comments might help, like > > /* >  * Find the largest order where the aligned range is completely prot_none(). Note >  * that all remaining orders will be completely prot_none(). >  */ > ... > > /* Try allocating the largest of the remaining orders. */ OK added. > >> >>> >>> That would make the code certainly easier to understand. That "orders" magic of >>> constructing, filtering, walking is confusing :) >>> >>> >>> I might find some time today to see if there is an easy way to cleanup all what >>> I spelled out above. It really is a mess. But likely that cleanup could be >>> deferred (but you're touching it, so ... :) ). >> >> I'm going to ignore the last 5 words. I heard the "that cleanup could be >> deferred" part loud and clear though :) > > :) > > If we could stop passing orders into thp_vma_allowable_orders(), that would > probably > be the biggest win. It's just all a confusing mess. I tried an approach like you suggested in the other thread originally, but I struggled to define exactly what "thp_vma_configured_orders()" should mean; Ideally, I just want "all the THP orders that are currently enabled for this VMA+flags". But some callers want to enforce_sysfs and others don't, so you probably have to at least pass that flag. Then you have DAX which explicitly ignores enforce_sysfs, but only in a page fault. And shmem, which ignores enforce_sysfs, but only outside of a page fault. So it quickly becomes pretty complex. It is basically thp_vma_allowable_orders() as currently defined. If this could be a simple function then it could be inline and as you say, we can do the masking in the caller and exit early for the order-0 case. But it is very complex (at least if you want to retain the equivalent logic to what thp_vma_allowable_orders() has) so I'm not sure how to do the order-0 early exit without passing in the orders bitfield. And we are unlikely to exit early because PMD-sized THP is likely enabled and because we didn't pass in a orders bitfield, that wasn't filtered out. In short, I can't see a solution that's better than the one I have. But if you have something in mind, if you can spell it out, then I'll have a go at tidying it up and integrating it into the series. Otherwise I really would prefer to leave it for a separate series.