From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AE4EC05027 for ; Thu, 2 Feb 2023 22:49:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 469FF6B0071; Thu, 2 Feb 2023 17:49:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 41A5B6B0073; Thu, 2 Feb 2023 17:49:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E2276B0074; Thu, 2 Feb 2023 17:49:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1C17B6B0071 for ; Thu, 2 Feb 2023 17:49:48 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E04001C6925 for ; Thu, 2 Feb 2023 22:49:47 +0000 (UTC) X-FDA: 80423845614.30.DB673F8 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf18.hostedemail.com (Postfix) with ESMTP id C70CB1C0004 for ; Thu, 2 Feb 2023 22:49:45 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=t7JTR+qp; dmarc=none; spf=none (imf18.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675378186; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gEnCsMGZWtlqcPCyr8s6lM1GgKbz0/ITK1xeZwSWEa4=; b=VL20/KPYMeyylt/S+Dgatj52VFLVWK++TC51yNaZ6GbNU2P1qsdWnsZ28hwWR9Fv6iv1dI 3B5v9c6pLFJgNMiFPynJoNFaa/tBICl2Zgnc49xoe/Qna9DacHshAnNeizeojAoFWrfiq3 6jjmMHANR1fNrfcc68QO4XSNd8rb6uU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=t7JTR+qp; dmarc=none; spf=none (imf18.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675378186; a=rsa-sha256; cv=none; b=cOhmMkXMrs5hKWoES8DrE/R1GEXyESEB0zgirxnP+/MNhE+oBcAM4iogDrNn+qaazVRijy 9BmzG6midqm/dQOvZRhrBtzT+sde6bXPPRgeWGeRg8M62NTrx//xMe7/YYMMx3cIQ9oDyt zA6i0QtgmnmsgYomRqbBaM/auAazKSs= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=gEnCsMGZWtlqcPCyr8s6lM1GgKbz0/ITK1xeZwSWEa4=; b=t7JTR+qpWMq+rzB8YgmIN+TBGT ArgbZFW4otUJ3OamJvwPQ4SUQAthL5rSjJN91cARwD3OE56PdmBkCTCvvTh4n8S37TY9ah5SDuFyL XKs5rQGIQo3mf+WzGG6ySgdSXSC+WjJlcCkNpcIhXd3yoio4a3BNh8cA9bBCcSE9E05FECAqDlAjd 6zGvRHJm/WTWRGfFx9NWfLRpquxdLQzbJJKGRUKWcNGHGRgwNzREnevf07Q1Z+SGLo+C4sbE6JyNC o2KevTCN+cGISmKSV1qF9MReBoZOO55mGduHqd51c0U23GHQ3oPDD+idE/f1OeNnd7GVadWJiWU7f 5NUR8SAg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pNiOg-00DnLD-Nt; Thu, 02 Feb 2023 22:49:38 +0000 Date: Thu, 2 Feb 2023 22:49:38 +0000 From: Matthew Wilcox To: "Kirill A. Shutemov" Cc: linux-arch@vger.kernel.org, Yin Fengwei , linux-mm@kvack.org Subject: Re: API for setting multiple PTEs at once Message-ID: References: <20230202214858.btrzrcevzxjfk6wg@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230202214858.btrzrcevzxjfk6wg@box.shutemov.name> X-Rspamd-Queue-Id: C70CB1C0004 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: k9ofn6q3ro61wa71fndbny677dc7kwgg X-HE-Tag: 1675378185-410866 X-HE-Meta: U2FsdGVkX19dgnpUQ8N/iH+dxwzKJNE5I45+d0XyDgzGeIDbEvlZi5JBD+d4TmbWEeZ2d437cW8ONptJfO6PtcCVfgFUdWaqN5QHHr/DxWgOlL6GLlVbzaH2ffo/zkux0TXcXUsrrdRzweRnT8X3v3xVHR4K43Idmyy8+omGgsL1bZUamEg7Zi3V26xTGGOXbhrTxVOmhGeCGNUeU+Fn7if2qia0oG43mvn0m9eXhIL5Oo23iNeS2xcUkrjcpsEmt0Oh//6xNkUX089EQc5ghUnB2xaiTJ4fD2xIqMFnXxhoBGLM+PcSrNGdI2lI0L+nviZTGZu3d76vs8zzgNsVbGEYhu0uMkpoJeH/uYzeUiUTShttzUPqzDD+Q13w5/xUiZODA+hXWc5NOXSmanuyO2a/DWPW28KWM2sdhiKUV5vFEEbaOT9Z+hB6sE7fdoMBUg++XipMF2z+8a+T+DMTpkliarlWxAL5QeXysfQ86VP2Cb9NQDUGegk6jhHgcX18BsHgSzmsuD671BiwCP2sn0q+Ah8JwZ+XuauIBUgtxwPbiBrR4Q2L/wOlV+lgSSSnAtqpE3kTXPgXPetPHux5lNLfIjkKWc2tJ7OUorebRUBKiuB2jqXYbvzqui/o0CMYPR9ZJttW7c0ipAyTcpxzW6detYETWEhoNbqgQQ4d1ca2D0SfrKaQZqPKCoFkM0YANCss/pe/QcEOhsq1EISVNH2ivfT0top8MAZ+33allauH+zi3ngHMOk27hx197ARtN3114n76VrbHwwf9FFs+Ma+GT0zy39f1C6ZE6d01fvGjFsCX+8aps6uGzbkwRJ52bcfxqItUpaWfh9P2ajueFkGegLf2jfKt/4KIR/T9mH75vQ/HfnVC3QVjpWD9HKp83DArbhP5ljnIVp9xC7Do2iXGg8HMattKOdU/1JDnxzqunnlBk42P0aoM5Wm+IYqGNID7BjG2y8zTWh0fp5W yGmEeTGX 0JMGI8lROqrHmBXwgD2OH0AukeF3cY0rV64q788h7OPMK3ef9zPsxVQkl1iUByx4dr219B4x13mFBxtYEhzSg5/lo9KuFUlpVkzbKayrKItfUrX9chs933OgduMcaMlH5hezbQTK69QdunIGfbghKWevKNUKZ2jhrb0gNbuDvMBKeDGfd4/LeT3lB/KUrKhL5DI3RKMrOB8pm+jLVhUix0W6nZ3/0KPEl6oZipPNl63Jo6/Akpk0C92I21blKHU1xj+ssSDpDItpc1vY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Feb 03, 2023 at 12:48:58AM +0300, Kirill A. Shutemov wrote: > On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote: > > For those of you not subscribed, linux-mm is currently discussing > > how best to handle page faults on large folios. I simply made it work > > when adding large folio support. Now Yin Fengwei is working on > > making it fast. > > > > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@casper.infradead.org/ > > is perhaps the best place to start as it pertains to what the > > architecture will see. > > > > At the bottom of that function, I propose > > > > + for (i = 0; i < nr; i++) { > > + set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry); > > + /* no need to invalidate: a not-present page won't be cached */ > > + update_mmu_cache(vma, addr, vmf->pte + i); > > + addr += PAGE_SIZE; > > + entry = pte_next(entry); > > + } > > > > (or I would have, had I not forgotten that pte_t isn't an integral type) > > > > But I think that some architectures want to mark PTEs specially for > > "This is part of a contiguous range" -- ARM, perhaps? So would you like > > an API like: > > > > arch_set_ptes(mm, addr, vmf->pte, entry, nr); > > Maybe just set_ptes(). arch_ doesn't contribute much. Sure. > > update_mmu_cache_range(vma, addr, vmf->pte, nr); > > > > There are some challenges here. For example, folios may be mapped > > askew (ie not naturally aligned). Another problem is that folios may > > be unmapped in part (eg mmap(), fault, followed by munmap() of one of > > the pages in the folio), and I presume you'd need to go and unmark the > > other PTEs in that case. So it's not as simple as just checking whether > > 'addr' and 'nr' are in some way compatible. > > I think the key question is who is responsible for 'nr' being safe. Like > is it caller or set_ptes() need to check that it belong to the same PTE > page table, folio, VMA, etc. > > I think it has to be done by caller and set_pte() has to be as simple as > possible. Caller guarantees that 'nr' is bounded by all of (vma, PMD table, folio). We don't currently allocate folios larger than PMD size, but perhaps we should prepare for that and as part of this same exercise define set_pmds(mm, addr, vmf->pmd, entry, nr); ... where 'nr' is the number of PMDs to set, not number of pages.