From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58906C36010 for ; Mon, 7 Apr 2025 07:23:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 840416B000C; Mon, 7 Apr 2025 03:23:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7EBA16B000D; Mon, 7 Apr 2025 03:23:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68C746B000E; Mon, 7 Apr 2025 03:23:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 49E016B000C for ; Mon, 7 Apr 2025 03:23:06 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 52579140E5E for ; Mon, 7 Apr 2025 07:23:07 +0000 (UTC) X-FDA: 83306406414.29.96D6EC5 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) by imf30.hostedemail.com (Postfix) with ESMTP id E6BB980012 for ; Mon, 7 Apr 2025 07:23:03 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MP3J1GuU; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744010585; a=rsa-sha256; cv=none; b=XYqmHMu68bjsfuR7dYoGmve9BeXtkEJpG8H7IOqGswlqPNrwFdSGXZu3Je9gJD3ijYU5CR yw84a/Dx6/UTSFvGG0Ejk9TMfJ0FcCEfS9giBbFV8gGDrRRaxrKf8uLcjRJVhPMGqnCkjo /2VQLC5ATz9GbR+0EYZd6zgQWISWd1M= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MP3J1GuU; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744010585; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vrijqIO4hPzJruG16zYzwCBzVzb9WmLqL3WnUdfELbM=; b=cORCkGKLfGCQpPjATd0S/CNlXSYjcYNseqRI890XFroVwtpCk0HkAOVmKzXpLuwVG+DWdv VOOGOTk0meJs8i3yDjxFpzm0uDWWYNmEXUw/8s6DVRKEFd4ilzrOn5nhRB4x5QaNHxjXET CKt9UywapiHoLFCZlpezD0E3GrIa9DU= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1744010581; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vrijqIO4hPzJruG16zYzwCBzVzb9WmLqL3WnUdfELbM=; b=MP3J1GuUO9BCyamovJmWCTVfHeITr4Ofmgt+/8nIjLAVXOyvG4J1gf5TLN1Y4S/hmvzRP5 kBeoLE781fZRmyE0VQuGrVzysvJIF29iWumnnXMKsL5k+QvXTMGIvHUWQfR9L+VTZRkB3y 56g7Tz4smGymDq/KrZUY21M9CElAQGc= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is broken, was Re: [RFC PATCH 0/6] Deep talk about folio vmap X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <55306631-8421-455a-9d03-709ebee97d76@vivo.com> Date: Mon, 7 Apr 2025 15:22:22 +0800 Cc: bingbu.cao@linux.intel.com, Christoph Hellwig , Matthew Wilcox , Gerd Hoffmann , Vivek Kasireddy , Sumit Semwal , =?utf-8?Q?Christian_K=C3=B6nig?= , Andrew Morton , Uladzislau Rezki , Shuah Khan , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, opensource.kernel@vivo.com Content-Transfer-Encoding: quoted-printable Message-Id: <777BB63E-245D-4027-880A-FFC3717928D7@linux.dev> References: <20250327092922.536-1-link@vivo.com> <20250404090111.GB11105@lst.de> <9A899641-BDED-4773-B349-56AF1DD58B21@linux.dev> <43DD699A-5C5D-429B-A2B5-61FBEAE2E252@linux.dev> <6f76a497-248b-4f92-9448-755006c732c8@vivo.com> <35D26C00-952F-481C-8345-E339F0ED770B@linux.dev> <55306631-8421-455a-9d03-709ebee97d76@vivo.com> To: Huan Yang X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: E6BB980012 X-Stat-Signature: n5rgww9faceyni3mzkshasiqyrh7685s X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1744010583-578901 X-HE-Meta: U2FsdGVkX1+niF3vjP5Iwu+BpMpECpLtg387lN4NY0TucPw9WwxooWXwtDx+jHzYGz0UDN0U11f4iVm9p/TEn0i41fz+rzyP/EzLA5vxSK71oO5I0aAUlVeyG8vgb/sijlMiLsh9G05m+BOqKsOBQNYN0SgGidSXvJ30AJZ7fgjKcwFKRqzsmQxdYNUWLVUjsdmVL0pkXWzu6ldN4Cg+cXzVp+j472lehE54y7UFmu49Zhm32AeM0zTkH+u59mVujfJyus/hEk4hlIpU4wpC9cqPlgfGFEkv+0QUPhSlESQGDUn8LT1Y9EQewsUFcm76nNDN2+93Y6ES0ERqQeDliJvtwSLqyPnUkl82P8Lk6GOI2EPow8NHeC9lGJwAWV7ray2hAJ0JA++p/I3RbpITQyQzBFXJ1nUWvlX9TsNo9kFebnbf4WotQ3CfdP1gsYsv6bxTr+SUwj1AWuoYWB7/2ErqjcNbGibWftHJCoB4o77i2EhFlK/QR4RbYDlPKeZ4UiaNDAJwDvtw2FHXMUmeEBwxR5otFUXpsjQ9m8+L6VjuMtgL0Bx1sUJjmKJGb5zbWqf3hYXXNWwXsoaS4NKPV0bwlozzFKXKhX4gTpSK/9i7rFwENMzo+ViI9ol98bsVeBmHb0hpFDtoNoDD55rNcEGmDeF/i52BnLfXUEZBZjeV010SAyviNTftDeRRylhWXu25LQyRSAxWa4xMYCl7ufkobWsHLuMhrLnxtr12EA39/GK6+fIEnzG9+Fk0CDvBjuiA4kBLjuBAkKbUqDkUXfPK816V+s5lAxa8vyBIULf7ka4Qsc6H5SIW1ZwiwbeIepzAa9Sg2E121aSyFShUChcmH/mkT5g5GLH6LjxAmomZtJllgbHBaQJSSz/Rl3rIP6hHzVEwoeaa/C/84Fjo0h3rFvJu6e7Vol+qSo3DAXDVVSM80ZvjKG8mbt91f6FWNEO5vsF7uxwfgtnu1kO 3H+aKEhT +db3aBGMP1FK48QHgNQTfSEQrG02LOmcrgkG6VlnxsLAo3zeKUrZsrLOAADX+NdOPSjyUlMRuN2VFCbiP3Bl+M/kaLY7MLZk2bt8XQk7KriZ/4jsJjfVhswOlGqvFe9uRMgCW9YBQsm6ag+sn3/r7OV1YE6G2Vt0c56NLQUbEso0fJBg61fgQrjhPmkpYVmRJ6FThpSsWnkxf9TpxUgOpfqzAt+mL7RV1guMf4yuaexrIX+9MJ/ruS9E+oAc3pmtZLBKrzpILoU1/AHeML4bz5U/rKFOX6kubCFV9k6sBpKJSz2ms09rHOVIF1w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Apr 7, 2025, at 15:09, Huan Yang wrote: >=20 >=20 > =E5=9C=A8 2025/4/7 14:43, Muchun Song =E5=86=99=E9=81=93: >>=20 >>> On Apr 7, 2025, at 11:37, Muchun Song wrote: >>>=20 >>>=20 >>>=20 >>>> On Apr 7, 2025, at 11:21, Huan Yang wrote: >>>>=20 >>>>=20 >>>> =E5=9C=A8 2025/4/7 10:57, Muchun Song =E5=86=99=E9=81=93: >>>>>> On Apr 7, 2025, at 09:59, Huan Yang wrote: >>>>>>=20 >>>>>>=20 >>>>>> =E5=9C=A8 2025/4/4 18:07, Muchun Song =E5=86=99=E9=81=93: >>>>>>>> On Apr 4, 2025, at 17:38, Muchun Song = wrote: >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>> On Apr 4, 2025, at 17:01, Christoph Hellwig = wrote: >>>>>>>>>=20 >>>>>>>>> After the btrfs compressed bio discussion I think the hugetlb = changes that >>>>>>>>> skip the tail pages are fundamentally unsafe in the current = kernel. >>>>>>>>>=20 >>>>>>>>> That is because the bio_vec representation assumes tail pages = do exist, so >>>>>>>>> as soon as you are doing direct I/O that generates a bvec = starting beyond >>>>>>>>> the present head page things will blow up. Other users of = bio_vecs might >>>>>>>>> do the same, but the way the block bio_vecs are generated are = very suspect >>>>>>>>> to that. So we'll first need to sort that out and a few other = things >>>>>>>>> before we can even think of enabling such a feature. >>>>>>>>>=20 >>>>>>>> I would like to express my gratitude to Christoph for including = me in the >>>>>>>> thread. I have carefully read the cover letter in [1], which = indicates >>>>>>>> that an issue has arisen due to the improper use of = `vmap_pfn()`. I'm >>>>>>>> wondering if we could consider using `vmap()` instead. In the = HVO scenario, >>>>>>>> the tail struct pages do **exist**, but they are read-only. = I've examined >>>>>>>> the code of `vmap()`, and it appears that it only reads the = struct page. >>>>>>>> Therefore, it seems feasible for us to use `vmap()` (I am not a = expert in >>>>>>>> udmabuf.). Right? >>>>>>> I believe my stance is correct. I've also reviewed another = thread in [2]. >>>>>>> Allow me to clarify and correct the viewpoints you presented. = You stated: >>>>>>> " >>>>>>> So by HVO, it also not backed by pages, only contains folio = head, each >>>>>>> tail pfn's page struct go away. >>>>>>> " >>>>>>> This statement is entirely inaccurate. The tail pages do not = cease to exist; >>>>>>> rather, they are read-only. For your specific use-case, please = use `vmap()` >>>>>>> to resolve the issue at hand. If you wish to gain a = comprehensive understanding >>>>>> I see the document give a simple graph to point: >>>>>>=20 >>>>>> +-----------+ ---virt_to_page---> +-----------+ mapping to = +-----------+ >>>>>> | | | 0 | = -------------> | 0 | >>>>>> | | +-----------+ +-----------+ >>>>>> | | | 1 | = -------------> | 1 | >>>>>> | | +-----------+ +-----------+ >>>>>> | | | 2 | = ----------------^ ^ ^ ^ ^ ^ >>>>>> | | +-----------+ | | | | | >>>>>> | | | 3 | = ------------------+ | | | | >>>>>> | | +-----------+ | | | | >>>>>> | | | 4 | = --------------------+ | | | >>>>>> | PMD | +-----------+ | | | >>>>>> | level | | 5 | = ----------------------+ | | >>>>>> | mapping | +-----------+ | | >>>>>> | | | 6 | = ------------------------+ | >>>>>> | | +-----------+ | >>>>>> | | | 7 | = --------------------------+ >>>>>> | | +-----------+ >>>>>> | | >>>>>> | | >>>>>> | | >>>>>> +-----------+ >>>>>>=20 >>>>>> If I understand correct, each 2-7 tail's page struct is freed, so = if I just need map page 2-7, can we use vmap do >>>>>>=20 >>>>>> something correctly? >>>>> The answer is you can. It is essential to distinguish between = virtual >>>> Thanks for your reply, but I still can't understand it. For = example, I need vmap a hugetlb HVO folio's >>>>=20 >>>> 2-7 page: >>>>=20 >>>> struct page **pages =3D kvmalloc(sizeof(*pages), 6, GFP_KENREL); >>>>=20 >>>> for (i =3D 2; i < 8; ++i) >>>>=20 >>>> pages[i] =3D folio_page(folio, i); //set 2-7 range page into = pages, >>>>=20 >>>> void *vaddr =3D vmap(pages, 6, 0, PAGE_KERNEL); >>>>=20 >>>> For no HVO pages, this can work. If HVO enabled, do "pages[i] =3D = folio_page(folio, i);" just >>>>=20 >>>> got the head page? and how vmap can correctly map each page? >>> Why do you think folio_page(folio, i) (i =E2=89=A0 0) returns the = head page? >>> Is it speculation or tested? Please base it on the actual situation >>> instead of indulging in wild thoughts. >> By the way, in case you truly struggle to comprehend the fundamental >> aspects of HVO, I would like to summarize for you the user-visible >> behaviors in comparison to the situation where HVO is disabled. >>=20 >> HVO Status Tail Page Structures Head Page Structures >> Enabled Read-Only (RO) Read-Write (RW) >> Disabled Read-Write (RW) Read-Write (RW) >>=20 >> The sole distinction between the two scenarios lies in whether the >> tail page structures are allowed to be written or not. Please refrain >> from getting bogged down in the details of the implementation of HVO. >=20 > Thanks, I do a test, an figure out that I'm totally misunderstand it. >=20 > Even if HVO enabled, tail page struct freed and point to head, linear = mapping still exist, so that any page_to_pfn, >=20 > page_to_virt(also folio's version), if start from head page can = compute each need page like folio_page, can still work: >=20 > hvo head 0xfffff9de849d0000, pfn=3D0x127400, wish offset_pfn 0x1275f1, = idx 497 is 0xfffff9de849d7c40, pfn=3D0x1275f1. >=20 > When vmap, we no need to touch actually page's content, just turn to = pfn, so, work well. You are able to read those tail page structures. The reason why vmap can function is not that it doesn't read those page structures. What I mean is that vmap will still work even if it does read the page structures, because those tail page structures do indeed exist. >=20 > BTW, even if we need to touch actually input page struct, it point to = head page, I guess will effect nothing. Allow me to clarify this for you to ensure that we have a shared = understanding. Those tail page structures (virtual addresses in the vmemmap area) are = mapped to the same page frame (physical page) to which the head page structures = (virtual addresses in the vmemmap area) are mapped. It is analogous to the = shared-mapping mechanism in the user space. >=20 > If anything still misunderstand, please corrent me. :) >=20 > Muchun, thank you for your patience, >=20 > Huan Yang >=20 >>=20 >> Thanks, >> Muchun. >>=20 >>> Thanks, >>> Muchun. >>>=20 >>>> Please correct me. :) >>>>=20 >>>> Thanks, >>>>=20 >>>> Huan Yang >>>>=20 >>>>> address (VA) and physical address (PA). The VAs of tail struct = pages >>>>> aren't freed but remapped to the physical page mapped by the VA of = the >>>>> head struct page (since contents of those tail physical pages are = the >>>>> same). Thus, the freed pages are the physical pages mapped by = original >>>>> tail struct pages, not their virtual addresses. Moreover, while it >>>>> is possible to read the virtual addresses of these tail struct = pages, >>>>> any write operations are prohibited since it is within the realm = of >>>>> acceptability that the kernel is expected to perform write = operations >>>>> solely on the head struct page of a compound head and conduct read >>>>> operations only on the tail struct pages. BTW, folio = infrastructure >>>>> is also based on this assumption. >>>>>=20 >>>>> Thanks, >>>>> Muchun. >>>>>=20 >>>>>> Or something I still misunderstand, please correct me. >>>>>>=20 >>>>>> Thanks, >>>>>>=20 >>>>>> Huan Yang >>>>>>=20 >>>>>>> of the fundamentals of HVO, I kindly suggest a thorough review = of the document >>>>>>> in [3]. >>>>>>>=20 >>>>>>> [2] = https://lore.kernel.org/lkml/5229b24f-1984-4225-ae03-8b952de56e3b@vivo.com= /#t >>>>>>> [3] Documentation/mm/vmemmap_dedup.rst >>>>>>>=20 >>>>>>>> [1] = https://lore.kernel.org/linux-mm/20250327092922.536-1-link@vivo.com/T/#m05= 5b34978cf882fd44d2d08d929b50292d8502b4 >>>>>>>>=20 >>>>>>>> Thanks, >>>>>>>> Muchun.