From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 010F5C3600C for ; Mon, 7 Apr 2025 03:38:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC4986B0005; Sun, 6 Apr 2025 23:38:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E74BD6B0007; Sun, 6 Apr 2025 23:38:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D137C6B0008; Sun, 6 Apr 2025 23:38:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A8B166B0005 for ; Sun, 6 Apr 2025 23:38:42 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C81CCBA1C3 for ; Mon, 7 Apr 2025 03:38:42 +0000 (UTC) X-FDA: 83305840884.22.FFD32B2 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) by imf12.hostedemail.com (Postfix) with ESMTP id EDD254000A for ; Mon, 7 Apr 2025 03:38:40 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PSH08hLe; spf=pass (imf12.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743997121; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TLwyuuy4aI3x0fwEsZllh/AdDqppEyi7Fwatm110DFQ=; b=STSvuLgLcQ8Zr3wELD2Q5x1e59DYN+Fms6DHIzAPriKKmZYb32mhcpjvmNSNL8lAhDU8I3 /LN6YfmcG9vegScHY9MO3ayP388CE/Irlc+5OqUfnF9txIe/Ap/bcCx1SvMRhiMGkDuweD 08JoKc7P/BUSdTrX+ssLtLq53arXmHA= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PSH08hLe; spf=pass (imf12.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743997121; a=rsa-sha256; cv=none; b=FdrCYTcORflu6q/RGfNprVDJD4CPfjM+95M2VhyuywGQjuHVEIXj0C/xsFQPgfMewKMULM piTBYbHFbPyN0x/ExOlpZO0wIVnLjDBZseEoH/b3pjahQt1e51rNOolYGECJ+36eq129+1 BwebbuJemp7jRIyTml7Is+a08xMvYtw= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1743997119; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TLwyuuy4aI3x0fwEsZllh/AdDqppEyi7Fwatm110DFQ=; b=PSH08hLemfgJhQFFxmJBpvROYAqZI7MpC0rqfgqbjIl/TN2RjSOOeCFfB3+Dj9AxbpiuFz NStsFT5lv9y3NH8iMn4jclT9WIIvS++m29FPMag/+cSWo4E4waLDoaP40Pj9R54ZF8/++d YhT2CFggqgP3BzLRBbwh/UR46tcv7kU= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is broken, was Re: [RFC PATCH 0/6] Deep talk about folio vmap X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <6f76a497-248b-4f92-9448-755006c732c8@vivo.com> Date: Mon, 7 Apr 2025 11:37:56 +0800 Cc: bingbu.cao@linux.intel.com, Christoph Hellwig , Matthew Wilcox , Gerd Hoffmann , Vivek Kasireddy , Sumit Semwal , =?utf-8?Q?Christian_K=C3=B6nig?= , Andrew Morton , Uladzislau Rezki , Shuah Khan , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, opensource.kernel@vivo.com Content-Transfer-Encoding: quoted-printable Message-Id: References: <20250327092922.536-1-link@vivo.com> <20250404090111.GB11105@lst.de> <9A899641-BDED-4773-B349-56AF1DD58B21@linux.dev> <43DD699A-5C5D-429B-A2B5-61FBEAE2E252@linux.dev> <6f76a497-248b-4f92-9448-755006c732c8@vivo.com> To: Huan Yang X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: EDD254000A X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 1sjxd5qjdbpsmusagxrnfmjfx1i73655 X-HE-Tag: 1743997120-314389 X-HE-Meta: U2FsdGVkX18tcmwOPA+FJqAKVzCHh2EKvUONlB3nYBBIp9Y1wbgrTiLm/J/jirTMW0TYpjq3Jmqa1UZ4Yv+n6a1eDIaElbd+J7MgrkYXllX7EmVDwLQqVypLqhMBwxPgfkGto0xwBOJY9nIsd0HkGynlBAfaHFy5Z3XKWKg12jHyiNROt8bdQqPRvISFmf4sBEa9NnOZ+/BJYNiumZjKgTgxWk6SPviZR9u4YtZyreqf2c/OFXeeG8vIMIRbAWSsocOygwTGdix5pf0G25mrpMlc6JEaX5K+vjIggkIg73HaIid/95wl0J6bwVHHK/ePMTzePPHf9T9++4wTtYaU3VtYaNsPTcpf0PNDOewTMdpPBpx0Mj6EHG8KVVZ7sFrAY+RKcYU+OWcDZl+4ddn4vnH4Q/B0HGC9tx1KVnNvlI4JuktP46KYmU2DXWN6ypXREqtaxf6DyMykFrM5BsjJNsIFxP4oSBjiBcU8RcQ8ANNv4OEpc0PR5LMBYYGyomFrtqwU3uPAPVRkaHLSu18oM/hsEv6z0ZrGmmpqT9mkSJnHVO7OeYIhyonjm/FBomUe+nHuObAWwX98jnXzXGFRPe6ZoALaJORdTqBmRxG1AJFhlj4aVRSQlEEpeWfYzqVBkfifRFRk7kyyFVsr7kuo7rRGdnhF868mom4Iz+TF5MY5whwzYb68bYxEyb4mxf9NfwaXzRGtAw5G2A1syLAqhzee6M2csg+He6cyoNzvAvoZ0c64VOiFyj0KaGVkBR0uUs1gH327b6Q7YACXJ3sqQMFVzCIahAOroDHZFl1QPHF1dWjVxOBEBsngGnuJnVH6uP9gGI6DzTL6mr3fo+97xiIxCcvLJhLQwzYVeF401IbRouaQWVyuQ0mrBZsRU1VJL5Js/l7nMZe5XJyZVMo5Re2+7ENvdaJBV8JxmQpHTtVf/h/ZmuJzFcB17atBditBvQeJdij437U1ugiohOt ZFWjQSZM lDvv2thibPX5Ti0h/JXcMnslAb4LNcDtSphkF1LQFoROy6yJ2f1pXxzh1iis9g7jWABqQEhEyW/CbohooTRA2Xd4y9UQYTWe4SDYE5/ztkf4EjycaSCLURALm6cCVv7phNoCL6dmUv9u5PCWgiH6hJ9b0qMDViYmXxfBASLFHpqHWDigX2YRnq+nzwJytFexA2kn/bLd1fuGkEOcI9YfSm2C68oo0TbQlBDfK5Wx+dT7miuxkSneSxPI4IyDKFWT4MrgEodXFIT15QoTOuQ86N5gj2uN8cKAEvp7zGtHOHQLbyU8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Apr 7, 2025, at 11:21, Huan Yang wrote: >=20 >=20 > =E5=9C=A8 2025/4/7 10:57, Muchun Song =E5=86=99=E9=81=93: >>=20 >>> On Apr 7, 2025, at 09:59, Huan Yang wrote: >>>=20 >>>=20 >>> =E5=9C=A8 2025/4/4 18:07, Muchun Song =E5=86=99=E9=81=93: >>>>> On Apr 4, 2025, at 17:38, Muchun Song = wrote: >>>>>=20 >>>>>=20 >>>>>=20 >>>>>> On Apr 4, 2025, at 17:01, Christoph Hellwig wrote: >>>>>>=20 >>>>>> After the btrfs compressed bio discussion I think the hugetlb = changes that >>>>>> skip the tail pages are fundamentally unsafe in the current = kernel. >>>>>>=20 >>>>>> That is because the bio_vec representation assumes tail pages do = exist, so >>>>>> as soon as you are doing direct I/O that generates a bvec = starting beyond >>>>>> the present head page things will blow up. Other users of = bio_vecs might >>>>>> do the same, but the way the block bio_vecs are generated are = very suspect >>>>>> to that. So we'll first need to sort that out and a few other = things >>>>>> before we can even think of enabling such a feature. >>>>>>=20 >>>>> I would like to express my gratitude to Christoph for including me = in the >>>>> thread. I have carefully read the cover letter in [1], which = indicates >>>>> that an issue has arisen due to the improper use of `vmap_pfn()`. = I'm >>>>> wondering if we could consider using `vmap()` instead. In the HVO = scenario, >>>>> the tail struct pages do **exist**, but they are read-only. I've = examined >>>>> the code of `vmap()`, and it appears that it only reads the struct = page. >>>>> Therefore, it seems feasible for us to use `vmap()` (I am not a = expert in >>>>> udmabuf.). Right? >>>> I believe my stance is correct. I've also reviewed another thread = in [2]. >>>> Allow me to clarify and correct the viewpoints you presented. You = stated: >>>> " >>>> So by HVO, it also not backed by pages, only contains folio = head, each >>>> tail pfn's page struct go away. >>>> " >>>> This statement is entirely inaccurate. The tail pages do not cease = to exist; >>>> rather, they are read-only. For your specific use-case, please use = `vmap()` >>>> to resolve the issue at hand. If you wish to gain a comprehensive = understanding >>> I see the document give a simple graph to point: >>>=20 >>> +-----------+ ---virt_to_page---> +-----------+ mapping to = +-----------+ >>> | | | 0 | = -------------> | 0 | >>> | | +-----------+ +-----------+ >>> | | | 1 | = -------------> | 1 | >>> | | +-----------+ +-----------+ >>> | | | 2 | = ----------------^ ^ ^ ^ ^ ^ >>> | | +-----------+ | | | | | >>> | | | 3 | = ------------------+ | | | | >>> | | +-----------+ | | | | >>> | | | 4 | = --------------------+ | | | >>> | PMD | +-----------+ | | | >>> | level | | 5 | = ----------------------+ | | >>> | mapping | +-----------+ | | >>> | | | 6 | = ------------------------+ | >>> | | +-----------+ | >>> | | | 7 | = --------------------------+ >>> | | +-----------+ >>> | | >>> | | >>> | | >>> +-----------+ >>>=20 >>> If I understand correct, each 2-7 tail's page struct is freed, so if = I just need map page 2-7, can we use vmap do >>>=20 >>> something correctly? >> The answer is you can. It is essential to distinguish between virtual >=20 > Thanks for your reply, but I still can't understand it. For example, I = need vmap a hugetlb HVO folio's >=20 > 2-7 page: >=20 > struct page **pages =3D kvmalloc(sizeof(*pages), 6, GFP_KENREL); >=20 > for (i =3D 2; i < 8; ++i) >=20 > pages[i] =3D folio_page(folio, i); //set 2-7 range page into = pages, >=20 > void *vaddr =3D vmap(pages, 6, 0, PAGE_KERNEL); >=20 > For no HVO pages, this can work. If HVO enabled, do "pages[i] =3D = folio_page(folio, i);" just >=20 > got the head page? and how vmap can correctly map each page? Why do you think folio_page(folio, i) (i =E2=89=A0 0) returns the head = page? Is it speculation or tested? Please base it on the actual situation instead of indulging in wild thoughts. Thanks, Muchun. >=20 > Please correct me. :) >=20 > Thanks, >=20 > Huan Yang >=20 >> address (VA) and physical address (PA). The VAs of tail struct pages >> aren't freed but remapped to the physical page mapped by the VA of = the >> head struct page (since contents of those tail physical pages are the >> same). Thus, the freed pages are the physical pages mapped by = original >> tail struct pages, not their virtual addresses. Moreover, while it >> is possible to read the virtual addresses of these tail struct pages, >> any write operations are prohibited since it is within the realm of >> acceptability that the kernel is expected to perform write operations >> solely on the head struct page of a compound head and conduct read >> operations only on the tail struct pages. BTW, folio infrastructure >> is also based on this assumption. >>=20 >> Thanks, >> Muchun. >>=20 >>> Or something I still misunderstand, please correct me. >>>=20 >>> Thanks, >>>=20 >>> Huan Yang >>>=20 >>>> of the fundamentals of HVO, I kindly suggest a thorough review of = the document >>>> in [3]. >>>>=20 >>>> [2] = https://lore.kernel.org/lkml/5229b24f-1984-4225-ae03-8b952de56e3b@vivo.com= /#t >>>> [3] Documentation/mm/vmemmap_dedup.rst >>>>=20 >>>>> [1] = https://lore.kernel.org/linux-mm/20250327092922.536-1-link@vivo.com/T/#m05= 5b34978cf882fd44d2d08d929b50292d8502b4 >>>>>=20 >>>>> Thanks, >>>>> Muchun.