From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BA5FE7716E for ; Fri, 6 Dec 2024 03:35:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1654C6B0173; Thu, 5 Dec 2024 22:35:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 115066B0183; Thu, 5 Dec 2024 22:35:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 003E56B0186; Thu, 5 Dec 2024 22:35:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D032C6B0173 for ; Thu, 5 Dec 2024 22:35:27 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 89ED3A14C7 for ; Fri, 6 Dec 2024 03:35:27 +0000 (UTC) X-FDA: 82863117918.20.8C9CBB8 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf16.hostedemail.com (Postfix) with ESMTP id 3FB3418000D for ; Fri, 6 Dec 2024 03:35:07 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf16.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733456118; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qjsDdeQ3X5fjbYR9dwNfo4EyWfwtYiX57JJcTOaNfug=; b=WjUpRWr9yHgXZEHP/epYzop1zFwXIBC8QknwOT5qfKC120X/B5F+IuLyKzCrC3nHbRaDuH h1KlGboXqdVQrsyYbQoRprqxuWBWm2zdm0oWT3Z5gHmY6u5YEFBzdPBnj7WqHO/kXr2fMm 7ya/oWdNkvd8LY4iEcAOelTAvIoH/gs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733456118; a=rsa-sha256; cv=none; b=GXmE6/XS1KwKuKD9gqMVn/VT19q5mKLFAsEcUYsh6u711rnXkrS/nqoT43S19cENfmVYiN OIbdYug6PVh7laKKdMRSdiS/nvHl6eBK1q5uOU8T9xI/XZ9IOH27ADXCI/sSV6u1KMo5Vy xI+T9LnbgPXPrr/FwxqnQ0xo544JbyA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf16.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Y4H0k31xvz21mcR; Fri, 6 Dec 2024 11:33:42 +0800 (CST) Received: from dggpeml500011.china.huawei.com (unknown [7.185.36.84]) by mail.maildlp.com (Postfix) with ESMTPS id 902411A016C; Fri, 6 Dec 2024 11:35:21 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by dggpeml500011.china.huawei.com (7.185.36.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 6 Dec 2024 11:35:20 +0800 Message-ID: <518c881b-8ba0-df0e-16bf-00694c59f5a7@huawei.com> Date: Fri, 6 Dec 2024 11:35:20 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH -next] ovl: respect underlying filesystem's get_unmapped_area() To: Lorenzo Stoakes , Amir Goldstein CC: , , , , , , , , , Matthew Wilcox , Liam Howlett References: <20241205143038.3260233-1-tujinjiang@huawei.com> <69b72e3d-b101-4641-9ce5-51346c93a98d@lucifer.local> From: Jinjiang Tu In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml500011.china.huawei.com (7.185.36.84) X-Stat-Signature: 7p6ogxiohdqmfbeyqg3wjxji4e6dwb4y X-Rspamd-Queue-Id: 3FB3418000D X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1733456107-341294 X-HE-Meta: U2FsdGVkX1/rV7Oa7JkicH8+ZH249gRxEa+6tupXQgRNWm3dRq1NSY2+PzUWyLXhMWzo8a13/9zpBIqkYg33gbEnCXEsIXvNHg0sIdKE7zoBtkXOsEa5KCPyHSuNQlnVEXZXPaAWVMBIT3p/TQVJEqf1TmbG0YUXjvFJA+1E+uBC7qIXgRodZ73bo+cr7cGw+i+Mjoqn0lCluRmosuzfYTSDhAFhYBwSk1GP76g8iUBJwMIWDCRmPEUfMZPNYFwThW9H+qV/nv8JQazoWTrTy9QdKsQwgxhfDAQdo8RSCRIo1RTP9g2PS1HKmKqtkhA60BZlHZvpMk8ZQ5gY6Q/KcuCk3MI/MoF+j4iQ7cV2Ifdc7PEsEU56XRRMOXGoY0+47+DIYub9jo0r4rdiHhSWm7tBI9bznQGJZSbwDSePrAG9/wN1Xw7Eevcy4BSAnHMcO8g3slraKx2xhkxeU/ZwEITr7s+wr4oW477MddJSRhWLPNe8Vv01Ntce9DnlUQg6zgE057S+Oe7yUBGYOuAogsA1aPYewfgA3bbPUtnrUvjyS8ZObXFbx0H3lvMWqRhTcx72IjKhOSXkdGoluCdKWXPhdrV+K41Yl8MWh6sQoVsPyo8/LAZO/J8vNskjFMnOLiakVYdOL/xsZfGR56Scfp1sy/JOWtHr5dGX2TqSwBsX8+ZkRGfXlZsSAESTcpTT3kdCTVAgoFOp7yoGLo/NdTldSQ9Hgj3H0cAZ2sjWaWPsx+tnZhkrEEHX92SQ91jXuiyzir1fJMx9JvS0vjx9juGjR52WOT+zketVUjQhr7B/DqnJRUKETDNa+YhGs8IDP3d6cp9MCB99CN1/lzyg0mWqgn04gUXAEaNTF0B77X/LAcbKIQIbo1ZSqPDmLwvp8p6L1H515ppuG1dHHgO4K3b/Tg+GjgXqFGlOo+lxgOYDpC+gWaVCOnJV/toIKSnjY+CPRSmJ4/hj3taS3fY il/9n9/x 9sDjlJDiNg+Zn0bvasdoBKFtZzTYLvovGM8VPyAKZnfsq+LV5shTcSUfFDghI0gGlxIEmla6Kg9nKPduSEqnMTDgEBnPpnulnxkLXt/wDldOGVPvNbT93uNkukg35WL46U9HgTEhm6m/kv05NRbZsJazxBXBoAz9/K1/Bi/no5NvbuQO4LuAopoe4SSMfiz6uJHppt6gPfDpmElkSmTJ5fSMUXBp5ETOq6QKom2LIYGbiZRrwRjQnAyhHYW2Yd+RtYVBp+iDpFCfSgxFQJtqSwq/HRVxxufsQh36GAB8D27Fy1zqpOmDqONevqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/5 23:24, Lorenzo Stoakes 写道: > (fixing typo in cc list: tujinjiang@huawe.com -> tujinjiang@huawei.com) > > + Liam > > (JinJiang - you forgot to cc the correct maintainers, please ensure you run > scripts/get_maintainers.pl on files you change) > > On Thu, Dec 05, 2024 at 04:12:12PM +0100, Amir Goldstein wrote: >> On Thu, Dec 5, 2024 at 4:04 PM Lorenzo Stoakes >> wrote: >>> + Matthew for large folio aspect >>> >>> On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote: >>>> During our tests in containers, there is a read-only file (i.e., shared >>>> libraies) in the overlayfs filesystem, and the underlying filesystem is >>>> ext4, which supports large folio. We mmap the file with PROT_READ prot, >>>> and then call madvise(MADV_COLLAPSE) for it. However, the madvise call >>>> fails and returns EINVAL. >>>> >>>> The reason is that the mapping address isn't aligned to PMD size. Since >>>> overlayfs doesn't support large folio, __get_unmapped_area() doesn't call >>>> thp_get_unmapped_area() to get a THP aligned address. >>>> >>>> To fix it, call get_unmapped_area() with the realfile. >>> Isn't the correct solution to get overlayfs to support large folios? >>> >>>> Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should >>>> export get_unmapped_area(). >>> Yeah, not in favour of this at all. This is an internal implementation >>> detail. It seems like you're trying to hack your way into avoiding >>> providing support for large folios and to hand it off to the underlying >>> file system. >>> >>> Again, why don't you just support large folios in overlayfs? >>> >> This whole discussion seems moot. >> overlayfs does not have address_space operations >> It does not have its own page cache. > And here we see my total lack of knowledge of overlayfs coming into play > here :) Thanks for pointing this out. > > In that case, I object even further to the original of course... > >> The file in vma->vm_file is not an overlayfs file at all - it is the >> real (e.g. ext4) file >> when returning from ovl_mmap() => backing_file_mmap() >> so I have very little clue why the proposed solution even works, >> but it certainly does not look correct. > I think then Jinjiang in this cause you ought to go back to the drawing > board and reconsider what might be the underlying issue here. When usespace calls mmap syscall, the call trace is as follows: do_mmap   __get_unmapped_area   mmap_region     mmap_file       ovl_mmap __get_unmapped_area() gets the address to mmap at, the file here is an overlayfs file. Since ovl_file_operations doesn't defines get_unmapped_area callback, __get_unmapped_area() fallbacks to mm_get_unmapped_area_vmflags(), and it doesn't return an address aligned to large folio size. > >> Thanks, >> Amir. > Cheers, Lorenzo >