From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 215F2E7716E for ; Fri, 6 Dec 2024 03:35:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4BF616B0172; Thu, 5 Dec 2024 22:35:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 46FDD6B0173; Thu, 5 Dec 2024 22:35:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 336C46B017C; Thu, 5 Dec 2024 22:35:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 15A2B6B0172 for ; Thu, 5 Dec 2024 22:35:18 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7D7FE42DFA for ; Fri, 6 Dec 2024 03:35:17 +0000 (UTC) X-FDA: 82863118212.12.D75F537 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf06.hostedemail.com (Postfix) with ESMTP id BB84D180003 for ; Fri, 6 Dec 2024 03:35:02 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733456097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zbzMRgR42t+mCly3eImKj45vazJT+yITSTQm9YO+9pk=; b=F1WjdRCDMTiOzuM1tGYX8nxwLKfBz7ZnnSUBf/qomRicvelLk1DUY04vC3i3OtRdNerXjZ 4J14AxRDkw5W5dbaf4IdyAs1cpIBf3Zxlz39BkxlkQlmj71jN8oWmo6oR8Poq/tE0p1+oX F0060H9irtB2p5SopqWbfXrCXrLBfyk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733456097; a=rsa-sha256; cv=none; b=QkUJgSSaixKd13yVAe6SLwgUIsDzHYZ9vpBhBXT2Jler6U8FqbJgxajnRD2/ThvKw1AkYI WaABk7UplxnlaCyaRrszQgXPEWOz+N1YhqK+u4fpGm6R9eCbDhFmDcfgaBuMNUhg3jPhTt paFd6cQw815WHIvAkRfbYtY344za2xE= Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4Y4H2j5dBgz1yrnm; Fri, 6 Dec 2024 11:35:25 +0800 (CST) Received: from dggpeml500011.china.huawei.com (unknown [7.185.36.84]) by mail.maildlp.com (Postfix) with ESMTPS id CCEEA14011B; Fri, 6 Dec 2024 11:35:09 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by dggpeml500011.china.huawei.com (7.185.36.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 6 Dec 2024 11:35:09 +0800 Message-ID: <041dcc1a-0630-27b9-661b-8c64a3775426@huawei.com> Date: Fri, 6 Dec 2024 11:35:08 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH -next] ovl: respect underlying filesystem's get_unmapped_area() To: Lorenzo Stoakes CC: , , , , , , , , , , , Matthew Wilcox References: <20241205143038.3260233-1-tujinjiang@huawei.com> <69b72e3d-b101-4641-9ce5-51346c93a98d@lucifer.local> From: Jinjiang Tu In-Reply-To: <69b72e3d-b101-4641-9ce5-51346c93a98d@lucifer.local> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpeml500011.china.huawei.com (7.185.36.84) X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BB84D180003 X-Rspam-User: X-Stat-Signature: 53t1zhb394x9mwar7wzp7wfrmmtcrb4w X-HE-Tag: 1733456102-851740 X-HE-Meta: U2FsdGVkX1+la6Sb7IrUQBps3LM5h1QbDZn0SRKdgWyIuWRJHGFnmHXVDqrVS9wWSosH1qHE7Qs9Dt7cOWZlQgRQ6a+y50Ee5Yz7xv0AsRnvvwdZTa3IpOw7bddAVLLyY0OV9+PCADsu74oNjRq5WfwezpegLPYyQ1zlW/ZAUqtj1OPAUnzhqzPLN3Zm+lbESiEuoGmJcAXrEiock4ARSjc0qYpZemvviOJ7yilBGaYBberYgajw8UyGltUlRRm09+Ra+9t7BlWmwbYOI67TpMFwQtfTlq/JOzQk184yiWvpp+rjwUULuKR41JJxKDqSBUGUJNxdGWOXhbnjuvewXpXE7fl4B5r1lvC0FN9YmlpDfhI3WWHgmgsGcTpTWPSDEEhPg/lbGTs/6PDONcc1rRdhaYHd9cdl9cOJZ8qX8IR81Jkz98RG00NimlX1xB66SXyRnGgo2pQjsNdrtOmD7ScR5y5knhlVNNl9zA4uhMN9z18Q8whQ2Q+ccMZpUVkBW5fi9/KloclEUBnErICTjDS1jZXFhQ6jbQ9EA6WtFd/spcs3AcRm20zg+pXuVLnXoQV3uniK/sKzVDPfmE2q6Kag9zYP7F6nFEgL/O7j8hIskx6JsROim6TN3ALHUhmm8T2R+MDhyk58ViztZqGISjtobcmGV61MwGbichd+YJDHGwsai9UjPnJYqqau2SzhkO1zP2PRvTacBsKSJ9yXP9fcpQQMPif7TVmDpsxm6/9cyUJWZn/Lt0P+7pJcPAwbsQDFp1tju5qvUml7vVQfNFVwKmcfrZ6TRRh8jz0xyv1Ld0hdvMHT5ZXzGaCAf7/dRMpFZcf3WztrhoJMdaTBmNtInNABZ4c6c7r1K9rKQ/OtmzJh8yOfWW68mCQC3xLUfHbaSeQ3EMTfu02GC5in8vKtdVJ17ZFU7y5eWWbPmCzEvJ6jHrYweL5WxXPQn3h4Ld99nEh9yTz0tlqSr/v 7K3SXV6l fHZNYCUDmvE6NlCNLy3UCTzv03saLTRFe1y1cY6edT9gYyCZ6LVCOvj8chdqNoau4hrhgG2bWa0X8JUBEefJiCArJrkRAo7/6nkXxfHzMDp0JzS3IUf66vqc1aPDLC/OyNs/SV3OMt9q8LsNuTW5KLJrolQZ7jJ7g68eek7gKSUFn0+Am+PsvvDh4tXHc/0dFgjVGMbwq4jiTzzj8EV/m9VRGwmvB2WNzEIHr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/5 23:04, Lorenzo Stoakes 写道: > + Matthew for large folio aspect > > On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote: >> During our tests in containers, there is a read-only file (i.e., shared >> libraies) in the overlayfs filesystem, and the underlying filesystem is >> ext4, which supports large folio. We mmap the file with PROT_READ prot, >> and then call madvise(MADV_COLLAPSE) for it. However, the madvise call >> fails and returns EINVAL. >> >> The reason is that the mapping address isn't aligned to PMD size. Since >> overlayfs doesn't support large folio, __get_unmapped_area() doesn't call >> thp_get_unmapped_area() to get a THP aligned address. >> >> To fix it, call get_unmapped_area() with the realfile. > Isn't the correct solution to get overlayfs to support large folios? > >> Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should >> export get_unmapped_area(). > Yeah, not in favour of this at all. This is an internal implementation > detail. It seems like you're trying to hack your way into avoiding > providing support for large folios and to hand it off to the underlying > file system. > > Again, why don't you just support large folios in overlayfs? > > Literally no other file system or driver appears to make use of this > directly in this manner. > > And there's absolutely no way this should be exported non-GPL as if it were > unavoidable core functionality that everyone needs. Only you seem to... > >> Signed-off-by: Jinjiang Tu >> --- >> fs/overlayfs/file.c | 20 ++++++++++++++++++++ >> mm/mmap.c | 1 + >> 2 files changed, 21 insertions(+) >> >> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c >> index 969b458100fe..d0dcf675ebe8 100644 >> --- a/fs/overlayfs/file.c >> +++ b/fs/overlayfs/file.c >> @@ -653,6 +653,25 @@ static int ovl_flush(struct file *file, fl_owner_t id) >> return err; >> } >> >> +static unsigned long ovl_get_unmapped_area(struct file *file, >> + unsigned long addr, unsigned long len, unsigned long pgoff, >> + unsigned long flags) >> +{ >> + struct file *realfile; >> + const struct cred *old_cred; >> + unsigned long ret; >> + >> + realfile = ovl_real_file(file); >> + if (IS_ERR(realfile)) >> + return PTR_ERR(realfile); >> + >> + old_cred = ovl_override_creds(file_inode(file)->i_sb); >> + ret = get_unmapped_area(realfile, addr, len, pgoff, flags); >> + ovl_revert_creds(old_cred); > Why are you overriding credentials, then reinstating them here? That > seems... iffy? I knew nothing about overlayfs so this may just be a > misunderstanding... I refer to other file operations in overlayfs (i.e., ovl_fallocate, backing_file_mmap). Since get_unmapped_area() has security related operations (e.g., security_mmap_addr()), We should call it with the cred of the underlying file. > >> + >> + return ret; >> +} >> + >> const struct file_operations ovl_file_operations = { >> .open = ovl_open, >> .release = ovl_release, >> @@ -661,6 +680,7 @@ const struct file_operations ovl_file_operations = { >> .write_iter = ovl_write_iter, >> .fsync = ovl_fsync, >> .mmap = ovl_mmap, >> + .get_unmapped_area = ovl_get_unmapped_area, >> .fallocate = ovl_fallocate, >> .fadvise = ovl_fadvise, >> .flush = ovl_flush, >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 16f8e8be01f8..60eb1ff7c9a8 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -913,6 +913,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, >> error = security_mmap_addr(addr); >> return error ? error : addr; >> } >> +EXPORT_SYMBOL(__get_unmapped_area); > We'll need a VERY good reason to export this internal implementation > detail, and if that were provided we'd need a VERY good reason for it not > to be GPL. > > This just seems to be a cheap way of invoking (), > maybe, if it is being used by the underlying file system. But the underlying file system may not support large folio. In this case, the mmap address doesn't need to be aligned with THP size. > > And again... why not just add large folio support? We can't just take a > hack here. > >> unsigned long >> mm_get_unmapped_area(struct mm_struct *mm, struct file *file, >> -- >> 2.34.1 >>