From: Mike Kravetz <mike.kravetz@oracle.com>
To: Yufen Yu <yuyufen@huawei.com>,
linux-mm@kvack.org, linux-kernel <linux-kernel@vger.kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2] hugetlbfs: fix memory leak for resv_map
Date: Thu, 7 Mar 2019 15:50:55 -0800 [thread overview]
Message-ID: <6aecc2e3-030b-5a3f-2fee-14ee90a47f5a@oracle.com> (raw)
In-Reply-To: <ac030f5b-3d9c-9a71-bd39-1c1f707bc931@oracle.com>
Adding others on Cc to see if they have comments or opinions.
On 3/6/19 3:52 PM, Mike Kravetz wrote:
> On 3/5/19 10:10 PM, Yufen Yu wrote:
>> When .mknod create a block device file in hugetlbfs, it will
>> allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
>> For now, inode->i_mapping->private_data is used to point the resv_map.
>> However, when open the device, bd_acquire() will set i_mapping as
>> bd_inode->imapping, result in resv_map memory leak.
>>
>> We fix it by waiting until a call to hugetlb_reserve_pages() to allocate
>> the inode specific resv_map. We could then remove the resv_map allocation
>> at inode creation time.
>>
>> Programs to reproduce:
>> mount -t hugetlbfs nodev hugetlbfs
>> mknod hugetlbfs/dev b 0 0
>> exec 30<> hugetlbfs/dev
>> umount hugetlbfs/
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>
> Thank you. That is the approach I had in mind.
>
> Unfortunately, this patch causes several regressions in the libhugetlbfs
> test suite. I have not debugged to determine exact cause.
>
> I was unsure about one thing with this approach. We set
> inode->i_mapping->private_data while holding the inode lock, so there
> should be no problem there. However, we access inode_resv_map() in the
> page fault path without the inode lock. The page fault path should get
> NULL or a resv_map. I just wonder if there may be some races where the
> fault path may still be seeing NULL.
>
> I can do more debug, but it will take a couple days as I am busy with
> other things right now.
My apologies. Calling resv_map_alloc() only from hugetlb_reserve_pages()
is not going to work. The reason why is that the reserv_map is used to track
page allocations even if there are not reservations. So, if reservations
are not created other huge page accounting is impacted.
Sorry for suggesting that approach.
As mentioned, I do not like your original approach as it adds an extra word
to every hugetlbfs inode. How about something like the following which
only adds the resv_map to inodes which can have associated page allocations?
I have only done limited regression testing with this.
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Thu, 7 Mar 2019 15:37:31 -0800
Subject: [PATCH] hugetlbfs: only allocate reserve map for inodes that can
allocate pages
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
fs/hugetlbfs/inode.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a7fa037b876b..a3a3d256fb0e 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -741,11 +741,17 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
umode_t mode, dev_t dev)
{
struct inode *inode;
- struct resv_map *resv_map;
+ struct resv_map *resv_map = NULL;
- resv_map = resv_map_alloc();
- if (!resv_map)
- return NULL;
+ /*
+ * Reserve maps are only needed for inodes that can have associated
+ * page allocations.
+ */
+ if (S_ISREG(mode) || S_ISLNK(mode)) {
+ resv_map = resv_map_alloc();
+ if (!resv_map)
+ return NULL;
+ }
inode = new_inode(sb);
if (inode) {
@@ -780,8 +786,10 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
break;
}
lockdep_annotate_inode_mutex_key(inode);
- } else
- kref_put(&resv_map->refs, resv_map_release);
+ } else {
+ if (resv_map)
+ kref_put(&resv_map->refs, resv_map_release);
+ }
return inode;
}
--
2.17.2
prev parent reply other threads:[~2019-03-07 23:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-06 6:10 Yufen Yu
2019-03-06 23:52 ` Mike Kravetz
2019-03-07 23:50 ` Mike Kravetz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6aecc2e3-030b-5a3f-2fee-14ee90a47f5a@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=yuyufen@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox