linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>, Yufen Yu <yuyufen@huawei.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info
Date: Mon, 15 Apr 2019 23:59:44 +0000	[thread overview]
Message-ID: <20190415235946.GA4465@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <f063c3e7-1b37-7592-14c2-78b494dbd825@oracle.com>

On Mon, Apr 15, 2019 at 10:11:39AM -0700, Mike Kravetz wrote:
> On 4/15/19 2:15 AM, Michal Hocko wrote:
> > On Mon 15-04-19 06:16:15, Naoya Horiguchi wrote:
> >> On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
> >>> On 4/11/19 9:02 PM, Yufen Yu wrote:
> >>>> Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> >>> ...
> >>>> However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
> >>>> free or modify i_mapping->private_data that is owned by bdev inode,
> >>>> which is not expected!
> >>> ...
> >>>> We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
> >>>> be more reasonable.
> >>>
> >>> Your patches force me to consider these potential issues.  Thank you!
> >>>
> >>> The root of all these problems (including the original leak) is that the
> >>> open of a block special inode will result in bd_acquire() overwriting the
> >>> value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
> >>> resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
> >>> not free the initially allocated resv_map.  In addition, when the
> >>> inode is evicted/destroyed inode->i_mapping may point to an address space
> >>> not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
> >>> points to hugetlbfs inode address space at evict time, there may be bad
> >>> data references or worse.
> >>
> >> Let me ask a kind of elementary question: is there any good reason/purpose
> >> to create and use block special files on hugetlbfs?  I never heard about
> >> such usecases.
> 
> I am not aware of this as a common use case.  Yufen Yu may be able to provide
> more details about how the issue was discovered.  My guess is that it was
> discovered via code inspection.
> 
> >>                 I guess that the conflict of the usage of ->i_mapping is
> >> discovered recently and that's because block special files on hugetlbfs are
> >> just not considered until recently or well defined.  So I think that we might
> >> be better to begin with defining it first.
> 
> Unless I am mistaken, this is just like creating a device special file
> in any other filesystem.  Correct?  hugetlbfs is just some place for the
> inode/file to reside.  What happens when you open/ioctl/close/etc the file
> is really dependent on the vfs layer and underlying driver.
> 

OK. Generally speaking, "special files just work even on hugetlbfs" sounds
fine for me if it properly works.

> > A absolutely agree. Hugetlbfs is overly complicated even without that.
> > So if this is merely "we have tried it and it has blown up" kinda thing
> > then just refuse the create blockdev files or document it as undefined.
> > You need a root to do so anyway.
> 
> Can we just refuse to create device special files in hugetlbfs?  Do we need
> to worry about breaking any potential users?  I honestly do not know if anyone
> does this today.  However, if they did I believe things would "just work".
> The only known issue is leaking a resv_map structure when the inode is
> destroyed.  I doubt anyone would notice that leak today.

Thanks for explanation, so that's unclear now. 

> 
> Let me do a little more research.  I think this can all be cleaned up by
> making hugetlbfs always operate on the address space embedded in the inode.
> If nothing else, a change or explanation should be added as to why most code
> operates on inode->mapping and one place operates on &inode->i_data.

Sounds nice, thank you.

(Just for sharing point, not intending to block the fix ...)
My remaining concern is that this problem might not be hugetlbfs specific,
because what triggers the issue seems to be the usage of inode->i_mapping.
bd_acquire() are callable from any filesystem, so I'm wondering whether we
have something to generally prevent this kind of issue?

Thanks,
Naoya Horiguchi

  reply	other threads:[~2019-04-16  0:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12  4:02 Yufen Yu
2019-04-12 23:40 ` Mike Kravetz
2019-04-13 11:57   ` yuyufen
2019-04-15  6:16   ` Naoya Horiguchi
2019-04-15  9:15     ` Michal Hocko
2019-04-15 17:11       ` Mike Kravetz
2019-04-15 23:59         ` Naoya Horiguchi [this message]
2019-04-16  0:37           ` Mike Kravetz
2019-04-16  6:50         ` Michal Hocko
2019-04-19 20:44           ` [PATCH] hugetlbfs: always use address space in inode for resv_map pointer Mike Kravetz
2019-05-08  7:10             ` yuyufen
2019-05-08 20:16               ` Mike Kravetz
2019-05-09 23:11                 ` Andrew Morton
2019-05-09 23:32                   ` Mike Kravetz
2019-04-16 12:57         ` [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info yuyufen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190415235946.GA4465@hori.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=yuyufen@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox