Re: [PATCH 2/3] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mike Kravetz <mike.kravetz@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Davidlohr Bueso <dave@stgolabs.net>
Subject: Re: [PATCH 2/3] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch
Date: Mon, 19 Oct 2015 18:41:36 -0700	[thread overview]
Message-ID: <56259BD0.7060307@oracle.com> (raw)
In-Reply-To: <20151019161642.68e787103cacb613d372b5c4@linux-foundation.org>

On 10/19/2015 04:16 PM, Andrew Morton wrote:
> On Fri, 16 Oct 2015 15:08:29 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> When performing a fallocate hole punch, set up a hugetlb_falloc struct
>> and make i_private point to it.  i_private will point to this struct for
>> the duration of the operation.  At the end of the operation, wake up
>> anyone who faulted on the hole and is on the waitq.
>>
>> ...
>>
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -507,7 +507,9 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>  {
>>  	struct hstate *h = hstate_inode(inode);
>>  	loff_t hpage_size = huge_page_size(h);
>> +	unsigned long hpage_shift = huge_page_shift(h);
>>  	loff_t hole_start, hole_end;
>> +	struct hugetlb_falloc hugetlb_falloc;
>>  
>>  	/*
>>  	 * For hole punch round up the beginning offset of the hole and
>> @@ -518,8 +520,23 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>  
>>  	if (hole_end > hole_start) {
>>  		struct address_space *mapping = inode->i_mapping;
>> +		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(hugetlb_falloc_waitq);
>> +
>> +		/*
>> +		 * Page faults on the area to be hole punched must be
>> +		 * stopped during the operation.  Initialize struct and
>> +		 * have inode->i_private point to it.
>> +		 */
>> +		hugetlb_falloc.waitq = &hugetlb_falloc_waitq;
>> +		hugetlb_falloc.start = hole_start >> hpage_shift;
>> +		hugetlb_falloc.end = hole_end >> hpage_shift;
> 
> This is a bit neater:
> 
> --- a/fs/hugetlbfs/inode.c~mm-hugetlb-setup-hugetlb_falloc-during-fallocate-hole-punch-fix
> +++ a/fs/hugetlbfs/inode.c
> @@ -509,7 +509,6 @@ static long hugetlbfs_punch_hole(struct
>  	loff_t hpage_size = huge_page_size(h);
>  	unsigned long hpage_shift = huge_page_shift(h);
>  	loff_t hole_start, hole_end;
> -	struct hugetlb_falloc hugetlb_falloc;
>  
>  	/*
>  	 * For hole punch round up the beginning offset of the hole and
> @@ -521,15 +520,16 @@ static long hugetlbfs_punch_hole(struct
>  	if (hole_end > hole_start) {
>  		struct address_space *mapping = inode->i_mapping;
>  		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(hugetlb_falloc_waitq);
> -
>  		/*
> -		 * Page faults on the area to be hole punched must be
> -		 * stopped during the operation.  Initialize struct and
> -		 * have inode->i_private point to it.
> +		 * Page faults on the area to be hole punched must be stopped
> +		 * during the operation.  Initialize struct and have
> +		 * inode->i_private point to it.
>  		 */
> -		hugetlb_falloc.waitq = &hugetlb_falloc_waitq;
> -		hugetlb_falloc.start = hole_start >> hpage_shift;
> -		hugetlb_falloc.end = hole_end >> hpage_shift;
> +		struct hugetlb_falloc hugetlb_falloc = {
> +			.waitq = &hugetlb_falloc_waitq,
> +			.start = hole_start >> hpage_shift,
> +			.end = hole_end >> hpage_shift
> +		};
>  
>  		mutex_lock(&inode->i_mutex);
>  
> 

Thanks!

>>  		mutex_lock(&inode->i_mutex);
>> +
>> +		spin_lock(&inode->i_lock);
>> +		inode->i_private = &hugetlb_falloc;
>> +		spin_unlock(&inode->i_lock);
> 
> Locking around a single atomic assignment is a bit peculiar.  I can
> kinda see that it kinda protects the logic in hugetlb_fault(), but I
> would like to hear (in comment form) your description of how this logic
> works?

To be honest, this code/scheme was copied from shmem as it addresses
the same situation there.  I did not notice how strange this looks until
you pointed it out.  At first glance, the locking does appear to be
unnecessary.  The fault code initially checks this value outside the
lock.  However, the fault code (on another CPU) will take the lock
and access values within the structure.  Without the locking or some other
type of memory barrier here, there is no guarantee that the structure
will be initialized before setting i_private.  So, the faulting code
could see invalid values in the structure.

Hugh, is that accurate?  You provided the shmem code.

-- 
Mike Kravetz

>>  		i_mmap_lock_write(mapping);
>>  		if (!RB_EMPTY_ROOT(&mapping->i_mmap))
>>  			hugetlb_vmdelete_list(&mapping->i_mmap,
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-10-20  1:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-16 22:08 [PATCH 0/3] hugetlbfs fallocate hole punch race with page faults Mike Kravetz
2015-10-16 22:08 ` [PATCH 1/3] mm/hugetlb: Define hugetlb_falloc structure for hole punch race Mike Kravetz
2015-10-16 22:08 ` [PATCH 2/3] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch Mike Kravetz
2015-10-19 23:16   ` Andrew Morton
2015-10-20  1:41     ` Mike Kravetz [this message]
2015-10-20  2:22       ` Hugh Dickins
2015-10-20  3:12         ` Mike Kravetz
2015-10-16 22:08 ` [PATCH 3/3] mm/hugetlb: page faults check for fallocate hole punch in progress and wait Mike Kravetz
2015-10-19 23:18 ` [PATCH 0/3] hugetlbfs fallocate hole punch race with page faults Andrew Morton
2015-10-20  1:54   ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56259BD0.7060307@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox