Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yang Shi <yang.shi@linux.alibaba.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: chrubis@suse.cz, vbabka@suse.cz, kirill@shutemov.name,
	akpm@linux-foundation.org, stable@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
Date: Wed, 20 Mar 2019 11:31:50 -0700	[thread overview]
Message-ID: <3c880e88-6eb7-cd6d-fbf3-394b89355e10@linux.alibaba.com> (raw)
In-Reply-To: <20190320081643.3c4m5tec5vx653sn@d104.suse.de>



On 3/20/19 1:16 AM, Oscar Salvador wrote:
> On Wed, Mar 20, 2019 at 02:35:56AM +0800, Yang Shi wrote:
>> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
>> Reported-by: Cyril Hrubis <chrubis@suse.cz>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: stable@vger.kernel.org
>> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Hi Yang, thanks for the patch.
>
> Some observations below.
>
>>   	}
>>   	page = pmd_page(*pmd);
>> @@ -473,8 +480,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>>   	ret = 1;
>>   	flags = qp->flags;
>>   	/* go to thp migration */
>> -	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
>> +	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
>> +		if (!vma_migratable(walk->vma)) {
>> +			ret = -EIO;
>> +			goto unlock;
>> +		}
>> +
>>   		migrate_page_add(page, qp->pagelist, flags);
>> +	} else
>> +		ret = -EIO;
> 	if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) ||
>         	        !vma_migratable(walk->vma)) {
>                 	ret = -EIO;
>                  goto unlock;
>          }
>
> 	migrate_page_add(page, qp->pagelist, flags);
> unlock:
>          spin_unlock(ptl);
> out:
>          return ret;
>
> seems more clean to me?

Yes, it sounds so.

>
>
>>   unlock:
>>   	spin_unlock(ptl);
>>   out:
>> @@ -499,8 +513,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>>   	ptl = pmd_trans_huge_lock(pmd, vma);
>>   	if (ptl) {
>>   		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
>> -		if (ret)
>> +		if (ret > 0)
>>   			return 0;
>> +		else if (ret < 0)
>> +			return ret;
> I would go with the following, but that's a matter of taste I guess.
>
> if (ret < 0)
> 	return ret;
> else
> 	return 0;

No, this is not correct. queue_pages_pmd() may return 0, which means THP 
gets split. If it returns 0 the code should just fall through instead of 
returning.

>
>>   	}
>>   
>>   	if (pmd_trans_unstable(pmd))
>> @@ -521,11 +537,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>>   			continue;
>>   		if (!queue_pages_required(page, qp))
>>   			continue;
>> -		migrate_page_add(page, qp->pagelist, flags);
>> +		if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
>> +			if (!vma_migratable(vma))
>> +				break;
>> +			migrate_page_add(page, qp->pagelist, flags);
>> +		} else
>> +			break;
> I might be missing something, but AFAICS neither vma nor flags is going to change
> while we are in queue_pages_pte_range(), so, could not we move the check just
> above the loop?
> In that way, 1) we only perform the check once and 2) if we enter the loop
> we know that we are going to do some work, so, something like:
>
> index af171ccb56a2..7c0e44389826 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -487,6 +487,9 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>          if (pmd_trans_unstable(pmd))
>                  return 0;
>   
> +       if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(vma))
> +               return -EIO;

It sounds not correct to me. We need check if there is existing page on 
the node which is not allowed by the policy. This is what 
queue_pages_required() does.

Thanks,
Yang

> +
>          pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
>          for (; addr != end; pte++, addr += PAGE_SIZE) {
>                  if (!pte_present(*pte))
>
>
>>   	}
>>   	pte_unmap_unlock(pte - 1, ptl);
>>   	cond_resched();
>> -	return 0;
>> +	return addr != end ? -EIO : 0;
> If we can do the above, we can leave the return value as it was.
>

next prev parent reply	other threads:[~2019-03-20 18:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-19 18:35 Yang Shi
2019-03-20  0:49 ` David Rientjes
2019-03-20  1:06   ` Yang Shi
2019-03-20  5:53 ` Souptick Joarder
2019-03-20 22:16   ` Andrew Morton
2019-03-20 23:06     ` Yang Shi
2019-03-20  8:16 ` Oscar Salvador
2019-03-20 18:31   ` Yang Shi [this message]
2019-03-20 18:48     ` Oscar Salvador
2019-03-20 15:44 ` Rafael Aquini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c880e88-6eb7-cd6d-fbf3-394b89355e10@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrubis@suse.cz \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox