From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org
Cc: 'Dave Hansen' <dave.hansen@linux.intel.com>,
'Naoya Horiguchi' <n-horiguchi@ah.jp.nec.com>,
'David Rientjes' <rientjes@google.com>,
'Hugh Dickins' <hughd@google.com>,
'Davidlohr Bueso' <dave@stgolabs.net>,
'Aneesh Kumar' <aneesh.kumar@linux.vnet.ibm.com>,
'Christoph Hellwig' <hch@infradead.org>
Subject: Re: [PATCH 02/10] mm/hugetlb: add region_del() to delete a specific range of entries
Date: Fri, 03 Jul 2015 10:22:31 -0700 [thread overview]
Message-ID: <5596C4D7.6010605@oracle.com> (raw)
In-Reply-To: <011301d0b560$c0ffc5a0$42ff50e0$@alibaba-inc.com>
On 07/03/2015 12:20 AM, Hillf Danton wrote:
>> fallocate hole punch will want to remove a specific range of pages.
>> The existing region_truncate() routine deletes all region/reserve
>> map entries after a specified offset. region_del() will provide
>> this same functionality if the end of region is specified as -1.
>> Hence, region_del() can replace region_truncate().
>>
>> Unlike region_truncate(), region_del() can return an error in the
>> rare case where it can not allocate memory for a region descriptor.
>> This ONLY happens in the case where an existing region must be split.
>> Current callers passing -1 as end of range will never experience
>> this error and do not need to deal with error handling. Future
>> callers of region_del() (such as fallocate hole punch) will need to
>> handle this error.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> mm/hugetlb.c | 101 ++++++++++++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 75 insertions(+), 26 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 189c0d7..e8c7f68 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -460,43 +460,92 @@ static void region_abort(struct resv_map *resv, long f, long t)
>> }
>>
>> /*
>> - * Truncate the reserve map at index 'end'. Modify/truncate any
>> - * region which contains end. Delete any regions past end.
>> - * Return the number of huge pages removed from the map.
>> + * Delete the specified range [f, t) from the reserve map. If the
>> + * t parameter is -1, this indicates that ALL regions after f should
>> + * be deleted. Locate the regions which intersect [f, t) and either
>> + * trim, delete or split the existing regions.
>> + *
>> + * Returns the number of huge pages deleted from the reserve map.
>> + * In the normal case, the return value is zero or more. In the
>> + * case where a region must be split, a new region descriptor must
>> + * be allocated. If the allocation fails, -ENOMEM will be returned.
>> + * NOTE: If the parameter t == -1, then we will never split a region
>> + * and possibly return -ENOMEM. Callers specifying t == -1 do not
>> + * need to check for -ENOMEM error.
>> */
>> -static long region_truncate(struct resv_map *resv, long end)
>> +static long region_del(struct resv_map *resv, long f, long t)
>> {
>> struct list_head *head = &resv->regions;
>> struct file_region *rg, *trg;
>> - long chg = 0;
>> + struct file_region *nrg = NULL;
>> + long del = 0;
>>
>> + if (t == -1)
>> + t = LONG_MAX;
>
> Why not use
See below
>> +retry:
>> spin_lock(&resv->lock);
>> - /* Locate the region we are either in or before. */
>> - list_for_each_entry(rg, head, link)
>> - if (end <= rg->to)
>> + list_for_each_entry_safe(rg, trg, head, link) {
>> + if (rg->to <= f)
>> + continue;
>> + if (rg->from >= t)
>> break;
>> - if (&rg->link == head)
>> - goto out;
>>
>> - /* If we are in the middle of a region then adjust it. */
>> - if (end > rg->from) {
>> - chg = rg->to - end;
>> - rg->to = end;
>> - rg = list_entry(rg->link.next, typeof(*rg), link);
>> - }
>> + if (f > rg->from && t < rg->to) { /* Must split region */
>> + /*
>> + * Check for an entry in the cache before dropping
>> + * lock and attempting allocation.
>> + */
>> + if (!nrg &&
>> + resv->rgn_cache_count > resv->adds_in_progress) {
>> + nrg = list_first_entry(&resv->rgn_cache,
>> + struct file_region,
>> + link);
>> + list_del(&nrg->link);
>> + resv->rgn_cache_count--;
>> + }
>>
>> - /* Drop any remaining regions. */
>> - list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
>> - if (&rg->link == head)
>> + if (!nrg) {
>> + spin_unlock(&resv->lock);
>> + nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
>> + if (!nrg)
>> + return -ENOMEM;
>> + goto retry;
>> + }
>> +
>> + del += t - f;
>> +
>> + /* New entry for end of split region */
>> + nrg->from = t;
>> + nrg->to = rg->to;
>> + INIT_LIST_HEAD(&nrg->link);
>> +
>> + /* Original entry is trimmed */
>> + rg->to = f;
>> +
>> + list_add(&nrg->link, &rg->link);
>> + nrg = NULL;
>> break;
>> - chg += rg->to - rg->from;
>> - list_del(&rg->link);
>> - kfree(rg);
>> + }
>> +
>> + if (f <= rg->from && t >= rg->to) { /* Remove entire region */
>> + del += rg->to - rg->from;
>> + list_del(&rg->link);
>> + kfree(rg);
>> + continue;
>> + }
>> +
>> + if (f <= rg->from) { /* Trim beginning of region */
>> + del += t - rg->from;
>> + rg->from = t;
>> + } else { /* Trim end of region */
>> + del += rg->to - f;
>> + rg->to = f;
>> + }
>> }
>>
>> -out:
>> spin_unlock(&resv->lock);
>> - return chg;
>> + kfree(nrg);
>> + return del;
>> }
>>
>> /*
>> @@ -647,7 +696,7 @@ void resv_map_release(struct kref *ref)
>> struct file_region *rg, *trg;
>>
>> /* Clear out any active regions before we release the map. */
>> - region_truncate(resv_map, 0);
>> + region_del(resv_map, 0, -1);
>
> LONG_MAX is not selected, why?
I think your question is "Why not use LONG_MAX to indicate all
remaining regions in the map should be deleted (a truncate
operation)?".
The two values passed to the routine are essentially huge page
offsets. My first thought when creating the routine is that a
negative value (such as -1) is an invalid offset and a good
choice to indicate a "special" case operation. LONG_MAX is also
an invalid huge page offset, so it could be used for the same
purpose and avoid the "if (t == -1) t = LONG_MAX" at the
beginning of the routine. I have no objections in changing this
to LONG_MAX is if makes the code simpler/easier to understand.
--
Mike Kravetz
>>
>> /* ... and any entries left in the cache */
>> list_for_each_entry_safe(rg, trg, head, link)
>> @@ -3868,7 +3917,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
>> long gbl_reserve;
>>
>> if (resv_map)
>> - chg = region_truncate(resv_map, offset);
>> + chg = region_del(resv_map, offset, -1);
>> spin_lock(&inode->i_lock);
>> inode->i_blocks -= (blocks_per_huge_page(h) * freed);
>> spin_unlock(&inode->i_lock);
>> --
>> 2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-07-03 17:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-02 15:38 [PATCH 00/10] hugetlbfs: add fallocate support Mike Kravetz
2015-07-02 15:38 ` [PATCH 01/10] mm/hugetlb: add cache of descriptors to resv_map for region_add Mike Kravetz
2015-07-03 4:21 ` Hillf Danton
2015-07-03 17:12 ` Mike Kravetz
2015-07-02 15:38 ` [PATCH 02/10] mm/hugetlb: add region_del() to delete a specific range of entries Mike Kravetz
2015-07-03 7:20 ` Hillf Danton
2015-07-03 17:22 ` Mike Kravetz [this message]
2015-07-02 15:38 ` [PATCH 03/10] mm/hugetlb: expose hugetlb fault mutex for use by fallocate Mike Kravetz
2015-07-02 15:38 ` [PATCH 04/10] hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete Mike Kravetz
2015-07-02 15:38 ` [PATCH 05/10] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
2015-07-02 15:38 ` [PATCH 06/10] mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch Mike Kravetz
2015-07-02 15:38 ` [PATCH 07/10] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate Mike Kravetz
2015-07-02 15:38 ` [PATCH 08/10] hugetlbfs: New huge_add_to_page_cache helper routine Mike Kravetz
2015-07-02 15:38 ` [PATCH 09/10] hugetlbfs: add hugetlbfs_fallocate() Mike Kravetz
2015-07-02 15:38 ` [PATCH 10/10] mm: madvise allow remove operation for hugetlbfs Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5596C4D7.6010605@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=hch@infradead.org \
--cc=hillf.zj@alibaba-inc.com \
--cc=hughd@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox