[patch 2/2] htlb forget rss with pt sharing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch 2/2] htlb forget rss with pt sharing
@ 2006-10-03 10:01 Chen, Kenneth W
  0 siblings, 0 replies; 4+ messages in thread
From: Chen, Kenneth W @ 2006-10-03 10:01 UTC (permalink / raw)
  To: 'Hugh Dickins', 'Andrew Morton',
	'Dave McCracken'
  Cc: linux-mm

Imprecise RSS accounting is an irritating ill effect with pt sharing. 
After consulted with several VM experts, I have tried various methods to
solve that problem: (1) iterate through all mm_structs that share the PT
and increment count; (2) keep RSS count in page table structure and then
sum them up at reporting time.  None of the above methods yield any
satisfactory implementation.

Since process RSS accounting is pure information only, I propose we don't
count them at all for hugetlb page. rlimit has such field, though there is
absolutely no enforcement on limiting that resource.  One other method is
to account all RSS at hugetlb mmap time regardless they are faulted or not.
I opt for the simplicity of no accounting at all.


Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>

--- ./mm/hugetlb.c.orig	2006-10-03 00:16:38.000000000 -0700
+++ ./mm/hugetlb.c	2006-10-03 00:19:45.000000000 -0700
@@ -344,7 +344,6 @@ int copy_hugetlb_page_range(struct mm_st
 			entry = *src_pte;
 			ptepage = pte_page(entry);
 			get_page(ptepage);
-			add_mm_counter(dst, file_rss, HPAGE_SIZE / PAGE_SIZE);
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 		}
 		spin_unlock(&src->page_table_lock);
@@ -372,10 +371,6 @@ void __unmap_hugepage_range(struct vm_ar
 
 	INIT_LIST_HEAD(&page_list);
 	spin_lock(&mm->page_table_lock);
-
-	/* Update high watermark before we lower rss */
-	update_hiwater_rss(mm);
-
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
 		if (!ptep)
@@ -390,9 +385,7 @@ void __unmap_hugepage_range(struct vm_ar
 
 		page = pte_page(pte);
 		list_add(&page->lru, &page_list);
-		add_mm_counter(mm, file_rss, (int) -(HPAGE_SIZE / PAGE_SIZE));
 	}
-
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
@@ -507,7 +500,6 @@ retry:
 	if (!pte_none(*ptep))
 		goto backout;
 
-	add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
 	set_huge_pte_at(mm, address, ptep, new_pte);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [patch 2/2] htlb forget rss with pt sharing
@ 2006-10-19 19:12 Chen, Kenneth W
  2006-10-21 15:58 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Chen, Kenneth W @ 2006-10-19 19:12 UTC (permalink / raw)
  To: 'Hugh Dickins', 'Andrew Morton'; +Cc: linux-mm

Imprecise RSS accounting is an irritating ill effect with pt sharing. 
After consulted with several VM experts, I have tried various methods to
solve that problem: (1) iterate through all mm_structs that share the PT
and increment count; (2) keep RSS count in page table structure and then
sum them up at reporting time.  None of the above methods yield any
satisfactory implementation.

Since process RSS accounting is pure information only, I propose we don't
count them at all for hugetlb page. rlimit has such field, though there is
absolutely no enforcement on limiting that resource.  One other method is
to account all RSS at hugetlb mmap time regardless they are faulted or not.
I opt for the simplicity of no accounting at all.


Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>


--- ./mm/hugetlb.c.orig	2006-10-19 10:01:43.000000000 -0700
+++ ./mm/hugetlb.c	2006-10-19 10:02:15.000000000 -0700
@@ -344,7 +344,6 @@ int copy_hugetlb_page_range(struct mm_st
 			entry = *src_pte;
 			ptepage = pte_page(entry);
 			get_page(ptepage);
-			add_mm_counter(dst, file_rss, HPAGE_SIZE / PAGE_SIZE);
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 		}
 		spin_unlock(&src->page_table_lock);
@@ -372,10 +371,6 @@ void __unmap_hugepage_range(struct vm_ar
 	BUG_ON(end & ~HPAGE_MASK);
 
 	spin_lock(&mm->page_table_lock);
-
-	/* Update high watermark before we lower rss */
-	update_hiwater_rss(mm);
-
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
 		if (!ptep)
@@ -390,9 +385,7 @@ void __unmap_hugepage_range(struct vm_ar
 
 		page = pte_page(pte);
 		list_add(&page->lru, &page_list);
-		add_mm_counter(mm, file_rss, (int) -(HPAGE_SIZE / PAGE_SIZE));
 	}
-
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
@@ -515,7 +508,6 @@ retry:
 	if (!pte_none(*ptep))
 		goto backout;
 
-	add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
 	set_huge_pte_at(mm, address, ptep, new_pte);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch 2/2] htlb forget rss with pt sharing
  2006-10-19 19:12 Chen, Kenneth W
@ 2006-10-21 15:58 ` Peter Zijlstra
  2006-10-22 21:28   ` Chen, Kenneth W
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2006-10-21 15:58 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'Hugh Dickins', 'Andrew Morton', linux-mm, arjan

On Thu, 2006-10-19 at 12:12 -0700, Chen, Kenneth W wrote:
> Imprecise RSS accounting is an irritating ill effect with pt sharing. 
> After consulted with several VM experts, I have tried various methods to
> solve that problem: (1) iterate through all mm_structs that share the PT
> and increment count; (2) keep RSS count in page table structure and then
> sum them up at reporting time.  None of the above methods yield any
> satisfactory implementation.
> 
> Since process RSS accounting is pure information only, I propose we don't
> count them at all for hugetlb page. rlimit has such field, though there is
> absolutely no enforcement on limiting that resource.  One other method is
> to account all RSS at hugetlb mmap time regardless they are faulted or not.
> I opt for the simplicity of no accounting at all.

I do feel I must object to this. Especially with hugetlb getting real
accessible with libhugetlbfs etc., I suspect administrators will shortly
be confused where all their memory went.

Also, like stated earlier, I don't like breaking RSS accounting now, and
when we do have thought up a valid meaning for the field, again. You
state correctly that RLIMIT_RSS is currently not enforced, but its an
active area int that we do want to enforce it in the near future.

I do grant its a very hard problem, comming up with a
valid/meaningfull/workable definition of RSS, but I dislike this opt out
of just not counting it at all - and thereby making the effort of
enforcing RSS harder.

Just my 0.02 eurocent ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [patch 2/2] htlb forget rss with pt sharing
  2006-10-21 15:58 ` Peter Zijlstra
@ 2006-10-22 21:28   ` Chen, Kenneth W
  0 siblings, 0 replies; 4+ messages in thread
From: Chen, Kenneth W @ 2006-10-22 21:28 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Hugh Dickins', 'Andrew Morton', linux-mm, arjan

Peter Zijlstra wrote on Saturday, October 21, 2006 8:59 AM
> On Thu, 2006-10-19 at 12:12 -0700, Chen, Kenneth W wrote:
> > Imprecise RSS accounting is an irritating ill effect with pt sharing. 
> > After consulted with several VM experts, I have tried various methods to
> > solve that problem: (1) iterate through all mm_structs that share the PT
> > and increment count; (2) keep RSS count in page table structure and then
> > sum them up at reporting time.  None of the above methods yield any
> > satisfactory implementation.
> > 
> > Since process RSS accounting is pure information only, I propose we don't
> > count them at all for hugetlb page. rlimit has such field, though there is
> > absolutely no enforcement on limiting that resource.  One other method is
> > to account all RSS at hugetlb mmap time regardless they are faulted or not.
> > I opt for the simplicity of no accounting at all.
> 
> I do feel I must object to this. Especially with hugetlb getting real
> accessible with libhugetlbfs etc., I suspect administrators will shortly
> be confused where all their memory went.

We have /proc/<pid>/smap.  That should have all the information there.  It
reminds me though that smap needs fix on hugetlb area as it prints nothing
for hugetlb vma at the moment.  I will fix that.


> Also, like stated earlier, I don't like breaking RSS accounting now, and
> when we do have thought up a valid meaning for the field, again. You
> state correctly that RLIMIT_RSS is currently not enforced, but its an
> active area int that we do want to enforce it in the near future.
> 
> I do grant its a very hard problem, comming up with a
> valid/meaningfull/workable definition of RSS, but I dislike this opt out
> of just not counting it at all - and thereby making the effort of
> enforcing RSS harder.

Hugetlb page are special, they are reserved up front in global reservation
pool and is not reclaimable.  From physical memory resource point of view,
it is already consumed regardless whether there are users using them.

If the concern is that RSS can be used to control resource allocation, we
already can specify hugetlb fs size limit and sysadmin can enforce that at
mount time.  Combined with the two points mentioned above, I fail to see
if there is anything got affected because of this patch.

- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-10-22 21:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-10-03 10:01 [patch 2/2] htlb forget rss with pt sharing Chen, Kenneth W
2006-10-19 19:12 Chen, Kenneth W
2006-10-21 15:58 ` Peter Zijlstra
2006-10-22 21:28   ` Chen, Kenneth W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox