Re: [patch][rfc] 5/5: core remove PageReserved

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: William Lee Irwin III <wli@holomorphy.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Linux Memory Management <linux-mm@kvack.org>,
	Hugh Dickins <hugh@veritas.com>,
	Badari Pulavarty <pbadari@us.ibm.com>
Subject: Re: [patch][rfc] 5/5: core remove PageReserved
Date: Thu, 23 Jun 2005 02:51:53 -0700	[thread overview]
Message-ID: <20050623095153.GB3334@holomorphy.com> (raw)
In-Reply-To: <42BA5FE8.2060207@yahoo.com.au>

On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> @@ -337,7 +338,7 @@ static inline void get_page(struct page 
>  static inline void put_page(struct page *page)
>  {
> -	if (!PageReserved(page) && put_page_testzero(page))
> +	if (put_page_testzero(page))
>  		__page_cache_release(page);
>  }

No sweep before this? I'm not so sure.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> @@ -514,7 +534,8 @@ int copy_page_range(struct mm_struct *ds
>  	return 0;
>  }
>  
> -static void zap_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> +static void zap_pte_range(struct mmu_gather *tlb,
> +				struct vm_area_struct *vma, pmd_t *pmd,
>  				unsigned long addr, unsigned long end,
>  				struct zap_details *details)
>  {

As exciting as this is, !!(vma->vm_flags & VM_RESERVED) could trivially
go into struct zap_details without excess args or diff.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> @@ -1225,6 +1251,8 @@ static int do_wp_page(struct mm_struct *
>  	unsigned long pfn = pte_pfn(pte);
>  	pte_t entry;
>  
> +	BUG_ON(vma->vm_flags & VM_RESERVED);
> +
>  	if (unlikely(!pfn_valid(pfn))) {
>  		/*
>  		 * This should really halt the system so it can be debugged or

!pfn_valid(pfn) is banned when !(vma->vm_flags & VM_RESERVED); here we
BUG_ON the precondition then handle !pfn_valid(pfn) in the old manner
where some new infrastructure has been erected for reporting such errors.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> @@ -1259,13 +1286,16 @@ static int do_wp_page(struct mm_struct *
>  	/*
>  	 * Ok, we need to copy. Oh, well..
>  	 */
> -	if (!PageReserved(old_page))
> +	if (old_page == ZERO_PAGE(address))
> +		old_page = NULL;
> +	else
>  		page_cache_get(old_page);
> +
>  	spin_unlock(&mm->page_table_lock);
>  
>  	if (unlikely(anon_vma_prepare(vma)))
>  		goto no_new_page;
> -	if (old_page == ZERO_PAGE(address)) {
> +	if (old_page == NULL) {
>  		new_page = alloc_zeroed_user_highpage(vma, address);
>  		if (!new_page)
>  			goto no_new_page;

There are some micro-optimizations mixed in with this and some
subsequent do_wp_page() alterations.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> @@ -1654,7 +1656,7 @@ void __init memmap_init_zone(unsigned lo
>  
>  	for (page = start; page < (start + size); page++) {
>  		set_page_zone(page, NODEZONE(nid, zone));
> -		set_page_count(page, 0);
> +		set_page_count(page, 1);
>  		reset_page_mapcount(page);
>  		SetPageReserved(page);
>  		INIT_LIST_HEAD(&page->lru);

The refcounting and PG_reserved activity in memmap_init_zone() is
superfluous. bootmem.c does all the necessary accounting internally.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> Index: linux-2.6/mm/fremap.c
> ===================================================================
> --- linux-2.6.orig/mm/fremap.c
> +++ linux-2.6/mm/fremap.c
> @@ -29,18 +29,21 @@ static inline void zap_pte(struct mm_str
>  		return;
>  	if (pte_present(pte)) {
>  		unsigned long pfn = pte_pfn(pte);
> +		struct page *page;
>  
>  		flush_cache_page(vma, addr, pfn);
>  		pte = ptep_clear_flush(vma, addr, ptep);
> -		if (pfn_valid(pfn)) {
> -			struct page *page = pfn_to_page(pfn);
> -			if (!PageReserved(page)) {
> -				if (pte_dirty(pte))
> -					set_page_dirty(page);
> -				page_remove_rmap(page);
> -				page_cache_release(page);
> -				dec_mm_counter(mm, rss);
> -			}
> +		if (unlikely(!pfn_valid(pfn))) {
> +			print_invalid_pfn("zap_pte", pte, vma->vm_flags, addr);
> +			return;
> +		}
> +		page = pfn_to_page(pfn);
> +		if (page != ZERO_PAGE(addr)) {
> +			if (pte_dirty(pte))
> +				set_page_dirty(page);
> +			page_remove_rmap(page);
> +			dec_mm_counter(mm, rss);
> +			page_cache_release(page);
>  		}
>  	} else {
>  		if (!pte_file(pte))

There is no error returned here to be handled by the caller.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> @@ -65,6 +68,8 @@ int install_page(struct mm_struct *mm, s
>  	pgd_t *pgd;
>  	pte_t pte_val;
>  
> +	BUG_ON(vma->vm_flags & VM_RESERVED);
> +
>  	pgd = pgd_offset(mm, addr);
>  	spin_lock(&mm->page_table_lock);
>  	
> @@ -122,6 +127,8 @@ int install_file_pte(struct mm_struct *m
>  	pgd_t *pgd;
>  	pte_t pte_val;
>  
> +	BUG_ON(vma->vm_flags & VM_RESERVED);
> +
>  	pgd = pgd_offset(mm, addr);
>  	spin_lock(&mm->page_table_lock);

This has no effect but to artificially constrain the interface. There
is no a priori reason to avoid the use of install_page() to deposit
mappings to normal pages in VM_RESERVED vmas. It's only the reverse
being "banned" here. Similar comments also apply to zap_pte().


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> Index: linux-2.6/drivers/scsi/sg.c
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/sg.c
> +++ linux-2.6/drivers/scsi/sg.c
> @@ -1887,9 +1887,10 @@ st_unmap_user_pages(struct scatterlist *
>  	int i;
>  
>  	for (i=0; i < nr_pages; i++) {
> -		if (dirtied && !PageReserved(sgl[i].page))
> +		if (dirtied)
>  			SetPageDirty(sgl[i].page);
>  		/* unlock_page(sgl[i].page); */
> +		/* FIXME: XXX don't dirty/unmap VM_RESERVED regions? */
>  		/* FIXME: cache flush missing for rw==READ
>  		 * FIXME: call the correct reference counting function
>  		 */

An answer should be devised for this. My numerous SCSI CD-ROM devices
(I have 5 across several different machines of several different arches)
are rather unlikely to be happy with /* FIXME: XXX ... as an answer.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> Index: linux-2.6/drivers/scsi/st.c
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/st.c
> +++ linux-2.6/drivers/scsi/st.c
> @@ -4435,8 +4435,9 @@ static int sgl_unmap_user_pages(struct s
>  	int i;
>  
>  	for (i=0; i < nr_pages; i++) {
> -		if (dirtied && !PageReserved(sgl[i].page))
> +		if (dirtied)
>  			SetPageDirty(sgl[i].page);
> +		/* FIXME: XXX don't dirty/unmap VM_RESERVED regions? */
>  		/* FIXME: cache flush missing for rw==READ
>  		 * FIXME: call the correct reference counting function
>  		 */

Mutatis mutandis for my SCSI tape drive.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> Index: linux-2.6/sound/core/pcm_native.c
> ===================================================================
> --- linux-2.6.orig/sound/core/pcm_native.c
> +++ linux-2.6/sound/core/pcm_native.c
> @@ -2942,8 +2942,7 @@ static struct page * snd_pcm_mmap_status
>  		return NOPAGE_OOM;
>  	runtime = substream->runtime;
>  	page = virt_to_page(runtime->status);
> -	if (!PageReserved(page))
> -		get_page(page);
> +	get_page(page);
>  	if (type)
>  		*type = VM_FAULT_MINOR;
>  	return page;

snd_malloc_pages() marks all pages it allocates PG_reserved. This
merits some commentary, and likely the removal of the superfluous
PG_reserved usage.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> Index: linux-2.6/mm/mmap.c
> ===================================================================
> --- linux-2.6.orig/mm/mmap.c
> +++ linux-2.6/mm/mmap.c
> @@ -1073,6 +1073,17 @@ munmap_back:
>  		error = file->f_op->mmap(file, vma);
>  		if (error)
>  			goto unmap_and_free_vma;
> +		if ((vma->vm_flags & (VM_SHARED | VM_WRITE | VM_RESERVED))
> +						== (VM_WRITE | VM_RESERVED)) {
> +			printk(KERN_WARNING "program %s is using MAP_PRIVATE, "
> +				"PROT_WRITE mmap of VM_RESERVED memory, which "
> +				"is deprecated. Please report this to "
> +				"linux-kernel@vger.kernel.org\n",current->comm);
> +			if (vma->vm_ops && vma->vm_ops->close)
> +				vma->vm_ops->close(vma);
> +			error = -EACCES;
> +			goto unmap_and_free_vma;
> +		}
>  	} else if (vm_flags & VM_SHARED) {
>  		error = shmem_zero_setup(vma);
>  		if (error)

This is user-triggerable where the driver honors mmap() protection
flags blindly.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> Index: linux-2.6/mm/bootmem.c
> ===================================================================
> --- linux-2.6.orig/mm/bootmem.c
> +++ linux-2.6/mm/bootmem.c
> @@ -286,6 +286,7 @@ static unsigned long __init free_all_boo
>  				if (j + 16 < BITS_PER_LONG)
>  					prefetchw(page + j + 16);
>  				__ClearPageReserved(page + j);
> +				set_page_count(page + j, 0);
>  			}
>  			__free_pages(page, order);
>  			i += BITS_PER_LONG;

ibid re: bootmem


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> Index: linux-2.6/kernel/power/swsusp.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/swsusp.c
> +++ linux-2.6/kernel/power/swsusp.c
> @@ -433,15 +433,12 @@ static int save_highmem_zone(struct zone
>  			continue;
>  		page = pfn_to_page(pfn);
>  		/*
> +		 * PageReserved(page) -
>  		 * This condition results from rvmalloc() sans vmalloc_32()
>  		 * and architectural memory reservations. This should be
>  		 * corrected eventually when the cases giving rise to this
>  		 * are better understood.
>  		 */
> -		if (PageReserved(page)) {
> -			printk("highmem reserved page?!\n");
> -			continue;
> -		}
>  		BUG_ON(PageNosave(page));
>  		if (PageNosaveFree(page))
>  			continue;

This behavioral change needs to be commented on. There are some additional
difficulties when memory holes are unintentionally covered by mem_map[];
It is beneficial otherwise. It's likely to triplefault on such holes.


On Thu, Jun 23, 2005 at 05:08:24PM +1000, Nick Piggin wrote:
> @@ -527,13 +524,8 @@ static int saveable(struct zone * zone, 
>  		return 0;
>  
>  	page = pfn_to_page(pfn);
> -	BUG_ON(PageReserved(page) && PageNosave(page));
>  	if (PageNosave(page))
>  		return 0;
> -	if (PageReserved(page) && pfn_is_nosave(pfn)) {
> -		pr_debug("[nosave pfn 0x%lx]", pfn);
> -		return 0;
> -	}
>  	if (PageNosaveFree(page))
>  		return 0;

The pfn_is_nosave() check must stand barring justification of why
unintentionally saving (and hence restoring) the page is tolerable.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-06-23  9:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-23  7:05 [patch][rfc] 0/5: " Nick Piggin
2005-06-23  7:06 ` [patch][rfc] 1/5: comment for mm/rmap.c Nick Piggin
2005-06-23  7:06   ` [patch][rfc] 2/5: micro optimisation " Nick Piggin
2005-06-23  7:07     ` [patch][rfc] 3/5: remove atomic bitop when freeing page Nick Piggin
2005-06-23  7:07       ` [patch][rfc] 4/5: remap ZERO_PAGE mappings Nick Piggin
2005-06-23  7:08         ` [patch][rfc] 5/5: core remove PageReserved Nick Piggin
2005-06-23  9:51           ` William Lee Irwin III [this message]
2005-06-23 10:32             ` Nick Piggin
2005-06-23 22:08               ` William Lee Irwin III
2005-06-23 23:21                 ` Nick Piggin
2005-06-24  0:59                   ` William Lee Irwin III
2005-06-24  1:17                     ` Nick Piggin
2005-06-24  1:47                       ` Nick Piggin
2005-06-24  1:25                     ` Nick Piggin
2005-06-24  4:50             ` Andrew Morton
2005-06-24  8:24               ` William Lee Irwin III
2005-06-26  8:41               ` Nick Piggin
2005-06-23  7:26     ` [patch][rfc] 2/5: micro optimisation for mm/rmap.c William Lee Irwin III
2005-06-23  7:33       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050623095153.GB3334@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pbadari@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox