From: Nick Piggin <nickpiggin@yahoo.com.au>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Linux Memory Management <linux-mm@kvack.org>,
Hugh Dickins <hugh@veritas.com>,
Badari Pulavarty <pbadari@us.ibm.com>
Subject: Re: [patch][rfc] 5/5: core remove PageReserved
Date: Fri, 24 Jun 2005 09:21:31 +1000 [thread overview]
Message-ID: <42BB43FB.1060609@yahoo.com.au> (raw)
In-Reply-To: <20050623220812.GD3334@holomorphy.com>
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>
>>>As exciting as this is, !!(vma->vm_flags & VM_RESERVED) could trivially
>>>go into struct zap_details without excess args or diff.
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>Actually, it isn't trivial. I thought of trying that.
>
>
> You're probably thinking of the zap_page_range() vs. unmap_page_range()
> topmost-level entrypoints as being the only places to set the flag in
> details. Unconditionally clobbering the field in details in
> zap_pud_range() should resolve any issues associated with that.
>
> This is obviously a style and/or diff-minimization issue.
>
I'm thinking of exit_mmap and unmap_vmas, that don't pass in a
details field at all, and all the tests for details being marked
unlikely. I ended up thinking it was less ugly this way.
>
> William Lee Irwin III wrote:
>
>>>The refcounting and PG_reserved activity in memmap_init_zone() is
>>>superfluous. bootmem.c does all the necessary accounting internally.
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>Well not superfluous yet. It is kept around to support all the arch
>>code that still uses it (not much, mainly reserved memory reporting).
>
>
> That activity is outside memmap_init_zone(). Specifically, the page
> structures are not used for anything whatsoever before bootmem puts
> them into the buddy allocators. It is not particularly interesting
> or difficult to defer even the initialization to the precise point
> in time bootmem cares to perform buddy bitmap insertion. This goes
> for all fields of struct page. What's notable about PG_reserved and
> refcounts here is that memmap_init_zone() goes about flipping bits one
> way where free_all_bootmem_core() undoes all its work.
>
This patch doesn't care how it works, that would be for a later patch.
>
> William Lee Irwin III wrote:
>
>>>There is no error returned here to be handled by the caller.
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>That's OK, the pte has been cleared. Nothing else we can do.
>
>
> This is called in the interior of a loop, which may be beneficial to
> terminate if this intended semantic is to be enforced. Furthermore, no
> error is propagated to the caller, which is not the desired effect in
> the stated error reporting scheme. So the code is inconsistent with
> explicitly stated intention.
>
No, the error reporting scheme says it doesn't handle any error,
that is all. What we have here in terms of behaviour is exactly
what used to happen, that is - do something saneish on error.
Changing behaviour would be outside the scope of this patch, but
be my guest.
>
> William Lee Irwin III wrote:
>
>>>This has no effect but to artificially constrain the interface. There
>>>is no a priori reason to avoid the use of install_page() to deposit
>>>mappings to normal pages in VM_RESERVED vmas. It's only the reverse
>>>being "banned" here. Similar comments also apply to zap_pte().
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>No, install_page is playing with the page (eg. page_add_file_rmap)
>>which is explicity banned even before my PageReserved removal. It
>>is unclear that this ever safely worked for normal pages, and will
>>hit NULL page dereferences if trying to do it with iomem.
>
>
> You are going on about the fact that install_page() can't be used on
> memory outside mem_map[] as it requires a page structure, and can't be
> used on reserved pages because page_add_file_rmap() will BUG. This case
> is not being discussed.
>
And that it isn't allowed to touch struct page of physical pages
in a VM_RESERVED region.
> The issue at stake is inserting normal pages into a VM_RESERVED vma.
> These will arise as e.g. kernel-allocated buffers managed by normal
> reference counting. remap_pfn_range() can't do it; it refuses to
> operate on "valid memory". install_page() now won't do it; it refuses
> to touch a VM_RESERVED vma. So this creates a giant semantic hole,
> and potentially breaks working code (i.e. if you were going to do
> this you would need not only a replacement but also a sweep to adjust
> all the drivers doing it or prove their nonexistence).
>
I think you'll find that remap_pfn_range will be happy to operate
on valid memory, and that any driver trying to use install_page
on VM_RESERVED probably needs fixing anyway.
>
> William Lee Irwin III wrote:
>
>>>An answer should be devised for this. My numerous SCSI CD-ROM devices
>>>(I have 5 across several different machines of several different arches)
>>>are rather unlikely to be happy with /* FIXME: XXX ... as an answer.
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>The worst it will do is dirty a VM_RESERVED page. So it is going
>>to work unless you're thinking about doing something crazy like
>>mmap /dev/mem and send your CDROM some pages from there. But yeah,
>>I have to find someone who knows what they're doing to look at this.
>
>
> Then you should replace FIXME: XXX with this explanation. By and large
> the presence of "FIXME: XXX" is a sign there is a gigantic hole in the
> code. It should definitely not be done with new code, but rather,
> exclusively confined to documenting discoveries of preexisting brokenness.
> After all, if a patch is introducing broken code, why would we merge it?
> Best to adjust the commentary and avoid that question altogether.
>
We wouldn't merge it. Hence this is an rfc and I explicitly said
it is not for merging.
> There are actually larger questions about this than the reserved page
> handling. If e.g. pagecache pages need to be dirtied the raw bitflag
> toggling is probably not how it should be done.
>
Yep.
>
> William Lee Irwin III wrote:
>
>>>snd_malloc_pages() marks all pages it allocates PG_reserved. This
>>>merits some commentary, and likely the removal of the superfluous
>>>PG_reserved usage.
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>Sure, but not in this patch. The aim here is just to eliminate special
>>casing of refcounting. Other PG_reserved usage can stay around for the
>>moment (and is actually good for catching errors).
>
>
> Unfortunately for this scheme, it's very much a case of putting the
> cart before the horse. PG_reserved is toggled at random in this driver
> after this change, to no useful effect (debugging or otherwise). And
> this really goes for the whole affair. Diddling the core first is just
> going to create bugs. Converting the users first is the way these
> things need to be done. When complete, nothing needs the core flags
> twiddling anymore and you just nuke the flag twiddling from the core.
>
I'm sorry, I don't see how 'diddling' the core will create bugs.
This is a fine way to do it, and "converting" users first (whatever
that means) is not possible because VM_RESERVED handling in core
code is not up to the task of replacing PageReserved without this
patch.
>
> William Lee Irwin III wrote:
>
>>>This is user-triggerable where the driver honors mmap() protection
>>>flags blindly.
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>If the user is allowed write access to VM_RESERVED memory, then I
>>suspect there is a lot worse they can do than flood the log.
>>But the check isn't going to stay around forever.
>
>
> This doesn't really do a whole lot of good for the unwitting user
> who invokes a privileged application relying on such kernel behavior.
> User-visible changes need to be taken on with somewhat more care
> (okay, vastly more, along with long-term backward compatibility).
>
It does a great lot of good, because they can tell us about it
we'll fix it. Search the kernel sources and you'll find other
examples that look almost exactly like this one.
>
> William Lee Irwin III wrote:
>
>>>This behavioral change needs to be commented on. There are some additional
>>>difficulties when memory holes are unintentionally covered by mem_map[];
>>>It is beneficial otherwise. It's likely to triplefault on such holes.
>
>
> On Thu, Jun 23, 2005 at 08:32:05PM +1000, Nick Piggin wrote:
>
>>It seems the author of this code themselves didn't really understand
>>what was going on here, so I'm buggered if I can be bothered :)
>>Remember though, PageReserved can stay around for as long as we need,
>>so this hunk can be trivially reverted if it is an immediate problem.
>
>
> This doesn't really fly. A fair number of drivers are poorly-understood
> and numerous gyrations have to be gone through to avoid breaking their
> possible assumptions until at long last clarifications are made (that
> is, if they're ever made). swsusp is not demotable to below their
> status on a whim.
>
Yep, that's why I'm going to ask some swsusp developers to have
a look at it.
I wouldn't pretend to be able to fix every bug everywhere in the
kernel myself.
Thanks,
Nick
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2005-06-23 23:21 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-23 7:05 [patch][rfc] 0/5: " Nick Piggin
2005-06-23 7:06 ` [patch][rfc] 1/5: comment for mm/rmap.c Nick Piggin
2005-06-23 7:06 ` [patch][rfc] 2/5: micro optimisation " Nick Piggin
2005-06-23 7:07 ` [patch][rfc] 3/5: remove atomic bitop when freeing page Nick Piggin
2005-06-23 7:07 ` [patch][rfc] 4/5: remap ZERO_PAGE mappings Nick Piggin
2005-06-23 7:08 ` [patch][rfc] 5/5: core remove PageReserved Nick Piggin
2005-06-23 9:51 ` William Lee Irwin III
2005-06-23 10:32 ` Nick Piggin
2005-06-23 22:08 ` William Lee Irwin III
2005-06-23 23:21 ` Nick Piggin [this message]
2005-06-24 0:59 ` William Lee Irwin III
2005-06-24 1:17 ` Nick Piggin
2005-06-24 1:47 ` Nick Piggin
2005-06-24 1:25 ` Nick Piggin
2005-06-24 4:50 ` Andrew Morton
2005-06-24 8:24 ` William Lee Irwin III
2005-06-26 8:41 ` Nick Piggin
2005-06-23 7:26 ` [patch][rfc] 2/5: micro optimisation for mm/rmap.c William Lee Irwin III
2005-06-23 7:33 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42BB43FB.1060609@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbadari@us.ibm.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox