linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>, Nick Piggin <nickpiggin@yahoo.com.au>,
	"Martin J. Bligh" <mbligh@mbligh.org>,
	Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>,
	kravetz@us.ibm.com, linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	lhms <lhms-devel@lists.sourceforge.net>,
	Arjan van de Ven <arjanv@infradead.org>
Subject: Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19
Date: Wed, 2 Nov 2005 09:49:46 +0100	[thread overview]
Message-ID: <20051102084946.GA3930@elte.hu> (raw)
In-Reply-To: <1130858580.14475.98.camel@localhost>

* Dave Hansen <haveblue@us.ibm.com> wrote:

> On Tue, 2005-11-01 at 16:01 +0100, Ingo Molnar wrote:
> > so it's all about expectations: _could_ you reasonably remove a piece of 
> > RAM? Customer will say: "I have stopped all nonessential services, and 
> > free RAM is at 90%, still I cannot remove that piece of faulty RAM, fix 
> > the kernel!".
> 
> That's an excellent example.  Until we have some kind of kernel 
> remapping, breaking the 1:1 kernel virtual mapping, these pages will 
> always exist.  The easiest example of this kind of memory is kernel 
> text.

another example is open files, dentries, inodes, kernel stacks and 
various other kernel objects, which can become embedded in a generic 
kernel memory zone anywhere, and can become referenced to from other 
objects.

The C language we use for the kernel has no notion to automatically 
track these links between objects, which makes general purpose memory 
unmapping very hard: each and every pointer would have to be tracked 
explicitly.

Such an 'explicit pointer tracking' approach is not only error-prone 
[the C language offers us no way to _avoid_ direct dereferencing], it's 
also clearly a maintainance nightmare. Code like:

	obj->ptr = obj2;

would have to become something like:

	obj_set(obj_deref(obj, ptr), obj2);

this is only a theoretical thing, it is very clear that such an approach 
is unreadable, unmaintainable and unworkable.

fixing 1:1 mapping assumptions is a cakewalk in comparison ...

the only sane solution to make generic kernel RAM hot-removable, from a 
conceptual angle, is to use a language for the kernel that supports 
pointer-rewriting, garbage-collection and hence VM-shrinking. I.e. to 
rewrite the kernel in Java, C# or whatever other type-safe language that 
can track pointers. [But possibly not even current Java implementations 
can do this right now, because they currently use faulting methods for 
GC and do not track every pointer, which method is not suitable for 
hot-remove.]

[ C++ might work too, but that needs extensive other changes and a 
  kernel-pointer type that all other pointer types have to inherited 
  from. No quick & easy void * pointers allowed. Such a restriction is
  possibly unenforcable and thus the solution is unmaintainable. ]

just to state the obvious: while using another programming language for 
the Linux kernel might make sense in the future, the likelhood for that 
to happen anytime soon seems quite low =B-)

so i strongly believe that it's plain impossible to do memory hot-unplug 
of generic kernel RAM in a reliable and guaranteed way.

there are other 'hot-' features though that might be doable though: 
memory hot-add and memory hot-replace:

- hot-add is relatively easy (still nontrivial) and with discontigmem we 
  have it supported in essence.

- hot-replace becomes possible with the breaking of 1:1 kernel mapping,
  because the totality of kernel RAM does not shrink, so it has no
  impact on the virtual side of kernel memory, it's "just" a replacement
  act on the physical side. It's still not trivial though: if the new
  memory area has a different physical offset (which is likely under
  most hw designs), all physical pointers needs tracking and fixups.
  I.e. DMA has to be tracked (iommu-alike approach) or silenced, and
  pagetables may need fixups. Also, if the swapped module involves the
  kernel image itself then "interesting" per-arch things have to be
  done. But in any case, this is a much more limited change than what
  hot-remove of generic kernel RAM necessiates. Hot-replace is what
  fault tolerant systems would need.

reliable hot-remove of generic kernel RAM is plain impossible even in a 
fully virtualized solution. It's impossible even with maximum hardware 
help. We simply dont have the means to fix up live kernel pointers still 
linked into the removed region, under the C programming model.

the hurdles towards a reliable solution are so incredibly high, that
other solutions _have_ to be considered: restrict the type of RAM that
can be removed, and put it into a separate zone. That solves things
easily: no kernel pointers will be allowed in those zones. It becomes
similar to highmem: various kernel caches can opt-in to be included in
that type of RAM, and the complexity (and maintainance impact) of the
approach can thus be nicely scaled.

> > > There is also no precedent in existing UNIXes for a 100% solution.
> > 
> > does this have any relevance to the point, other than to prove that it's 
> > a hard problem that we should not pretend to be able to solve, without 
> > seeing a clear path towards a solution?
> 
> Agreed.  It is a hard problem.  One that some other UNIXes have not
> fully solved.
> 
> Here are the steps that I think we need to take.  Do you see any holes
> in their coverage?  Anything that seems infeasible?
> 
> 1. Fragmentation avoidance
>    * by itself, increases likelyhood of having an area of memory
>      which might be easily removed
>    * very small (if any) performance overhead
>    * other potential in-kernel users
>    * creates infrastructure to enforce the "hotplugablity" of any
>      particular are of memory.
> 2. Driver APIs
>    * Require that drivers specifically request for areas which must
>      retain constant physical addresses
>    * Driver must relinquish control of such areas upon request
>    * Can be worked around by hypervisors
> 3. Break 1:1 Kernel Virtual/Physial Mapping 
>    * In any large area of physical memory we wish to remove, there will
>      likely be very, very few straggler pages, which can not easily be
>      freed.
>    * Kernel will transparently move the contents of these physical pages
>      to new pages, keeping constant virtual addresses.
>    * Negative TLB overhead, as in-kernel large page mappings are broken
>      down into smaller pages.
>    * __{p,v}a() become more expensive, likely a table lookup
> 
> I've already done (3) on a limited basis, in the early days of memory 
> hotplug.  Not the remapping, just breaking the 1:1 assumptions.  It 
> wasn't too horribly painful.

i dont see the most fundamental problem listed: live kernel pointers 
pointing into a generic kernel RAM zone. Removing the 1:1 mapping and 
making the kernel VM space fully virtual will not solve that problem!

lets face it: removal of generic kernel RAM is a hard, and essentially 
unsolvable problem under the current Linux kernel model. It's not just 
the VM itself and 1:1 mappings (which is a nontrivial problem but which 
we can and probably should solve), it boils down to the fundamental 
choice of using C as the language of the kernel!

really, once you accept that, the path out of this mess becomes 'easy': 
we _have to_ compromise on the feature side! And the moment we give up 
the notion of 'generic kernel RAM' and focus on the hot-removability of 
a limited-functionality zone, the complexity of the solution becomes 
three orders of magnitude smaller. No fragmentation avoidance necessary.  
No 'have to handle dozens of very hard problems to become 99% 
functional' issues. Once you make that zone an opt-in thing, it becomes 
much better from a development dynamics point of view as well.

i believe that it's also easier from an emotional point of view: our 
choice to use the C language forces us to abandon the idea of 
hot-removable generic kernel RAM. This is not some borderline decision 
where different people have different judgement - this is a hard, 
almost-mathematical fact that is forced upon us by the laws of physics 
(and/or whatever deity you might believe in). The same laws that make 
faster than O(N)*O(log(N)) sorting impossible. No amount of hacking will 
get us past that wall.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2005-11-02  8:49 UTC|newest]

Thread overview: 253+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-30 18:33 Mel Gorman
2005-10-30 18:34 ` [PATCH 1/7] Fragmentation Avoidance V19: 001_antidefrag_flags Mel Gorman
2005-10-30 18:34 ` [PATCH 2/7] Fragmentation Avoidance V19: 002_usemap Mel Gorman
2005-10-30 18:34 ` [PATCH 3/7] Fragmentation Avoidance V19: 003_fragcore Mel Gorman
2005-10-30 18:34 ` [PATCH 4/7] Fragmentation Avoidance V19: 004_fallback Mel Gorman
2005-10-30 18:34 ` [PATCH 5/7] Fragmentation Avoidance V19: 005_largealloc_tryharder Mel Gorman
2005-10-30 18:34 ` [PATCH 6/7] Fragmentation Avoidance V19: 006_percpu Mel Gorman
2005-10-30 18:34 ` [PATCH 7/7] Fragmentation Avoidance V19: 007_stats Mel Gorman
2005-10-31  5:57 ` [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19 Mike Kravetz
2005-10-31  6:37   ` Nick Piggin
2005-10-31  7:54     ` Andrew Morton
2005-10-31  7:11       ` Nick Piggin
2005-10-31 16:19         ` Mel Gorman
2005-10-31 23:54           ` Nick Piggin
2005-11-01  1:28             ` Mel Gorman
2005-11-01  1:42               ` Nick Piggin
2005-10-31 14:34       ` Martin J. Bligh
2005-10-31 19:24         ` Andrew Morton
2005-10-31 19:40           ` Martin J. Bligh
2005-10-31 23:59             ` Nick Piggin
2005-11-01  1:36               ` Mel Gorman
2005-10-31 23:29         ` Nick Piggin
2005-11-01  0:59           ` Mel Gorman
2005-11-01  1:31             ` Nick Piggin
2005-11-01  2:07               ` Mel Gorman
2005-11-01  2:35                 ` Nick Piggin
2005-11-01 11:57                   ` Mel Gorman
2005-11-01 13:56                     ` Ingo Molnar
2005-11-01 14:10                       ` Dave Hansen
2005-11-01 14:29                         ` Ingo Molnar
2005-11-01 14:49                           ` Dave Hansen
2005-11-01 15:01                             ` Ingo Molnar
2005-11-01 15:22                               ` Dave Hansen
2005-11-02  8:49                                 ` Ingo Molnar [this message]
2005-11-02  9:02                                   ` Nick Piggin
2005-11-02  9:17                                     ` Ingo Molnar
2005-11-02  9:32                                     ` Dave Hansen
2005-11-02  9:48                                       ` Nick Piggin
2005-11-02 10:54                                         ` Dave Hansen
2005-11-02 15:02                                         ` Martin J. Bligh
2005-11-03  3:21                                           ` Nick Piggin
2005-11-03 15:36                                             ` Martin J. Bligh
2005-11-03 15:40                                               ` Arjan van de Ven
2005-11-03 15:51                                                 ` Linus Torvalds
2005-11-03 15:57                                                   ` Martin J. Bligh
2005-11-03 16:20                                                   ` Arjan van de Ven
2005-11-03 16:27                                                   ` Mel Gorman
2005-11-03 16:46                                                     ` Linus Torvalds
2005-11-03 16:52                                                       ` Martin J. Bligh
2005-11-03 17:19                                                         ` Linus Torvalds
2005-11-03 17:48                                                           ` Dave Hansen
2005-11-03 17:51                                                           ` Martin J. Bligh
2005-11-03 17:59                                                             ` Arjan van de Ven
2005-11-03 18:08                                                               ` Linus Torvalds
2005-11-03 18:17                                                                 ` Martin J. Bligh
2005-11-03 18:44                                                                   ` Linus Torvalds
2005-11-03 18:51                                                                     ` Martin J. Bligh
2005-11-03 19:35                                                                       ` Linus Torvalds
2005-11-03 22:40                                                                         ` Martin J. Bligh
2005-11-03 22:56                                                                           ` Linus Torvalds
2005-11-03 23:01                                                                             ` Martin J. Bligh
2005-11-04  0:58                                                                   ` Nick Piggin
2005-11-04  1:06                                                                     ` Linus Torvalds
2005-11-04  1:20                                                                       ` Paul Mackerras
2005-11-04  1:22                                                                       ` Nick Piggin
2005-11-04  1:48                                                                         ` Mel Gorman
2005-11-04  1:59                                                                           ` Nick Piggin
2005-11-04  2:35                                                                             ` Mel Gorman
2005-11-04  1:26                                                                       ` Mel Gorman
2005-11-03 21:11                                                                 ` Mel Gorman
2005-11-03 18:03                                                             ` Linus Torvalds
2005-11-03 20:00                                                               ` Paul Jackson
2005-11-03 20:46                                                               ` Mel Gorman
2005-11-03 18:48                                                             ` Martin J. Bligh
2005-11-03 19:08                                                               ` Linus Torvalds
2005-11-03 22:37                                                                 ` Martin J. Bligh
2005-11-03 23:16                                                                   ` Linus Torvalds
2005-11-03 23:39                                                                     ` Martin J. Bligh
2005-11-04  0:42                                                                       ` Nick Piggin
2005-11-04  4:39                                                                     ` Andrew Morton
2005-11-04 16:22                                                                 ` Mel Gorman
2005-11-03 15:53                                                 ` Martin J. Bligh
2005-11-02 14:57                                   ` Martin J. Bligh
2005-11-01 16:48                               ` Kamezawa Hiroyuki
2005-11-01 16:59                                 ` Kamezawa Hiroyuki
2005-11-01 17:19                                 ` Mel Gorman
2005-11-02  0:32                                   ` KAMEZAWA Hiroyuki
2005-11-02 11:22                                     ` Mel Gorman
2005-11-01 18:06                                 ` linux-os (Dick Johnson)
2005-11-02  7:19                                 ` Ingo Molnar
2005-11-02  7:46                                   ` Gerrit Huizenga
2005-11-02  8:50                                     ` Nick Piggin
2005-11-02  9:12                                       ` Gerrit Huizenga
2005-11-02  9:37                                         ` Nick Piggin
2005-11-02 10:17                                           ` Gerrit Huizenga
2005-11-02 23:47                                           ` Rob Landley
2005-11-03  4:43                                             ` Nick Piggin
2005-11-03  6:07                                               ` Rob Landley
2005-11-03  7:34                                                 ` Nick Piggin
2005-11-03 17:54                                                   ` Rob Landley
2005-11-03 20:13                                                     ` Jeff Dike
2005-11-03 16:35                                                 ` Jeff Dike
2005-11-03 16:23                                                   ` Badari Pulavarty
2005-11-03 18:27                                                     ` Jeff Dike
2005-11-03 18:49                                                     ` Rob Landley
2005-11-04  4:52                                                     ` Andrew Morton
2005-11-04  5:35                                                       ` Paul Jackson
2005-11-04  5:48                                                         ` Andrew Morton
2005-11-04  6:42                                                           ` Paul Jackson
2005-11-04  7:10                                                             ` Andrew Morton
2005-11-04  7:45                                                               ` Paul Jackson
2005-11-04  8:02                                                                 ` Andrew Morton
2005-11-04  9:52                                                                   ` Paul Jackson
2005-11-04 15:27                                                                     ` Martin J. Bligh
2005-11-04 15:19                                                               ` Martin J. Bligh
2005-11-04 17:38                                                                 ` Andrew Morton
2005-11-04  6:16                                                         ` Bron Nelson
2005-11-04  7:26                                                       ` [patch] swapin rlimit Ingo Molnar
2005-11-04  7:36                                                         ` Andrew Morton
2005-11-04  8:07                                                           ` Ingo Molnar
2005-11-04 10:06                                                             ` Paul Jackson
2005-11-04 15:24                                                             ` Martin J. Bligh
2005-11-04  8:18                                                           ` Arjan van de Ven
2005-11-04 10:04                                                             ` Paul Jackson
2005-11-04 15:14                                                           ` Rob Landley
2005-11-04 10:14                                                         ` Bernd Petrovitsch
2005-11-04 10:21                                                           ` Ingo Molnar
2005-11-04 11:17                                                             ` Bernd Petrovitsch
2005-11-02 10:41                                     ` [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19 Ingo Molnar
2005-11-02 11:04                                       ` Gerrit Huizenga
2005-11-02 12:00                                         ` Ingo Molnar
2005-11-02 12:42                                           ` Dave Hansen
2005-11-02 15:02                                           ` Gerrit Huizenga
2005-11-03  0:10                                             ` Rob Landley
2005-11-02  7:57                                   ` Nick Piggin
2005-11-02  0:51                             ` Nick Piggin
2005-11-02  7:42                               ` Dave Hansen
2005-11-02  8:24                                 ` Nick Piggin
2005-11-02  8:33                                   ` Yasunori Goto
2005-11-02  8:43                                     ` Nick Piggin
2005-11-02 14:51                                       ` Martin J. Bligh
2005-11-02 23:28                                       ` Rob Landley
2005-11-03  5:26                                         ` Jeff Dike
2005-11-03  5:41                                           ` Rob Landley
2005-11-04  3:26                                             ` [uml-devel] " Blaisorblade
2005-11-04 15:50                                               ` Rob Landley
2005-11-04 17:18                                                 ` Blaisorblade
2005-11-04 17:44                                                   ` Rob Landley
2005-11-02 12:38                               ` [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19 - Summary Mel Gorman
2005-11-03  3:14                                 ` Nick Piggin
2005-11-03 12:19                                   ` Mel Gorman
2005-11-10 18:47                                     ` Steve Lord
2005-11-03 15:34                                   ` Martin J. Bligh
2005-11-01 14:41                       ` [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19 Mel Gorman
2005-11-01 14:46                         ` Ingo Molnar
2005-11-01 15:23                           ` Mel Gorman
2005-11-01 18:33                           ` Rob Landley
2005-11-01 19:02                             ` Ingo Molnar
2005-11-01 14:50                         ` Dave Hansen
2005-11-01 15:24                           ` Mel Gorman
2005-11-02  5:11                         ` Andrew Morton
2005-11-01 18:23                       ` Rob Landley
2005-11-01 20:31                         ` Joel Schopp
2005-11-01 20:59                   ` Joel Schopp
2005-11-02  1:06                     ` Nick Piggin
2005-11-02  1:41                       ` Martin J. Bligh
2005-11-02  2:03                         ` Nick Piggin
2005-11-02  2:24                           ` Martin J. Bligh
2005-11-02  2:49                             ` Nick Piggin
2005-11-02  4:39                               ` Martin J. Bligh
2005-11-02  5:09                                 ` Nick Piggin
2005-11-02  5:14                                   ` Martin J. Bligh
2005-11-02  6:23                                     ` KAMEZAWA Hiroyuki
2005-11-02 10:15                                       ` Nick Piggin
2005-11-02  7:19                               ` Yasunori Goto
2005-11-02 11:48                               ` Mel Gorman
2005-11-02 11:41                           ` Mel Gorman
2005-11-02 11:37                       ` Mel Gorman
2005-11-02 15:11                       ` Mel Gorman
2005-11-01 15:25               ` Martin J. Bligh
2005-11-01 15:33                 ` Dave Hansen
2005-11-01 16:57                   ` Mel Gorman
2005-11-01 17:00                     ` Mel Gorman
2005-11-01 18:58                   ` Rob Landley
2005-11-01 14:40         ` Avi Kivity
2005-11-04  1:00 Andy Nelson
2005-11-04  1:16 ` Martin J. Bligh
2005-11-04  1:27   ` Nick Piggin
2005-11-04  5:14 ` Linus Torvalds
2005-11-04  6:10   ` Paul Jackson
2005-11-04  6:38     ` Ingo Molnar
2005-11-04  7:26       ` Paul Jackson
2005-11-04  7:37         ` Ingo Molnar
2005-11-04 15:31       ` Linus Torvalds
2005-11-04 15:39         ` Martin J. Bligh
2005-11-04 15:53         ` Ingo Molnar
2005-11-06  7:34           ` Paul Jackson
2005-11-06 15:55             ` Linus Torvalds
2005-11-06 18:18               ` Paul Jackson
2005-11-06  8:44         ` Kyle Moffett
2005-11-06 16:12           ` Linus Torvalds
2005-11-06 17:00             ` Linus Torvalds
2005-11-07  8:00               ` Ingo Molnar
2005-11-07 11:00                 ` Dave Hansen
2005-11-07 12:20                   ` Ingo Molnar
2005-11-07 19:34                     ` Steven Rostedt
2005-11-07 23:38                       ` Joel Schopp
2005-11-04  7:44     ` Eric Dumazet
2005-11-07 16:42       ` Adam Litke
2005-11-04 14:56   ` Andy Nelson
2005-11-04 15:18     ` Ingo Molnar
2005-11-04 15:39       ` Andy Nelson
2005-11-04 16:05         ` Ingo Molnar
2005-11-04 16:07         ` Linus Torvalds
2005-11-04 16:40           ` Ingo Molnar
2005-11-04 17:22             ` Linus Torvalds
2005-11-04 17:43               ` Andy Nelson
2005-11-04 16:00     ` Linus Torvalds
2005-11-04 16:13       ` Martin J. Bligh
2005-11-04 16:40         ` Linus Torvalds
2005-11-04 17:10           ` Martin J. Bligh
2005-11-04 16:14       ` Andy Nelson
2005-11-04 16:49         ` Linus Torvalds
2005-11-04 15:19 Andy Nelson
2005-11-04 17:03 Andy Nelson
2005-11-04 17:49 ` Linus Torvalds
2005-11-04 17:51   ` Andy Nelson
2005-11-04 20:12 ` Ingo Molnar
2005-11-04 21:04   ` Andy Nelson
2005-11-04 21:14     ` Ingo Molnar
2005-11-04 21:22     ` Linus Torvalds
2005-11-04 21:39       ` Linus Torvalds
2005-11-05  2:48       ` Rob Landley
2005-11-06 10:59       ` Paul Jackson
2005-11-04 21:31     ` Gregory Maxwell
2005-11-04 22:43       ` Andi Kleen
2005-11-05  0:07         ` Nick Piggin
2005-11-06  1:30         ` Zan Lynx
2005-11-06  2:25           ` Rob Landley
2005-11-04 17:56 Andy Nelson
2005-11-04 21:51 Andy Nelson
2005-11-05  1:37 Seth, Rohit, Nick
2005-11-07  0:34 ` Andy Nelson
2005-11-07 18:58   ` Adam Litke
2005-11-07 20:51     ` Rohit Seth
2005-11-07 20:55       ` Andy Nelson
2005-11-07 20:58         ` Martin J. Bligh
2005-11-07 21:20           ` Rohit Seth
2005-11-07 21:33             ` Adam Litke
2005-11-08  2:12         ` David Gibson
2005-11-07 21:11       ` Adam Litke
2005-11-07 21:31         ` Rohit Seth
2005-11-05  1:52 Seth, Rohit, Friday, November

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051102084946.GA3930@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@osdl.org \
    --cc=arjanv@infradead.org \
    --cc=haveblue@us.ibm.com \
    --cc=kravetz@us.ibm.com \
    --cc=lhms-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@mbligh.org \
    --cc=mel@csn.ul.ie \
    --cc=nickpiggin@yahoo.com.au \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox