linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Toshi Kani <toshi.kani@hp.com>
To: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	lenb@kernel.org, akpm@linux-foundation.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, bhelgaas@google.com,
	isimatu.yasuaki@jp.fujitsu.com, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, guohanjun@huawei.com, yinghai@kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com
Subject: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
Date: Mon, 04 Feb 2013 09:02:46 -0700	[thread overview]
Message-ID: <1359993766.23410.105.camel@misato.fc.hp.com> (raw)
In-Reply-To: <5192355.CsKHU8mj3W@vostro.rjw.lan>

On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
  :
> > > Yes, but those are just remove events and we can only see how destructive they
> > > were after the removal.  The point is to be able to figure out whether or not
> > > we *want* to do the removal in the first place.
> > 
> > Yes, but, you will always race if you try to test to see if you can shut
> > down a device and then trying to do it.  So walking the bus ahead of
> > time isn't a good idea.
> >
> > And, we really don't have a viable way to recover if disconnect() fails,
> > do we.  What do we do in that situation, restore the other devices we
> > disconnected successfully?  How do we remember/know what they were?
> > 
> > PCI hotplug almost had this same problem until the designers finally
> > realized that they just had to accept the fact that removing a PCI
> > device could either happen by:
> > 	- a user yanking out the device, at which time the OS better
> > 	  clean up properly no matter what happens
> > 	- the user asked nicely to remove a device, and the OS can take
> > 	  as long as it wants to complete that action, including
> > 	  stalling for noticable amounts of time before eventually,
> > 	  always letting the action succeed.
> > 
> > I think the second thing is what you have to do here.  If a user tells
> > the OS it wants to remove these devices, you better do it.  If you
> > can't, because memory is being used by someone else, either move them
> > off, or just hope that nothing bad happens, before the user gets
> > frustrated and yanks out the CPU/memory module themselves physically :)
> 
> Well, that we can't help, but sometimes users really *want* the OS to tell them
> if it is safe to unplug something at this particualr time (think about the
> Windows' "safe remove" feature for USB sticks, for example; that came out of
> users' demand AFAIR).
> 
> So in my opinion it would be good to give them an option to do "safe eject" or
> "forcible eject", whichever they prefer.

For system device hot-plug, it always needs to be "safe eject".  This
feature will be implemented on mission critical servers, which are
managed by professional IT folks.  Crashing a server causes serious
money to the business.

A user yanking out a system device won't happen, and it immediately
crashes the system if it is done.  So, we have nothing to do with this
case.  The 2nd case can hang the operation, waiting forever to proceed,
which is still a serious issue for enterprise customers.


> > > Say you have a computing node which signals a hardware problem in a processor
> > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > may want to eject that package, but you don't want to kill the system this
> > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > That may be costly, however (maybe weeks of computations), so it should be
> > > avoided if possible, but not at the expense of crashing the box if the eject
> > > doesn't work out.
> > 
> > These same "situations" came up for PCI hotplug, and I still say the
> > same resolution there holds true, as described above.  The user wants to
> > remove something, so let them do it.  They always know best, and get mad
> > at us if we think otherwise :)
> 
> Well, not necessarily.  Users sometimes really don't know what they are doing
> and want us to give them a hint.  My opinion is that if we can give them a
> hint, there's no reason not to.
> 
> > What does the ACPI spec say about this type of thing?  Surely the same
> > people that did the PCI Hotplug spec were consulted when doing this part
> > of the spec, right?  Yeah, I know, I can dream...
> 
> It's not very specific (as usual), but it gives hints. :-)
> 
> For example, there is the _OST method (Section 6.3.5 of ACPI 5) that we are
> supposed to use to notify the platform of ejection failures and there are
> status codes like "0x81: Device in use by application" or "0x82: Device busy"
> that can be used in there.  So definitely the authors took ejection failures
> for software-related reasons into consideration.

That is correct.  Also, ACPI spec deliberately does not define
implementation details, so we defined DIG64 hotplug spec below (which I
contributed to the spec.)

http://www.dig64.org/home/DIG64_HPPF_R1_0.pdf

For example, Figure 2 in page 14 states memory hot-remove flow.  The
operation needs to either succeed or fail.  Crash or hang is not an
option.


Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-02-04 16:12 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
2013-01-11 21:23   ` Rafael J. Wysocki
2013-01-14 15:33     ` Toshi Kani
2013-01-14 18:48       ` Rafael J. Wysocki
2013-01-14 19:02         ` Toshi Kani
2013-01-30  4:48           ` Greg KH
2013-01-31  1:15             ` Toshi Kani
2013-01-31  5:24               ` Greg KH
2013-01-31 14:42                 ` Toshi Kani
2013-01-30  4:53   ` Greg KH
2013-01-31  1:46     ` Toshi Kani
2013-01-30  4:58   ` Greg KH
2013-01-31  2:57     ` Toshi Kani
2013-01-31 20:54       ` Rafael J. Wysocki
2013-02-01  1:32         ` Toshi Kani
2013-02-01  7:30           ` Greg KH
2013-02-01 20:40             ` Toshi Kani
2013-02-01 22:21               ` Rafael J. Wysocki
2013-02-01 23:12                 ` Toshi Kani
2013-02-02 15:01               ` Greg KH
2013-02-04  0:28                 ` Toshi Kani
2013-02-04 12:46                   ` Greg KH
2013-02-04 16:46                     ` Toshi Kani
2013-02-04 19:45                       ` Rafael J. Wysocki
2013-02-04 20:59                         ` Toshi Kani
2013-02-04 23:23                           ` Rafael J. Wysocki
2013-02-04 23:33                             ` Toshi Kani
2013-02-01  7:23         ` Greg KH
2013-02-01 22:12           ` Rafael J. Wysocki
2013-02-02 14:58             ` Greg KH
2013-02-02 20:15               ` Rafael J. Wysocki
2013-02-02 22:18                 ` [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework) Rafael J. Wysocki
2013-02-04  1:24                   ` Greg KH
2013-02-04 12:34                     ` Rafael J. Wysocki
2013-02-03 20:44                 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Rafael J. Wysocki
2013-02-04 12:48                   ` Greg KH
2013-02-04 14:21                     ` Rafael J. Wysocki
2013-02-04 14:33                       ` Greg KH
2013-02-04 20:07                         ` Rafael J. Wysocki
2013-02-04 22:13                           ` Toshi Kani
2013-02-04 23:52                             ` Rafael J. Wysocki
2013-02-05  0:04                               ` Greg KH
2013-02-05  1:02                                 ` Rafael J. Wysocki
2013-02-05 11:11                                 ` Rafael J. Wysocki
2013-02-05 18:39                                   ` Greg KH
2013-02-05 21:13                                     ` Rafael J. Wysocki
2013-02-05  0:55                               ` Toshi Kani
2013-02-04 16:19                       ` Toshi Kani
2013-02-04 19:43                         ` Rafael J. Wysocki
2013-02-04  1:23                 ` Greg KH
2013-02-04 13:41                   ` Rafael J. Wysocki
2013-02-04 16:02                     ` Toshi Kani [this message]
2013-02-04 19:48                       ` Rafael J. Wysocki
2013-02-04 19:46                         ` Toshi Kani
2013-02-04 20:12                           ` Rafael J. Wysocki
2013-02-04 20:34                             ` Toshi Kani
2013-02-04 23:19                               ` Rafael J. Wysocki
2013-01-10 23:40 ` [RFC PATCH v2 02/12] ACPI: " Toshi Kani
2013-01-11 21:25   ` Rafael J. Wysocki
2013-01-14 15:53     ` Toshi Kani
2013-01-14 18:47       ` Rafael J. Wysocki
2013-01-14 18:42         ` Toshi Kani
2013-01-14 19:07           ` Rafael J. Wysocki
2013-01-14 19:21             ` Toshi Kani
2013-01-30  4:51               ` Greg KH
2013-01-31  1:38                 ` Toshi Kani
2013-01-14 19:21             ` Greg KH
2013-01-14 19:29               ` Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 03/12] drivers/base: Add " Toshi Kani
2013-01-30  4:54   ` Greg KH
2013-01-31  1:48     ` Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 04/12] cpu: Add cpu hotplug handlers Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 05/12] mm: Add memory " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 06/12] ACPI: Add ACPI bus " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 07/12] ACPI: Add ACPI resource hotplug handler Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 08/12] ACPI: Update processor driver for hotplug framework Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 09/12] ACPI: Update memory " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 10/12] ACPI: Update container " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 11/12] cpu: Update sysfs cpu/online " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 12/12] ACPI: Update sysfs eject " Toshi Kani
2013-01-17  0:50 ` [RFC PATCH v2 00/12] System device hot-plug framework Rafael J. Wysocki
2013-01-17 17:59   ` Toshi Kani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1359993766.23410.105.camel@misato.fc.hp.com \
    --to=toshi.kani@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guohanjun@huawei.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=wency@cn.fujitsu.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox