* [Ksummit-discuss] [topic] Richer internal block API
@ 2014-05-29 17:49 Daniel Phillips
2014-05-29 18:13 ` Greg KH
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Phillips @ 2014-05-29 17:49 UTC (permalink / raw)
To: ksummit-discuss, NeilBrown
Hi Neil,
This will be my annual proposal to open a general discussion about
improving the internal block API, to be capable of doing all the things
that the ZFS crowd claim are impossible without rampantly violating
filesystem/raid layering. Attacking this in a storage-specific venue
would also be good, however I view this issue as being at least as
central as a number of topics already raised for general consideration.
Full disclosure dept: I have an agenda. I want to add the equivalent of
Raidz etc to Tux3 without reimplementing a logical volume manager in the
filesystem.
Regards,
Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-29 17:49 [Ksummit-discuss] [topic] Richer internal block API Daniel Phillips
@ 2014-05-29 18:13 ` Greg KH
2014-05-29 18:13 ` Daniel Phillips
2014-05-30 9:56 ` Lukáš Czerner
0 siblings, 2 replies; 12+ messages in thread
From: Greg KH @ 2014-05-29 18:13 UTC (permalink / raw)
To: Daniel Phillips; +Cc: ksummit-discuss
On Thu, May 29, 2014 at 10:49:13AM -0700, Daniel Phillips wrote:
> Hi Neil,
>
> This will be my annual proposal to open a general discussion about improving
> the internal block API, to be capable of doing all the things that the ZFS
> crowd claim are impossible without rampantly violating filesystem/raid
> layering. Attacking this in a storage-specific venue would also be good,
> however I view this issue as being at least as central as a number of topics
> already raised for general consideration.
Why didn't you bring this up at the filesystem summit a few months ago?
That's the best place for it, not at the kernel summit.
> Full disclosure dept: I have an agenda. I want to add the equivalent of
> Raidz etc to Tux3 without reimplementing a logical volume manager in the
> filesystem.
Like btrfs is doing? :)
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-29 18:13 ` Greg KH
@ 2014-05-29 18:13 ` Daniel Phillips
2014-05-29 18:23 ` Greg KH
2014-05-30 9:56 ` Lukáš Czerner
1 sibling, 1 reply; 12+ messages in thread
From: Daniel Phillips @ 2014-05-29 18:13 UTC (permalink / raw)
To: Greg KH; +Cc: ksummit-discuss
On 05/29/2014 11:13 AM, Greg KH wrote:
> On Thu, May 29, 2014 at 10:49:13AM -0700, Daniel Phillips wrote:
>> Hi Neil,
>>
>> This will be my annual proposal to open a general discussion about improving
>> the internal block API, to be capable of doing all the things that the ZFS
>> crowd claim are impossible without rampantly violating filesystem/raid
>> layering. Attacking this in a storage-specific venue would also be good,
>> however I view this issue as being at least as central as a number of topics
>> already raised for general consideration.
> Why didn't you bring this up at the filesystem summit a few months ago?
> That's the best place for it, not at the kernel summit.
Sorry, I did not have time to participate this year. I wonder though,
why power management is regarded as a summit-worthy topic, but core
functionality of the block layer is not.
>
>> Full disclosure dept: I have an agenda. I want to add the equivalent of
>> Raidz etc to Tux3 without reimplementing a logical volume manager in the
>> filesystem.
> Like btrfs is doing? :)
>
> greg k-h
Not like btrfs is doing, the opposite really.
Regards,
Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-29 18:13 ` Daniel Phillips
@ 2014-05-29 18:23 ` Greg KH
2014-05-29 18:43 ` Daniel Phillips
0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2014-05-29 18:23 UTC (permalink / raw)
To: Daniel Phillips; +Cc: ksummit-discuss
On Thu, May 29, 2014 at 11:13:25AM -0700, Daniel Phillips wrote:
> On 05/29/2014 11:13 AM, Greg KH wrote:
> >On Thu, May 29, 2014 at 10:49:13AM -0700, Daniel Phillips wrote:
> >>Hi Neil,
> >>
> >>This will be my annual proposal to open a general discussion about improving
> >>the internal block API, to be capable of doing all the things that the ZFS
> >>crowd claim are impossible without rampantly violating filesystem/raid
> >>layering. Attacking this in a storage-specific venue would also be good,
> >>however I view this issue as being at least as central as a number of topics
> >>already raised for general consideration.
> >Why didn't you bring this up at the filesystem summit a few months ago?
> >That's the best place for it, not at the kernel summit.
> Sorry, I did not have time to participate this year. I wonder though, why
> power management is regarded as a summit-worthy topic, but core
> functionality of the block layer is not.
power management covers the whole tree, the block layer is "just" the
block layer.
> >>Full disclosure dept: I have an agenda. I want to add the equivalent of
> >>Raidz etc to Tux3 without reimplementing a logical volume manager in the
> >>filesystem.
> >Like btrfs is doing? :)
> >
> >greg k-h
> Not like btrfs is doing, the opposite really.
Good, post patches then :)
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-29 18:23 ` Greg KH
@ 2014-05-29 18:43 ` Daniel Phillips
2014-05-29 23:43 ` Greg KH
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Phillips @ 2014-05-29 18:43 UTC (permalink / raw)
To: Greg KH; +Cc: ksummit-discuss
On 05/29/2014 11:23 AM, Greg KH wrote:
> On Thu, May 29, 2014 at 11:13:25AM -0700, Daniel Phillips wrote:
>> ...I wonder though, why
>> power management is regarded as a summit-worthy topic, but core
>> functionality of the block layer is not.
> power management covers the whole tree, the block layer is "just" the
> block layer.
Power management does not cover more of the tree than the block layer
plus memory management plus filesystem plus vfs do, all of which are
impacted, and all of which raise user visible API questions.
>
>>>> Full disclosure dept: I have an agenda. I want to add the equivalent of
>>>> Raidz etc to Tux3 without reimplementing a logical volume manager in the
>>>> filesystem.
>>> Like btrfs is doing? :)
>>>
>>> greg k-h
>> Not like btrfs is doing, the opposite really.
> Good, post patches then :)
>
> greg k-h
Is that a recommendation to develop a core API extension in a vacuum?
Regards,
Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-29 18:43 ` Daniel Phillips
@ 2014-05-29 23:43 ` Greg KH
2014-05-31 22:44 ` Daniel Phillips
0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2014-05-29 23:43 UTC (permalink / raw)
To: Daniel Phillips; +Cc: ksummit-discuss
On Thu, May 29, 2014 at 11:43:09AM -0700, Daniel Phillips wrote:
> >> Not like btrfs is doing, the opposite really.
> > Good, post patches then :)
> >
> Is that a recommendation to develop a core API extension in a vacuum?
No, do it like any other core api changes, post patches that explain
what you want to do, and people will review them.
Come on, you know how this all works, we don't have to have meetings in
order to do design decisions that are "large".
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-29 18:13 ` Greg KH
2014-05-29 18:13 ` Daniel Phillips
@ 2014-05-30 9:56 ` Lukáš Czerner
1 sibling, 0 replies; 12+ messages in thread
From: Lukáš Czerner @ 2014-05-30 9:56 UTC (permalink / raw)
To: Greg KH; +Cc: ksummit-discuss
On Thu, 29 May 2014, Greg KH wrote:
> Date: Thu, 29 May 2014 11:13:19 -0700
> From: Greg KH <greg@kroah.com>
> To: Daniel Phillips <d.phillips@partner.samsung.com>
> Cc: ksummit-discuss@lists.linuxfoundation.org
> Subject: Re: [Ksummit-discuss] [topic] Richer internal block API
>
> On Thu, May 29, 2014 at 10:49:13AM -0700, Daniel Phillips wrote:
> > Hi Neil,
> >
> > This will be my annual proposal to open a general discussion about improving
> > the internal block API, to be capable of doing all the things that the ZFS
> > crowd claim are impossible without rampantly violating filesystem/raid
> > layering. Attacking this in a storage-specific venue would also be good,
> > however I view this issue as being at least as central as a number of topics
> > already raised for general consideration.
>
> Why didn't you bring this up at the filesystem summit a few months ago?
> That's the best place for it, not at the kernel summit.
Actually we've sort-of started the discussion about this topic at
LSF. Dave Chinner was the one who brought this up, the only problem
was that his idea was in really early stage and I suppose it still
is because I have not heard about this since then.
But I agree that this kind of discussion is more suited for LSF
rather than kernel summit since it's much more targeted to block vs.
file systems interactions.
>
> > Full disclosure dept: I have an agenda. I want to add the equivalent of
> > Raidz etc to Tux3 without reimplementing a logical volume manager in the
> > filesystem.
>
> Like btrfs is doing? :)
Well, we want exactly what btrfs is _not_ doing :)
-Lukas
>
> greg k-h
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-29 23:43 ` Greg KH
@ 2014-05-31 22:44 ` Daniel Phillips
2014-06-01 2:34 ` Greg KH
2014-06-01 4:31 ` NeilBrown
0 siblings, 2 replies; 12+ messages in thread
From: Daniel Phillips @ 2014-05-31 22:44 UTC (permalink / raw)
To: Greg KH; +Cc: ksummit-discuss
On 05/29/2014 04:43 PM, Greg KH wrote:
> ...you know how this all works, we don't have to have meetings in
> order to do design decisions that are "large".
Perhaps there is something wrong with that approach. Certainly in
regards to how to bridge the gap between what we now have for logical
volume support, and what we should have, or what BSD has, that approach
is demonstrably a perennial failure. After all these years, we still
have dm and md as separate islands, no usable snapshotting block device,
and roughly zero interaction between filesystems and volume managers.
The larger issue would be, why is there no design process in Linux for
large design issues? Maybe that is the core topic that is really missing.
Regards,
Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-31 22:44 ` Daniel Phillips
@ 2014-06-01 2:34 ` Greg KH
2014-06-02 17:33 ` Martin K. Petersen
2014-06-01 4:31 ` NeilBrown
1 sibling, 1 reply; 12+ messages in thread
From: Greg KH @ 2014-06-01 2:34 UTC (permalink / raw)
To: Daniel Phillips; +Cc: ksummit-discuss
On Sat, May 31, 2014 at 03:44:52PM -0700, Daniel Phillips wrote:
> On 05/29/2014 04:43 PM, Greg KH wrote:
> >...you know how this all works, we don't have to have meetings in order to
> >do design decisions that are "large".
>
>
> Perhaps there is something wrong with that approach. Certainly in regards to
> how to bridge the gap between what we now have for logical volume support,
> and what we should have, or what BSD has, that approach is demonstrably a
> perennial failure. After all these years, we still have dm and md as
> separate islands, no usable snapshotting block device, and roughly zero
> interaction between filesystems and volume managers.
People have talked about this for over a very long time. I've seen
Neil give numerous presentations about this for what, a decade now?
It must not be important enough for anyone to actually do the work. Or,
more likely, no one has been able to convince a company to sponsor the
work. So perhaps, it isn't that major of a thing that is needed to be
done?
> The larger issue would be, why is there no design process in Linux for
> large design issues?
There is, an "evolutionary" process. If you take a look at a 4 or 5
year old kernel, major things have happened. It's just that if you are
in the middle of it all, it doesn't look like "large" things have
changed.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-05-31 22:44 ` Daniel Phillips
2014-06-01 2:34 ` Greg KH
@ 2014-06-01 4:31 ` NeilBrown
1 sibling, 0 replies; 12+ messages in thread
From: NeilBrown @ 2014-06-01 4:31 UTC (permalink / raw)
To: Daniel Phillips; +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]
On Sat, 31 May 2014 15:44:52 -0700 Daniel Phillips
<d.phillips@partner.samsung.com> wrote:
> On 05/29/2014 04:43 PM, Greg KH wrote:
> > ...you know how this all works, we don't have to have meetings in
> > order to do design decisions that are "large".
>
>
> Perhaps there is something wrong with that approach. Certainly in
> regards to how to bridge the gap between what we now have for logical
> volume support, and what we should have, or what BSD has, that approach
> is demonstrably a perennial failure. After all these years, we still
> have dm and md as separate islands, no usable snapshotting block device,
> and roughly zero interaction between filesystems and volume managers.
dm-raid.c is a bridge between those islands.
Does dm-thin.c not provide usable snapshots? I admit I haven't looked in
detail.
> The larger issue would be, why is there no design process in Linux for
> large design issues? Maybe that is the core topic that is really missing.
What sort of "design process" do you imagine? Something like IETF? While it
certainly has had some successes I don't see that its process as conducive to
quality.
The design rule for Linux is simple: show me the code.
If if passes review, it goes in. If it doesn't you should know why and
can try again.
You can certainly start with a design proposal if you like, and you might get
valuable feedback from that. The more concrete your design, the easier it is
to respond to, so the quality of the responses you get will be higher.
But there is no way to escape the fact that, for a "big design" which affects
multiple subsystems, you will probably need to develop several prototypes
before you find something that works well. Be ready to discard and try again.
Like Greg said - it is "evolutionary" and evolution isn't just "survival of
the fittest", it is also "death to the weak".
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-06-01 2:34 ` Greg KH
@ 2014-06-02 17:33 ` Martin K. Petersen
2014-06-02 18:10 ` James Bottomley
0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2014-06-02 17:33 UTC (permalink / raw)
To: Greg KH; +Cc: ksummit-discuss
>>>>> "Greg" == Greg KH <greg@kroah.com> writes:
>> Perhaps there is something wrong with that approach. Certainly in
>> regards to how to bridge the gap between what we now have for logical
>> volume support, and what we should have, or what BSD has, that
>> approach is demonstrably a perennial failure. After all these years,
>> we still have dm and md as separate islands, no usable snapshotting
>> block device, and roughly zero interaction between filesystems and
>> volume managers.
Greg> People have talked about this for over a very long time.
Lots of talking, indeed. But I think the main problem that there's
nothing (or very little) to see here. Move along :)
Either you let the filesystem explicitly manage RAID and snapshots (like
btrfs) or you let DM or MD do it behind the filesystem's back. What's
the point of introducing a new interface to do something that we already
have?
That doesn't mean that there isn't merit to the "given this cookie, do
you happen to have another copy?" call we have discussed in the past.
Somebody just needs to do it. But I honestly think that btrfs is a much
better approach to that whole thing...
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ksummit-discuss] [topic] Richer internal block API
2014-06-02 17:33 ` Martin K. Petersen
@ 2014-06-02 18:10 ` James Bottomley
0 siblings, 0 replies; 12+ messages in thread
From: James Bottomley @ 2014-06-02 18:10 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: ksummit-discuss
On Mon, 2014-06-02 at 13:33 -0400, Martin K. Petersen wrote:
> >>>>> "Greg" == Greg KH <greg@kroah.com> writes:
>
> >> Perhaps there is something wrong with that approach. Certainly in
> >> regards to how to bridge the gap between what we now have for logical
> >> volume support, and what we should have, or what BSD has, that
> >> approach is demonstrably a perennial failure. After all these years,
> >> we still have dm and md as separate islands, no usable snapshotting
> >> block device, and roughly zero interaction between filesystems and
> >> volume managers.
>
> Greg> People have talked about this for over a very long time.
Agreed; KS would never be the right venue for this, it's a LSF topic.
> Lots of talking, indeed. But I think the main problem that there's
> nothing (or very little) to see here. Move along :)
>
> Either you let the filesystem explicitly manage RAID and snapshots (like
> btrfs) or you let DM or MD do it behind the filesystem's back. What's
> the point of introducing a new interface to do something that we already
> have?
>
> That doesn't mean that there isn't merit to the "given this cookie, do
> you happen to have another copy?" call we have discussed in the past.
> Somebody just needs to do it. But I honestly think that btrfs is a much
> better approach to that whole thing...
We actually tried RAID unification between btrfs and dm and md a long
time ago. We did make some progress with dm and md, but the use
paradigm of btrfs is just a bit too different and it couldn't be made to
work without making a huge mess. What's happening now is that we're
looking at the token and descriptor APIs (mostly for copy offload) and
if we find a good one we could revisit the issue and see if there's
other things it might support.
When I was a kid, I used to love architecture (in the software sense)
because it looked like blue printing the perfect edifice in advance and
then just putting the bricks in. Now that I'm older, I far prefer
having a set of abstractions that make an outline and being guided by
how the pieces fit together because that leaves you open to things the
perfect architecture approach forces you to ignore and it fits well with
the Linux code and use case requirements.
I'm sure when the use case finally arrives we'll be able to refactor
around it, but I don't think it's quite here yet.
James
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-06-02 18:10 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-29 17:49 [Ksummit-discuss] [topic] Richer internal block API Daniel Phillips
2014-05-29 18:13 ` Greg KH
2014-05-29 18:13 ` Daniel Phillips
2014-05-29 18:23 ` Greg KH
2014-05-29 18:43 ` Daniel Phillips
2014-05-29 23:43 ` Greg KH
2014-05-31 22:44 ` Daniel Phillips
2014-06-01 2:34 ` Greg KH
2014-06-02 17:33 ` Martin K. Petersen
2014-06-02 18:10 ` James Bottomley
2014-06-01 4:31 ` NeilBrown
2014-05-30 9:56 ` Lukáš Czerner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox