linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
@ 2007-06-06 16:07 Badari Pulavarty
  2007-06-06 17:02 ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-06 16:07 UTC (permalink / raw)
  To: Eric W. Biederman, Andrew Morton; +Cc: linux-mm, lkml

Hi Eric,

Your recent cleanup to shm code, namely

[PATCH] shm: make sysv ipc shared memory use stacked files

took away one of the debugging feature for shm segments.
Originally, shmid were forced to be the inode numbers and
they show up in /proc/pid/maps for the process which mapped
this shared memory segments (vma listing). That way, its easy
to find out who all mapped this shared memory segment. Your
patchset, took away the inode# setting. So, we can't easily
match the shmem segments to /proc/pid/maps easily. (It was
really useful in tracking down a customer problem recently). 
Is this done deliberately ? Anything wrong in setting this back ?

Comments ?

Thanks,
Badari

Without patch:
--------------

# ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 884737     db2inst1  767        33554432   13

# grep 884737 /proc/*/maps
#

With patch:
-----------

# ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 884737     db2inst1  767        33554432   13

# grep 884737 /proc/*/maps
/proc/11110/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11111/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11112/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11113/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11114/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11115/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11116/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11117/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11118/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11121/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11122/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11124/maps:4000389c000-4000589c000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11575/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)



Here is the patch.

"ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
memory (shmid). It was useful in debugging, but its changed recently. 
This patch sets inode number to shared memory id to match /proc/pid/maps.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>

Index: linux-2.6.22-rc4/ipc/shm.c
===================================================================
--- linux-2.6.22-rc4.orig/ipc/shm.c	2007-06-04 17:57:25.000000000 -0700
+++ linux-2.6.22-rc4/ipc/shm.c	2007-06-06 08:23:57.000000000 -0700
@@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace 
 	shp->shm_nattch = 0;
 	shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
 	shp->shm_file = file;
+	file->f_dentry->d_inode->i_ino = shp->id;
 
 	ns->shm_tot += numpages;
 	shm_unlock(shp);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-06 16:07 [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid Badari Pulavarty
@ 2007-06-06 17:02 ` Eric W. Biederman
  2007-06-06 17:37   ` Badari Pulavarty
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2007-06-06 17:02 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Andrew Morton, linux-mm, lkml

Badari Pulavarty <pbadari@us.ibm.com> writes:

> Hi Eric,
>
> Your recent cleanup to shm code, namely
>
> [PATCH] shm: make sysv ipc shared memory use stacked files
>
> took away one of the debugging feature for shm segments.
> Originally, shmid were forced to be the inode numbers and
> they show up in /proc/pid/maps for the process which mapped
> this shared memory segments (vma listing). That way, its easy
> to find out who all mapped this shared memory segment. Your
> patchset, took away the inode# setting. So, we can't easily
> match the shmem segments to /proc/pid/maps easily. (It was
> really useful in tracking down a customer problem recently). 
> Is this done deliberately ? Anything wrong in setting this back ?
>
> Comments ?
>
> Thanks,
> Badari
>
> Without patch:
> --------------
>
> # ipcs -m
>
> ------ Shared Memory Segments --------
> key        shmid      owner      perms      bytes      nattch     status
> 0x00000000 884737     db2inst1  767        33554432   13
>
> # grep 884737 /proc/*/maps
> #
>
> With patch:
> -----------
>
> # ipcs -m
>
> ------ Shared Memory Segments --------
> key        shmid      owner      perms      bytes      nattch     status
> 0x00000000 884737     db2inst1  767        33554432   13
>
> # grep 884737 /proc/*/maps
> /proc/11110/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11111/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11112/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11113/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11114/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11115/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11116/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11117/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11118/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11121/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11122/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11124/maps:4000389c000-4000589c000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11575/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
>
>
>
> Here is the patch.
>
> "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
> memory (shmid). It was useful in debugging, but its changed recently. 
> This patch sets inode number to shared memory id to match /proc/pid/maps.

Theoretically it makes the stacked file concept more brittle, because
it means the lower layers can't care about their inode number.

We do need something to tie these things together.

So I suspect what makes most sense is to simply rename the dentry
SYSVID<segmentid>

That should give you the necessary information while not doing something
that is a long term maintenance problem.

Do you think you can cook up a patch to that effect?
Otherwise I will see if I can.

In practice I'm not really against your change, but I would prefer
to leave the code in a state where someone can reimplement hugetlbfs
or shmfs and we simply don't care.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-06 17:02 ` Eric W. Biederman
@ 2007-06-06 17:37   ` Badari Pulavarty
  2007-06-06 18:24     ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-06 17:37 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Andrew Morton, linux-mm, lkml

On Wed, 2007-06-06 at 11:02 -0600, Eric W. Biederman wrote:
> Badari Pulavarty <pbadari@us.ibm.com> writes:
> 
> > Hi Eric,
> >
> > Your recent cleanup to shm code, namely
> >
> > [PATCH] shm: make sysv ipc shared memory use stacked files
> >
> > took away one of the debugging feature for shm segments.
> > Originally, shmid were forced to be the inode numbers and
> > they show up in /proc/pid/maps for the process which mapped
> > this shared memory segments (vma listing). That way, its easy
> > to find out who all mapped this shared memory segment. Your
> > patchset, took away the inode# setting. So, we can't easily
> > match the shmem segments to /proc/pid/maps easily. (It was
> > really useful in tracking down a customer problem recently). 
> > Is this done deliberately ? Anything wrong in setting this back ?
> >
> > Comments ?
> >
> > Thanks,
> > Badari
> >
> > Without patch:
> > --------------
> >
> > # ipcs -m
> >
> > ------ Shared Memory Segments --------
> > key        shmid      owner      perms      bytes      nattch     status
> > 0x00000000 884737     db2inst1  767        33554432   13
> >
> > # grep 884737 /proc/*/maps
> > #
> >
> > With patch:
> > -----------
> >
> > # ipcs -m
> >
> > ------ Shared Memory Segments --------
> > key        shmid      owner      perms      bytes      nattch     status
> > 0x00000000 884737     db2inst1  767        33554432   13
> >
> > # grep 884737 /proc/*/maps
> > /proc/11110/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11111/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11112/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11113/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11114/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11115/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11116/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11117/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11118/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11121/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11122/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11124/maps:4000389c000-4000589c000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11575/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> >
> >
> >
> > Here is the patch.
> >
> > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
> > memory (shmid). It was useful in debugging, but its changed recently. 
> > This patch sets inode number to shared memory id to match /proc/pid/maps.
> 
> Theoretically it makes the stacked file concept more brittle, because
> it means the lower layers can't care about their inode number.
> 
> We do need something to tie these things together.
> 
> So I suspect what makes most sense is to simply rename the dentry
> SYSVID<segmentid>

Yep. Currently, we use part of "key" as the dentry name. For example,

# ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x083d0d74 851968     db2inst1  767        33554432   13

# grep 83d0d74 /proc/*/maps
/proc/11110/maps:40004724000-40006724000 rw-s 00000000 00:08 851968  /SYSV083d0d74 (deleted)
/proc/11111/maps:40004724000-40006724000 rw-s 00000000 00:08 851968  /SYSV083d0d74 (deleted)
/proc/11112/maps:40004724000-40006724000 rw-s 00000000 00:08 851968  /SYSV083d0d74 (deleted)
/proc/11113/maps:40004724000-40006724000 rw-s 00000000 00:08 851968  /SYSV083d0d74 (deleted)
..

The issue is with the ones with key = 0x0000000, like following:

# ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 884737     db2inst1  767        33554432   13
0x00000000 950275     db2fenc1  701        23052288   13

There is no unique way to identify them easily :(

I guess, like you suggested, we can change the dentry name to use shmid
instead of the portions of the "key" to make it unique. I think, I can 
work out a patch for this.


> 
> That should give you the necessary information while not doing something
> that is a long term maintenance problem.
> 
> Do you think you can cook up a patch to that effect?
> Otherwise I will see if I can.
> 
> In practice I'm not really against your change, but I would prefer
> to leave the code in a state where someone can reimplement hugetlbfs
> or shmfs and we simply don't care.

Thanks for your suggestion.

Thanks,
Badari


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-06 17:37   ` Badari Pulavarty
@ 2007-06-06 18:24     ` Eric W. Biederman
  0 siblings, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2007-06-06 18:24 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Andrew Morton, linux-mm, lkml

Badari Pulavarty <pbadari@us.ibm.com> writes:
>
> ------ Shared Memory Segments --------
> key        shmid      owner      perms      bytes      nattch     status
> 0x00000000 884737     db2inst1  767        33554432   13
> 0x00000000 950275     db2fenc1  701        23052288   13
>
> There is no unique way to identify them easily :(
>
> I guess, like you suggested, we can change the dentry name to use shmid
> instead of the portions of the "key" to make it unique. I think, I can 
> work out a patch for this.

Thanks.  That should be more robust.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-08  3:45       ` Eric W. Biederman
  2007-06-08  4:41         ` Albert Cahalan
@ 2007-06-08 16:07         ` Badari Pulavarty
  1 sibling, 0 replies; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-08 16:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Albert Cahalan, Andrew Morton, linux-kernel,
	linux-mm, torvalds


Eric W. Biederman wrote:

>
>At this point given that we actually have a small user space dependency
>and the fact that after I have reviewed the code it looks harmless to
>change the inode number of those inodes, in both cases they are just
>anonymous inodes generated with new_inode, and anything that we wrap
>is likely to be equally so.
>
>So it looks to me like we need to do three things:
>- Fix the inode number
>
Okay. its already done.

>
>- Fix the name on the hugetlbfs dentry to hold the key
>
I don't see need for doing this for hugetlbfs inodes. Currently, they 
don't base their
name on "key" + basing on the "key" is kind of useless anyway (its not 
unique).

>
>- Add a big fat comment that user space programs depend on this
>  behavior of both the dentry name and the inode number.
>
I don't think, the user-space can depend on the dentry-name. It can only 
depend
on inode# to match shmid. (since key is not unique esp. for key=0x00000000).

BTW, I agree that shmid is not unique even without namespaces as its 
based on
seq# and we wrap seq#.

Thanks,
Badari




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-08  5:55           ` Eric W. Biederman
@ 2007-06-08  6:51             ` Albert Cahalan
  0 siblings, 0 replies; 25+ messages in thread
From: Albert Cahalan @ 2007-06-08  6:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Andrew Morton, linux-kernel, linux-mm, pbadari,
	torvalds

On 6/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "Albert Cahalan" <acahalan@gmail.com> writes:
> > On 6/7/07, Eric W. Biederman <ebiederm@xmission.com> wrote:

> >> So it looks to me like we need to do three things:
> >> - Fix the inode number
> >> - Fix the name on the hugetlbfs dentry to hold the key
> >> - Add a big fat comment that user space programs depend on this
> >>   behavior of both the dentry name and the inode number.
> >
> > Assuming that this proposed fix goes in:
> >
> > Since the inode number is the shmid, and this is a number
> > that the kernel randomly chooses AFAIK, there should be
> > no need to have different shm segments sharing the same
> > inode number.
>
> Where we run into inode number confusion is that all of
> these shm segments are actually files on a tmpfs filesystem
> somewhere, and by making the inode number the shmid we loose
> the tmpfs inode number.  So it is possible we get tmpfs inode
> number conflicts.  However the inode number is not used for
> anything, and the files are not visible in any other way except
> as shm segments so it doesn't matter.

Eh, the kernel choses both shmid and tmpfs inode number.
You could set a high bit in one or the other.

> There is another case with ipc namespaces where we ultimately need
> to support duplicate shmids on the same machine (so migration
> is a possibility).  However by and large the user space
> processes with duplicate ids should be invisible to each other.

On the bright side, this only screws up people who get the
crazy idea that processes can be migrated.

> > The situation with the key is a bit more disturbing, though
> > we already hit that anyway when IPC_PRIVATE is used.
> > (why anybody would NOT use IPC_PRIVATE is a mystery)
> > So having the key in the name doesn't make things worse.
>
> Having "SYSV" in the name appears mandatory.  Otherwise you
> don't even know it is a shm file. Although I may be confused.

It's mandatory for a different reason: to satisfy parsers.

It is nearly useless for identifying shm files. Look what I can do:
    touch /SYSV00000000
    touch '/SYSV00000000 (deleted)'

(so pmap creates a shm, looks for the address in /proc/self/maps,
determines the device major/minor in use, and then uses that)

> Hmm.  Thinking about this I have just realized that we may want
> to approach this a little differently.  Currently I am reusing
> the dentry and inode structure that hugetlbfs and tmpfs return
> me, and simply have a distinct struct file for each shm mapping.
>
> There is a little more cost but it may actually make sense to have
> a dentry and inode that is specific to shm.c so we can do whatever
> we need to without adding requirements to the normal tmpfs or hugtlb
> code.

Piggybacking on tmpfs has always seemed a bit dirty to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-08  4:41         ` Albert Cahalan
@ 2007-06-08  5:55           ` Eric W. Biederman
  2007-06-08  6:51             ` Albert Cahalan
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2007-06-08  5:55 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Serge E. Hallyn, Andrew Morton, linux-kernel, linux-mm, pbadari,
	torvalds

"Albert Cahalan" <acahalan@gmail.com> writes:

> On 6/7/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> So it looks to me like we need to do three things:
>> - Fix the inode number
>> - Fix the name on the hugetlbfs dentry to hold the key
>> - Add a big fat comment that user space programs depend on this
>>   behavior of both the dentry name and the inode number.
>
> Assuming that this proposed fix goes in:
>
> Since the inode number is the shmid, and this is a number
> that the kernel randomly chooses AFAIK, there should be
> no need to have different shm segments sharing the same
> inode number.

Where we run into inode number confusion is that all of
these shm segments are actually files on a tmpfs filesystem
somewhere, and by making the inode number the shmid we loose
the tmpfs inode number.  So it is possible we get tmpfs inode
number conflicts.  However the inode number is not used for
anything, and the files are not visible in any other way except
as shm segments so it doesn't matter.

There is another case with ipc namespaces where we ultimately need
to support duplicate shmids on the same machine (so migration
is a possibility).  However by and large the user space
processes with duplicate ids should be invisible to each other.

> The situation with the key is a bit more disturbing, though
> we already hit that anyway when IPC_PRIVATE is used.
> (why anybody would NOT use IPC_PRIVATE is a mystery)
> So having the key in the name doesn't make things worse.

Having "SYSV" in the name appears mandatory.  Otherwise you
don't even know it is a shm file. Although I may be confused.

> I have some concern about the device minor number.
> This should be the same for all shm mappings; I do not
> know if the behavior changed.

So I haven't changed anything here.  But I haven't really
looked either.

I don't have a clue if hugetlbfs files use the same device minor
number as tmpfs files.

Hmm.  Thinking about this I have just realized that we may want
to approach this a little differently.  Currently I am reusing
the dentry and inode structure that hugetlbfs and tmpfs return
me, and simply have a distinct struct file for each shm mapping.

There is a little more cost but it may actually make sense to have
a dentry and inode that is specific to shm.c so we can do whatever
we need to without adding requirements to the normal tmpfs or hugtlb
code.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-08  3:45       ` Eric W. Biederman
@ 2007-06-08  4:41         ` Albert Cahalan
  2007-06-08  5:55           ` Eric W. Biederman
  2007-06-08 16:07         ` Badari Pulavarty
  1 sibling, 1 reply; 25+ messages in thread
From: Albert Cahalan @ 2007-06-08  4:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Andrew Morton, linux-kernel, linux-mm, pbadari,
	torvalds

On 6/7/07, Eric W. Biederman <ebiederm@xmission.com> wrote:

> So it looks to me like we need to do three things:
> - Fix the inode number
> - Fix the name on the hugetlbfs dentry to hold the key
> - Add a big fat comment that user space programs depend on this
>   behavior of both the dentry name and the inode number.

Assuming that this proposed fix goes in:

Since the inode number is the shmid, and this is a number
that the kernel randomly chooses AFAIK, there should be
no need to have different shm segments sharing the same
inode number.

The situation with the key is a bit more disturbing, though
we already hit that anyway when IPC_PRIVATE is used.
(why anybody would NOT use IPC_PRIVATE is a mystery)
So having the key in the name doesn't make things worse.

I have some concern about the device minor number.
This should be the same for all shm mappings; I do not
know if the behavior changed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 16:20     ` Serge E. Hallyn
@ 2007-06-08  3:45       ` Eric W. Biederman
  2007-06-08  4:41         ` Albert Cahalan
  2007-06-08 16:07         ` Badari Pulavarty
  0 siblings, 2 replies; 25+ messages in thread
From: Eric W. Biederman @ 2007-06-08  3:45 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Albert Cahalan, Andrew Morton, linux-kernel, linux-mm, pbadari, torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Ok, so IIUC the problem was that inode->i_ino was being set to the id,
> and the id can be the same for different things in two namespaces.

There is nothing preventing inode number collisions in this code even
without multiple namespaces, and even when it was functioning
correctly.  However as it does not seem possible to find these files
through normal filesystem operations that does not seem to be a problem.

> So aside from not using the id as inode->i_ino, an alternative is to use
> a separate superblock, spearate mqeueue fs, for each ipc ns.
>
> I haven't looked at that enough to see whether it's feasible, i.e. I 
> don't know what else mqueue fs is used for.  Eric, does that sound
> reasonable to you?

At this point given that we actually have a small user space dependency
and the fact that after I have reviewed the code it looks harmless to
change the inode number of those inodes, in both cases they are just
anonymous inodes generated with new_inode, and anything that we wrap
is likely to be equally so.

So it looks to me like we need to do three things:
- Fix the inode number
- Fix the name on the hugetlbfs dentry to hold the key
- Add a big fat comment that user space programs depend on this
  behavior of both the dentry name and the inode number.

So Badari it looks like your original patch plus a little bit is
what we need.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 22:22                     ` Serge E. Hallyn
@ 2007-06-07 23:57                       ` Badari Pulavarty
  0 siblings, 0 replies; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-07 23:57 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andrew Morton, Albert Cahalan, lkml, linux-mm, ebiederm, torvalds


Serge E. Hallyn wrote:

>Quoting Serge E. Hallyn (serge@hallyn.com):
>
>>Quoting Badari Pulavarty (pbadari@us.ibm.com):
>>
>>>On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
>>>
>>>>Quoting Badari Pulavarty (pbadari@us.ibm.com):
>>>>
>>>>>On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
>>>>>
>>>>>>On Thu, 07 Jun 2007 10:06:37 -0700
>>>>>>Badari Pulavarty <pbadari@us.ibm.com> wrote:
>>>>>>
>>>>>>>On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
>>>>>>>
>>>>>>>>On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
>>>>>>>>
>>>>>>>>>BTW, I agree with Eric that its would be nice to use shmid as part
>>>>>>>>>of name instead of forcing to be as inode number. It should be
>>>>>>>>>possible for pmap to workout shmid from "key" or name. Isn't it ?
>>>>>>>>>
>>>>>>>>It is not at all nice.
>>>>>>>>
>>>>>>>>1. it's incompatible ABI breakage
>>>>>>>>2. where will you put the key then, in the inode? :-)
>>>>>>>>
>>>>>>>Nope. Currently "key" is part of the name (but its not unique).
>>>>>>>
>>>>>>>>Changing to "SYSVID%d" is no good either. Look, people
>>>>>>>>are ***parsing*** this stuff in /proc. The /proc filesystem
>>>>>>>>is not some random sandbox to be playing in.
>>>>>>>>
>>>>>>>>Before you go messing with it, note that the device number
>>>>>>>>also matters. (it's per-boot dynamic, but that's OK)
>>>>>>>>That's how one knows that /SYSV00000000 is not just
>>>>>>>>a regular file; sadly these didn't get a non-/ prefix.
>>>>>>>>(and no you can't fix that now; it's way too late)
>>>>>>>>
>>>>>>>>Next time you feel like breaking an ABI, mind putting
>>>>>>>>"LET'S BREAK AN ABI!" in the subject of your email?
>>>>>>>>
>>>>>>>I am not breaking ABI. Its already broken in the current
>>>>>>>mainline. I am trying to fix it by putting back the ino#
>>>>>>>as shmid. Eric had a suggestion that, instead of depending
>>>>>>>on the inode# to be shmid, we could embed shmid into name
>>>>>>>(instead of "key" which is currently not unique).
>>>>>>>
>>>>>>>>BTW, I suspect this kind of thing also breaks:
>>>>>>>>a. fuser, lsof, and other resource usage display tools
>>>>>>>>b. various obscure emulators (similar to valgrind)
>>>>>>>>
>>>>>>>If you strongly feel that "old" behaviour needs to be retained, 
>>>>>>>
>>>>>>yup, we should put it back.  The change was, afaik, accidental.
>>>>>>
>>>>>>>here is the patch I originally suggested.
>>>>>>>
>>>>>>Confused.  Will this one-liner fix all the userspace breakage to which
>>>>>>Albert refers?
>>>>>>
>>>>>Yes. Albert, please correct me if I am wrong.
>>>>>
>>>>It will, but could lead to two different inodes with the same i_ino,
>>>>right?
>>>>
>>>Only if we generate same ID in two different namespaces. Is it currently
>>>possible ? 
>>>
>>Should be nothing stopping it.
>>
>
>(just to be more certain, a quick test showed I can get id 0 for
>different keys, and different ids for the same key 0xff, in different
>ipc namespaces)
>
Funny. I played with it and decided that it can happen :)

Thanks,
Badari



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 22:08                   ` Serge E. Hallyn
  2007-06-07 22:21                     ` Badari Pulavarty
@ 2007-06-07 22:22                     ` Serge E. Hallyn
  2007-06-07 23:57                       ` Badari Pulavarty
  1 sibling, 1 reply; 25+ messages in thread
From: Serge E. Hallyn @ 2007-06-07 22:22 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Badari Pulavarty, Andrew Morton, Albert Cahalan, lkml, linux-mm,
	ebiederm, torvalds

Quoting Serge E. Hallyn (serge@hallyn.com):
> Quoting Badari Pulavarty (pbadari@us.ibm.com):
> > On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
> > > Quoting Badari Pulavarty (pbadari@us.ibm.com):
> > > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > > > 
> > > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > > > On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > > > > > 
> > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > > > > 
> > > > > > > It is not at all nice.
> > > > > > > 
> > > > > > > 1. it's incompatible ABI breakage
> > > > > > > 2. where will you put the key then, in the inode? :-)
> > > > > > 
> > > > > > Nope. Currently "key" is part of the name (but its not unique).
> > > > > > 
> > > > > > > 
> > > > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > > > is not some random sandbox to be playing in.
> > > > > > > 
> > > > > > > Before you go messing with it, note that the device number
> > > > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > > > That's how one knows that /SYSV00000000 is not just
> > > > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > > > (and no you can't fix that now; it's way too late)
> > > > > > > 
> > > > > > > Next time you feel like breaking an ABI, mind putting
> > > > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > > > > 
> > > > > > I am not breaking ABI. Its already broken in the current
> > > > > > mainline. I am trying to fix it by putting back the ino#
> > > > > > as shmid. Eric had a suggestion that, instead of depending
> > > > > > on the inode# to be shmid, we could embed shmid into name
> > > > > > (instead of "key" which is currently not unique).
> > > > > > 
> > > > > > > BTW, I suspect this kind of thing also breaks:
> > > > > > > a. fuser, lsof, and other resource usage display tools
> > > > > > > b. various obscure emulators (similar to valgrind)
> > > > > > 
> > > > > > If you strongly feel that "old" behaviour needs to be retained, 
> > > > > 
> > > > > yup, we should put it back.  The change was, afaik, accidental.
> > > > > 
> > > > > > here is the patch I originally suggested.
> > > > > 
> > > > > Confused.  Will this one-liner fix all the userspace breakage to which
> > > > > Albert refers?
> > > > 
> > > > Yes. Albert, please correct me if I am wrong.
> > > 
> > > It will, but could lead to two different inodes with the same i_ino,
> > > right?
> > 
> > Only if we generate same ID in two different namespaces. Is it currently
> > possible ? 
> 
> Should be nothing stopping it.

(just to be more certain, a quick test showed I can get id 0 for
different keys, and different ids for the same key 0xff, in different
ipc namespaces)

-serge

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 22:08                   ` Serge E. Hallyn
@ 2007-06-07 22:21                     ` Badari Pulavarty
  2007-06-07 22:22                     ` Serge E. Hallyn
  1 sibling, 0 replies; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-07 22:21 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andrew Morton, Albert Cahalan, lkml, linux-mm, ebiederm, torvalds


Serge E. Hallyn wrote:

>Quoting Badari Pulavarty (pbadari@us.ibm.com):
>
>>On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
>>
>>>Quoting Badari Pulavarty (pbadari@us.ibm.com):
>>>
>>>>On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
>>>>
>>>>>On Thu, 07 Jun 2007 10:06:37 -0700
>>>>>Badari Pulavarty <pbadari@us.ibm.com> wrote:
>>>>>
>>>>>>On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
>>>>>>
>>>>>>>On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
>>>>>>>
>>>>>>>>BTW, I agree with Eric that its would be nice to use shmid as part
>>>>>>>>of name instead of forcing to be as inode number. It should be
>>>>>>>>possible for pmap to workout shmid from "key" or name. Isn't it ?
>>>>>>>>
>>>>>>>It is not at all nice.
>>>>>>>
>>>>>>>1. it's incompatible ABI breakage
>>>>>>>2. where will you put the key then, in the inode? :-)
>>>>>>>
>>>>>>Nope. Currently "key" is part of the name (but its not unique).
>>>>>>
>>>>>>>Changing to "SYSVID%d" is no good either. Look, people
>>>>>>>are ***parsing*** this stuff in /proc. The /proc filesystem
>>>>>>>is not some random sandbox to be playing in.
>>>>>>>
>>>>>>>Before you go messing with it, note that the device number
>>>>>>>also matters. (it's per-boot dynamic, but that's OK)
>>>>>>>That's how one knows that /SYSV00000000 is not just
>>>>>>>a regular file; sadly these didn't get a non-/ prefix.
>>>>>>>(and no you can't fix that now; it's way too late)
>>>>>>>
>>>>>>>Next time you feel like breaking an ABI, mind putting
>>>>>>>"LET'S BREAK AN ABI!" in the subject of your email?
>>>>>>>
>>>>>>I am not breaking ABI. Its already broken in the current
>>>>>>mainline. I am trying to fix it by putting back the ino#
>>>>>>as shmid. Eric had a suggestion that, instead of depending
>>>>>>on the inode# to be shmid, we could embed shmid into name
>>>>>>(instead of "key" which is currently not unique).
>>>>>>
>>>>>>>BTW, I suspect this kind of thing also breaks:
>>>>>>>a. fuser, lsof, and other resource usage display tools
>>>>>>>b. various obscure emulators (similar to valgrind)
>>>>>>>
>>>>>>If you strongly feel that "old" behaviour needs to be retained, 
>>>>>>
>>>>>yup, we should put it back.  The change was, afaik, accidental.
>>>>>
>>>>>>here is the patch I originally suggested.
>>>>>>
>>>>>Confused.  Will this one-liner fix all the userspace breakage to which
>>>>>Albert refers?
>>>>>
>>>>Yes. Albert, please correct me if I am wrong.
>>>>
>>>It will, but could lead to two different inodes with the same i_ino,
>>>right?
>>>
>>Only if we generate same ID in two different namespaces. Is it currently
>>possible ? 
>>
>
>Should be nothing stopping it.
>
>But like I say we never find the inode based on i_ino, and don't hash
>the inode, so it might be ok.
>
Correct. We might end up with same shmid - which mean same inode# shows 
up in /proc/pid/maps.
If we don't unshare pid namespace or look from parent namespace - we 
will end up seeing same
shmid/inode# in different /proc/pid/maps, even though they are 
different. But I guess its okay..

Thanks,
Badari



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 21:16                 ` Badari Pulavarty
@ 2007-06-07 22:08                   ` Serge E. Hallyn
  2007-06-07 22:21                     ` Badari Pulavarty
  2007-06-07 22:22                     ` Serge E. Hallyn
  0 siblings, 2 replies; 25+ messages in thread
From: Serge E. Hallyn @ 2007-06-07 22:08 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Serge E. Hallyn, Andrew Morton, Albert Cahalan, lkml, linux-mm,
	ebiederm, torvalds

Quoting Badari Pulavarty (pbadari@us.ibm.com):
> On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
> > Quoting Badari Pulavarty (pbadari@us.ibm.com):
> > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > > 
> > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > > On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > > > > 
> > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > > > 
> > > > > > It is not at all nice.
> > > > > > 
> > > > > > 1. it's incompatible ABI breakage
> > > > > > 2. where will you put the key then, in the inode? :-)
> > > > > 
> > > > > Nope. Currently "key" is part of the name (but its not unique).
> > > > > 
> > > > > > 
> > > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > > is not some random sandbox to be playing in.
> > > > > > 
> > > > > > Before you go messing with it, note that the device number
> > > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > > That's how one knows that /SYSV00000000 is not just
> > > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > > (and no you can't fix that now; it's way too late)
> > > > > > 
> > > > > > Next time you feel like breaking an ABI, mind putting
> > > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > > > 
> > > > > I am not breaking ABI. Its already broken in the current
> > > > > mainline. I am trying to fix it by putting back the ino#
> > > > > as shmid. Eric had a suggestion that, instead of depending
> > > > > on the inode# to be shmid, we could embed shmid into name
> > > > > (instead of "key" which is currently not unique).
> > > > > 
> > > > > > BTW, I suspect this kind of thing also breaks:
> > > > > > a. fuser, lsof, and other resource usage display tools
> > > > > > b. various obscure emulators (similar to valgrind)
> > > > > 
> > > > > If you strongly feel that "old" behaviour needs to be retained, 
> > > > 
> > > > yup, we should put it back.  The change was, afaik, accidental.
> > > > 
> > > > > here is the patch I originally suggested.
> > > > 
> > > > Confused.  Will this one-liner fix all the userspace breakage to which
> > > > Albert refers?
> > > 
> > > Yes. Albert, please correct me if I am wrong.
> > 
> > It will, but could lead to two different inodes with the same i_ino,
> > right?
> 
> Only if we generate same ID in two different namespaces. Is it currently
> possible ? 

Should be nothing stopping it.

But like I say we never find the inode based on i_ino, and don't hash
the inode, so it might be ok.

-serge

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 20:37               ` Serge E. Hallyn
  2007-06-07 20:51                 ` Serge E. Hallyn
@ 2007-06-07 21:16                 ` Badari Pulavarty
  2007-06-07 22:08                   ` Serge E. Hallyn
  1 sibling, 1 reply; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-07 21:16 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andrew Morton, Albert Cahalan, lkml, linux-mm, ebiederm, torvalds

On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
> Quoting Badari Pulavarty (pbadari@us.ibm.com):
> > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > 
> > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > > > 
> > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > > 
> > > > > It is not at all nice.
> > > > > 
> > > > > 1. it's incompatible ABI breakage
> > > > > 2. where will you put the key then, in the inode? :-)
> > > > 
> > > > Nope. Currently "key" is part of the name (but its not unique).
> > > > 
> > > > > 
> > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > is not some random sandbox to be playing in.
> > > > > 
> > > > > Before you go messing with it, note that the device number
> > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > That's how one knows that /SYSV00000000 is not just
> > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > (and no you can't fix that now; it's way too late)
> > > > > 
> > > > > Next time you feel like breaking an ABI, mind putting
> > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > > 
> > > > I am not breaking ABI. Its already broken in the current
> > > > mainline. I am trying to fix it by putting back the ino#
> > > > as shmid. Eric had a suggestion that, instead of depending
> > > > on the inode# to be shmid, we could embed shmid into name
> > > > (instead of "key" which is currently not unique).
> > > > 
> > > > > BTW, I suspect this kind of thing also breaks:
> > > > > a. fuser, lsof, and other resource usage display tools
> > > > > b. various obscure emulators (similar to valgrind)
> > > > 
> > > > If you strongly feel that "old" behaviour needs to be retained, 
> > > 
> > > yup, we should put it back.  The change was, afaik, accidental.
> > > 
> > > > here is the patch I originally suggested.
> > > 
> > > Confused.  Will this one-liner fix all the userspace breakage to which
> > > Albert refers?
> > 
> > Yes. Albert, please correct me if I am wrong.
> 
> It will, but could lead to two different inodes with the same i_ino,
> right?

Only if we generate same ID in two different namespaces. Is it currently
possible ? 

Thanks,
Badari


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 20:37               ` Serge E. Hallyn
@ 2007-06-07 20:51                 ` Serge E. Hallyn
  2007-06-07 21:16                 ` Badari Pulavarty
  1 sibling, 0 replies; 25+ messages in thread
From: Serge E. Hallyn @ 2007-06-07 20:51 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Badari Pulavarty, Andrew Morton, Albert Cahalan, lkml, linux-mm,
	ebiederm, torvalds

Quoting Serge E. Hallyn (serge@hallyn.com):
> Quoting Badari Pulavarty (pbadari@us.ibm.com):
> > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > 
> > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > > > 
> > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > > 
> > > > > It is not at all nice.
> > > > > 
> > > > > 1. it's incompatible ABI breakage
> > > > > 2. where will you put the key then, in the inode? :-)
> > > > 
> > > > Nope. Currently "key" is part of the name (but its not unique).
> > > > 
> > > > > 
> > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > is not some random sandbox to be playing in.
> > > > > 
> > > > > Before you go messing with it, note that the device number
> > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > That's how one knows that /SYSV00000000 is not just
> > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > (and no you can't fix that now; it's way too late)
> > > > > 
> > > > > Next time you feel like breaking an ABI, mind putting
> > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > > 
> > > > I am not breaking ABI. Its already broken in the current
> > > > mainline. I am trying to fix it by putting back the ino#
> > > > as shmid. Eric had a suggestion that, instead of depending
> > > > on the inode# to be shmid, we could embed shmid into name
> > > > (instead of "key" which is currently not unique).
> > > > 
> > > > > BTW, I suspect this kind of thing also breaks:
> > > > > a. fuser, lsof, and other resource usage display tools
> > > > > b. various obscure emulators (similar to valgrind)
> > > > 
> > > > If you strongly feel that "old" behaviour needs to be retained, 
> > > 
> > > yup, we should put it back.  The change was, afaik, accidental.
> > > 
> > > > here is the patch I originally suggested.
> > > 
> > > Confused.  Will this one-liner fix all the userspace breakage to which
> > > Albert refers?
> > 
> > Yes. Albert, please correct me if I am wrong.
> 
> It will, but could lead to two different inodes with the same i_ino,
> right?

Well I guess it's not *technically* a problem since these inodes are
never hashed.

-serge

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 19:59             ` Badari Pulavarty
@ 2007-06-07 20:37               ` Serge E. Hallyn
  2007-06-07 20:51                 ` Serge E. Hallyn
  2007-06-07 21:16                 ` Badari Pulavarty
  0 siblings, 2 replies; 25+ messages in thread
From: Serge E. Hallyn @ 2007-06-07 20:37 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Andrew Morton, Albert Cahalan, lkml, linux-mm, ebiederm, torvalds

Quoting Badari Pulavarty (pbadari@us.ibm.com):
> On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > On Thu, 07 Jun 2007 10:06:37 -0700
> > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > 
> > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > > 
> > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > of name instead of forcing to be as inode number. It should be
> > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > 
> > > > It is not at all nice.
> > > > 
> > > > 1. it's incompatible ABI breakage
> > > > 2. where will you put the key then, in the inode? :-)
> > > 
> > > Nope. Currently "key" is part of the name (but its not unique).
> > > 
> > > > 
> > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > is not some random sandbox to be playing in.
> > > > 
> > > > Before you go messing with it, note that the device number
> > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > That's how one knows that /SYSV00000000 is not just
> > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > (and no you can't fix that now; it's way too late)
> > > > 
> > > > Next time you feel like breaking an ABI, mind putting
> > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > 
> > > I am not breaking ABI. Its already broken in the current
> > > mainline. I am trying to fix it by putting back the ino#
> > > as shmid. Eric had a suggestion that, instead of depending
> > > on the inode# to be shmid, we could embed shmid into name
> > > (instead of "key" which is currently not unique).
> > > 
> > > > BTW, I suspect this kind of thing also breaks:
> > > > a. fuser, lsof, and other resource usage display tools
> > > > b. various obscure emulators (similar to valgrind)
> > > 
> > > If you strongly feel that "old" behaviour needs to be retained, 
> > 
> > yup, we should put it back.  The change was, afaik, accidental.
> > 
> > > here is the patch I originally suggested.
> > 
> > Confused.  Will this one-liner fix all the userspace breakage to which
> > Albert refers?
> 
> Yes. Albert, please correct me if I am wrong.

It will, but could lead to two different inodes with the same i_ino,
right?

thanks,
-serge

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 19:48           ` Andrew Morton
@ 2007-06-07 19:59             ` Badari Pulavarty
  2007-06-07 20:37               ` Serge E. Hallyn
  0 siblings, 1 reply; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-07 19:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Albert Cahalan, lkml, linux-mm, ebiederm, torvalds

On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> On Thu, 07 Jun 2007 10:06:37 -0700
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> 
> > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > 
> > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > of name instead of forcing to be as inode number. It should be
> > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > 
> > > It is not at all nice.
> > > 
> > > 1. it's incompatible ABI breakage
> > > 2. where will you put the key then, in the inode? :-)
> > 
> > Nope. Currently "key" is part of the name (but its not unique).
> > 
> > > 
> > > Changing to "SYSVID%d" is no good either. Look, people
> > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > is not some random sandbox to be playing in.
> > > 
> > > Before you go messing with it, note that the device number
> > > also matters. (it's per-boot dynamic, but that's OK)
> > > That's how one knows that /SYSV00000000 is not just
> > > a regular file; sadly these didn't get a non-/ prefix.
> > > (and no you can't fix that now; it's way too late)
> > > 
> > > Next time you feel like breaking an ABI, mind putting
> > > "LET'S BREAK AN ABI!" in the subject of your email?
> > 
> > I am not breaking ABI. Its already broken in the current
> > mainline. I am trying to fix it by putting back the ino#
> > as shmid. Eric had a suggestion that, instead of depending
> > on the inode# to be shmid, we could embed shmid into name
> > (instead of "key" which is currently not unique).
> > 
> > > BTW, I suspect this kind of thing also breaks:
> > > a. fuser, lsof, and other resource usage display tools
> > > b. various obscure emulators (similar to valgrind)
> > 
> > If you strongly feel that "old" behaviour needs to be retained, 
> 
> yup, we should put it back.  The change was, afaik, accidental.
> 
> > here is the patch I originally suggested.
> 
> Confused.  Will this one-liner fix all the userspace breakage to which
> Albert refers?

Yes. Albert, please correct me if I am wrong.

Thanks,
Badari


> > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
> > memory (shmid). It was useful in debugging, but its changed recently. 
> > This patch sets inode number to shared memory id to match /proc/pid/maps.
> > 
> > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
> > 
> > Index: linux-2.6.22-rc4/ipc/shm.c
> > ===================================================================
> > --- linux-2.6.22-rc4.orig/ipc/shm.c	2007-06-04 17:57:25.000000000 -0700
> > +++ linux-2.6.22-rc4/ipc/shm.c	2007-06-06 08:23:57.000000000 -0700
> > @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace 
> >  	shp->shm_nattch = 0;
> >  	shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
> >  	shp->shm_file = file;
> > +	file->f_dentry->d_inode->i_ino = shp->id;
> >  
> >  	ns->shm_tot += numpages;
> >  	shm_unlock(shp);
> > 
> > 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 17:06         ` Badari Pulavarty
@ 2007-06-07 19:48           ` Andrew Morton
  2007-06-07 19:59             ` Badari Pulavarty
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-06-07 19:48 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Albert Cahalan, lkml, linux-mm, ebiederm, torvalds

On Thu, 07 Jun 2007 10:06:37 -0700
Badari Pulavarty <pbadari@us.ibm.com> wrote:

> On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > 
> > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > of name instead of forcing to be as inode number. It should be
> > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > 
> > It is not at all nice.
> > 
> > 1. it's incompatible ABI breakage
> > 2. where will you put the key then, in the inode? :-)
> 
> Nope. Currently "key" is part of the name (but its not unique).
> 
> > 
> > Changing to "SYSVID%d" is no good either. Look, people
> > are ***parsing*** this stuff in /proc. The /proc filesystem
> > is not some random sandbox to be playing in.
> > 
> > Before you go messing with it, note that the device number
> > also matters. (it's per-boot dynamic, but that's OK)
> > That's how one knows that /SYSV00000000 is not just
> > a regular file; sadly these didn't get a non-/ prefix.
> > (and no you can't fix that now; it's way too late)
> > 
> > Next time you feel like breaking an ABI, mind putting
> > "LET'S BREAK AN ABI!" in the subject of your email?
> 
> I am not breaking ABI. Its already broken in the current
> mainline. I am trying to fix it by putting back the ino#
> as shmid. Eric had a suggestion that, instead of depending
> on the inode# to be shmid, we could embed shmid into name
> (instead of "key" which is currently not unique).
> 
> > BTW, I suspect this kind of thing also breaks:
> > a. fuser, lsof, and other resource usage display tools
> > b. various obscure emulators (similar to valgrind)
> 
> If you strongly feel that "old" behaviour needs to be retained, 

yup, we should put it back.  The change was, afaik, accidental.

> here is the patch I originally suggested.

Confused.  Will this one-liner fix all the userspace breakage to which
Albert refers?

> Thanks,
> Badari
> 
> "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
> memory (shmid). It was useful in debugging, but its changed recently. 
> This patch sets inode number to shared memory id to match /proc/pid/maps.
> 
> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
> 
> Index: linux-2.6.22-rc4/ipc/shm.c
> ===================================================================
> --- linux-2.6.22-rc4.orig/ipc/shm.c	2007-06-04 17:57:25.000000000 -0700
> +++ linux-2.6.22-rc4/ipc/shm.c	2007-06-06 08:23:57.000000000 -0700
> @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace 
>  	shp->shm_nattch = 0;
>  	shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
>  	shp->shm_file = file;
> +	file->f_dentry->d_inode->i_ino = shp->id;
>  
>  	ns->shm_tot += numpages;
>  	shm_unlock(shp);
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 16:43       ` Albert Cahalan
@ 2007-06-07 17:06         ` Badari Pulavarty
  2007-06-07 19:48           ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-07 17:06 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Andrew Morton, lkml, linux-mm, ebiederm, torvalds

On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:
> 
> > BTW, I agree with Eric that its would be nice to use shmid as part
> > of name instead of forcing to be as inode number. It should be
> > possible for pmap to workout shmid from "key" or name. Isn't it ?
> 
> It is not at all nice.
> 
> 1. it's incompatible ABI breakage
> 2. where will you put the key then, in the inode? :-)

Nope. Currently "key" is part of the name (but its not unique).

> 
> Changing to "SYSVID%d" is no good either. Look, people
> are ***parsing*** this stuff in /proc. The /proc filesystem
> is not some random sandbox to be playing in.
> 
> Before you go messing with it, note that the device number
> also matters. (it's per-boot dynamic, but that's OK)
> That's how one knows that /SYSV00000000 is not just
> a regular file; sadly these didn't get a non-/ prefix.
> (and no you can't fix that now; it's way too late)
> 
> Next time you feel like breaking an ABI, mind putting
> "LET'S BREAK AN ABI!" in the subject of your email?

I am not breaking ABI. Its already broken in the current
mainline. I am trying to fix it by putting back the ino#
as shmid. Eric had a suggestion that, instead of depending
on the inode# to be shmid, we could embed shmid into name
(instead of "key" which is currently not unique).

> BTW, I suspect this kind of thing also breaks:
> a. fuser, lsof, and other resource usage display tools
> b. various obscure emulators (similar to valgrind)

If you strongly feel that "old" behaviour needs to be retained, 
here is the patch I originally suggested.

Thanks,
Badari

"ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
memory (shmid). It was useful in debugging, but its changed recently. 
This patch sets inode number to shared memory id to match /proc/pid/maps.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>

Index: linux-2.6.22-rc4/ipc/shm.c
===================================================================
--- linux-2.6.22-rc4.orig/ipc/shm.c	2007-06-04 17:57:25.000000000 -0700
+++ linux-2.6.22-rc4/ipc/shm.c	2007-06-06 08:23:57.000000000 -0700
@@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace 
 	shp->shm_nattch = 0;
 	shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
 	shp->shm_file = file;
+	file->f_dentry->d_inode->i_ino = shp->id;
 
 	ns->shm_tot += numpages;
 	shm_unlock(shp);



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07 16:23     ` Badari Pulavarty
@ 2007-06-07 16:43       ` Albert Cahalan
  2007-06-07 17:06         ` Badari Pulavarty
  0 siblings, 1 reply; 25+ messages in thread
From: Albert Cahalan @ 2007-06-07 16:43 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Andrew Morton, lkml, linux-mm, ebiederm, torvalds

On 6/7/07, Badari Pulavarty <pbadari@us.ibm.com> wrote:

> BTW, I agree with Eric that its would be nice to use shmid as part
> of name instead of forcing to be as inode number. It should be
> possible for pmap to workout shmid from "key" or name. Isn't it ?

It is not at all nice.

1. it's incompatible ABI breakage
2. where will you put the key then, in the inode? :-)

Changing to "SYSVID%d" is no good either. Look, people
are ***parsing*** this stuff in /proc. The /proc filesystem
is not some random sandbox to be playing in.

Before you go messing with it, note that the device number
also matters. (it's per-boot dynamic, but that's OK)
That's how one knows that /SYSV00000000 is not just
a regular file; sadly these didn't get a non-/ prefix.
(and no you can't fix that now; it's way too late)

Next time you feel like breaking an ABI, mind putting
"LET'S BREAK AN ABI!" in the subject of your email?

BTW, I suspect this kind of thing also breaks:
a. fuser, lsof, and other resource usage display tools
b. various obscure emulators (similar to valgrind)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07  4:53   ` Albert Cahalan
  2007-06-07 16:20     ` Serge E. Hallyn
@ 2007-06-07 16:23     ` Badari Pulavarty
  2007-06-07 16:43       ` Albert Cahalan
  1 sibling, 1 reply; 25+ messages in thread
From: Badari Pulavarty @ 2007-06-07 16:23 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Andrew Morton, lkml, linux-mm, ebiederm, torvalds

On Thu, 2007-06-07 at 00:53 -0400, Albert Cahalan wrote:
> On 6/6/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <acahalan@gmail.com> wrote:
> > > Eric W. Biederman writes:
> > > > Badari Pulavarty <pbadari@us.ibm.com> writes:
> > >
> > > >> Your recent cleanup to shm code, namely
> > > >>
> > > >> [PATCH] shm: make sysv ipc shared memory use stacked files
> > > >>
> > > >> took away one of the debugging feature for shm segments.
> > > >> Originally, shmid were forced to be the inode numbers and
> > > >> they show up in /proc/pid/maps for the process which mapped
> > > >> this shared memory segments (vma listing). That way, its easy
> > > >> to find out who all mapped this shared memory segment. Your
> > > >> patchset, took away the inode# setting. So, we can't easily
> > > >> match the shmem segments to /proc/pid/maps easily. (It was
> > > >> really useful in tracking down a customer problem recently).
> > > >> Is this done deliberately ? Anything wrong in setting this back ?
> > > >
> > > > Theoretically it makes the stacked file concept more brittle,
> > > > because it means the lower layers can't care about their inode
> > > > number.
> > > >
> > > > We do need something to tie these things together.
> > > >
> > > > So I suspect what makes most sense is to simply rename the
> > > > dentry SYSVID<segmentid>
> > >
> > > Please stop breaking things in /proc. The pmap command relys
> > > on the old behavior.
> >
> > What effect did this change have upon the pmap command?  Details, please.
> >
> > > It's time to revert.
> >
> > Probably true, but we'd need to understand what the impact was.
> 
> Very simply, pmap reports the shmid.
> 
> albert 0 ~$ pmap `pidof X` | egrep -2 shmid
> 30050000  16384K rw-s-  /dev/fb0
> 31050000    152K rw---    [ anon ]
> 31076000    384K rw-s-    [ shmid=0x3f428000 ]
> 310d6000    384K rw-s-    [ shmid=0x3f430001 ]
> 31136000    384K rw-s-    [ shmid=0x3f438002 ]
> 31196000    384K rw-s-    [ shmid=0x3f440003 ]
> 311f6000    384K rw-s-    [ shmid=0x3f448004 ]
> 31256000    384K rw-s-    [ shmid=0x3f450005 ]
> 312b6000    384K rw-s-    [ shmid=0x3f460006 ]
> 31316000    384K rw-s-    [ shmid=0x3f870007 ]
> 31491000    140K r----  /usr/share/fonts/type1/gsfonts/n021003l.pfb
> 3150e000   9496K rw---    [ anon ]

pmap seems to get shmid from "ino#" field of /proc/pid/map.
Its already broken in current mainline.

But, the breakage is not due to namespaces or container effort :(
Its due to noble effort from Eric to clean up the shm code,
take out the hacks to handle hugetlbfs and make the code
more streamlined and readable.

If we really really want old behaviour, we need my one line
patch to force shmid as inode# :(

BTW, I agree with Eric that its would be nice to use shmid as part
of name instead of forcing to be as inode number. It should be
possible for pmap to workout shmid from "key" or name. Isn't it ?

Andrew/Linus, its up to you to figure out if its worth breaking.
Here is the patch to base dentry-name on shmid - so we don't
need to use ino# to identify shmid.

Thanks,
Badari

Instead of basing dentry name on the shm "key", base it on
"shmid" - so it shows up clearly in /proc/pid/maps. Earlier
we were forcing ino# to match shmid.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Index: linux-2.6.22-rc4/ipc/shm.c
===================================================================
--- linux-2.6.22-rc4.orig/ipc/shm.c	2007-06-04 17:57:25.000000000 -0700
+++ linux-2.6.22-rc4/ipc/shm.c	2007-06-06 13:43:36.000000000 -0700
@@ -364,6 +364,14 @@ static int newseg (struct ipc_namespace 
 		return error;
 	}
 
+	error = -ENOSPC;
+	id = shm_addid(ns, shp);
+	if(id == -1)
+		goto no_id;
+
+	/* Build an id, so we can use it for filename */
+	shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
+
 	if (shmflg & SHM_HUGETLB) {
 		/* hugetlb_zero_setup takes care of mlock user accounting */
 		file = hugetlb_zero_setup(size);
@@ -377,34 +385,28 @@ static int newseg (struct ipc_namespace 
 		if  ((shmflg & SHM_NORESERVE) &&
 				sysctl_overcommit_memory != OVERCOMMIT_NEVER)
 			acctflag = 0;
-		sprintf (name, "SYSV%08x", key);
+		sprintf (name, "SYSVID%d", shp->id);
 		file = shmem_file_setup(name, size, acctflag);
 	}
 	error = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto no_file;
 
-	error = -ENOSPC;
-	id = shm_addid(ns, shp);
-	if(id == -1) 
-		goto no_id;
-
 	shp->shm_cprid = current->tgid;
 	shp->shm_lprid = 0;
 	shp->shm_atim = shp->shm_dtim = 0;
 	shp->shm_ctim = get_seconds();
 	shp->shm_segsz = size;
 	shp->shm_nattch = 0;
-	shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
 	shp->shm_file = file;
 
 	ns->shm_tot += numpages;
 	shm_unlock(shp);
 	return shp->id;
 
-no_id:
-	fput(file);
 no_file:
+	shm_rmid(ns, shp->id);
+no_id:
 	security_shm_free(shp);
 	ipc_rcu_putref(shp);
 	return error;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07  4:53   ` Albert Cahalan
@ 2007-06-07 16:20     ` Serge E. Hallyn
  2007-06-08  3:45       ` Eric W. Biederman
  2007-06-07 16:23     ` Badari Pulavarty
  1 sibling, 1 reply; 25+ messages in thread
From: Serge E. Hallyn @ 2007-06-07 16:20 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Andrew Morton, linux-kernel, linux-mm, ebiederm, pbadari, torvalds

Quoting Albert Cahalan (acahalan@gmail.com):
> On 6/6/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> >On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <acahalan@gmail.com> 
> >wrote:
> >> Eric W. Biederman writes:
> >> > Badari Pulavarty <pbadari@us.ibm.com> writes:
> >>
> >> >> Your recent cleanup to shm code, namely
> >> >>
> >> >> [PATCH] shm: make sysv ipc shared memory use stacked files
> >> >>
> >> >> took away one of the debugging feature for shm segments.
> >> >> Originally, shmid were forced to be the inode numbers and
> >> >> they show up in /proc/pid/maps for the process which mapped
> >> >> this shared memory segments (vma listing). That way, its easy
> >> >> to find out who all mapped this shared memory segment. Your
> >> >> patchset, took away the inode# setting. So, we can't easily
> >> >> match the shmem segments to /proc/pid/maps easily. (It was
> >> >> really useful in tracking down a customer problem recently).
> >> >> Is this done deliberately ? Anything wrong in setting this back ?
> >> >
> >> > Theoretically it makes the stacked file concept more brittle,
> >> > because it means the lower layers can't care about their inode
> >> > number.
> >> >
> >> > We do need something to tie these things together.
> >> >
> >> > So I suspect what makes most sense is to simply rename the
> >> > dentry SYSVID<segmentid>
> >>
> >> Please stop breaking things in /proc. The pmap command relys
> >> on the old behavior.
> >
> >What effect did this change have upon the pmap command?  Details, please.
> >
> >> It's time to revert.
> >
> >Probably true, but we'd need to understand what the impact was.
> 
> Very simply, pmap reports the shmid.
> 
> albert 0 ~$ pmap `pidof X` | egrep -2 shmid
> 30050000  16384K rw-s-  /dev/fb0
> 31050000    152K rw---    [ anon ]
> 31076000    384K rw-s-    [ shmid=0x3f428000 ]
> 310d6000    384K rw-s-    [ shmid=0x3f430001 ]
> 31136000    384K rw-s-    [ shmid=0x3f438002 ]
> 31196000    384K rw-s-    [ shmid=0x3f440003 ]
> 311f6000    384K rw-s-    [ shmid=0x3f448004 ]
> 31256000    384K rw-s-    [ shmid=0x3f450005 ]
> 312b6000    384K rw-s-    [ shmid=0x3f460006 ]
> 31316000    384K rw-s-    [ shmid=0x3f870007 ]
> 31491000    140K r----  /usr/share/fonts/type1/gsfonts/n021003l.pfb
> 3150e000   9496K rw---    [ anon ]

Ok, so IIUC the problem was that inode->i_ino was being set to the id,
and the id can be the same for different things in two namespaces.

So aside from not using the id as inode->i_ino, an alternative is to use
a separate superblock, spearate mqeueue fs, for each ipc ns.

I haven't looked at that enough to see whether it's feasible, i.e. I 
don't know what else mqueue fs is used for.  Eric, does that sound
reasonable to you?

thanks,
-serge

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07  3:44 ` Andrew Morton
@ 2007-06-07  4:53   ` Albert Cahalan
  2007-06-07 16:20     ` Serge E. Hallyn
  2007-06-07 16:23     ` Badari Pulavarty
  0 siblings, 2 replies; 25+ messages in thread
From: Albert Cahalan @ 2007-06-07  4:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, ebiederm, pbadari, torvalds

On 6/6/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <acahalan@gmail.com> wrote:
> > Eric W. Biederman writes:
> > > Badari Pulavarty <pbadari@us.ibm.com> writes:
> >
> > >> Your recent cleanup to shm code, namely
> > >>
> > >> [PATCH] shm: make sysv ipc shared memory use stacked files
> > >>
> > >> took away one of the debugging feature for shm segments.
> > >> Originally, shmid were forced to be the inode numbers and
> > >> they show up in /proc/pid/maps for the process which mapped
> > >> this shared memory segments (vma listing). That way, its easy
> > >> to find out who all mapped this shared memory segment. Your
> > >> patchset, took away the inode# setting. So, we can't easily
> > >> match the shmem segments to /proc/pid/maps easily. (It was
> > >> really useful in tracking down a customer problem recently).
> > >> Is this done deliberately ? Anything wrong in setting this back ?
> > >
> > > Theoretically it makes the stacked file concept more brittle,
> > > because it means the lower layers can't care about their inode
> > > number.
> > >
> > > We do need something to tie these things together.
> > >
> > > So I suspect what makes most sense is to simply rename the
> > > dentry SYSVID<segmentid>
> >
> > Please stop breaking things in /proc. The pmap command relys
> > on the old behavior.
>
> What effect did this change have upon the pmap command?  Details, please.
>
> > It's time to revert.
>
> Probably true, but we'd need to understand what the impact was.

Very simply, pmap reports the shmid.

albert 0 ~$ pmap `pidof X` | egrep -2 shmid
30050000  16384K rw-s-  /dev/fb0
31050000    152K rw---    [ anon ]
31076000    384K rw-s-    [ shmid=0x3f428000 ]
310d6000    384K rw-s-    [ shmid=0x3f430001 ]
31136000    384K rw-s-    [ shmid=0x3f438002 ]
31196000    384K rw-s-    [ shmid=0x3f440003 ]
311f6000    384K rw-s-    [ shmid=0x3f448004 ]
31256000    384K rw-s-    [ shmid=0x3f450005 ]
312b6000    384K rw-s-    [ shmid=0x3f460006 ]
31316000    384K rw-s-    [ shmid=0x3f870007 ]
31491000    140K r----  /usr/share/fonts/type1/gsfonts/n021003l.pfb
3150e000   9496K rw---    [ anon ]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
  2007-06-07  3:27 Albert Cahalan
@ 2007-06-07  3:44 ` Andrew Morton
  2007-06-07  4:53   ` Albert Cahalan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-06-07  3:44 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel, linux-mm, ebiederm, pbadari, torvalds

On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <acahalan@gmail.com> wrote:

> Eric W. Biederman writes:
> > Badari Pulavarty <pbadari@us.ibm.com> writes:
> 
> >> Your recent cleanup to shm code, namely
> >>
> >> [PATCH] shm: make sysv ipc shared memory use stacked files
> >>
> >> took away one of the debugging feature for shm segments.
> >> Originally, shmid were forced to be the inode numbers and
> >> they show up in /proc/pid/maps for the process which mapped
> >> this shared memory segments (vma listing). That way, its easy
> >> to find out who all mapped this shared memory segment. Your
> >> patchset, took away the inode# setting. So, we can't easily
> >> match the shmem segments to /proc/pid/maps easily. (It was
> >> really useful in tracking down a customer problem recently).
> >> Is this done deliberately ? Anything wrong in setting this back ?
> >
> > Theoretically it makes the stacked file concept more brittle,
> > because it means the lower layers can't care about their inode
> > number.
> >
> > We do need something to tie these things together.
> >
> > So I suspect what makes most sense is to simply rename the
> > dentry SYSVID<segmentid>
> 
> Please stop breaking things in /proc. The pmap command relys
> on the old behavior.

What effect did this change have upon the pmap command?  Details, please.

> It's time to revert.

Probably true, but we'd need to understand what the impact was.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
@ 2007-06-07  3:27 Albert Cahalan
  2007-06-07  3:44 ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Albert Cahalan @ 2007-06-07  3:27 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm, ebiederm, pbadari, torvalds

Eric W. Biederman writes:
> Badari Pulavarty <pbadari@us.ibm.com> writes:

>> Your recent cleanup to shm code, namely
>>
>> [PATCH] shm: make sysv ipc shared memory use stacked files
>>
>> took away one of the debugging feature for shm segments.
>> Originally, shmid were forced to be the inode numbers and
>> they show up in /proc/pid/maps for the process which mapped
>> this shared memory segments (vma listing). That way, its easy
>> to find out who all mapped this shared memory segment. Your
>> patchset, took away the inode# setting. So, we can't easily
>> match the shmem segments to /proc/pid/maps easily. (It was
>> really useful in tracking down a customer problem recently).
>> Is this done deliberately ? Anything wrong in setting this back ?
>
> Theoretically it makes the stacked file concept more brittle,
> because it means the lower layers can't care about their inode
> number.
>
> We do need something to tie these things together.
>
> So I suspect what makes most sense is to simply rename the
> dentry SYSVID<segmentid>

Please stop breaking things in /proc. The pmap command relys
on the old behavior. It's time to revert. Put back the segment ID
where it belongs, and leave the key where it belongs too.

Containers are NOT worth breaking our ABIs left and right.
We don't need to leap off that bridge just because Solaris did,
unless you can explain why complexity and bloat are desirable.
We already have SE Linux, chroot, KVM, and several more!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-06-08 16:07 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-06 16:07 [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid Badari Pulavarty
2007-06-06 17:02 ` Eric W. Biederman
2007-06-06 17:37   ` Badari Pulavarty
2007-06-06 18:24     ` Eric W. Biederman
2007-06-07  3:27 Albert Cahalan
2007-06-07  3:44 ` Andrew Morton
2007-06-07  4:53   ` Albert Cahalan
2007-06-07 16:20     ` Serge E. Hallyn
2007-06-08  3:45       ` Eric W. Biederman
2007-06-08  4:41         ` Albert Cahalan
2007-06-08  5:55           ` Eric W. Biederman
2007-06-08  6:51             ` Albert Cahalan
2007-06-08 16:07         ` Badari Pulavarty
2007-06-07 16:23     ` Badari Pulavarty
2007-06-07 16:43       ` Albert Cahalan
2007-06-07 17:06         ` Badari Pulavarty
2007-06-07 19:48           ` Andrew Morton
2007-06-07 19:59             ` Badari Pulavarty
2007-06-07 20:37               ` Serge E. Hallyn
2007-06-07 20:51                 ` Serge E. Hallyn
2007-06-07 21:16                 ` Badari Pulavarty
2007-06-07 22:08                   ` Serge E. Hallyn
2007-06-07 22:21                     ` Badari Pulavarty
2007-06-07 22:22                     ` Serge E. Hallyn
2007-06-07 23:57                       ` Badari Pulavarty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox