From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f179.google.com (mail-pd0-f179.google.com [209.85.192.179]) by kanga.kvack.org (Postfix) with ESMTP id 35C956B0253 for ; Mon, 27 Jul 2015 15:33:47 -0400 (EDT) Received: by pdjr16 with SMTP id r16so57849110pdj.3 for ; Mon, 27 Jul 2015 12:33:46 -0700 (PDT) Received: from mail-pa0-x22e.google.com (mail-pa0-x22e.google.com. [2607:f8b0:400e:c03::22e]) by mx.google.com with ESMTPS id xy9si3696538pac.198.2015.07.27.12.33.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jul 2015 12:33:46 -0700 (PDT) Received: by pacan13 with SMTP id an13so57267976pac.1 for ; Mon, 27 Jul 2015 12:33:45 -0700 (PDT) Date: Mon, 27 Jul 2015 12:32:53 -0700 (PDT) From: Hugh Dickins Subject: Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments. In-Reply-To: <1437741275-5388-1-git-send-email-sds@tycho.nsa.gov> Message-ID: References: <1437741275-5388-1-git-send-email-sds@tycho.nsa.gov> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Stephen Smalley Cc: mstevens@fedoraproject.org, linux-kernel@vger.kernel.org, nyc@holomorphy.com, hughd@google.com, akpm@linux-foundation.org, manfred@colorfullife.com, dave@stgolabs.net, linux-mm@kvack.org, wagi@monom.org, prarit@redhat.com, torvalds@linux-foundation.org, david@fromorbit.com, esandeen@redhat.com, eparis@redhat.com, selinux@tycho.nsa.gov, paul@paul-moore.com, linux-security-module@vger.kernel.org On Fri, 24 Jul 2015, Stephen Smalley wrote: > The shm implementation internally uses shmem or hugetlbfs inodes > for shm segments. As these inodes are never directly exposed to > userspace and only accessed through the shm operations which are > already hooked by security modules, mark the inodes with the > S_PRIVATE flag so that inode security initialization and permission > checking is skipped. > > This was motivated by the following lockdep warning: > Jul 22 14:36:40 fc23 kernel: > ====================================================== > Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking > dependency detected ] > Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 > Tainted: G W > Jul 22 14:36:40 fc23 kernel: > ------------------------------------------------------- > Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock: > Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at: > [] shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock: > Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at: > [] SyS_shmdt+0x4b/0x180 > Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock. > Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in > reverse order) is: > Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}: > Jul 22 14:36:40 fc23 kernel: [] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] __might_fault+0x7a/0xa0 > Jul 22 14:36:40 fc23 kernel: [] filldir+0x9e/0x130 > Jul 22 14:36:40 fc23 kernel: [] > xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_readdir+0x1b4/0x330 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_file_readdir+0x2b/0x30 [xfs] > Jul 22 14:36:40 fc23 kernel: [] iterate_dir+0x97/0x130 > Jul 22 14:36:40 fc23 kernel: [] SyS_getdents+0x91/0x120 > Jul 22 14:36:40 fc23 kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #2 (&xfs_dir_ilock_class){++++.+}: > Jul 22 14:36:40 fc23 kernel: [] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] > down_read_nested+0x57/0xa0 > Jul 22 14:36:40 fc23 kernel: [] > xfs_ilock+0x167/0x350 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_ilock_attr_map_shared+0x38/0x50 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_attr_get+0xbd/0x190 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_xattr_get+0x3d/0x70 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > generic_getxattr+0x4f/0x70 > Jul 22 14:36:40 fc23 kernel: [] > inode_doinit_with_dentry+0x162/0x670 > Jul 22 14:36:40 fc23 kernel: [] > sb_finish_set_opts+0xd9/0x230 > Jul 22 14:36:40 fc23 kernel: [] > selinux_set_mnt_opts+0x35c/0x660 > Jul 22 14:36:40 fc23 kernel: [] > superblock_doinit+0x77/0xf0 > Jul 22 14:36:40 fc23 kernel: [] > delayed_superblock_init+0x10/0x20 > Jul 22 14:36:40 fc23 kernel: [] > iterate_supers+0xb3/0x110 > Jul 22 14:36:40 fc23 kernel: [] > selinux_complete_init+0x2f/0x40 > Jul 22 14:36:40 fc23 kernel: [] > security_load_policy+0x103/0x600 > Jul 22 14:36:40 fc23 kernel: [] > sel_write_load+0xc1/0x750 > Jul 22 14:36:40 fc23 kernel: [] __vfs_write+0x37/0x100 > Jul 22 14:36:40 fc23 kernel: [] vfs_write+0xa9/0x1a0 > Jul 22 14:36:40 fc23 kernel: [] SyS_write+0x58/0xd0 > Jul 22 14:36:40 fc23 kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #1 (&isec->lock){+.+.+.}: > Jul 22 14:36:40 fc23 kernel: [] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] > mutex_lock_nested+0x7f/0x3e0 > Jul 22 14:36:40 fc23 kernel: [] > inode_doinit_with_dentry+0xb9/0x670 > Jul 22 14:36:40 fc23 kernel: [] > selinux_d_instantiate+0x1c/0x20 > Jul 22 14:36:40 fc23 kernel: [] > security_d_instantiate+0x36/0x60 > Jul 22 14:36:40 fc23 kernel: [] d_instantiate+0x54/0x70 > Jul 22 14:36:40 fc23 kernel: [] > __shmem_file_setup+0xdc/0x240 > Jul 22 14:36:40 fc23 kernel: [] > shmem_file_setup+0x10/0x20 > Jul 22 14:36:40 fc23 kernel: [] newseg+0x290/0x3a0 > Jul 22 14:36:40 fc23 kernel: [] ipcget+0x208/0x2d0 > Jul 22 14:36:40 fc23 kernel: [] SyS_shmget+0x54/0x70 > Jul 22 14:36:40 fc23 kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #0 (&ids->rwsem){+++++.}: > Jul 22 14:36:40 fc23 kernel: [] > __lock_acquire+0x1a78/0x1d00 > Jul 22 14:36:40 fc23 kernel: [] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] down_write+0x5a/0xc0 > Jul 22 14:36:40 fc23 kernel: [] shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [] remove_vma+0x45/0x80 > Jul 22 14:36:40 fc23 kernel: [] do_munmap+0x2b0/0x460 > Jul 22 14:36:40 fc23 kernel: [] SyS_shmdt+0xb5/0x180 > Jul 22 14:36:40 fc23 kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this: > Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 &ids->rwsem --> > &xfs_dir_ilock_class --> &mm->mmap_sem > Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario: > Jul 22 14:36:40 fc23 kernel: CPU0 CPU1 > Jul 22 14:36:40 fc23 kernel: ---- ---- > Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem); > Jul 22 14:36:40 fc23 kernel: > lock(&xfs_dir_ilock_class); > Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem); > Jul 22 14:36:40 fc23 kernel: lock(&ids->rwsem); > Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK *** > Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597: > Jul 22 14:36:40 fc23 kernel: #0: (&mm->mmap_sem){++++++}, at: > [] SyS_shmdt+0x4b/0x180 > Jul 22 14:36:40 fc23 kernel: #012stack backtrace: > Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G > W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 > Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware > Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 > 05/20/2014 > Jul 22 14:36:40 fc23 kernel: 0000000000000000 000000006cb6fe9d > ffff88019ff07c58 ffffffff81868175 > Jul 22 14:36:40 fc23 kernel: 0000000000000000 ffffffff82aea390 > ffff88019ff07ca8 ffffffff81105903 > Jul 22 14:36:40 fc23 kernel: ffff88019ff07c78 ffff88019ff07d08 > 0000000000000001 ffff8800b75108f0 > Jul 22 14:36:40 fc23 kernel: Call Trace: > Jul 22 14:36:40 fc23 kernel: [] dump_stack+0x4c/0x65 > Jul 22 14:36:40 fc23 kernel: [] print_circular_bug+0x1e3/0x250 > Jul 22 14:36:40 fc23 kernel: [] __lock_acquire+0x1a78/0x1d00 > Jul 22 14:36:40 fc23 kernel: [] ? unlink_file_vma+0x33/0x60 > Jul 22 14:36:40 fc23 kernel: [] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] ? shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [] down_write+0x5a/0xc0 > Jul 22 14:36:40 fc23 kernel: [] ? shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [] shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [] remove_vma+0x45/0x80 > Jul 22 14:36:40 fc23 kernel: [] do_munmap+0x2b0/0x460 > Jul 22 14:36:40 fc23 kernel: [] ? SyS_shmdt+0x4b/0x180 > Jul 22 14:36:40 fc23 kernel: [] SyS_shmdt+0xb5/0x180 > Jul 22 14:36:40 fc23 kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x76 > > Reported-by: Morten Stevens > Signed-off-by: Stephen Smalley Acked-by: Hugh Dickins but with one reservation below... > --- > This version only differs in the patch description, which restores > the original lockdep trace from Morten Stevens. It was unfortunately > mangled in the prior version. > > fs/hugetlbfs/inode.c | 2 ++ > ipc/shm.c | 2 +- > mm/shmem.c | 4 ++-- > 3 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 0cf74df..973c24c 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, > inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0); > if (!inode) > goto out_dentry; > + if (creat_flags == HUGETLB_SHMFS_INODE) > + inode->i_flags |= S_PRIVATE; I wonder if you would do better just to set S_PRIVATE unconditionally there. hugetlb_file_setup() has two callsites, neither of which exposes an fd. One of them is shm.c's newseg(), which is getting us into the lockdep trouble that you're fixing here. The other is mmap.c's mmap_pgoff(). Now I don't think that will ever get into lockdep trouble (no mutex or rwsem has been taken at that point), but might your change above introduce (perhaps now or perhaps in future) an inconsistency between how SElinux checks are applied to a SHM area, and how they are applied to a MAP_ANONYMOUS|MAP_HUGETLB area, and how they are applied to a straight MAP_ANONYMOUS area? I think your patch as it stands brings SHM into line with MAP_ANONYMOUS, but leaves MAP_ANONYMOUS|MAP_HUGETLB going the old way. Perhaps an anomaly would appear when mprotect() is used? It's up to you: I think your patch is okay as is, but I just wonder if it has a surprise in store for the future. Hugh > > file = ERR_PTR(-ENOMEM); > if (hugetlb_reserve_pages(inode, 0, > diff --git a/ipc/shm.c b/ipc/shm.c > index 06e5cf2..4aef24d 100644 > --- a/ipc/shm.c > +++ b/ipc/shm.c > @@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) > if ((shmflg & SHM_NORESERVE) && > sysctl_overcommit_memory != OVERCOMMIT_NEVER) > acctflag = VM_NORESERVE; > - file = shmem_file_setup(name, size, acctflag); > + file = shmem_kernel_file_setup(name, size, acctflag); > } > error = PTR_ERR(file); > if (IS_ERR(file)) > diff --git a/mm/shmem.c b/mm/shmem.c > index 4caf8ed..dbe0c1e 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3363,8 +3363,8 @@ put_path: > * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be > * kernel internal. There will be NO LSM permission checks against the > * underlying inode. So users of this interface must do LSM checks at a > - * higher layer. The one user is the big_key implementation. LSM checks > - * are provided at the key level rather than the inode level. > + * higher layer. The users are the big_key and shm implementations. LSM > + * checks are provided at the key or shm level rather than the inode. > * @name: name for dentry (to be seen in /proc//maps > * @size: size to be set for the file > * @flags: VM_NORESERVE suppresses pre-accounting of the entire object size > -- > 2.1.0 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org