From: Sedat Dilek <sedat.dilek@gmail.com>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>,
Manfred Spraul <manfred@colorfullife.com>
Subject: Re: [PATCH 00/11] sysv ipc shared mem optimizations
Date: Wed, 26 Jun 2013 10:08:00 +0200 [thread overview]
Message-ID: <CA+icZUVKoEvRfLAOoF-inJyHHGdyMh9sSXDT7SYatRBt_F7iug@mail.gmail.com> (raw)
In-Reply-To: <CA+icZUW=+DJVn=-mSUKF6dQkCAdZvn-u=AemQWR2FNWH34=+3w@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]
On Wed, Jun 26, 2013 at 1:55 AM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> Hi,
>
> I have tested the patchset "sysv ipc shared mem optimizations" on top
> of next-20130618.
>
> My typical rebuild with fakeroot & 'make deb-pkg' was fine.
>
> Further tests done with LPT-full (20130503): IPC and SYSCALLS
> test-cases ran successfully.
>
> I am attaching the tarball I have sent already to Davidlohr which contains:
>
> 35070 Jun 26 00:37 3.10.0-rc6-next20130618-1-iniza-small.patch
> 114002 Jun 26 00:48 config-3.10.0-rc6-next20130618-1-iniza-small
> 84489 Jun 26 00:55 dmesg_3.10.0-rc6-next20130618-1-iniza-small.txt
> 38996 Jun 26 00:57 runltp-f-ipc_3.10.0-rc6-next20130618-1-iniza-small_dash.txt
> 760276 Jun 26 01:12
> runltp-f-syscalls_3.10.0-rc6-next20130618-1-iniza-small_dash.txt
>
> NOTES:
> 1. 09/11 needed a small refresh as v2 (attached).
> 2. [ PATCH] ipc,msq: fix race in msgrcv(2) (as v2) applied on top of
> all (attached).
>
> Please feel free to add my Tested-by to the whole series.
>
I have re-tested this patchset also against next-20130624 (09/11
original fits here, 08/11 needs to be cleanpatch-ed).
( In addition I still need the ipc-msg-next fix mentioned above which
is now in akpm's mmots. )
- Sedat -
[1] http://ozlabs.org/~akpm/mmots/broken-out/ipcmsg-shorten-critical-region-in-msgrcv-fix-race-in-msgrcv2.patch
> Regards,
> - Sedat -
[-- Attachment #2: 3.10.0-rc7-next20130624-4-iniza-small.patch --]
[-- Type: application/octet-stream, Size: 35558 bytes --]
Davidlohr Bueso (12):
ipc,msq: fix race in msgrcv(2)
ipc,shm: introduce lockless functions to obtain the ipc object
ipc,shm: shorten critical region in shmctl_down
ipc: drop ipcctl_pre_down
ipc,shm: introduce shmctl_nolock
ipc,shm: make shmctl_nolock lockless
ipc,shm: shorten critical region for shmctl
ipc,shm: cleanup do_shmat pasta
ipc,shm: shorten critical region for shmat
ipc: rename ids->rw_mutex
ipc,msg: drop msg_unlock
ipc: document general ipc locking scheme
Sedat Dilek (7):
kbuild: deb-pkg: Try to determine distribution
kbuild: deb-pkg: Bump year in debian/copyright file
kbuild: deb-pkg: Update git repository URL in debian/copyright file
Merge tag 'next-20130624' of git://git.kernel.org/.../next/linux-next into Linux-Next-v20130624
Merge branch 'deb-pkg-3.10-fixes' into 3.10.0-rc7-next20130624-1-iniza-small
Merge branch 'ipc-msg-next-fixes-from-akpm-mmots' into 3.10.0-rc7-next20130624-4-iniza-small
Merge branch 'sysv-ipc-shm-optimizations-next20130624-testing' into 3.10.0-rc7-next20130624-4-iniza-small
include/linux/ipc_namespace.h | 2 +-
ipc/msg.c | 36 +++----
ipc/namespace.c | 4 +-
ipc/sem.c | 24 ++---
ipc/shm.c | 239 ++++++++++++++++++++++++++----------------
ipc/util.c | 57 +++++-----
ipc/util.h | 7 +-
scripts/package/builddeb | 19 +++-
8 files changed, 220 insertions(+), 168 deletions(-)
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index c4d870b..19c19a5 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -22,7 +22,7 @@ struct ipc_ids {
int in_use;
unsigned short seq;
unsigned short seq_max;
- struct rw_semaphore rw_mutex;
+ struct rw_semaphore rwsem;
struct idr ipcs_idr;
int next_id;
};
diff --git a/ipc/msg.c b/ipc/msg.c
index a1cf70e..14d64f8 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -70,8 +70,6 @@ struct msg_sender {
#define msg_ids(ns) ((ns)->ids[IPC_MSG_IDS])
-#define msg_unlock(msq) ipc_unlock(&(msq)->q_perm)
-
static void freeque(struct ipc_namespace *, struct kern_ipc_perm *);
static int newque(struct ipc_namespace *, struct ipc_params *);
#ifdef CONFIG_PROC_FS
@@ -172,7 +170,7 @@ static inline void msg_rmid(struct ipc_namespace *ns, struct msg_queue *s)
* @ns: namespace
* @params: ptr to the structure that contains the key and msgflg
*
- * Called with msg_ids.rw_mutex held (writer)
+ * Called with msg_ids.rwsem held (writer)
*/
static int newque(struct ipc_namespace *ns, struct ipc_params *params)
{
@@ -259,8 +257,8 @@ static void expunge_all(struct msg_queue *msq, int res)
* removes the message queue from message queue ID IDR, and cleans up all the
* messages associated with this queue.
*
- * msg_ids.rw_mutex (writer) and the spinlock for this message queue are held
- * before freeque() is called. msg_ids.rw_mutex remains locked on exit.
+ * msg_ids.rwsem (writer) and the spinlock for this message queue are held
+ * before freeque() is called. msg_ids.rwsem remains locked on exit.
*/
static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
{
@@ -270,7 +268,8 @@ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
expunge_all(msq, -EIDRM);
ss_wakeup(&msq->q_senders, 1);
msg_rmid(ns, msq);
- msg_unlock(msq);
+ ipc_unlock_object(&msq->q_perm);
+ rcu_read_unlock();
list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) {
atomic_dec(&ns->msg_hdrs);
@@ -282,7 +281,7 @@ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
}
/*
- * Called with msg_ids.rw_mutex and ipcp locked.
+ * Called with msg_ids.rwsem and ipcp locked.
*/
static inline int msg_security(struct kern_ipc_perm *ipcp, int msgflg)
{
@@ -386,9 +385,9 @@ copy_msqid_from_user(struct msqid64_ds *out, void __user *buf, int version)
}
/*
- * This function handles some msgctl commands which require the rw_mutex
+ * This function handles some msgctl commands which require the rwsem
* to be held in write mode.
- * NOTE: no locks must be held, the rw_mutex is taken inside this function.
+ * NOTE: no locks must be held, the rwsem is taken inside this function.
*/
static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
struct msqid_ds __user *buf, int version)
@@ -403,7 +402,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
return -EFAULT;
}
- down_write(&msg_ids(ns).rw_mutex);
+ down_write(&msg_ids(ns).rwsem);
rcu_read_lock();
ipcp = ipcctl_pre_down_nolock(ns, &msg_ids(ns), msqid, cmd,
@@ -459,7 +458,7 @@ out_unlock0:
out_unlock1:
rcu_read_unlock();
out_up:
- up_write(&msg_ids(ns).rw_mutex);
+ up_write(&msg_ids(ns).rwsem);
return err;
}
@@ -494,7 +493,7 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
msginfo.msgmnb = ns->msg_ctlmnb;
msginfo.msgssz = MSGSSZ;
msginfo.msgseg = MSGSEG;
- down_read(&msg_ids(ns).rw_mutex);
+ down_read(&msg_ids(ns).rwsem);
if (cmd == MSG_INFO) {
msginfo.msgpool = msg_ids(ns).in_use;
msginfo.msgmap = atomic_read(&ns->msg_hdrs);
@@ -505,7 +504,7 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
msginfo.msgtql = MSGTQL;
}
max_id = ipc_get_maxid(&msg_ids(ns));
- up_read(&msg_ids(ns).rw_mutex);
+ up_read(&msg_ids(ns).rwsem);
if (copy_to_user(buf, &msginfo, sizeof(struct msginfo)))
return -EFAULT;
return (max_id < 0) ? 0 : max_id;
@@ -895,6 +894,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
if (ipcperms(ns, &msq->q_perm, S_IRUGO))
goto out_unlock1;
+ ipc_lock_object(&msq->q_perm);
msg = find_msg(msq, &msgtyp, mode);
if (!IS_ERR(msg)) {
/*
@@ -903,7 +903,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
*/
if ((bufsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) {
msg = ERR_PTR(-E2BIG);
- goto out_unlock1;
+ goto out_unlock0;
}
/*
* If we are copying, then do not unlink message and do
@@ -911,10 +911,9 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
*/
if (msgflg & MSG_COPY) {
msg = copy_msg(msg, copy);
- goto out_unlock1;
+ goto out_unlock0;
}
- ipc_lock_object(&msq->q_perm);
list_del(&msg->m_list);
msq->q_qnum--;
msq->q_rtime = get_seconds();
@@ -930,10 +929,9 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
/* No message waiting. Wait for a message */
if (msgflg & IPC_NOWAIT) {
msg = ERR_PTR(-ENOMSG);
- goto out_unlock1;
+ goto out_unlock0;
}
- ipc_lock_object(&msq->q_perm);
list_add_tail(&msr_d.r_list, &msq->q_receivers);
msr_d.r_tsk = current;
msr_d.r_msgtype = msgtyp;
@@ -957,7 +955,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
* Prior to destruction, expunge_all(-EIRDM) changes r_msg.
* Thus if r_msg is -EAGAIN, then the queue not yet destroyed.
* rcu_read_lock() prevents preemption between reading r_msg
- * and the spin_lock() inside ipc_lock_by_ptr().
+ * and acquiring the q_perm.lock in ipc_lock_object().
*/
rcu_read_lock();
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 7ee61bf..67dc744 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -81,7 +81,7 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
int next_id;
int total, in_use;
- down_write(&ids->rw_mutex);
+ down_write(&ids->rwsem);
in_use = ids->in_use;
@@ -93,7 +93,7 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
free(ns, perm);
total++;
}
- up_write(&ids->rw_mutex);
+ up_write(&ids->rwsem);
}
static void free_ipc_ns(struct ipc_namespace *ns)
diff --git a/ipc/sem.c b/ipc/sem.c
index 4108889..69b6a21 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -322,7 +322,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum)
}
/*
- * sem_lock_(check_) routines are called in the paths where the rw_mutex
+ * sem_lock_(check_) routines are called in the paths where the rwsem
* is not held.
*
* The caller holds the RCU read lock.
@@ -426,7 +426,7 @@ static inline void sem_rmid(struct ipc_namespace *ns, struct sem_array *s)
* @ns: namespace
* @params: ptr to the structure that contains key, semflg and nsems
*
- * Called with sem_ids.rw_mutex held (as a writer)
+ * Called with sem_ids.rwsem held (as a writer)
*/
static int newary(struct ipc_namespace *ns, struct ipc_params *params)
@@ -492,7 +492,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
/*
- * Called with sem_ids.rw_mutex and ipcp locked.
+ * Called with sem_ids.rwsem and ipcp locked.
*/
static inline int sem_security(struct kern_ipc_perm *ipcp, int semflg)
{
@@ -503,7 +503,7 @@ static inline int sem_security(struct kern_ipc_perm *ipcp, int semflg)
}
/*
- * Called with sem_ids.rw_mutex and ipcp locked.
+ * Called with sem_ids.rwsem and ipcp locked.
*/
static inline int sem_more_checks(struct kern_ipc_perm *ipcp,
struct ipc_params *params)
@@ -994,8 +994,8 @@ static int count_semzcnt (struct sem_array * sma, ushort semnum)
return semzcnt;
}
-/* Free a semaphore set. freeary() is called with sem_ids.rw_mutex locked
- * as a writer and the spinlock for this semaphore set hold. sem_ids.rw_mutex
+/* Free a semaphore set. freeary() is called with sem_ids.rwsem locked
+ * as a writer and the spinlock for this semaphore set hold. sem_ids.rwsem
* remains locked on exit.
*/
static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
@@ -1116,7 +1116,7 @@ static int semctl_nolock(struct ipc_namespace *ns, int semid,
seminfo.semmnu = SEMMNU;
seminfo.semmap = SEMMAP;
seminfo.semume = SEMUME;
- down_read(&sem_ids(ns).rw_mutex);
+ down_read(&sem_ids(ns).rwsem);
if (cmd == SEM_INFO) {
seminfo.semusz = sem_ids(ns).in_use;
seminfo.semaem = ns->used_sems;
@@ -1125,7 +1125,7 @@ static int semctl_nolock(struct ipc_namespace *ns, int semid,
seminfo.semaem = SEMAEM;
}
max_id = ipc_get_maxid(&sem_ids(ns));
- up_read(&sem_ids(ns).rw_mutex);
+ up_read(&sem_ids(ns).rwsem);
if (copy_to_user(p, &seminfo, sizeof(struct seminfo)))
return -EFAULT;
return (max_id < 0) ? 0: max_id;
@@ -1431,9 +1431,9 @@ copy_semid_from_user(struct semid64_ds *out, void __user *buf, int version)
}
/*
- * This function handles some semctl commands which require the rw_mutex
+ * This function handles some semctl commands which require the rwsem
* to be held in write mode.
- * NOTE: no locks must be held, the rw_mutex is taken inside this function.
+ * NOTE: no locks must be held, the rwsem is taken inside this function.
*/
static int semctl_down(struct ipc_namespace *ns, int semid,
int cmd, int version, void __user *p)
@@ -1448,7 +1448,7 @@ static int semctl_down(struct ipc_namespace *ns, int semid,
return -EFAULT;
}
- down_write(&sem_ids(ns).rw_mutex);
+ down_write(&sem_ids(ns).rwsem);
rcu_read_lock();
ipcp = ipcctl_pre_down_nolock(ns, &sem_ids(ns), semid, cmd,
@@ -1487,7 +1487,7 @@ out_unlock0:
out_unlock1:
rcu_read_unlock();
out_up:
- up_write(&sem_ids(ns).rw_mutex);
+ up_write(&sem_ids(ns).rwsem);
return err;
}
diff --git a/ipc/shm.c b/ipc/shm.c
index c6b4ad5..9017786 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -19,6 +19,9 @@
* namespaces support
* OpenVZ, SWsoft Inc.
* Pavel Emelianov <xemul@openvz.org>
+ *
+ * Better ipc lock (kern_ipc_perm.lock) handling
+ * Davidlohr Bueso <davidlohr.bueso@hp.com>, June 2013.
*/
#include <linux/slab.h>
@@ -80,8 +83,8 @@ void shm_init_ns(struct ipc_namespace *ns)
}
/*
- * Called with shm_ids.rw_mutex (writer) and the shp structure locked.
- * Only shm_ids.rw_mutex remains locked on exit.
+ * Called with shm_ids.rwsem (writer) and the shp structure locked.
+ * Only shm_ids.rwsem remains locked on exit.
*/
static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
{
@@ -124,8 +127,28 @@ void __init shm_init (void)
IPC_SHM_IDS, sysvipc_shm_proc_show);
}
+static inline struct shmid_kernel *shm_obtain_object(struct ipc_namespace *ns, int id)
+{
+ struct kern_ipc_perm *ipcp = ipc_obtain_object(&shm_ids(ns), id);
+
+ if (IS_ERR(ipcp))
+ return ERR_CAST(ipcp);
+
+ return container_of(ipcp, struct shmid_kernel, shm_perm);
+}
+
+static inline struct shmid_kernel *shm_obtain_object_check(struct ipc_namespace *ns, int id)
+{
+ struct kern_ipc_perm *ipcp = ipc_obtain_object_check(&shm_ids(ns), id);
+
+ if (IS_ERR(ipcp))
+ return ERR_CAST(ipcp);
+
+ return container_of(ipcp, struct shmid_kernel, shm_perm);
+}
+
/*
- * shm_lock_(check_) routines are called in the paths where the rw_mutex
+ * shm_lock_(check_) routines are called in the paths where the rwsem
* is not necessarily held.
*/
static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
@@ -182,7 +205,7 @@ static void shm_open(struct vm_area_struct *vma)
* @ns: namespace
* @shp: struct to free
*
- * It has to be called with shp and shm_ids.rw_mutex (writer) locked,
+ * It has to be called with shp and shm_ids.rwsem (writer) locked,
* but returns with shp unlocked and freed.
*/
static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
@@ -230,7 +253,7 @@ static void shm_close(struct vm_area_struct *vma)
struct shmid_kernel *shp;
struct ipc_namespace *ns = sfd->ns;
- down_write(&shm_ids(ns).rw_mutex);
+ down_write(&shm_ids(ns).rwsem);
/* remove from the list of attaches of the shm segment */
shp = shm_lock(ns, sfd->id);
BUG_ON(IS_ERR(shp));
@@ -241,10 +264,10 @@ static void shm_close(struct vm_area_struct *vma)
shm_destroy(ns, shp);
else
shm_unlock(shp);
- up_write(&shm_ids(ns).rw_mutex);
+ up_write(&shm_ids(ns).rwsem);
}
-/* Called with ns->shm_ids(ns).rw_mutex locked */
+/* Called with ns->shm_ids(ns).rwsem locked */
static int shm_try_destroy_current(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
@@ -275,7 +298,7 @@ static int shm_try_destroy_current(int id, void *p, void *data)
return 0;
}
-/* Called with ns->shm_ids(ns).rw_mutex locked */
+/* Called with ns->shm_ids(ns).rwsem locked */
static int shm_try_destroy_orphaned(int id, void *p, void *data)
{
struct ipc_namespace *ns = data;
@@ -286,7 +309,7 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
* We want to destroy segments without users and with already
* exit'ed originating process.
*
- * As shp->* are changed under rw_mutex, it's safe to skip shp locking.
+ * As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
if (shp->shm_creator != NULL)
return 0;
@@ -300,10 +323,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
void shm_destroy_orphaned(struct ipc_namespace *ns)
{
- down_write(&shm_ids(ns).rw_mutex);
+ down_write(&shm_ids(ns).rwsem);
if (shm_ids(ns).in_use)
idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_orphaned, ns);
- up_write(&shm_ids(ns).rw_mutex);
+ up_write(&shm_ids(ns).rwsem);
}
@@ -315,10 +338,10 @@ void exit_shm(struct task_struct *task)
return;
/* Destroy all already created segments, but not mapped yet */
- down_write(&shm_ids(ns).rw_mutex);
+ down_write(&shm_ids(ns).rwsem);
if (shm_ids(ns).in_use)
idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
- up_write(&shm_ids(ns).rw_mutex);
+ up_write(&shm_ids(ns).rwsem);
}
static int shm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
@@ -452,7 +475,7 @@ static const struct vm_operations_struct shm_vm_ops = {
* @ns: namespace
* @params: ptr to the structure that contains key, size and shmflg
*
- * Called with shm_ids.rw_mutex held as a writer.
+ * Called with shm_ids.rwsem held as a writer.
*/
static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
@@ -560,7 +583,7 @@ no_file:
}
/*
- * Called with shm_ids.rw_mutex and ipcp locked.
+ * Called with shm_ids.rwsem and ipcp locked.
*/
static inline int shm_security(struct kern_ipc_perm *ipcp, int shmflg)
{
@@ -571,7 +594,7 @@ static inline int shm_security(struct kern_ipc_perm *ipcp, int shmflg)
}
/*
- * Called with shm_ids.rw_mutex and ipcp locked.
+ * Called with shm_ids.rwsem and ipcp locked.
*/
static inline int shm_more_checks(struct kern_ipc_perm *ipcp,
struct ipc_params *params)
@@ -684,7 +707,7 @@ static inline unsigned long copy_shminfo_to_user(void __user *buf, struct shminf
/*
* Calculate and add used RSS and swap pages of a shm.
- * Called with shm_ids.rw_mutex held as a reader
+ * Called with shm_ids.rwsem held as a reader
*/
static void shm_add_rss_swap(struct shmid_kernel *shp,
unsigned long *rss_add, unsigned long *swp_add)
@@ -711,7 +734,7 @@ static void shm_add_rss_swap(struct shmid_kernel *shp,
}
/*
- * Called with shm_ids.rw_mutex held as a reader
+ * Called with shm_ids.rwsem held as a reader
*/
static void shm_get_stat(struct ipc_namespace *ns, unsigned long *rss,
unsigned long *swp)
@@ -740,9 +763,9 @@ static void shm_get_stat(struct ipc_namespace *ns, unsigned long *rss,
}
/*
- * This function handles some shmctl commands which require the rw_mutex
+ * This function handles some shmctl commands which require the rwsem
* to be held in write mode.
- * NOTE: no locks must be held, the rw_mutex is taken inside this function.
+ * NOTE: no locks must be held, the rwsem is taken inside this function.
*/
static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
struct shmid_ds __user *buf, int version)
@@ -757,14 +780,13 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
return -EFAULT;
}
- down_write(&shm_ids(ns).rw_mutex);
+ down_write(&shm_ids(ns).rwsem);
rcu_read_lock();
- ipcp = ipcctl_pre_down(ns, &shm_ids(ns), shmid, cmd,
- &shmid64.shm_perm, 0);
+ ipcp = ipcctl_pre_down_nolock(ns, &shm_ids(ns), shmid, cmd,
+ &shmid64.shm_perm, 0);
if (IS_ERR(ipcp)) {
err = PTR_ERR(ipcp);
- /* the ipc lock is not held upon failure */
goto out_unlock1;
}
@@ -772,14 +794,16 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
err = security_shm_shmctl(shp, cmd);
if (err)
- goto out_unlock0;
+ goto out_unlock1;
switch (cmd) {
case IPC_RMID:
+ ipc_lock_object(&shp->shm_perm);
/* do_shm_rmid unlocks the ipc object and rcu */
do_shm_rmid(ns, ipcp);
goto out_up;
case IPC_SET:
+ ipc_lock_object(&shp->shm_perm);
err = ipc_update_perm(&shmid64.shm_perm, ipcp);
if (err)
goto out_unlock0;
@@ -787,6 +811,7 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
break;
default:
err = -EINVAL;
+ goto out_unlock1;
}
out_unlock0:
@@ -794,33 +819,28 @@ out_unlock0:
out_unlock1:
rcu_read_unlock();
out_up:
- up_write(&shm_ids(ns).rw_mutex);
+ up_write(&shm_ids(ns).rwsem);
return err;
}
-SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
+static int shmctl_nolock(struct ipc_namespace *ns, int shmid,
+ int cmd, int version, void __user *buf)
{
+ int err;
struct shmid_kernel *shp;
- int err, version;
- struct ipc_namespace *ns;
- if (cmd < 0 || shmid < 0) {
- err = -EINVAL;
- goto out;
+ /* preliminary security checks for *_INFO */
+ if (cmd == IPC_INFO || cmd == SHM_INFO) {
+ err = security_shm_shmctl(NULL, cmd);
+ if (err)
+ return err;
}
- version = ipc_parse_version(&cmd);
- ns = current->nsproxy->ipc_ns;
-
- switch (cmd) { /* replace with proc interface ? */
+ switch (cmd) {
case IPC_INFO:
{
struct shminfo64 shminfo;
- err = security_shm_shmctl(NULL, cmd);
- if (err)
- return err;
-
memset(&shminfo, 0, sizeof(shminfo));
shminfo.shmmni = shminfo.shmseg = ns->shm_ctlmni;
shminfo.shmmax = ns->shm_ctlmax;
@@ -830,9 +850,9 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
if(copy_shminfo_to_user (buf, &shminfo, version))
return -EFAULT;
- down_read(&shm_ids(ns).rw_mutex);
+ down_read(&shm_ids(ns).rwsem);
err = ipc_get_maxid(&shm_ids(ns));
- up_read(&shm_ids(ns).rw_mutex);
+ up_read(&shm_ids(ns).rwsem);
if(err<0)
err = 0;
@@ -842,19 +862,15 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
{
struct shm_info shm_info;
- err = security_shm_shmctl(NULL, cmd);
- if (err)
- return err;
-
memset(&shm_info, 0, sizeof(shm_info));
- down_read(&shm_ids(ns).rw_mutex);
+ down_read(&shm_ids(ns).rwsem);
shm_info.used_ids = shm_ids(ns).in_use;
shm_get_stat (ns, &shm_info.shm_rss, &shm_info.shm_swp);
shm_info.shm_tot = ns->shm_tot;
shm_info.swap_attempts = 0;
shm_info.swap_successes = 0;
err = ipc_get_maxid(&shm_ids(ns));
- up_read(&shm_ids(ns).rw_mutex);
+ up_read(&shm_ids(ns).rwsem);
if (copy_to_user(buf, &shm_info, sizeof(shm_info))) {
err = -EFAULT;
goto out;
@@ -869,27 +885,31 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
struct shmid64_ds tbuf;
int result;
+ rcu_read_lock();
if (cmd == SHM_STAT) {
- shp = shm_lock(ns, shmid);
+ shp = shm_obtain_object(ns, shmid);
if (IS_ERR(shp)) {
err = PTR_ERR(shp);
- goto out;
+ goto out_unlock;
}
result = shp->shm_perm.id;
} else {
- shp = shm_lock_check(ns, shmid);
+ shp = shm_obtain_object_check(ns, shmid);
if (IS_ERR(shp)) {
err = PTR_ERR(shp);
- goto out;
+ goto out_unlock;
}
result = 0;
}
+
err = -EACCES;
if (ipcperms(ns, &shp->shm_perm, S_IRUGO))
goto out_unlock;
+
err = security_shm_shmctl(shp, cmd);
if (err)
goto out_unlock;
+
memset(&tbuf, 0, sizeof(tbuf));
kernel_to_ipc64_perm(&shp->shm_perm, &tbuf.shm_perm);
tbuf.shm_segsz = shp->shm_segsz;
@@ -899,43 +919,76 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
tbuf.shm_cpid = shp->shm_cprid;
tbuf.shm_lpid = shp->shm_lprid;
tbuf.shm_nattch = shp->shm_nattch;
- shm_unlock(shp);
- if(copy_shmid_to_user (buf, &tbuf, version))
+ rcu_read_unlock();
+
+ if (copy_shmid_to_user (buf, &tbuf, version))
err = -EFAULT;
else
err = result;
goto out;
}
+ default:
+ return -EINVAL;
+ }
+
+out_unlock:
+ rcu_read_unlock();
+out:
+ return err;
+}
+
+SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
+{
+ struct shmid_kernel *shp;
+ int err, version;
+ struct ipc_namespace *ns;
+
+ if (cmd < 0 || shmid < 0)
+ return -EINVAL;
+
+ version = ipc_parse_version(&cmd);
+ ns = current->nsproxy->ipc_ns;
+
+ switch (cmd) {
+ case IPC_INFO:
+ case SHM_INFO:
+ case SHM_STAT:
+ case IPC_STAT:
+ return shmctl_nolock(ns, shmid, cmd, version, buf);
+ case IPC_RMID:
+ case IPC_SET:
+ return shmctl_down(ns, shmid, cmd, buf, version);
case SHM_LOCK:
case SHM_UNLOCK:
{
struct file *shm_file;
- shp = shm_lock_check(ns, shmid);
+ rcu_read_lock();
+ shp = shm_obtain_object_check(ns, shmid);
if (IS_ERR(shp)) {
err = PTR_ERR(shp);
- goto out;
+ goto out_unlock1;
}
audit_ipc_obj(&(shp->shm_perm));
+ err = security_shm_shmctl(shp, cmd);
+ if (err)
+ goto out_unlock1;
+ ipc_lock_object(&shp->shm_perm);
if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
kuid_t euid = current_euid();
err = -EPERM;
if (!uid_eq(euid, shp->shm_perm.uid) &&
!uid_eq(euid, shp->shm_perm.cuid))
- goto out_unlock;
+ goto out_unlock0;
if (cmd == SHM_LOCK && !rlimit(RLIMIT_MEMLOCK))
- goto out_unlock;
+ goto out_unlock0;
}
- err = security_shm_shmctl(shp, cmd);
- if (err)
- goto out_unlock;
-
shm_file = shp->shm_file;
if (is_file_hugepages(shm_file))
- goto out_unlock;
+ goto out_unlock0;
if (cmd == SHM_LOCK) {
struct user_struct *user = current_user();
@@ -944,32 +997,31 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
shp->shm_perm.mode |= SHM_LOCKED;
shp->mlock_user = user;
}
- goto out_unlock;
+ goto out_unlock0;
}
/* SHM_UNLOCK */
if (!(shp->shm_perm.mode & SHM_LOCKED))
- goto out_unlock;
+ goto out_unlock0;
shmem_lock(shm_file, 0, shp->mlock_user);
shp->shm_perm.mode &= ~SHM_LOCKED;
shp->mlock_user = NULL;
get_file(shm_file);
- shm_unlock(shp);
+ ipc_unlock_object(&shp->shm_perm);
+ rcu_read_unlock();
shmem_unlock_mapping(shm_file->f_mapping);
+
fput(shm_file);
- goto out;
- }
- case IPC_RMID:
- case IPC_SET:
- err = shmctl_down(ns, shmid, cmd, buf, version);
return err;
+ }
default:
return -EINVAL;
}
-out_unlock:
- shm_unlock(shp);
-out:
+out_unlock0:
+ ipc_unlock_object(&shp->shm_perm);
+out_unlock1:
+ rcu_read_unlock();
return err;
}
@@ -1037,7 +1089,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
* additional creator id...
*/
ns = current->nsproxy->ipc_ns;
- shp = shm_lock_check(ns, shmid);
+ rcu_read_lock();
+ shp = shm_obtain_object_check(ns, shmid);
if (IS_ERR(shp)) {
err = PTR_ERR(shp);
goto out;
@@ -1051,24 +1104,31 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
if (err)
goto out_unlock;
+ ipc_lock_object(&shp->shm_perm);
path = shp->shm_file->f_path;
path_get(&path);
shp->shm_nattch++;
size = i_size_read(path.dentry->d_inode);
- shm_unlock(shp);
+ ipc_unlock_object(&shp->shm_perm);
+ rcu_read_unlock();
err = -ENOMEM;
sfd = kzalloc(sizeof(*sfd), GFP_KERNEL);
- if (!sfd)
- goto out_put_dentry;
+ if (!sfd) {
+ path_put(&path);
+ goto out_nattch;
+ }
file = alloc_file(&path, f_mode,
is_file_hugepages(shp->shm_file) ?
&shm_file_operations_huge :
&shm_file_operations);
err = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_free;
+ if (IS_ERR(file)) {
+ kfree(sfd);
+ path_put(&path);
+ goto out_nattch;
+ }
file->private_data = sfd;
file->f_mapping = shp->shm_file->f_mapping;
@@ -1094,7 +1154,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
addr > current->mm->start_stack - size - PAGE_SIZE * 5)
goto invalid;
}
-
+
addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate);
*raddr = addr;
err = 0;
@@ -1109,7 +1169,7 @@ out_fput:
fput(file);
out_nattch:
- down_write(&shm_ids(ns).rw_mutex);
+ down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
BUG_ON(IS_ERR(shp));
shp->shm_nattch--;
@@ -1117,20 +1177,13 @@ out_nattch:
shm_destroy(ns, shp);
else
shm_unlock(shp);
- up_write(&shm_ids(ns).rw_mutex);
-
-out:
+ up_write(&shm_ids(ns).rwsem);
return err;
out_unlock:
- shm_unlock(shp);
- goto out;
-
-out_free:
- kfree(sfd);
-out_put_dentry:
- path_put(&path);
- goto out_nattch;
+ rcu_read_unlock();
+out:
+ return err;
}
SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg)
diff --git a/ipc/util.c b/ipc/util.c
index 4704223..69ee3c1 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -15,6 +15,14 @@
* Jun 2006 - namespaces ssupport
* OpenVZ, SWsoft Inc.
* Pavel Emelianov <xemul@openvz.org>
+ *
+ * General sysv ipc locking scheme:
+ * when doing ipc id lookups, take the ids->rwsem
+ * rcu_read_lock()
+ * obtain the ipc object (kern_ipc_perm)
+ * perform security, capabilities, auditing and permission checks, etc.
+ * acquire the ipc lock (kern_ipc_perm.lock) throught ipc_lock_object()
+ * perform data updates (ie: SET, RMID, LOCK/UNLOCK commands)
*/
#include <linux/mm.h>
@@ -119,7 +127,7 @@ __initcall(ipc_init);
void ipc_init_ids(struct ipc_ids *ids)
{
- init_rwsem(&ids->rw_mutex);
+ init_rwsem(&ids->rwsem);
ids->in_use = 0;
ids->seq = 0;
@@ -174,7 +182,7 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
* @ids: Identifier set
* @key: The key to find
*
- * Requires ipc_ids.rw_mutex locked.
+ * Requires ipc_ids.rwsem locked.
* Returns the LOCKED pointer to the ipc structure if found or NULL
* if not.
* If key is found ipc points to the owning ipc structure
@@ -208,7 +216,7 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
* ipc_get_maxid - get the last assigned id
* @ids: IPC identifier set
*
- * Called with ipc_ids.rw_mutex held.
+ * Called with ipc_ids.rwsem held.
*/
int ipc_get_maxid(struct ipc_ids *ids)
@@ -246,7 +254,7 @@ int ipc_get_maxid(struct ipc_ids *ids)
* is returned. The 'new' entry is returned in a locked state on success.
* On failure the entry is not locked and a negative err-code is returned.
*
- * Called with writer ipc_ids.rw_mutex held.
+ * Called with writer ipc_ids.rwsem held.
*/
int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
{
@@ -312,9 +320,9 @@ static int ipcget_new(struct ipc_namespace *ns, struct ipc_ids *ids,
{
int err;
- down_write(&ids->rw_mutex);
+ down_write(&ids->rwsem);
err = ops->getnew(ns, params);
- up_write(&ids->rw_mutex);
+ up_write(&ids->rwsem);
return err;
}
@@ -331,7 +339,7 @@ static int ipcget_new(struct ipc_namespace *ns, struct ipc_ids *ids,
*
* On success, the IPC id is returned.
*
- * It is called with ipc_ids.rw_mutex and ipcp->lock held.
+ * It is called with ipc_ids.rwsem and ipcp->lock held.
*/
static int ipc_check_perms(struct ipc_namespace *ns,
struct kern_ipc_perm *ipcp,
@@ -376,7 +384,7 @@ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
* Take the lock as a writer since we are potentially going to add
* a new entry + read locks are not "upgradable"
*/
- down_write(&ids->rw_mutex);
+ down_write(&ids->rwsem);
ipcp = ipc_findkey(ids, params->key);
if (ipcp == NULL) {
/* key not used */
@@ -402,7 +410,7 @@ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
}
ipc_unlock(ipcp);
}
- up_write(&ids->rw_mutex);
+ up_write(&ids->rwsem);
return err;
}
@@ -413,7 +421,7 @@ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
* @ids: IPC identifier set
* @ipcp: ipc perm structure containing the identifier to remove
*
- * ipc_ids.rw_mutex (as a writer) and the spinlock for this ID are held
+ * ipc_ids.rwsem (as a writer) and the spinlock for this ID are held
* before this function is called, and remain locked on the exit.
*/
@@ -621,7 +629,7 @@ struct kern_ipc_perm *ipc_obtain_object(struct ipc_ids *ids, int id)
}
/**
- * ipc_lock - Lock an ipc structure without rw_mutex held
+ * ipc_lock - Lock an ipc structure without rwsem held
* @ids: IPC identifier set
* @id: ipc id to look for
*
@@ -746,26 +754,10 @@ int ipc_update_perm(struct ipc64_perm *in, struct kern_ipc_perm *out)
* It must be called without any lock held and
* - retrieves the ipc with the given id in the given table.
* - performs some audit and permission check, depending on the given cmd
- * - returns the ipc with the ipc lock held in case of success
- * or an err-code without any lock held otherwise.
+ * - returns a pointer to the ipc object or otherwise, the corresponding error.
*
- * Call holding the both the rw_mutex and the rcu read lock.
+ * Call holding the both the rwsem and the rcu read lock.
*/
-struct kern_ipc_perm *ipcctl_pre_down(struct ipc_namespace *ns,
- struct ipc_ids *ids, int id, int cmd,
- struct ipc64_perm *perm, int extra_perm)
-{
- struct kern_ipc_perm *ipcp;
-
- ipcp = ipcctl_pre_down_nolock(ns, ids, id, cmd, perm, extra_perm);
- if (IS_ERR(ipcp))
- goto out;
-
- spin_lock(&ipcp->lock);
-out:
- return ipcp;
-}
-
struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
struct ipc_ids *ids, int id, int cmd,
struct ipc64_perm *perm, int extra_perm)
@@ -782,8 +774,7 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
audit_ipc_obj(ipcp);
if (cmd == IPC_SET)
- audit_ipc_set_perm(extra_perm, perm->uid,
- perm->gid, perm->mode);
+ audit_ipc_set_perm(extra_perm, perm->uid, perm->gid, perm->mode);
euid = current_euid();
if (uid_eq(euid, ipcp->cuid) || uid_eq(euid, ipcp->uid) ||
@@ -884,7 +875,7 @@ static void *sysvipc_proc_start(struct seq_file *s, loff_t *pos)
* Take the lock - this will be released by the corresponding
* call to stop().
*/
- down_read(&ids->rw_mutex);
+ down_read(&ids->rwsem);
/* pos < 0 is invalid */
if (*pos < 0)
@@ -911,7 +902,7 @@ static void sysvipc_proc_stop(struct seq_file *s, void *it)
ids = &iter->ns->ids[iface->ids];
/* Release the lock we took in start() */
- up_read(&ids->rw_mutex);
+ up_read(&ids->rwsem);
}
static int sysvipc_proc_show(struct seq_file *s, void *it)
diff --git a/ipc/util.h b/ipc/util.h
index b6a6a88..0a362ff 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -94,10 +94,10 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
#define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER)
-/* must be called with ids->rw_mutex acquired for writing */
+/* must be called with ids->rwsem acquired for writing */
int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
-/* must be called with ids->rw_mutex acquired for reading */
+/* must be called with ids->rwsem acquired for reading */
int ipc_get_maxid(struct ipc_ids *);
/* must be called with both locks acquired. */
@@ -131,9 +131,6 @@ int ipc_update_perm(struct ipc64_perm *in, struct kern_ipc_perm *out);
struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
struct ipc_ids *ids, int id, int cmd,
struct ipc64_perm *perm, int extra_perm);
-struct kern_ipc_perm *ipcctl_pre_down(struct ipc_namespace *ns,
- struct ipc_ids *ids, int id, int cmd,
- struct ipc64_perm *perm, int extra_perm);
#ifndef CONFIG_ARCH_WANT_IPC_PARSE_VERSION
/* On IA-64, we always use the "64-bit version" of the IPC structures. */
diff --git a/scripts/package/builddeb b/scripts/package/builddeb
index acb8650..7d7c9d8 100644
--- a/scripts/package/builddeb
+++ b/scripts/package/builddeb
@@ -172,9 +172,22 @@ else
fi
maintainer="$name <$email>"
+# Try to determine distribution
+if [ -e $(which lsb_release) ]; then
+ codename=$(lsb_release --codename --short)
+ if [ "$codename" != "" ]; then
+ distribution=$codename
+ else
+ distribution="UNRELEASED"
+ echo "WARNING: The distribution could NOT be determined!"
+ fi
+else
+ echo "HINT: Install lsb_release binary, this helps to identify your distribution!"
+fi
+
# Generate a simple changelog template
cat <<EOF > debian/changelog
-linux-upstream ($packageversion) unstable; urgency=low
+linux-upstream ($packageversion) $distribution; urgency=low
* Custom built Linux kernel.
@@ -188,10 +201,10 @@ This is a packacked upstream version of the Linux kernel.
The sources may be found at most Linux ftp sites, including:
ftp://ftp.kernel.org/pub/linux/kernel
-Copyright: 1991 - 2009 Linus Torvalds and others.
+Copyright: 1991 - 2013 Linus Torvalds and others.
The git repository for mainline kernel development is at:
-git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
next prev parent reply other threads:[~2013-06-26 8:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1372197144-13729-1-git-send-email-davidlohr.bueso@hp.com>
2013-06-25 23:55 ` Sedat Dilek
2013-06-26 8:08 ` Sedat Dilek [this message]
2013-06-28 10:10 ` Sedat Dilek
2013-06-19 1:18 Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+icZUVKoEvRfLAOoF-inJyHHGdyMh9sSXDT7SYatRBt_F7iug@mail.gmail.com \
--to=sedat.dilek@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=davidlohr.bueso@hp.com \
--cc=linux-mm@kvack.org \
--cc=manfred@colorfullife.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox