linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Topi Miettinen <toiwoton@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Tony Luck" <tony.luck@intel.com>,
	"Fenghua Yu" <fenghua.yu@intel.com>,
	"Alexander Graf" <agraf@suse.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, "Doug Ledford" <dledford@redhat.com>,
	"Sean Hefty" <sean.hefty@intel.com>,
	"Hal Rosenstock" <hal.rosenstock@gmail.com>,
	"Mike Marciniszyn" <mike.marciniszyn@intel.com>,
	"Dennis Dalessandro" <dennis.dalessandro@intel.com>,
	"Christian Benvenuti" <benve@cisco.com>,
	"Dave Goodell" <dgoodell@cisco.com>,
	"Sudeep Dutt" <sudeep.dutt@intel.com>,
	"Ashutosh Dixit" <ashutosh.dixit@intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Tejun Heo" <tj@kernel.org>, "Li Zefan" <lizefan@huawei.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Balbir Singh" <bsingharora@gmail.com>,
	"Markus Elfring" <elfring@users.sourceforge.net>,
	"David S. Miller" <davem@davemloft.net>,
	"Nicolas Dichtel" <nicolas.dichtel@6wind.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Konstantin Khlebnikov" <koct9i@gmail.com>,
	"Jiri Slaby" <jslaby@suse.cz>,
	"Cyrill Gorcunov" <gorcunov@openvz.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Dan Carpenter" <dan.carpenter@oracle.com>,
	"Michael Kerrisk" <mtk.manpages@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Marcus Gelderie" <redmnic@gmail.com>,
	"Vladimir Davydov" <vdavydov@virtuozzo.com>,
	"Joe Perches" <joe@perches.com>,
	"Frederic Weisbecker" <fweisbec@gmail.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Andi Kleen" <ak@linux.intel.com>,
	"Oleg Nesterov" <oleg@redhat.com>, "Stas Sergeev" <stsp@list.ru>,
	"Amanieu d'Antras" <amanieu@gmail.com>,
	"Richard Weinberger" <richard@nod.at>,
	"Wang Xiaoqiang" <wangxq10@lzu.edu.cn>,
	"Helge Deller" <deller@gmx.de>,
	"Mateusz Guzik" <mguzik@redhat.com>,
	"Alex Thorlton" <athorlton@sgi.com>,
	"Ben Segall" <bsegall@google.com>,
	"John Stultz" <john.stultz@linaro.org>,
	"Rik van Riel" <riel@redhat.com>,
	"Eric B Munson" <emunson@akamai.com>,
	"Alexey Klimov" <klimov.linux@gmail.com>,
	"Chen Gang" <gang.chen.5i5j@gmail.com>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"David Rientjes" <rientjes@google.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Alexander Kuleshov" <kuleshovmail@gmail.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	"open list:IA64 (Itanium) PLATFORM" <linux-ia64@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC"
	<kvm-ppc@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@vger.kernel.org>,
	"open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)"
	<linuxppc-dev@lists.ozlabs.org>,
	"open list:INFINIBAND SUBSYSTEM" <linux-rdma@vger.kernel.org>,
	"open list:FILESYSTEMS (VFS and infrastructure)"
	<linux-fsdevel@vger.kernel.org>,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	"open list:BPF (Safe dynamic programs and tools)"
	<netdev@vger.kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: [PATCH 00/14] Present useful limits to user (v2)
Date: Wed, 3 Aug 2016 18:20:20 +0000	[thread overview]
Message-ID: <5808f9b5-6558-458b-0487-d20ecea0e903@gmail.com> (raw)
In-Reply-To: <1468578983-28229-1-git-send-email-toiwoton@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2742 bytes --]

Hello,

I'm trying the systemtap approach and it looks promising. The script is
annotating strace-like output with capability, device access and RLIMIT
information. In the end there's a summary. Here's sample output from
wpa_supplicant run:

mprotect(0x7efebf140000, 16384, PROT_READ) = 0 [DATA 548864 -> 573440]
[AS 44986368 -> 45002752]
brk(0x55d9611f8000) = 94392125718528 missing
[Capabilities=CAP_SYS_ADMIN] [AS 45002752 -> 45010944]
open(0x55d960716462, O_RDWR) = 3 [DeviceAllow=/dev/char/1:3 rw ]
open("/dev/random", O_RDONLY|O_NONBLOCK) = 3 [DeviceAllow=/dev/char/1:8 r ]
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 4
[RestrictAddressFamilies=AF_UNIX] [NOFILE 3 -> 4]
open("/etc/wpa_supplicant.conf", O_RDONLY) = 5 [NOFILE 4 -> 5]
socket(PF_NETLINK, SOCK_RAW, 0) = 5 [RestrictAddressFamilies=AF_NETLINK]
socket(PF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, 16) = 6
[RestrictAddressFamilies=AF_NETLINK] [NOFILE 5 -> 6]
socket(PF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, 16) = 7
[RestrictAddressFamilies=AF_NETLINK] [NOFILE 6 -> 7]
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 8
[RestrictAddressFamilies=AF_INET] [NOFILE 7 -> 8]
open("/dev/rfkill", O_RDONLY) = 9 [DeviceAllow=/dev/char/10:58 r ]
[NOFILE 8 -> 9]
socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 10
[RestrictAddressFamilies=AF_UNIX] [NOFILE 9 -> 10]
sendmsg(6, 0x7ffc778f35b0, 0x0) = 36 [Capabilities=CAP_NET_ADMIN]

Summary:
CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_RAW
Consider also missing CapabilityBoundingSet=CAP_SYS_ADMIN
DeviceAllow=/dev/char/1:3 rw
DeviceAllow=/dev/char/1:8 r
DeviceAllow=/dev/char/10:58 r
DeviceAllow=/dev/char/1:9 r
LimitFSIZE=0
LimitDATA=577536
LimitSTACK=139264
LimitCORE=0
LimitNOFILE=15
LimitAS=45146112
LimitNPROC=171
LimitMEMLOCK=0
LimitSIGPENDING=0
LimitMSGQUEUE=0
LimitNICE=0
LimitRTPRIO=0
RestrictAddressFamilies=AF_UNIX AF_INET AF_NETLINK AF_PACKET
MemoryDenyWriteExecute=true

Some values are not correct. NPROC is wrong because staprun needs to be
run as root instead of the separate privileged user for wpa_supplicant
and that messes user process count. DATA/AS/STACK seems to be a bit off.
I can easily use this as systemd service configuration drop-in otherwise.

Now, the relevant part for the kernel is that I'd like to analyze error
paths better, so the system calls would be also annotated when there's a
failure when a RLIMIT is too tight. It would be easier to insert probes
if there was only one path for RLIMIT checks. Would it be OK to make the
function task_rlimit() a full check against the limit and also make it a
non-inlined function, just for improved probing purposes?

There's already error analysis for the capabilities, but there are some
false positive hits (like brk() complaining about missing CAP_SYS_ADMIN
above).

-Topi


[-- Attachment #2: strace.stp --]
[-- Type: text/plain, Size: 19078 bytes --]

#! /bin/sh

# suppress some run-time errors here for cleaner output
//bin/true && exec stap --suppress-handler-errors --skip-badvars $0 ${1+"$@"}

/*
 * Compile:
 * stap -p4 -DSTP_NO_OVERLOAD -m strace
 * Run:
 * /usr/bin/staprun -R -c "/sbin/wpa_supplicant -u -O /run/wpa_supplicant -c /etc/wpa_supplicant.conf -i wlan0" -w /root/strace.ko only_capability_use=1 timestamp=0
 */

/* configuration options; set these with stap -G */
global follow_fork = 0   /* -Gfollow_fork=1 means trace descendant processes too */
global timestamp = 1     /* -Gtimestamp=0 means don't print a syscall timestamp */
global elapsed_time = 0  /* -Gelapsed_time=1 means print a syscall duration too */
global only_capability_use = 0 /* -Gonly_capability_use=1 means print only when capabilities are used */
global thread_argstr%
global thread_time%

global syscalls_nonreturn[2]
global capnames[64]
global used_caps
global missing_caps
global all_used_caps
global all_missing_caps
global accessed_devices[1000]
global all_accessed_devices[1000]
global highwatermark_fsize
global highwatermark_data
global highwatermark_stack
global highwatermark_core
global highwatermark_nproc
global highwatermark_nofile
global highwatermark_memlock
global highwatermark_as
global highwatermark_sigpending
global highwatermark_msgqueue
global highwatermark_nice
global highwatermark_rtprio
global old_highwatermark_fsize
global old_highwatermark_data
global old_highwatermark_stack
global old_highwatermark_core
global old_highwatermark_nproc
global old_highwatermark_nofile
global old_highwatermark_memlock
global old_highwatermark_as
global old_highwatermark_sigpending
global old_highwatermark_msgqueue
global old_highwatermark_nice
global old_highwatermark_rtprio
global afnames[64]
global used_afs
global missing_afs
global all_used_afs
global all_missing_afs
global no_memory_deny_write_execute
global all_memory_deny_write_execute = "true"
global print_syscall


probe begin 
  {
    /* list those syscalls that never .return */
    syscalls_nonreturn["exit"]=1
    syscalls_nonreturn["exit_group"]=1

    // grep '#define CAP_.*[0-9]+$' /usr/src/linux-headers*/include/uapi/linux/capability.h | awk '{ print "capnames[" $3 "] = \"" $2 "\";" }'
    capnames[0] = "CAP_CHOWN";
    capnames[1] = "CAP_DAC_OVERRIDE";
    capnames[2] = "CAP_DAC_READ_SEARCH";
    capnames[3] = "CAP_FOWNER";
    capnames[4] = "CAP_FSETID";
    capnames[5] = "CAP_KILL";
    capnames[6] = "CAP_SETGID";
    capnames[7] = "CAP_SETUID";
    capnames[8] = "CAP_SETPCAP";
    capnames[9] = "CAP_LINUX_IMMUTABLE";
    capnames[10] = "CAP_NET_BIND_SERVICE";
    capnames[11] = "CAP_NET_BROADCAST";
    capnames[12] = "CAP_NET_ADMIN";
    capnames[13] = "CAP_NET_RAW";
    capnames[14] = "CAP_IPC_LOCK";
    capnames[15] = "CAP_IPC_OWNER";
    capnames[16] = "CAP_SYS_MODULE";
    capnames[17] = "CAP_SYS_RAWIO";
    capnames[18] = "CAP_SYS_CHROOT";
    capnames[19] = "CAP_SYS_PTRACE";
    capnames[20] = "CAP_SYS_PACCT";
    capnames[21] = "CAP_SYS_ADMIN";
    capnames[22] = "CAP_SYS_BOOT";
    capnames[23] = "CAP_SYS_NICE";
    capnames[24] = "CAP_SYS_RESOURCE";
    capnames[25] = "CAP_SYS_TIME";
    capnames[26] = "CAP_SYS_TTY_CONFIG";
    capnames[27] = "CAP_MKNOD";
    capnames[28] = "CAP_LEASE";
    capnames[29] = "CAP_AUDIT_WRITE";
    capnames[30] = "CAP_AUDIT_CONTROL";
    capnames[31] = "CAP_SETFCAP";
    capnames[32] = "CAP_MAC_OVERRIDE";
    capnames[33] = "CAP_MAC_ADMIN";
    capnames[34] = "CAP_SYSLOG";
    capnames[35] = "CAP_WAKE_ALARM";
    capnames[36] = "CAP_BLOCK_SUSPEND";
    capnames[37] = "CAP_AUDIT_READ";

    //grep '#define AF_.*' /usr/src/linux-headers-*/include/linux/socket.h | awk '{ print "afnames[" $3 "] = \"" $2 "\"" }'
    afnames[0] = "AF_UNSPEC"
    afnames[1] = "AF_UNIX"
    afnames[2] = "AF_INET"
    afnames[3] = "AF_AX25"
    afnames[4] = "AF_IPX"
    afnames[5] = "AF_APPLETALK"
    afnames[6] = "AF_NETROM"
    afnames[7] = "AF_BRIDGE"
    afnames[8] = "AF_ATMPVC"
    afnames[9] = "AF_X25"
    afnames[10] = "AF_INET6"
    afnames[11] = "AF_ROSE"
    afnames[12] = "AF_DECnet"
    afnames[13] = "AF_NETBEUI"
    afnames[14] = "AF_SECURITY"
    afnames[15] = "AF_KEY"
    afnames[16] = "AF_NETLINK"
    afnames[17] = "AF_PACKET"
    afnames[18] = "AF_ASH"
    afnames[19] = "AF_ECONET"
    afnames[20] = "AF_ATMSVC"
    afnames[21] = "AF_RDS"
    afnames[22] = "AF_SNA"
    afnames[23] = "AF_IRDA"
    afnames[24] = "AF_PPPOX"
    afnames[25] = "AF_WANPIPE"
    afnames[26] = "AF_LLC"
    afnames[27] = "AF_IB"
    afnames[28] = "AF_MPLS"
    afnames[29] = "AF_CAN"
    afnames[30] = "AF_TIPC"
    afnames[31] = "AF_BLUETOOTH"
    afnames[32] = "AF_IUCV"
    afnames[33] = "AF_RXRPC"
    afnames[34] = "AF_ISDN"
    afnames[35] = "AF_PHONET"
    afnames[36] = "AF_IEEE802154"
    afnames[37] = "AF_CAIF"
    afnames[38] = "AF_ALG"
    afnames[39] = "AF_NFC"
    afnames[40] = "AF_VSOCK"
    afnames[41] = "AF_KCM"
  }



function filter_p()
  {
    if (target() == 0) return 0; /* system-wide */
    if (!follow_fork && pid() != target()) return 1; /* single-process */
    if (follow_fork && !target_set_pid(pid())) return 1; /* multi-process */
    return 0;
  }

function caps_to_str(caps)
  {
    str = ""
    for (i = 0; i < 37; i++) # CAP_LAST_CAP
      if (caps & (1 << i)) {
        str .= capnames[i]
	if ((caps & ~((1 << (i + 1)) - 1)) != 0)
	  str .= " "
      }
    return str
  }

function dev_to_str(type, dev, access)
  {
    devs = "/dev/"
    if (type == 1) # DEV_BLOCK
      devs .= "block"
    else
      devs .= "char"
    devs .= sprintf("/%d:%d ", dev >> 32, dev & 0xffffffff)
    if (access & 2) # ACC_READ
      devs .= "r"
    if (access & 4) # ACC_WRITE
      devs .= "w"
    if (access & 1) # ACC_MKNOD
      devs .= "m"
    return devs
  }

function afs_to_str(afs)
  {
    str = ""
    for (i = 0; i < 42; i++) # MAX_AF
      if (afs & (1 << i)) {
        str .= afnames[i]
	if ((afs & ~((1 << (i + 1)) - 1)) != 0)
	  str .= " "
      }
    return str
  }

/* Capabilities */
probe kernel.function("cap_capable@security/commoncap.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && $audit)
      used_caps |= 1 << $cap;
    else
      missing_caps |= 1 << $cap;
  }

/* Devices */
probe kernel.function("__devcgroup_check_permission@security/device_cgroup.c").return
  {
    if (filter_p()) next;

    if ($return == 0)
      accessed_devices[$type, $major << 32 | $minor] |= $access
  }

/* RLIMIT_FSIZE */
probe kernel.function("inode_newsize_ok@fs/attr.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && highwatermark_fsize < $offset)
      highwatermark_fsize = $offset
  }

/* RLIMIT_DATA */
probe kernel.function("prctl_set_mm@kernel/sys.c").return
  {
    if (filter_p()) next;

    if ($return == 0 && highwatermark_data < $prctl_map->end_data - $prctl_map->start_data) {
      highwatermark_data = $prctl_map->end_data - $prctl_map->start_data
      print_syscall = 1
    }
  }

probe kernel.function("do_brk@mm/mmap.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return > 0 && highwatermark_data < task->mm->data_vm << 12) { # PAGE_SHIFT
      highwatermark_data = task->mm->data_vm << 12
      print_syscall = 1
    }
    if ($return > 0 && highwatermark_as < task->mm->total_vm << 12) {
      highwatermark_as = task->mm->total_vm << 12
      print_syscall = 1
    }
  }

/* also RLIMIT_STACK and RLIMIT_MEMLOCK */
probe kernel.function("vm_stat_account@mm/mmap.c").return
  {
    if (filter_p()) next;

    if (highwatermark_data < $mm->data_vm << 12) { # PAGE_SHIFT
      highwatermark_data = $mm->data_vm << 12
      print_syscall = 1
    }
    if (highwatermark_stack < $mm->stack_vm << 12) {
      highwatermark_stack = $mm->stack_vm << 12
      print_syscall = 1
    }
    if (highwatermark_memlock < atomic_long_read(&$mm->locked_vm) << 12) {
      highwatermark_memlock = atomic_long_read(&$mm->locked_vm) << 12
      print_syscall = 1
    }
    if (highwatermark_as < $mm->total_vm << 12) {
      highwatermark_as = $mm->total_vm << 12
      print_syscall = 1
    }
  }

/* RLIMIT_CORE */
probe kernel.function("dump_emit@fs/coredump.c").return
  {
    if (filter_p()) next;

    if (highwatermark_core < $cprm->written) {
      highwatermark_core = $cprm->written
      print_syscall = 1
    }
  }

/* RLIMIT_NPROC */
probe kernel.function("commit_creds@kernel/cred.c").return
  {
    if (filter_p()) next;

    if (highwatermark_nproc < atomic_read(&$new->user->processes)) {
      highwatermark_nproc = atomic_read(&$new->user->processes)
      print_syscall = 1
    }
  }

probe kernel.function("copy_process@kernel/fork.c").return
  {
    if (filter_p()) next;
    printf("return %d\n", $return);
    try {
    if (($return > 0 || $return < -1000) && $return->real_cred && $return->real_cred->user)
      printf("good return %d\n", $return);
      if (highwatermark_nproc < atomic_read(&$return->real_cred->user->processes)) {
	highwatermark_nproc = atomic_read(&$return->real_cred->user->processes)
	print_syscall = 1
      }
    } catch {}
  }

/* RLIMIT_NOFILE */
probe kernel.function("__alloc_fd@fs/file.c").return
  {
    if (filter_p()) next;

    if (($return >= 0 || $return < -1000) && highwatermark_nofile < $return) {
      highwatermark_nofile = $return
      print_syscall = 1
    }
  }

probe kernel.function("do_dup2@fs/file.c").return
  {
    if (filter_p()) next;

    if (($return >= 0 || $return < -1000) && highwatermark_nofile < $return) {
      highwatermark_nofile = $return
      print_syscall = 1
    }
  }

/* RLIMIT_MEMLOCK */
probe kernel.function("sys_bpf@kernel/bpf/syscall.c").return
  {
    if (filter_p()) next;

    task = task_current()
    user = task->real_cred->user
    if ($return == 0 && highwatermark_memlock < atomic_long_read(&user->locked_vm) << 12) { # PAGE_SHIFT
      highwatermark_memlock = atomic_long_read(&user->locked_vm) << 12
      print_syscall = 1
    }
  }

probe kernel.function("perf_mmap@kernel/events/core.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return == 0 && highwatermark_memlock < task->mm->pinned_vm << 12) { # PAGE_SHIFT
      highwatermark_memlock = task->mm->pinned_vm << 12
      print_syscall = 1
    }
  }

probe kernel.function("do_mlock@mm/mlock.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return == 0 && highwatermark_memlock < task->mm->locked_vm << 12) { # PAGE_SHIFT
      highwatermark_memlock = task->mm->locked_vm << 12
      print_syscall = 1
    }
  }

probe kernel.function("sys_mlockall@mm/mlock.c").return
  {
    if (filter_p()) next;

    task = task_current()
    if ($return == 0 && highwatermark_memlock < task->mm->total_vm << 12) { # PAGE_SHIFT
      highwatermark_memlock = task->mm->total_vm << 12
      print_syscall = 1
    }
  }

/* RLIMIT_SIGPENDING */
probe kernel.function("__sigqueue_alloc@kernel/signal.c").return
  {
    if (filter_p()) next;

    task = task_current()
    user = task->real_cred->user
    if ($return == 0 && highwatermark_sigpending < atomic_read(&user->sigpending)) {
      highwatermark_sigpending = atomic_read(&user->sigpending)
      print_syscall = 1
    }
  }

/* RLIMIT_MSGGQUEUE */
probe kernel.function("mqueue_get_inode@ipc/mqueue.c").return
  {
    if (filter_p()) next;

    task = task_current()
    user = task->real_cred->user
    if ($return == 0 && highwatermark_msgqueue < user->mq_bytes) {
      highwatermark_msgqueue = user->mq_bytes
      print_syscall = 1
    }
  }

/* RLIMIT_NICE */
probe kernel.function("set_user_nice@kernel/sched/core.c").return
  {
    if (filter_p()) next;

    if (highwatermark_nice < $nice) {
      highwatermark_nice = $nice
      print_syscall = 1
    }
  }

/* RLIMIT_RTPRIO */
probe kernel.function("__sched_setscheduler@kernel/sched/core.c").return
  {
    if (filter_p()) next;

    if (highwatermark_rtprio < $attr->sched_priority) {
      highwatermark_rtprio = $attr->sched_priority
      print_syscall = 1
    }
  }

/* socket address families */
probe kernel.function("__sock_create@net/socket.c").return
  {
    if (filter_p()) next;

    if ($return == 0) {
      used_afs |= 1 << $family
      print_syscall = 1
    } else if ($return == 93) { # EPROTONOSUPPORT
      missing_afs |= 1 << $family
      print_syscall = 1
    }
  }

/* mmap flags */
probe kernel.function("do_mmap@mm/mmap.c").return
  {
    if (filter_p()) next;

    if (($return >= 0 || $return < -1000) && ($flags & (2 | 4)) == (2 | 4)) { # PROT_WRITE | PROT_EXEC
      no_memory_deny_write_execute = 1
      print_syscall = 1
    }
  }

/* system call printing */
probe nd_syscall.* 
  {
    # TODO: filter out apparently-nested syscalls (that are implemented
    # in terms of each other within the kernel); PR6762

    if (filter_p()) next;

    thread_argstr[tid()]=argstr
    if (timestamp || elapsed_time)
      thread_time[tid()]=gettimeofday_us()

    if (name in syscalls_nonreturn)
      report(name,argstr,"")
  }

probe nd_syscall.*.return
  {
    if (filter_p()) next;

    report(name,thread_argstr[tid()],retstr)
  }

function report(syscall_name, syscall_argstr, syscall_retstr)
  {
    if (timestamp || elapsed_time)
      {
        now = gettimeofday_us()
        then = thread_time[tid()]

        if (timestamp)
          prefix=sprintf("%s.%06d ", ctime(then/1000000), then%1000000)

        if (elapsed_time && (now>then)) {
          diff = now-then
          suffix=sprintf(" <%d.%06d>", diff/1000000, diff%1000000)
        }

        delete thread_time[tid()]
      }

    /* add a thread-id string in lots of cases, except if
       stap strace.stp -c SINGLE_THREADED_CMD */
    if (tid() != target()) {
      prefix .= sprintf("%s[%d] ", execname(), tid())
    }

    if (used_caps) {
       suffix .= " [Capabilities=" . caps_to_str(used_caps) . "]"
       all_used_caps |= used_caps
       print_syscall = 1
    }		       
    if (missing_caps) {
       suffix .= " missing [Capabilities=" . caps_to_str(missing_caps) . "]"
       all_missing_caps |= missing_caps
       print_syscall = 1
    }		       

    foreach ([type, dev] in accessed_devices) {
      devs .= dev_to_str(type, dev, accessed_devices[type, dev]) . " "
      if (has_devs == 0) {
        has_devs = 1
	print_syscall = 1
	devs = " [DeviceAllow=" . devs
      }
      all_accessed_devices[type, dev] = accessed_devices[type, dev];
    }
    if (has_devs) {
      devs .= "]"
      suffix .= devs
    }

    if (used_afs) {
      suffix .= " [RestrictAddressFamilies=" . afs_to_str(used_afs) . "]"
      all_used_afs |= used_afs
      print_syscall = 1
    }		       
    if (missing_afs) {
      suffix .= " missing [RestrictAddressFamilies=" . afs_to_str(missing_afs) . "]"
      all_missing_afs |= missing_afs
      print_syscall = 1
    }		       

    if (no_memory_deny_write_execute) {
      suffix .= " [MemoryDenyWriteExecute=false]"
      all_memory_deny_write_execute = "false"
    }		       

    if (highwatermark_fsize > old_highwatermark_fsize) {
      suffix .= sprintf(" [FSIZE %d -> %d]", old_highwatermark_fsize, highwatermark_fsize)
      old_highwatermark_fsize = highwatermark_fsize
    }
    if (highwatermark_data > old_highwatermark_data) {
      suffix .= sprintf(" [DATA %d -> %d]", old_highwatermark_data, highwatermark_data)
      old_highwatermark_data = highwatermark_data
    }
    if (highwatermark_stack > old_highwatermark_stack) {
      suffix .= sprintf(" [STACK %d -> %d]", old_highwatermark_stack, highwatermark_stack)
      old_highwatermark_stack = highwatermark_stack
    }
    if (highwatermark_core > old_highwatermark_core) {
      suffix .= sprintf(" [CORE %d -> %d]", old_highwatermark_core, highwatermark_core)
      old_highwatermark_core = highwatermark_core
    }
    if (highwatermark_nofile > old_highwatermark_nofile) {
      suffix .= sprintf(" [NOFILE %d -> %d]", old_highwatermark_nofile, highwatermark_nofile)
      old_highwatermark_nofile = highwatermark_nofile
    }
    if (highwatermark_as > old_highwatermark_as) {
      suffix .= sprintf(" [AS %d -> %d]", old_highwatermark_as, highwatermark_as)
      old_highwatermark_as = highwatermark_as
    }
    if (highwatermark_nproc > old_highwatermark_nproc) {
      suffix .= sprintf(" [NPROC %d -> %d]", old_highwatermark_nproc, highwatermark_nproc)
      old_highwatermark_nproc = highwatermark_nproc
    }
    if (highwatermark_memlock > old_highwatermark_memlock) {
      suffix .= sprintf(" [MEMLOCK %d -> %d]", old_highwatermark_memlock, highwatermark_memlock)
      old_highwatermark_memlock = highwatermark_memlock
    }
    if (highwatermark_sigpending > old_highwatermark_sigpending) {
      suffix .= sprintf(" [SIGPENDING %d -> %d]", old_highwatermark_sigpending, highwatermark_sigpending)
      old_highwatermark_sigpending = highwatermark_sigpending
    }
    if (highwatermark_msgqueue > old_highwatermark_msgqueue) {
      suffix .= sprintf(" [MSGQUEUE %d -> %d]", old_highwatermark_msgqueue, highwatermark_msgqueue)
      old_highwatermark_msgqueue = highwatermark_msgqueue
    }
    if (highwatermark_nice > old_highwatermark_nice) {
      suffix .= sprintf(" [NICE %d -> %d]", old_highwatermark_nice, highwatermark_nice)
      old_highwatermark_nice = highwatermark_nice
    }
    if (highwatermark_rtprio > old_highwatermark_rtprio) {
      suffix .= sprintf(" [RTPRIO %d -> %d]", old_highwatermark_rtprio, highwatermark_rtprio)
      old_highwatermark_rtprio = highwatermark_rtprio
    }

    if (!only_capability_use || print_syscall)
        printf("%s%s(%s) = %s%s\n",
             prefix, 
             syscall_name, syscall_argstr, syscall_retstr,
	     suffix)

    used_caps = 0
    missing_caps = 0
    used_afs = 0
    print_syscall = 0
    no_memory_deny_write_execute = 0
    delete accessed_devices

    delete thread_argstr[tid()]
  }

probe end
  {
    printf("\nSummary:\n")
    printf("CapabilityBoundingSet=%s\n", caps_to_str(all_used_caps))
    if (all_missing_caps)
	    printf("Consider also missing CapabilityBoundingSet=%s\n", caps_to_str(all_missing_caps))
    foreach ([type, dev] in all_accessed_devices)
      printf("DeviceAllow=%s\n", dev_to_str(type, dev, all_accessed_devices[type, dev]))
    printf("LimitFSIZE=%d\n", highwatermark_fsize)
    printf("LimitDATA=%d\n", highwatermark_data)
    printf("LimitSTACK=%d\n", highwatermark_stack)
    printf("LimitCORE=%d\n", highwatermark_core)
    printf("LimitNOFILE=%d\n", highwatermark_nofile)
    printf("LimitAS=%d\n", highwatermark_as)
    printf("LimitNPROC=%d\n", highwatermark_nproc)
    printf("LimitMEMLOCK=%d\n", highwatermark_memlock)
    printf("LimitSIGPENDING=%d\n", highwatermark_sigpending)
    printf("LimitMSGQUEUE=%d\n", highwatermark_msgqueue)
    printf("LimitNICE=%d\n", highwatermark_nice)
    printf("LimitRTPRIO=%d\n", highwatermark_rtprio)
    printf("RestrictAddressFamilies=%s\n", afs_to_str(all_used_afs))
    if (all_missing_afs)
	    printf("Consider also missing RestrictAddressFamilies=%s\n", afs_to_str(all_missing_afs))
    printf("MemoryDenyWriteExecute=%s\n", all_memory_deny_write_execute)
  }

      parent reply	other threads:[~2016-08-03 18:21 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-15 10:35 Topi Miettinen
2016-07-15 10:35 ` [PATCH 04/14] resource limits: track highwater mark of VM data segment Topi Miettinen
2016-07-15 10:35 ` [PATCH 05/14] resource limits: track highwater mark of stack size Topi Miettinen
2016-07-15 10:35 ` [PATCH 09/14] resource limits: track highwater mark of locked memory Topi Miettinen
2016-07-15 15:14   ` Oleg Nesterov
2016-07-15 17:39     ` Topi Miettinen
2016-07-18 15:38       ` Oleg Nesterov
2016-07-15 10:35 ` [PATCH 10/14] resource limits: track highwater mark of address space size Topi Miettinen
2016-07-15 12:43 ` [PATCH 00/14] Present useful limits to user (v2) Peter Zijlstra
2016-07-15 13:52   ` Topi Miettinen
2016-07-15 13:59     ` Peter Zijlstra
2016-07-15 16:57       ` Topi Miettinen
2016-07-15 20:54       ` H. Peter Anvin
2016-07-15 20:54       ` H. Peter Anvin
2016-07-15 13:04 ` Balbir Singh
2016-07-15 16:35   ` Topi Miettinen
2016-07-18 22:05     ` Doug Ledford
2016-07-19 16:53       ` Topi Miettinen
2016-07-15 14:19 ` Richard Weinberger
2016-07-15 17:19   ` Topi Miettinen
2016-07-18 21:25   ` Doug Ledford
2016-08-03 18:20 ` Topi Miettinen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5808f9b5-6558-458b-0487-d20ecea0e903@gmail.com \
    --to=toiwoton@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=agraf@suse.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=amanieu@gmail.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=ashutosh.dixit@intel.com \
    --cc=ast@kernel.org \
    --cc=athorlton@sgi.com \
    --cc=benh@kernel.crashing.org \
    --cc=benve@cisco.com \
    --cc=bsegall@google.com \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.carpenter@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=dennis.dalessandro@intel.com \
    --cc=dgoodell@cisco.com \
    --cc=dledford@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=elfring@users.sourceforge.net \
    --cc=emunson@akamai.com \
    --cc=fenghua.yu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=gang.chen.5i5j@gmail.com \
    --cc=gorcunov@openvz.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hal.rosenstock@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=joe@perches.com \
    --cc=john.stultz@linaro.org \
    --cc=jslaby@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=klimov.linux@gmail.com \
    --cc=koct9i@gmail.com \
    --cc=kuleshovmail@gmail.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lizefan@huawei.com \
    --cc=mguzik@redhat.com \
    --cc=mhocko@suse.com \
    --cc=mike.marciniszyn@intel.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=mtk.manpages@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=oleg@redhat.com \
    --cc=paulus@samba.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=redmnic@gmail.com \
    --cc=richard@nod.at \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=rkrcmar@redhat.com \
    --cc=sean.hefty@intel.com \
    --cc=stsp@list.ru \
    --cc=sudeep.dutt@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=vdavydov@virtuozzo.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangxq10@lzu.edu.cn \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox