From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E41ADC433E2 for ; Fri, 28 Aug 2020 17:40:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9B58C2098B for ; Fri, 28 Aug 2020 17:40:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9B58C2098B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arndb.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2BFC56B0003; Fri, 28 Aug 2020 13:40:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 270536B0005; Fri, 28 Aug 2020 13:40:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15FC86B0007; Fri, 28 Aug 2020 13:40:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id ED0406B0003 for ; Fri, 28 Aug 2020 13:40:28 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A928C1DE0 for ; Fri, 28 Aug 2020 17:40:28 +0000 (UTC) X-FDA: 77200691736.30.crush07_080708827077 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 5BA66180B3C83 for ; Fri, 28 Aug 2020 17:40:28 +0000 (UTC) X-HE-Tag: crush07_080708827077 X-Filterd-Recvd-Size: 7484 Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.17.13]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Fri, 28 Aug 2020 17:40:27 +0000 (UTC) Received: from mail-qk1-f179.google.com ([209.85.222.179]) by mrelayeu.kundenserver.de (mreue109 [212.227.15.145]) with ESMTPSA (Nemesis) id 1M3DBb-1kALqe3TNV-003dxf for ; Fri, 28 Aug 2020 19:40:26 +0200 Received: by mail-qk1-f179.google.com with SMTP id z3so198121qkz.7 for ; Fri, 28 Aug 2020 10:40:25 -0700 (PDT) X-Gm-Message-State: AOAM530ZZ3zaS3oHkH5J+U6pmdgs+vCMWhtvhRM3BZnTzUP48aExcLNf v+zVRp9YOvTYftfJQb8KetpnmwotNE3kH9/NGZg= X-Google-Smtp-Source: ABdhPJzCBEo0V55un3Qk90Ymj8++lkumRxJxJL9N83eWCYBj8QoEBWG/1LB8a5CefOTbYjb1QZFqechgK54VbTulcbk= X-Received: by 2002:ae9:f106:: with SMTP id k6mr220528qkg.3.1598636424271; Fri, 28 Aug 2020 10:40:24 -0700 (PDT) MIME-Version: 1.0 References: <20200622192900.22757-1-minchan@kernel.org> <20200622192900.22757-4-minchan@kernel.org> In-Reply-To: <20200622192900.22757-4-minchan@kernel.org> From: Arnd Bergmann Date: Fri, 28 Aug 2020 19:40:08 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API To: Minchan Kim Cc: Andrew Morton , LKML , Christian Brauner , linux-mm , Linux API , Oleksandr Natalenko , Suren Baghdasaryan , Tim Murray , Sandeep Patil , Sonny Rao , Brian Geffon , Michal Hocko , Johannes Weiner , Shakeel Butt , John Dias , Joel Fernandes , Jann Horn , alexander.h.duyck@linux.intel.com, SeongJae Park , David Rientjes , Arjun Roy , Vlastimil Babka , Christian Brauner , Daniel Colascione , Jens Axboe , Kirill Tkhai , SeongJae Park , linux-man Content-Type: text/plain; charset="UTF-8" X-Provags-ID: V03:K1:Y9BUHS2U2NAPA4MgJN0xpFP20Q2T+um9o6w7oZq1URMJncZfphZ op76n44+uQlu9sPvRDd0gkFZgIUhiKpxBh6SeeSUZxmSH57BphkTJDteavVp+0aBxdkfzrw 45I6dSLeDNLmDmVM3/f86eN+GNsn4YC/Gpm4BwpDaEB/RSG/BD1CA6Yu69LpAggKmO7Wvlr 0wUdZKe0FNQgO40oOpGgQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:CIl6Oub2zYw=:8hPqVOcFLJPSA9j5OyzM26 AEcQTTjDdgCFgtdB9K4X8h85zKdFelWAR3Rz3dGQyl0WMoqQzcrCAAWw+aORPVZkmy5MiQMDO FkogTuq45tqzhoJMG5vlPcAUyrhUzkgqwicx9ju6U8Oc5nUsvjQJ7VGgehck6nfnnpOOXtQU2 bDELJd6vmAXJMcsA/0ZM09n7vFzh9rgN+wRJuf+McTIsVa3SxPfo1Ya2qaSRwaRtx7aUodrAq Y2Ud2j2d7jdNX57Z4djkF6R0H9H8AzbVgMPHNO5FEDY6H4ErlXui9VNOVNdFGSOPPplwU621O YfucUOQaj8t9NsAutGGE+sWgpHaOUF6zyxFDy3mmcTZK+AnT8NdHQykPRwq8h80mKSDGsvIHx 8sDDu5a4/j0hSVx12hP+FhaP7Zy/L83Nv2n8CGk1ZQL8eZkIuDEBu7qIwRdyBn/zr+EiwQeCO 8HIeFcQSc8Rj+08OZlmKoQ54mS8sJ9nElfQHcgm/jhz13PQ+J5ZcVcnaKN8iZ183+R0IqcyFw Hu6qvKXWAoTQyNhjZ/x2KIa9MmU9Enea4oaFJf9AnQo0Ljv1J79V3xNlgjZQ/fAujdgU5sCYQ DNSKAUIHbszEGVbhpDEAGX29MU5Req/UDiUvsmZx9aBetBs+Q6W5C4AvZ+SjuNE7t6hnM8Ow/ NJvkEkggBZvhGo6xvp4m8vr29IJToLVcn+GcUgWMCXB5lKxGi6qvk+tQa4z/iwm7JH/kkTuXn pHylMN3uYSYn6Zm0cjjYkObQWAqAe2R+h5vdYjHuz492xOITbnfzrmispuLttsPQIiSSIe8Y3 180j6XTLn6/SmjV0DgsTGkw9W6miBVqtmoYq17pCQZiR5WZJf5IDLenhvAiDBQpB/x718jwwn 0HDL1SkM5RnJvI/s0eBNQphPDRUkXdecrLOtogm39UjxARJT4cXDLYTcB3G1OMUkMKkmjh2Dg RAF0Gj4McjMyqv9rF2ruQBSlYBEdWXJ8dW9MYSIr0/0O8qmv5F1El X-Rspamd-Queue-Id: 5BA66180B3C83 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim wrote: > So finally, the API is as follows, > > ssize_t process_madvise(int pidfd, const struct iovec *iovec, > unsigned long vlen, int advice, unsigned int flags); I had not followed the discussion earlier and only now came across the syscall in linux-next, sorry for stirring things up this late. > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl > index 94bf4958d114..8f959d90338a 100644 > --- a/arch/x86/entry/syscalls/syscall_64.tbl > +++ b/arch/x86/entry/syscalls/syscall_64.tbl > @@ -364,6 +364,7 @@ > 440 common watch_mount sys_watch_mount > 441 common watch_sb sys_watch_sb > 442 common fsinfo sys_fsinfo > +443 64 process_madvise sys_process_madvise > > # > # x32-specific system call numbers start at 512 to avoid cache impact > @@ -407,3 +408,4 @@ > 545 x32 execveat compat_sys_execveat > 546 x32 preadv2 compat_sys_preadv64v2 > 547 x32 pwritev2 compat_sys_pwritev64v2 > +548 x32 process_madvise compat_sys_process_madvise I think we should not add any new x32-specific syscalls. Instead I think the compat_sys_process_madvise/sys_process_madvise can be merged into one. > + mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); > + if (IS_ERR_OR_NULL(mm)) { > + ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; > + goto release_task; > + } Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile, and I would try to avoid that. Can mm_access() be changed to itself return PTR_ERR(-ESRCH) instead of NULL to improve its calling conventions? I see there are only three other callers. > + ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); > + if (ret >= 0) { > + ret = do_process_madvise(pidfd, &iter, behavior, flags); > + kfree(iov); > + } > + return ret; > +} > + > +#ifdef CONFIG_COMPAT ... > + > + ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), > + &iov, &iter); > + if (ret >= 0) { > + ret = do_process_madvise(pidfd, &iter, behavior, flags); > + kfree(iov); > + } Every syscall that passes an iovec seems to do this. If we make import_iovec() handle both cases directly, this syscall and a number of others can be simplified, and you avoid the x32 entry point I mentioned above Something like (untested) index dad8d0cfaaf7..0de4ddff24c1 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct iovec __user * uvector, { ssize_t n; struct iovec *p; - n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs, - *iov, &p); + + if (in_compat_syscall()) + n = compat_rw_copy_check_uvector(type, uvector, nr_segs, + fast_segs, *iov, &p); + else + n = rw_copy_check_uvector(type, uvector, nr_segs, + fast_segs, *iov, &p); if (n < 0) { if (p != *iov) kfree(p); Arnd