From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D13FC433F5 for ; Mon, 3 Oct 2022 07:33:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABCF58E0001; Mon, 3 Oct 2022 03:33:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6D316B0073; Mon, 3 Oct 2022 03:33:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90D4E8E0001; Mon, 3 Oct 2022 03:33:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 741756B0072 for ; Mon, 3 Oct 2022 03:33:52 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 414ADC0B02 for ; Mon, 3 Oct 2022 07:33:52 +0000 (UTC) X-FDA: 79978823904.06.994F306 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf23.hostedemail.com (Postfix) with ESMTP id D3DEF140014 for ; Mon, 3 Oct 2022 07:33:51 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id d18so2285373lfb.0 for ; Mon, 03 Oct 2022 00:33:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=cTF3AiWCQzUZJV3DQOd8YpyAQ0uZN+oT26/EmVI5Gno=; b=DvdRFGG3GDv+8INZlnRFdJ/8sk5eEa7AareONRXiJEau+xbhzZax6uLRyfYITFWbZf 3NqpZEEzMP/hKmEkCARYJ5G9gzbOE0q7McTWIAngIxaTuYC1o8lT6E0RUJP/MLnLpZ7d Qu+1Vd0FUnf6Z7BDCSiXfOB3Lp6wjhae3evjWtqy1Hvhcufb4H2zsgnTlazKt2OEhd79 BA6ZEelvKbFypTSJOPXKwWi6F96cJdE1vUQrd76EmQ0Smui1y+dlW96zT/pi7JhI5c/2 8NZeHRiN3xgZc7P9e5p+GcKW4B3xAZ9Q9xeU6JSlAtM3mdUAbXU1rYpnUGpgYkW0Wdos uORw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=cTF3AiWCQzUZJV3DQOd8YpyAQ0uZN+oT26/EmVI5Gno=; b=EI03iAeRSfnFB1MsaQiGhPCV49sSPvZ0+dVbwBrTNEINx5UL8gqutK+EG2oDsDR4ZV AcPfJJ06rkgDYR/ittE3Z2RwnIAsEcXcp12teeGXyFe6wqoQYW8+TowT9iuD1NkysRSF j5Abmt8ohmgwPrUyc+SfW4k64OVWyoVBJ0UgNJHV4Byp0G8d4Ys9rM++z1kyQQY6/vUB pMYq2tCizw0bov/H4IpamNRE0kNEq0N+Hf5zobtmOOOzAidVLSORCSIFpcoTePjTj02Q Lg+GP5AwoXw1y0T7AOsSECXXiZhFxVjs3d6/YDd3X0zO/6SJA8c9YqYVC41whPVPsmv0 KWoA== X-Gm-Message-State: ACrzQf3B8aMYm9hH+B7yy+vAXDifKaZbeK/5pCFSEOP06oXvM4D80c59 t5JbrQ6QEQazWA6YSrY70TkPIrwtWtLJhbjKsdMIXg== X-Google-Smtp-Source: AMsMyM54UDKb/wQylYbThizMTNEzG/N4GHFJjCY1YMNV55ssdwv1emM1/33fLKcW7+Q244fzhfIJ6l3JiIpmQNO4rxw= X-Received: by 2002:a05:6512:261b:b0:4a1:abd7:3129 with SMTP id bt27-20020a056512261b00b004a1abd73129mr7271284lfb.637.1664782430012; Mon, 03 Oct 2022 00:33:50 -0700 (PDT) MIME-Version: 1.0 References: <20220915142913.2213336-1-chao.p.peng@linux.intel.com> <20220915142913.2213336-2-chao.p.peng@linux.intel.com> <20220930162301.i226o523teuikygq@box.shutemov.name> In-Reply-To: <20220930162301.i226o523teuikygq@box.shutemov.name> From: Fuad Tabba Date: Mon, 3 Oct 2022 08:33:13 +0100 Message-ID: Subject: Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd To: "Kirill A . Shutemov" Cc: Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=DvdRFGG3; spf=pass (imf23.hostedemail.com: domain of tabba@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664782432; a=rsa-sha256; cv=none; b=WkTMGkWMdlBKyhk+hrEY/orrn+JoqUXKrk5EtjWIEgjfe/QjQrdtAMBf1JwNJF7ltw7niy i3FFlAxO0hvl8YyEFGk0ahjFCdgLL9Q3RthyJM5c4lIc64xifmTvNJ9+cx21yW2k77E3yi XKQ4GOqABOAKNJIOa8CtekbLKWvnqxc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664782432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cTF3AiWCQzUZJV3DQOd8YpyAQ0uZN+oT26/EmVI5Gno=; b=jJi9b5pCL1vYYolSUTBulYDi8WHN10pdXleFD4EyncMlRcjE+Iu1qfV1K65lQjJbEHL8Iu yRjf8J+Y+KjV/wQcQ8zMBUlfYz0OVn1v80NpJdkUv//Ywb57hvxp8Z/7WUzmRvJzo2aioq wnO6mZ0q9YD3VM6ikyxBrnyrCwElp7o= X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=DvdRFGG3; spf=pass (imf23.hostedemail.com: domain of tabba@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam10 X-Stat-Signature: mppbcw4opbqk1bqf5w34nd87jfhq5y7n X-Rspamd-Queue-Id: D3DEF140014 X-HE-Tag: 1664782431-367013 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi On Fri, Sep 30, 2022 at 5:23 PM Kirill A . Shutemov wrote: > > On Fri, Sep 30, 2022 at 05:14:00PM +0100, Fuad Tabba wrote: > > Hi, > > > > <...> > > > > > diff --git a/mm/memfd_inaccessible.c b/mm/memfd_inaccessible.c > > > new file mode 100644 > > > index 000000000000..2d33cbdd9282 > > > --- /dev/null > > > +++ b/mm/memfd_inaccessible.c > > > @@ -0,0 +1,219 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +#include "linux/sbitmap.h" > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +struct inaccessible_data { > > > + struct mutex lock; > > > + struct file *memfd; > > > + struct list_head notifiers; > > > +}; > > > + > > > +static void inaccessible_notifier_invalidate(struct inaccessible_data *data, > > > + pgoff_t start, pgoff_t end) > > > +{ > > > + struct inaccessible_notifier *notifier; > > > + > > > + mutex_lock(&data->lock); > > > + list_for_each_entry(notifier, &data->notifiers, list) { > > > + notifier->ops->invalidate(notifier, start, end); > > > + } > > > + mutex_unlock(&data->lock); > > > +} > > > + > > > +static int inaccessible_release(struct inode *inode, struct file *file) > > > +{ > > > + struct inaccessible_data *data = inode->i_mapping->private_data; > > > + > > > + fput(data->memfd); > > > + kfree(data); > > > + return 0; > > > +} > > > + > > > +static long inaccessible_fallocate(struct file *file, int mode, > > > + loff_t offset, loff_t len) > > > +{ > > > + struct inaccessible_data *data = file->f_mapping->private_data; > > > + struct file *memfd = data->memfd; > > > + int ret; > > > + > > > + if (mode & FALLOC_FL_PUNCH_HOLE) { > > > + if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) > > > + return -EINVAL; > > > + } > > > + > > > + ret = memfd->f_op->fallocate(memfd, mode, offset, len); > > > > I think that shmem_file_operations.fallocate is only set if > > CONFIG_TMPFS is enabled (shmem.c). Should there be a check at > > initialization that fallocate is set, or maybe a config dependency, or > > can we count on it always being enabled? > > It is already there: > > config MEMFD_CREATE > def_bool TMPFS || HUGETLBFS > > And we reject inaccessible memfd_create() for HUGETLBFS. > > But if we go with a separate syscall, yes, we need the dependency. I missed that, thanks. > > > > + inaccessible_notifier_invalidate(data, offset, offset + len); > > > + return ret; > > > +} > > > + > > > > <...> > > > > > +void inaccessible_register_notifier(struct file *file, > > > + struct inaccessible_notifier *notifier) > > > +{ > > > + struct inaccessible_data *data = file->f_mapping->private_data; > > > + > > > + mutex_lock(&data->lock); > > > + list_add(¬ifier->list, &data->notifiers); > > > + mutex_unlock(&data->lock); > > > +} > > > +EXPORT_SYMBOL_GPL(inaccessible_register_notifier); > > > > If the memfd wasn't marked as inaccessible, or more generally > > speaking, if the file isn't a memfd_inaccessible file, this ends up > > accessing an uninitialized pointer for the notifier list. Should there > > be a check for that here, and have this function return an error if > > that's not the case? > > I think it is "don't do that" category. inaccessible_register_notifier() > caller has to know what file it operates on, no? The thing is, you could oops the kernel from userspace. For that, all you have to do is a memfd_create without the MFD_INACCESSIBLE, followed by a KVM_SET_USER_MEMORY_REGION using that as the private_fd. I ran into this using my port of this patch series to arm64. Cheers, /fuad > -- > Kiryl Shutsemau / Kirill A. Shutemov