From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A487DC433E7 for ; Mon, 31 Aug 2020 13:48:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0DD9420719 for ; Mon, 31 Aug 2020 13:48:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=mykernel.net header.i=cgxu519@mykernel.net header.b="foKzUdRc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0DD9420719 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=mykernel.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 676CD6B0002; Mon, 31 Aug 2020 09:48:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 623D46B0003; Mon, 31 Aug 2020 09:48:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C1B16B0037; Mon, 31 Aug 2020 09:48:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id 1B7876B0002 for ; Mon, 31 Aug 2020 09:48:08 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C7664181AEF15 for ; Mon, 31 Aug 2020 13:48:07 +0000 (UTC) X-FDA: 77210992614.11.basin56_631037927090 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 893D4180F8B81 for ; Mon, 31 Aug 2020 13:48:07 +0000 (UTC) X-HE-Tag: basin56_631037927090 X-Filterd-Recvd-Size: 8608 Received: from sender2-pp-o92.zoho.com.cn (sender2-pp-o92.zoho.com.cn [163.53.93.251]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Mon, 31 Aug 2020 13:48:02 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; t=1598881632; cv=none; d=zoho.com.cn; s=zohoarc; b=Gja5Sfv9l2Kt8jPPMttMIcKNIVl3SuJvuvh/1uoKrIZ2mlY5ClKaJ2sIL2BUYQ43C8O0OBOMIA0OM8wSnI7fHu0SjqOrMd9qTTiIKQqcvRDZX1LaLw8UELwwToy75zhzp18wbq+xMgRK2OA+Io858U+vZIRQShNoAbW2Bw5LcFU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com.cn; s=zohoarc; t=1598881632; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=eX9Ln3ZLMpVj4uDsRbZpaJVlp7X48vtTXzB/H1V5WIs=; b=T7ZLLNbr+xfAjKbfdCVao5zEiSjdIJUYo/w+GJDJnZz7HNYA3SnTe6Una8NkfcZTFbCocYrZVLMmyta29YFLmx2f3/0/a26uVS7vBz15tjUmqKarwxNbFdqbS7RK6Wm13oKLi8r2/DkhcvJ6bz+63mXDFWl3PRQoiiCPaK7Otew= ARC-Authentication-Results: i=1; mx.zoho.com.cn; dkim=pass header.i=mykernel.net; spf=pass smtp.mailfrom=cgxu519@mykernel.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1598881632; s=zohomail; d=mykernel.net; i=cgxu519@mykernel.net; h=Subject:To:Cc:References:From:Message-ID:Date:MIME-Version:In-Reply-To:Content-Type:Content-Transfer-Encoding; bh=eX9Ln3ZLMpVj4uDsRbZpaJVlp7X48vtTXzB/H1V5WIs=; b=foKzUdRcaUpoILOEOVWiwpVhyCehg4VGUP2m8pPaN/g3KrqfjicwPWgRDzz90F0N 8zDa8XWQk2Fo7BKXWqSO6kLBJcmNhgMErlL+FYG+T2IbYJewdYoH9trsal+ROXbdSTS zjNImGt9fbg2AQuGfbQ3/pF9sK4xWIwOjLBnvdRg= Received: from [10.0.0.2] (113.88.135.106 [113.88.135.106]) by mx.zoho.com.cn with SMTPS id 1598881629062550.3876028156856; Mon, 31 Aug 2020 21:47:09 +0800 (CST) Subject: Re: [RFC PATCH 3/3] ovl: implement stacked mmap for shared map To: Amir Goldstein Cc: overlayfs , Linux MM , Miklos Szeredi , Andrew Morton , Ritesh Harjani References: <20200829095101.25350-1-cgxu519@mykernel.net> <20200829095101.25350-4-cgxu519@mykernel.net> From: cgxu Message-ID: Date: Mon, 31 Aug 2020 21:47:07 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-ZohoCNMailClient: External X-Rspamd-Queue-Id: 893D4180F8B81 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/30/20 7:33 PM, Amir Goldstein wrote: > On Sat, Aug 29, 2020 at 12:51 PM Chengguang Xu wrote: >> >> Implement stacked mmap for shared map to keep data >> consistency. >> >> Signed-off-by: Chengguang Xu >> --- >> fs/overlayfs/file.c | 120 +++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 114 insertions(+), 6 deletions(-) >> >> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c >> index 14ab5344a918..db5ab200d984 100644 >> --- a/fs/overlayfs/file.c >> +++ b/fs/overlayfs/file.c >> @@ -21,9 +21,17 @@ struct ovl_aio_req { >> struct fd fd; >> }; >> >> +static vm_fault_t ovl_fault(struct vm_fault *vmf); >> +static vm_fault_t ovl_page_mkwrite(struct vm_fault *vmf); >> + >> +static const struct vm_operations_struct ovl_vm_ops = { >> + .fault = ovl_fault, >> + .page_mkwrite = ovl_page_mkwrite, >> +}; >> + > > Interesting direction, not sure if this is workable. > I don't know enough about mm to say. > > But what about the rest of the operations? > Did you go over them and decide that overlay doesn't need to implement them? > I doubt it, but if you did, please document that. I did some check for rest of them, IIUC ->fault will be enough for this special case (shared read-only mmap with no upper), I will remove ->page_mkwrite in v2. # I do not consider support ->huge_fault in current stage due to many fs cannot support DAX properly. BTW, do you know who should I add to CC list for further deep review of this code? fadevel-list? > >> struct ovl_file_entry { >> struct file *realfile; >> - void *vm_ops; >> + const struct vm_operations_struct *vm_ops; >> }; >> >> struct file *ovl_get_realfile(struct file *file) >> @@ -40,14 +48,15 @@ void ovl_set_realfile(struct file *file, struct file *realfile) >> ofe->realfile = realfile; >> } >> >> -void *ovl_get_real_vmops(struct file *file) >> +const struct vm_operations_struct *ovl_get_real_vmops(struct file *file) >> { >> struct ovl_file_entry *ofe = file->private_data; >> >> return ofe->vm_ops; >> } >> >> -void ovl_set_real_vmops(struct file *file, void *vm_ops) >> +void ovl_set_real_vmops(struct file *file, >> + const struct vm_operations_struct *vm_ops) >> { >> struct ovl_file_entry *ofe = file->private_data; >> >> @@ -493,11 +502,104 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) >> return ret; >> } >> >> +vm_fault_t ovl_fault(struct vm_fault *vmf) >> +{ >> + struct vm_area_struct *vma = vmf->vma; >> + struct file *file = vma->vm_file; >> + struct file *realfile; >> + struct file *fpin, *tmp; >> + struct inode *inode = file_inode(file); >> + struct inode *realinode; >> + const struct cred *old_cred; >> + bool retry_allowed; >> + vm_fault_t ret; >> + int err = 0; >> + >> + if (fault_flag_check(vmf, FAULT_FLAG_TRIED)) { >> + realfile = ovl_get_realfile(file); >> + >> + if (!ovl_has_upperdata(inode) || >> + realfile->f_inode != ovl_inode_upper(inode) || >> + !realfile->f_op->mmap) >> + return VM_FAULT_SIGBUS; >> + >> + if (!ovl_get_real_vmops(file)) { >> + old_cred = ovl_override_creds(inode->i_sb); >> + err = call_mmap(realfile, vma); >> + revert_creds(old_cred); >> + >> + vma->vm_file = file; >> + if (err) { >> + vma->vm_ops = &ovl_vm_ops; >> + return VM_FAULT_SIGBUS; >> + } >> + ovl_set_real_vmops(file, vma->vm_ops); >> + vma->vm_ops = &ovl_vm_ops; >> + } >> + >> + retry_allowed = fault_flag_check(vmf, FAULT_FLAG_ALLOW_RETRY); >> + if (retry_allowed) >> + vma->vm_flags &= ~FAULT_FLAG_ALLOW_RETRY; >> + vma->vm_file = realfile; >> + ret = ovl_get_real_vmops(file)->fault(vmf); >> + vma->vm_file = file; >> + if (retry_allowed) >> + vma->vm_flags |= FAULT_FLAG_ALLOW_RETRY; >> + return ret; >> + >> + } else { >> + fpin = maybe_unlock_mmap_for_io(vmf, NULL); >> + if (!fpin) >> + return VM_FAULT_SIGBUS; >> + >> + ret = VM_FAULT_RETRY; >> + if (!ovl_has_upperdata(inode)) { >> + err = ovl_copy_up_with_data(file->f_path.dentry); >> + if (err) >> + goto out; >> + } >> + >> + realinode = ovl_inode_realdata(inode); >> + realfile = ovl_open_realfile(file, realinode); >> + if (IS_ERR(realfile)) >> + goto out; >> + >> + tmp = ovl_get_realfile(file); >> + ovl_set_realfile(file, realfile); >> + fput(tmp); >> + >> +out: >> + fput(fpin); >> + return ret; >> + } >> +} > > > Please add some documentation to explain the method used. > Do we need to retry if real_vmops are already set? > Good catch, actually retry is not needed in that case. Basically, we unlock(mmap_lock)->copy-up->open when detecting no upper inode then retry fault operation. However, we need to check fault retry flag carefully for avoiding endless retry. I'll add more explanation in v2. --- cgxu