From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E77C5C07E94 for ; Fri, 4 Jun 2021 14:29:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5DBF2613FF for ; Fri, 4 Jun 2021 14:29:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5DBF2613FF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D78916B0074; Fri, 4 Jun 2021 10:29:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D501C6B0089; Fri, 4 Jun 2021 10:29:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7B066B008A; Fri, 4 Jun 2021 10:29:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 7888F6B0074 for ; Fri, 4 Jun 2021 10:29:15 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id EA720180ACEEB for ; Fri, 4 Jun 2021 14:29:14 +0000 (UTC) X-FDA: 78216273828.02.02AE15D Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by imf06.hostedemail.com (Postfix) with ESMTP id B963FC0237F6 for ; Fri, 4 Jun 2021 14:28:53 +0000 (UTC) Received: by mail-lj1-f173.google.com with SMTP id d2so7653410ljj.11 for ; Fri, 04 Jun 2021 07:29:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=pCVIi+nZZxe8ZlGb7I4OC8njN4h/VVm5Y+yB3/IfyTY=; b=mECmWYzqpTySZQH2wkJqMIcmn7LfTexE4j7Pnh/TsKf9bK7kqLD8Nygupkg9vC1PRD f7cGVXK8JOAhOJ5PVXkkzN9KnNUYAjNjAY1Fu7L7AmTzD0SkF6clA1gzmTQ0LXn49gkx ZGBETftbNT0BteVgRar5B1jjLfGUrR9+knrQNRpM/mqrCtfOcbcxJArFwo228YFYPfWC M3IItISx9mUuG+kIgTlR3kO8oJdnUjsDHBTuMQk9IdQYSMmZYtm/nj2ly7BtbocFRfwk nj3PvHU9nQeai5n+2PgyZEnVhcNdF+peC9sQ7qsEaWp2PYR4omyFa0RcBh+tlyr8RsXa HlDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=pCVIi+nZZxe8ZlGb7I4OC8njN4h/VVm5Y+yB3/IfyTY=; b=SG/KDPeVo3n3efi9LtqUxd7IdWloSNBoWgF+jddBa+BIT1gnoakN59vj16mMsHaG2v UTGcHDqnNqEMerbQyodCReF1HbmcDM7aPGV23qjeNL1kSul4u6kEJcheC+BRMmSr5N7y KKVnT7F7Xzz0X0PyqTFInb6kJnm5VtajPnauI5K2BmjURWheqZ1dLwPGzrnuwf2KTzfU x6BUZR9uRRWsvD9AMuAYu3kLP8PG7+z/gfvMfdkDzIXLEbi7fLNWZf5rMCsMHSBqFDLU 8ylHMPbD5mkHhu3771/KUFxoQhvix3n5TMApkezmr2+ooCAFiNMCJe9UQZaYDTJDdQgc iqLA== X-Gm-Message-State: AOAM533+/rS2wWQeRUWG+H7+PauIr6PLjQWguJPVxhkgwtctWP5LFyde 695MRWpWyajl7CSn7C7gSQwB/Q== X-Google-Smtp-Source: ABdhPJze2U6fVG84J+l655keZISi8YYSxYUcal139okOmnVe75lfp7wt2w4m44eMDXc5ufSKvksw1g== X-Received: by 2002:a2e:8715:: with SMTP id m21mr3765506lji.170.1622816940038; Fri, 04 Jun 2021 07:29:00 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id l7sm239126ljc.28.2021.06.04.07.28.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Jun 2021 07:28:59 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 20BF81027A9; Fri, 4 Jun 2021 17:29:11 +0300 (+03) Date: Fri, 4 Jun 2021 17:29:11 +0300 From: "Kirill A. Shutemov" To: Sean Christopherson Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , Chao Peng , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Message-ID: <20210604142911.vbbucf4ten7e5khf@box> References: <20210419185354.v3rgandtrel7bzjj@box> <20210419225755.nsrtjfvfcqscyb6m@box.shutemov.name> <20210521123148.a3t4uh4iezm6ax47@box> <20210531200712.qjxghakcaj4s6ara@box.shutemov.name> <20210602233353.gxq35yxluhas5knp@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=mECmWYzq; dmarc=none; spf=none (imf06.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.208.173) smtp.mailfrom=kirill@shutemov.name X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B963FC0237F6 X-Stat-Signature: pqatoqu8ckhgk15di5fafe15ig7sx3d8 X-HE-Tag: 1622816933-74589 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 03, 2021 at 07:46:52PM +0000, Sean Christopherson wrote: > In other words, I would expect the code to look something ike: > > if (PageGuest()) { > if (!(flags & FOLL_GUEST)) { > pte_unmap_unlock(ptep, ptl); > return NULL; > } > } else if ((flags & FOLL_NUMA) && pte_protnone(pte)) { > goto no_page; > } Okay, looks good. Updated patch is below. I fixed few more bugs. The branch is also updated and rebased to v5.12. > Yeah, and I'm saying we should explicitly disallow mapping PageGuest() into > shared memory, and then the KVM code that manually kmaps() PageGuest() memory > to avoid copy_{to,from}_user() failure goes aways. Manual kmap thing is not needed for TDX: it only required for pure KVM where we handle instruction emulation in the host. ------------------------------8<------------------------------------------ From: "Kirill A. Shutemov" Date: Fri, 16 Apr 2021 01:30:48 +0300 Subject: [PATCH] mm: Introduce guest-only pages PageGuest() pages are only allowed to be used as guest memory. Userspace is not allowed read from or write to such pages. On page fault, PageGuest() pages produce PROT_NONE page table entries. Read or write there will trigger SIGBUS. Access to such pages via syscall leads to -EIO. The new mprotect(2) flag PROT_GUEST translates to VM_GUEST. Any page fault to VM_GUEST VMA produces PageGuest() page. Only shared tmpfs/shmem mappings are supported. GUP normally fails on such pages. KVM will use the new FOLL_GUEST flag to access them. Signed-off-by: Kirill A. Shutemov --- include/linux/mm.h | 5 ++- include/linux/mman.h | 7 +++- include/linux/page-flags.h | 8 ++++ include/uapi/asm-generic/mman-common.h | 1 + mm/gup.c | 36 ++++++++++++++---- mm/huge_memory.c | 51 +++++++++++++++++++------- mm/memory.c | 28 ++++++++++---- mm/mprotect.c | 18 ++++++++- mm/shmem.c | 12 ++++++ 9 files changed, 133 insertions(+), 33 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ba434287387..8e679f4d0f21 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -362,6 +362,8 @@ extern unsigned int kobjsize(const void *objp); # define VM_GROWSUP VM_NONE #endif +#define VM_GUEST VM_HIGH_ARCH_4 + /* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) @@ -413,7 +415,7 @@ extern unsigned int kobjsize(const void *objp); #ifndef VM_ARCH_CLEAR # define VM_ARCH_CLEAR VM_NONE #endif -#define VM_FLAGS_CLEAR (ARCH_VM_PKEY_FLAGS | VM_ARCH_CLEAR) +#define VM_FLAGS_CLEAR (ARCH_VM_PKEY_FLAGS | VM_GUEST | VM_ARCH_CLEAR) /* * mapping from the currently active vm_flags protection bits (the @@ -2793,6 +2795,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_GUEST 0x100000 /* allow access to guest-only pages */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/include/linux/mman.h b/include/linux/mman.h index 629cefc4ecba..204e03d7787c 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -103,7 +103,9 @@ static inline void vm_unacct_memory(long pages) */ static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) { - return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0; + int allowed; + allowed = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_GUEST; + return (prot & ~allowed) == 0; } #define arch_validate_prot arch_validate_prot #endif @@ -140,7 +142,8 @@ calc_vm_prot_bits(unsigned long prot, unsigned long pkey) { return _calc_vm_trans(prot, PROT_READ, VM_READ ) | _calc_vm_trans(prot, PROT_WRITE, VM_WRITE) | - _calc_vm_trans(prot, PROT_EXEC, VM_EXEC) | + _calc_vm_trans(prot, PROT_EXEC, VM_EXEC ) | + _calc_vm_trans(prot, PROT_GUEST, VM_GUEST) | arch_calc_vm_prot_bits(prot, pkey); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 04a34c08e0a6..4bac0371f5c9 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -436,6 +436,14 @@ PAGEFLAG_FALSE(HWPoison) #define __PG_HWPOISON 0 #endif +#if defined(CONFIG_64BIT) && defined(CONFIG_HAVE_KVM_PROTECTED_MEMORY) +PAGEFLAG(Guest, arch_2, PF_HEAD) +TESTSCFLAG(Guest, arch_2, PF_HEAD) +#else +PAGEFLAG_FALSE(Guest) +TESTSCFLAG_FALSE(Guest) +#endif + #if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) TESTPAGEFLAG(Young, young, PF_ANY) SETPAGEFLAG(Young, young, PF_ANY) diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index f94f65d429be..c4d985d22b49 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -16,6 +16,7 @@ #define PROT_NONE 0x0 /* page can not be accessed */ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */ +#define PROT_GUEST 0x04000000 /* KVM guest memory */ /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ diff --git a/mm/gup.c b/mm/gup.c index ef7d2da9f03f..2d2d57f70e1f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -356,10 +356,22 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, * FOLL_FORCE can write to even unwritable pte's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) +static inline bool can_follow_write_pte(struct vm_area_struct *vma, + pte_t pte, unsigned int flags) { - return pte_write(pte) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); + if (pte_write(pte)) + return true; + + if ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)) + return true; + + if (!(flags & FOLL_GUEST)) + return false; + + if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) != (VM_WRITE | VM_SHARED)) + return false; + + return PageGuest(pte_page(pte)); } static struct page *follow_page_pte(struct vm_area_struct *vma, @@ -400,14 +412,20 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, migration_entry_wait(mm, pmd, address); goto retry; } - if ((flags & FOLL_NUMA) && pte_protnone(pte)) + + page = vm_normal_page(vma, address, pte); + if (page && PageGuest(page)) { + if (!(flags & FOLL_GUEST)) + goto no_page; + } else if ((flags & FOLL_NUMA) && pte_protnone(pte)) { goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { + } + + if ((flags & FOLL_WRITE) && !can_follow_write_pte(vma, pte, flags)) { pte_unmap_unlock(ptep, ptl); return NULL; } - page = vm_normal_page(vma, address, pte); if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) { /* * Only return device mapping pages in the FOLL_GET or FOLL_PIN @@ -571,8 +589,12 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (likely(!pmd_trans_huge(pmdval))) return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); - if ((flags & FOLL_NUMA) && pmd_protnone(pmdval)) + if (PageGuest(pmd_page(pmdval))) { + if (!(flags & FOLL_GUEST)) + return no_page_table(vma, flags); + } else if ((flags & FOLL_NUMA) && pmd_protnone(pmdval)) { return no_page_table(vma, flags); + } retry_locked: ptl = pmd_lock(mm, pmd); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ae907a9c2050..c430a52a3b7f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1336,10 +1336,22 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) * FOLL_FORCE can write to even unwritable pmd's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) +static inline bool can_follow_write_pmd(struct vm_area_struct *vma, + pmd_t pmd, unsigned int flags) { - return pmd_write(pmd) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd)); + if (pmd_write(pmd)) + return true; + + if ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd)) + return true; + + if (!(flags & FOLL_GUEST)) + return false; + + if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) != (VM_WRITE | VM_SHARED)) + return false; + + return PageGuest(pmd_page(pmd)); } struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, @@ -1352,20 +1364,30 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, assert_spin_locked(pmd_lockptr(mm, pmd)); - if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags)) - goto out; + if (!pmd_present(*pmd)) + return NULL; + + page = pmd_page(*pmd); + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); + + if (PageGuest(page)) { + if (!(flags & FOLL_GUEST)) + return NULL; + } else if ((flags & FOLL_NUMA) && pmd_protnone(*pmd)) { + /* + * Full NUMA hinting faults to serialise migration in fault + * paths + */ + return NULL; + } + + if (flags & FOLL_WRITE && !can_follow_write_pmd(vma, *pmd, flags)) + return NULL; /* Avoid dumping huge zero page */ if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) return ERR_PTR(-EFAULT); - /* Full NUMA hinting faults to serialise migration in fault paths */ - if ((flags & FOLL_NUMA) && pmd_protnone(*pmd)) - goto out; - - page = pmd_page(*pmd); - VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); - if (!try_grab_page(page, flags)) return ERR_PTR(-ENOMEM); @@ -1408,7 +1430,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); -out: return page; } @@ -1426,6 +1447,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) bool was_writable; int flags = 0; + page = pmd_page(pmd); + if (PageGuest(page)) + return VM_FAULT_SIGBUS; + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); if (unlikely(!pmd_same(pmd, *vmf->pmd))) goto out_unlock; diff --git a/mm/memory.c b/mm/memory.c index 550405fc3b5e..d588220feabf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3703,9 +3703,13 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) for (i = 0; i < HPAGE_PMD_NR; i++) flush_icache_page(vma, page + i); - entry = mk_huge_pmd(page, vma->vm_page_prot); - if (write) - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (PageGuest(page)) { + entry = mk_huge_pmd(page, PAGE_NONE); + } else { + entry = mk_huge_pmd(page, vma->vm_page_prot); + if (write) + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + } add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR); page_add_file_rmap(page, true); @@ -3741,13 +3745,17 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) pte_t entry; flush_icache_page(vma, page); - entry = mk_pte(page, vma->vm_page_prot); + if (PageGuest(page)) { + entry = mk_pte(page, PAGE_NONE); + } else { + entry = mk_pte(page, vma->vm_page_prot); - if (prefault && arch_wants_old_prefaulted_pte()) - entry = pte_mkold(entry); + if (prefault && arch_wants_old_prefaulted_pte()) + entry = pte_mkold(entry); - if (write) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (write) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); + } /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); @@ -4105,6 +4113,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) bool was_writable = pte_savedwrite(vmf->orig_pte); int flags = 0; + page = pte_page(vmf->orig_pte); + if (PageGuest(page)) + return VM_FAULT_SIGBUS; + /* * The "pte" at this point cannot be used safely without * validation through pte_unmap_same(). It's of NUMA type but diff --git a/mm/mprotect.c b/mm/mprotect.c index 94188df1ee55..aecba46af544 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -484,8 +484,12 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, dirty_accountable = vma_wants_writenotify(vma, vma->vm_page_prot); vma_set_page_prot(vma); - change_protection(vma, start, end, vma->vm_page_prot, - dirty_accountable ? MM_CP_DIRTY_ACCT : 0); + if (vma->vm_flags & VM_GUEST) { + zap_page_range(vma, vma->vm_start, vma->vm_end - vma->vm_start); + } else { + change_protection(vma, start, end, vma->vm_page_prot, + dirty_accountable ? MM_CP_DIRTY_ACCT : 0); + } /* * Private VM_LOCKED VMA becoming writable: trigger COW to avoid major @@ -603,6 +607,16 @@ static int do_mprotect_pkey(unsigned long start, size_t len, goto out; } + if ((newflags & (VM_GUEST|VM_SHARED)) == VM_GUEST) { + error = -EINVAL; + goto out; + } + + if ((newflags & VM_GUEST) && !vma_is_shmem(vma)) { + error = -EINVAL; + goto out; + } + /* Allow architectures to sanity-check the new flags */ if (!arch_validate_flags(newflags)) { error = -EINVAL; diff --git a/mm/shmem.c b/mm/shmem.c index b2db4ed0fbc7..0f44f2fac06c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1835,6 +1835,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, if (page && sgp == SGP_WRITE) mark_page_accessed(page); + if (page && PageGuest(page) && sgp != SGP_CACHE) { + error = -EIO; + goto unlock; + } + /* fallocated page? */ if (page && !PageUptodate(page)) { if (sgp != SGP_READ) @@ -2117,6 +2122,13 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) gfp, vma, vmf, &ret); if (err) return vmf_error(err); + + if ((vmf->vma->vm_flags & VM_GUEST) && !TestSetPageGuest(vmf->page)) { + struct page *head = compound_head(vmf->page); + try_to_unmap(head, TTU_IGNORE_MLOCK); + set_page_dirty(head); + } + return ret; } -- Kirill A. Shutemov