From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 414F3C433E2 for ; Tue, 1 Sep 2020 09:19:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CBA882087D for ; Tue, 1 Sep 2020 09:19:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="YlndEew+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CBA882087D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 51BFF6B0068; Tue, 1 Sep 2020 05:19:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CB8B6B006E; Tue, 1 Sep 2020 05:19:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36D156B0070; Tue, 1 Sep 2020 05:19:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id EE2AC6B0068 for ; Tue, 1 Sep 2020 05:19:31 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id AFB00DB3B for ; Tue, 1 Sep 2020 09:19:31 +0000 (UTC) X-FDA: 77213944542.20.page52_2a1667b27097 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 6D7BE180BF305 for ; Tue, 1 Sep 2020 09:19:31 +0000 (UTC) X-HE-Tag: page52_2a1667b27097 X-Filterd-Recvd-Size: 17604 Received: from mail-pl1-f193.google.com (mail-pl1-f193.google.com [209.85.214.193]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Tue, 1 Sep 2020 09:19:30 +0000 (UTC) Received: by mail-pl1-f193.google.com with SMTP id j11so234945plk.9 for ; Tue, 01 Sep 2020 02:19:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ijN30iFv6YiIEHO1h3OWcIRuCJcjysrVyvh6mrscmp0=; b=YlndEew+A/SrrVmmN8qlE+GxzrSD+A1u3gMsqBBs9kTp/1mo9qXKQtDwUyTxIMzh4m 5wt1jNG7Sp6P9nkSjd4FOrLnjvZY0/ONrW25ZFxqSIMlL4VXf7nh+yZALHjq/HnBnuei NMPWujHTlS32pANuQGysCQ0Pw0qUjCy7k1jj7ykVeL8sAUSKeO79xrd6EEGsPFW0GQBe UcLmYsmMFE+asVW8PTLdmZqSxG4qO7BQbPUQYoJm51+G63bmu8IiPoRX9ddNZ3I3dSc+ OfAY61oiEEMzENJgvBjk+3GLcl0fj3R075G6bsncSLKq0vifVXm4dqLDJE0sD2MMdpfR FAjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ijN30iFv6YiIEHO1h3OWcIRuCJcjysrVyvh6mrscmp0=; b=GDcSgM1NAzDoMVpOT2Qec9EiNgtCgfRHKte/6HSzIw2oGmb6I+l5qSvoCAvPVkgQr6 SR4EEYKSIAixzSXLr4W5lrRzSzeuEiVcGeuloMP68iUGQEC9nSxGlK7MtJtARnNRoLJW bj37ADV71AWCg3iaq/Cl7kGXmhP1+KBBta4VVPgIXuGR2YjGQFPN9GSNlM1+CVj6bSOF YDeLeAgwgAhZEirSVhWFNq7bpZfr76PluU18MxUp9SZ6AA5PbtpRUz0lSMLI9n9ENoSv vVHeoAOYh8WlKuk+9vVKWAuXavIFRmNaAmEkoKs1cDAn5VJSDenZaj1FNF2AJLjjC6cM wgMQ== X-Gm-Message-State: AOAM530HHNUXtUr/7VZ/VACASKcfpq3WgEdX4NZfVH+8B/Ra8fl4s4O1 Ac7ZcLoc5JK87Tnmh9/vvLsjmA== X-Google-Smtp-Source: ABdhPJxDQQQZfbC55qdAdmwjO2IWvaOIwPyhHqU66rbhx4kIuofShEp/iwgWflOFkbKT9eVjvNOfSA== X-Received: by 2002:a17:90b:611:: with SMTP id gb17mr678107pjb.71.1598951969886; Tue, 01 Sep 2020 02:19:29 -0700 (PDT) Received: from nagraj.local ([49.206.21.239]) by smtp.gmail.com with ESMTPSA id u191sm1337707pgu.56.2020.09.01.02.19.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 02:19:29 -0700 (PDT) From: Sumit Semwal To: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexey Dobriyan , Jonathan Corbet Cc: Mauro Carvalho Chehab , Kees Cook , Michal Hocko , Colin Cross , Alexey Gladkov , Matthew Wilcox , Jason Gunthorpe , "Kirill A . Shutemov" , Michel Lespinasse , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Song Liu , Huang Ying , Vlastimil Babka , Yang Shi , chenqiwu , Mathieu Desnoyers , John Hubbard , Mike Christie , Bart Van Assche , Amit Pundir , Thomas Gleixner , Christian Brauner , Daniel Jordan , Adrian Reber , Nicolas Viennot , Al Viro , Thomas Cedeno , linux-fsdevel@vger.kernel.org, John Stultz , Pekka Enberg , Dave Hansen , Peter Zijlstra , Ingo Molnar , Oleg Nesterov , "Eric W. Biederman" , Jan Glauber , Rob Landley , Cyrill Gorcunov , "Serge E. Hallyn" , David Rientjes , Hugh Dickins , Rik van Riel , Mel Gorman , Tang Chen , Robin Holt , Shaohua Li , Sasha Levin , Johannes Weiner , Minchan Kim , Sumit Semwal Subject: [PATCH v6 1/3] mm: rearrange madvise code to allow for reuse Date: Tue, 1 Sep 2020 14:48:59 +0530 Message-Id: <20200901091901.19779-2-sumit.semwal@linaro.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200901091901.19779-1-sumit.semwal@linaro.org> References: <20200901091901.19779-1-sumit.semwal@linaro.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 6D7BE180BF305 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Colin Cross Refactor the madvise syscall to allow for parts of it to be reused by a prctl syscall that affects vmas. Move the code that walks vmas in a virtual address range into a function that takes a function pointer as a parameter. The only caller for now is sys_madvise, which uses it to call madvise_vma_behavior on each vma, but the next patch will add an additional caller. Move handling all vma behaviors inside madvise_behavior, and rename it to madvise_vma_behavior. Move the code that updates the flags on a vma, including splitting or merging the vma as necessary, into a new function called madvise_update_vma. The next patch will add support for updating a new anon_name field as well. Signed-off-by: Colin Cross Cc: Pekka Enberg Cc: Dave Hansen Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Jan Glauber Cc: John Stultz Cc: Rob Landley Cc: Cyrill Gorcunov Cc: Kees Cook Cc: "Serge E. Hallyn" Cc: David Rientjes Cc: Al Viro Cc: Hugh Dickins Cc: Rik van Riel Cc: Mel Gorman Cc: Michel Lespinasse Cc: Tang Chen Cc: Robin Holt Cc: Shaohua Li Cc: Sasha Levin Cc: Johannes Weiner Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Sumit Semwal [sumits: rebased over v5.9-rc3] --- mm/madvise.c | 312 +++++++++++++++++++++++++++------------------------ 1 file changed, 168 insertions(+), 144 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index dd1d43cf026d..84482c21b029 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -60,76 +60,20 @@ static int madvise_need_mmap_write(int behavior) } =20 /* - * We can potentially split a vm area into separate - * areas, each area with its own behavior. + * Update the vm_flags on regiion of a vma, splitting it or merging it a= s + * necessary. Must be called with mmap_sem held for writing; */ -static long madvise_behavior(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end, int behavior) +static int madvise_update_vma(struct vm_area_struct *vma, + struct vm_area_struct **prev, unsigned long start, + unsigned long end, unsigned long new_flags) { struct mm_struct *mm =3D vma->vm_mm; - int error =3D 0; + int error; pgoff_t pgoff; - unsigned long new_flags =3D vma->vm_flags; - - switch (behavior) { - case MADV_NORMAL: - new_flags =3D new_flags & ~VM_RAND_READ & ~VM_SEQ_READ; - break; - case MADV_SEQUENTIAL: - new_flags =3D (new_flags & ~VM_RAND_READ) | VM_SEQ_READ; - break; - case MADV_RANDOM: - new_flags =3D (new_flags & ~VM_SEQ_READ) | VM_RAND_READ; - break; - case MADV_DONTFORK: - new_flags |=3D VM_DONTCOPY; - break; - case MADV_DOFORK: - if (vma->vm_flags & VM_IO) { - error =3D -EINVAL; - goto out; - } - new_flags &=3D ~VM_DONTCOPY; - break; - case MADV_WIPEONFORK: - /* MADV_WIPEONFORK is only supported on anonymous memory. */ - if (vma->vm_file || vma->vm_flags & VM_SHARED) { - error =3D -EINVAL; - goto out; - } - new_flags |=3D VM_WIPEONFORK; - break; - case MADV_KEEPONFORK: - new_flags &=3D ~VM_WIPEONFORK; - break; - case MADV_DONTDUMP: - new_flags |=3D VM_DONTDUMP; - break; - case MADV_DODUMP: - if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) { - error =3D -EINVAL; - goto out; - } - new_flags &=3D ~VM_DONTDUMP; - break; - case MADV_MERGEABLE: - case MADV_UNMERGEABLE: - error =3D ksm_madvise(vma, start, end, behavior, &new_flags); - if (error) - goto out_convert_errno; - break; - case MADV_HUGEPAGE: - case MADV_NOHUGEPAGE: - error =3D hugepage_madvise(vma, &new_flags, behavior); - if (error) - goto out_convert_errno; - break; - } =20 if (new_flags =3D=3D vma->vm_flags) { *prev =3D vma; - goto out; + return 0; } =20 pgoff =3D vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); @@ -146,21 +90,21 @@ static long madvise_behavior(struct vm_area_struct *= vma, if (start !=3D vma->vm_start) { if (unlikely(mm->map_count >=3D sysctl_max_map_count)) { error =3D -ENOMEM; - goto out; + return error; } error =3D __split_vma(mm, vma, start, 1); if (error) - goto out_convert_errno; + return error; } =20 if (end !=3D vma->vm_end) { if (unlikely(mm->map_count >=3D sysctl_max_map_count)) { error =3D -ENOMEM; - goto out; + return error; } error =3D __split_vma(mm, vma, end, 0); if (error) - goto out_convert_errno; + return error; } =20 success: @@ -169,15 +113,7 @@ static long madvise_behavior(struct vm_area_struct *= vma, */ vma->vm_flags =3D new_flags; =20 -out_convert_errno: - /* - * madvise() returns EAGAIN if kernel resources, such as - * slab, are temporarily unavailable. - */ - if (error =3D=3D -ENOMEM) - error =3D -EAGAIN; -out: - return error; + return 0; } =20 #ifdef CONFIG_SWAP @@ -862,6 +798,93 @@ static long madvise_remove(struct vm_area_struct *vm= a, return error; } =20 +/* + * Apply an madvise behavior to a region of a vma. madvise_update_vma + * will handle splitting a vm area into separate areas, each area with i= ts own + * behavior. + */ +static int madvise_vma_behavior(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end, + unsigned long behavior) +{ + int error =3D 0; + unsigned long new_flags =3D vma->vm_flags; + + switch (behavior) { + case MADV_REMOVE: + return madvise_remove(vma, prev, start, end); + case MADV_WILLNEED: + return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); + case MADV_FREE: + case MADV_DONTNEED: + return madvise_dontneed_free(vma, prev, start, end, behavior); + case MADV_NORMAL: + new_flags =3D new_flags & ~VM_RAND_READ & ~VM_SEQ_READ; + break; + case MADV_SEQUENTIAL: + new_flags =3D (new_flags & ~VM_RAND_READ) | VM_SEQ_READ; + break; + case MADV_RANDOM: + new_flags =3D (new_flags & ~VM_SEQ_READ) | VM_RAND_READ; + break; + case MADV_DONTFORK: + new_flags |=3D VM_DONTCOPY; + break; + case MADV_DOFORK: + if (vma->vm_flags & VM_IO) { + error =3D -EINVAL; + goto out; + } + new_flags &=3D ~VM_DONTCOPY; + break; + case MADV_WIPEONFORK: + /* MADV_WIPEONFORK is only supported on anonymous memory. */ + if (vma->vm_file || vma->vm_flags & VM_SHARED) { + error =3D -EINVAL; + goto out; + } + new_flags |=3D VM_WIPEONFORK; + break; + case MADV_KEEPONFORK: + new_flags &=3D ~VM_WIPEONFORK; + break; + case MADV_DONTDUMP: + new_flags |=3D VM_DONTDUMP; + break; + case MADV_DODUMP: + if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) { + error =3D -EINVAL; + goto out; + } + new_flags &=3D ~VM_DONTDUMP; + break; + case MADV_MERGEABLE: + case MADV_UNMERGEABLE: + error =3D ksm_madvise(vma, start, end, behavior, &new_flags); + if (error) + goto out; + break; + case MADV_HUGEPAGE: + case MADV_NOHUGEPAGE: + error =3D hugepage_madvise(vma, &new_flags, behavior); + if (error) + goto out; + break; + } + + error =3D madvise_update_vma(vma, prev, start, end, new_flags); + +out: + if (error =3D=3D -ENOMEM) + error =3D -EAGAIN; + return error; +} + #ifdef CONFIG_MEMORY_FAILURE /* * Error injection support for memory error handling. @@ -931,27 +954,6 @@ static int madvise_inject_error(int behavior, } #endif =20 -static long -madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end, int behavior) -{ - switch (behavior) { - case MADV_REMOVE: - return madvise_remove(vma, prev, start, end); - case MADV_WILLNEED: - return madvise_willneed(vma, prev, start, end); - case MADV_COLD: - return madvise_cold(vma, prev, start, end); - case MADV_PAGEOUT: - return madvise_pageout(vma, prev, start, end); - case MADV_FREE: - case MADV_DONTNEED: - return madvise_dontneed_free(vma, prev, start, end, behavior); - default: - return madvise_behavior(vma, prev, start, end, behavior); - } -} - static bool madvise_behavior_valid(int behavior) { @@ -990,6 +992,73 @@ madvise_behavior_valid(int behavior) } } =20 +/* + * Walk the vmas in range [start,end), and call the visit function on ea= ch one. + * The visit function will get start and end parameters that cover the o= verlap + * between the current vma and the original range. Any unmapped regions= in the + * original range will result in this function returning -ENOMEM while s= till + * calling the visit function on all of the existing vmas in the range. + * Must be called with the mmap_lock held for reading or writing. + */ +static +int madvise_walk_vmas(unsigned long start, unsigned long end, + unsigned long arg, + int (*visit)(struct vm_area_struct *vma, + struct vm_area_struct **prev, unsigned long start, + unsigned long end, unsigned long arg)) +{ + struct vm_area_struct *vma; + struct vm_area_struct *prev; + unsigned long tmp; + int unmapped_error =3D 0; + + /* + * If the interval [start,end) covers some unmapped address + * ranges, just ignore them, but return -ENOMEM at the end. + * - different from the way of handling in mlock etc. + */ + vma =3D find_vma_prev(current->mm, start, &prev); + if (vma && start > vma->vm_start) + prev =3D vma; + + for (;;) { + int error; + + /* Still start < end. */ + if (!vma) + return -ENOMEM; + + /* Here start < (end|vma->vm_end). */ + if (start < vma->vm_start) { + unmapped_error =3D -ENOMEM; + start =3D vma->vm_start; + if (start >=3D end) + break; + } + + /* Here vma->vm_start <=3D start < (end|vma->vm_end) */ + tmp =3D vma->vm_end; + if (end < tmp) + tmp =3D end; + + /* Here vma->vm_start <=3D start < tmp <=3D (end|vma->vm_end). */ + error =3D visit(vma, &prev, start, tmp, arg); + if (error) + return error; + start =3D tmp; + if (prev && start < prev->vm_end) + start =3D prev->vm_end; + if (start >=3D end) + break; + if (prev) + vma =3D prev->vm_next; + else /* madvise_remove dropped mmap_lock */ + vma =3D find_vma(current->mm, start); + } + + return unmapped_error; +} + /* * The madvise(2) system call. * @@ -1053,9 +1122,7 @@ madvise_behavior_valid(int behavior) */ int do_madvise(unsigned long start, size_t len_in, int behavior) { - unsigned long end, tmp; - struct vm_area_struct *vma, *prev; - int unmapped_error =3D 0; + unsigned long end; int error =3D -EINVAL; int write; size_t len; @@ -1112,51 +1179,8 @@ int do_madvise(unsigned long start, size_t len_in,= int behavior) mmap_read_lock(current->mm); } =20 - /* - * If the interval [start,end) covers some unmapped address - * ranges, just ignore them, but return -ENOMEM at the end. - * - different from the way of handling in mlock etc. - */ - vma =3D find_vma_prev(current->mm, start, &prev); - if (vma && start > vma->vm_start) - prev =3D vma; - blk_start_plug(&plug); - for (;;) { - /* Still start < end. */ - error =3D -ENOMEM; - if (!vma) - goto out; - - /* Here start < (end|vma->vm_end). */ - if (start < vma->vm_start) { - unmapped_error =3D -ENOMEM; - start =3D vma->vm_start; - if (start >=3D end) - goto out; - } - - /* Here vma->vm_start <=3D start < (end|vma->vm_end) */ - tmp =3D vma->vm_end; - if (end < tmp) - tmp =3D end; - - /* Here vma->vm_start <=3D start < tmp <=3D (end|vma->vm_end). */ - error =3D madvise_vma(vma, &prev, start, tmp, behavior); - if (error) - goto out; - start =3D tmp; - if (prev && start < prev->vm_end) - start =3D prev->vm_end; - error =3D unmapped_error; - if (start >=3D end) - goto out; - if (prev) - vma =3D prev->vm_next; - else /* madvise_remove dropped mmap_lock */ - vma =3D find_vma(current->mm, start); - } -out: + error =3D madvise_walk_vmas(start, end, behavior, madvise_vma_behavior)= ; blk_finish_plug(&plug); if (write) mmap_write_unlock(current->mm); --=20 2.28.0