From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C82DBC433E2 for ; Tue, 1 Sep 2020 16:15:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6D2022078B for ; Tue, 1 Sep 2020 16:15:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="dJKNWgN4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6D2022078B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EDD5290000A; Tue, 1 Sep 2020 12:15:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8C2E900002; Tue, 1 Sep 2020 12:15:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2DC490000A; Tue, 1 Sep 2020 12:15:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0244.hostedemail.com [216.40.44.244]) by kanga.kvack.org (Postfix) with ESMTP id AC7D5900002 for ; Tue, 1 Sep 2020 12:15:31 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 653018248068 for ; Tue, 1 Sep 2020 16:15:31 +0000 (UTC) X-FDA: 77214992862.03.snake00_490dcbb27099 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 2B08A28A4E9 for ; Tue, 1 Sep 2020 16:15:31 +0000 (UTC) X-HE-Tag: snake00_490dcbb27099 X-Filterd-Recvd-Size: 17559 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Tue, 1 Sep 2020 16:15:30 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id f18so1054711pfa.10 for ; Tue, 01 Sep 2020 09:15:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nJVS63TDgknUdWZCiTdwqO9WBbjG7Ud++6PslaqlIIU=; b=dJKNWgN4FO1epusp5or9OOgXJ4SQL0BXsZAw0DeujctQkdyD/oCIXJn9Z6hlkJNI7z TejuchCsleX7LVde83qsAq/IE6sskpofkBz6X4O/0Hwajc04IgBzx8fBVXtkMl8v8Hbq vOfdVNW4F4bCvdF2pFl3+9rh3rRX9QtmSdXN5b8Z/+X5tOWGDP5O9kG2kCHOIGKuik+d evgySkf1GPZEjI7joRodB9U/MhRzqC9mfZ4TM8l8Iu+1lkyt0Cycx/tF13HjM7/ZlmTv GKmrGhQLWK5gsrM9qrd5AMmjGI+DI0+vcUM23LlSUA08ZGlgYGT6pVXFT+0tJLfdEoFX Nspw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nJVS63TDgknUdWZCiTdwqO9WBbjG7Ud++6PslaqlIIU=; b=Mvz9vgWf92DDwT7qp0JOP0PXqHtjy3Qs3g10cPW9SHJwg6BFaISVpBwSNI7AttCg8C HhJjQzkLs1qUtdijWGggEhWgWRp7itw38VgG2wfCaJRteb8VnYuhwhY7wBBPkPXW0Lb+ vSx4rMAjVC4kOp5AIEOzhbiMsXONCF+mQrT03FpQ5OSLlePoCBFFCrhOp0mmRG8DKetO xhYn7PHIrEUmJEJX5mpsiaFPnmLwUH5bqNGp8wRV5U++Ev0R/z0C0cZpXyH3KLaHztxU 53hPwY1p4CPJSBm30T6guyLN2NiS66cUnRFL1hjsr2Bv2hypJaOIfUicjah6xNVp2jf7 KYdw== X-Gm-Message-State: AOAM531vIQWgc8qh0ttjEZH38/GRlH1SUv9PAYUuvTpe6VALGxognldf zKiXABvw6w6J367zFcDzFRvs4w== X-Google-Smtp-Source: ABdhPJy+mW4RJpWDiVkCe9CbglEyBjCqPVdpHIC61+oqnC9xCd3lryxp82/PdpeLocZOyMLl4Z+fYg== X-Received: by 2002:a62:6104:: with SMTP id v4mr2506868pfb.207.1598976929384; Tue, 01 Sep 2020 09:15:29 -0700 (PDT) Received: from nagraj.lan ([175.100.146.50]) by smtp.gmail.com with ESMTPSA id d77sm2553169pfd.121.2020.09.01.09.15.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Sep 2020 09:15:27 -0700 (PDT) From: Sumit Semwal To: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexey Dobriyan , Jonathan Corbet Cc: Mauro Carvalho Chehab , Kees Cook , Michal Hocko , Colin Cross , Alexey Gladkov , Matthew Wilcox , Jason Gunthorpe , "Kirill A . Shutemov" , Michel Lespinasse , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Song Liu , Huang Ying , Vlastimil Babka , Yang Shi , chenqiwu , Mathieu Desnoyers , John Hubbard , Mike Christie , Bart Van Assche , Amit Pundir , Thomas Gleixner , Christian Brauner , Daniel Jordan , Adrian Reber , Nicolas Viennot , Al Viro , linux-fsdevel@vger.kernel.org, John Stultz , Pekka Enberg , Dave Hansen , Peter Zijlstra , Ingo Molnar , Oleg Nesterov , "Eric W. Biederman" , Jan Glauber , Rob Landley , Cyrill Gorcunov , "Serge E. Hallyn" , David Rientjes , Hugh Dickins , Rik van Riel , Mel Gorman , Tang Chen , Robin Holt , Shaohua Li , Sasha Levin , Johannes Weiner , Minchan Kim , Sumit Semwal Subject: [PATCH v7 1/3] mm: rearrange madvise code to allow for reuse Date: Tue, 1 Sep 2020 21:44:57 +0530 Message-Id: <20200901161459.11772-2-sumit.semwal@linaro.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200901161459.11772-1-sumit.semwal@linaro.org> References: <20200901161459.11772-1-sumit.semwal@linaro.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2B08A28A4E9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Colin Cross Refactor the madvise syscall to allow for parts of it to be reused by a prctl syscall that affects vmas. Move the code that walks vmas in a virtual address range into a function that takes a function pointer as a parameter. The only caller for now is sys_madvise, which uses it to call madvise_vma_behavior on each vma, but the next patch will add an additional caller. Move handling all vma behaviors inside madvise_behavior, and rename it to madvise_vma_behavior. Move the code that updates the flags on a vma, including splitting or merging the vma as necessary, into a new function called madvise_update_vma. The next patch will add support for updating a new anon_name field as well. Signed-off-by: Colin Cross Cc: Pekka Enberg Cc: Dave Hansen Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Jan Glauber Cc: John Stultz Cc: Rob Landley Cc: Cyrill Gorcunov Cc: Kees Cook Cc: "Serge E. Hallyn" Cc: David Rientjes Cc: Al Viro Cc: Hugh Dickins Cc: Rik van Riel Cc: Mel Gorman Cc: Michel Lespinasse Cc: Tang Chen Cc: Robin Holt Cc: Shaohua Li Cc: Sasha Levin Cc: Johannes Weiner Cc: Minchan Kim Signed-off-by: Andrew Morton [sumits: rebased over v5.9-rc3] Signed-off-by: Sumit Semwal --- mm/madvise.c | 312 +++++++++++++++++++++++++++------------------------ 1 file changed, 168 insertions(+), 144 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index dd1d43cf026d..84482c21b029 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -60,76 +60,20 @@ static int madvise_need_mmap_write(int behavior) } =20 /* - * We can potentially split a vm area into separate - * areas, each area with its own behavior. + * Update the vm_flags on regiion of a vma, splitting it or merging it a= s + * necessary. Must be called with mmap_sem held for writing; */ -static long madvise_behavior(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end, int behavior) +static int madvise_update_vma(struct vm_area_struct *vma, + struct vm_area_struct **prev, unsigned long start, + unsigned long end, unsigned long new_flags) { struct mm_struct *mm =3D vma->vm_mm; - int error =3D 0; + int error; pgoff_t pgoff; - unsigned long new_flags =3D vma->vm_flags; - - switch (behavior) { - case MADV_NORMAL: - new_flags =3D new_flags & ~VM_RAND_READ & ~VM_SEQ_READ; - break; - case MADV_SEQUENTIAL: - new_flags =3D (new_flags & ~VM_RAND_READ) | VM_SEQ_READ; - break; - case MADV_RANDOM: - new_flags =3D (new_flags & ~VM_SEQ_READ) | VM_RAND_READ; - break; - case MADV_DONTFORK: - new_flags |=3D VM_DONTCOPY; - break; - case MADV_DOFORK: - if (vma->vm_flags & VM_IO) { - error =3D -EINVAL; - goto out; - } - new_flags &=3D ~VM_DONTCOPY; - break; - case MADV_WIPEONFORK: - /* MADV_WIPEONFORK is only supported on anonymous memory. */ - if (vma->vm_file || vma->vm_flags & VM_SHARED) { - error =3D -EINVAL; - goto out; - } - new_flags |=3D VM_WIPEONFORK; - break; - case MADV_KEEPONFORK: - new_flags &=3D ~VM_WIPEONFORK; - break; - case MADV_DONTDUMP: - new_flags |=3D VM_DONTDUMP; - break; - case MADV_DODUMP: - if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) { - error =3D -EINVAL; - goto out; - } - new_flags &=3D ~VM_DONTDUMP; - break; - case MADV_MERGEABLE: - case MADV_UNMERGEABLE: - error =3D ksm_madvise(vma, start, end, behavior, &new_flags); - if (error) - goto out_convert_errno; - break; - case MADV_HUGEPAGE: - case MADV_NOHUGEPAGE: - error =3D hugepage_madvise(vma, &new_flags, behavior); - if (error) - goto out_convert_errno; - break; - } =20 if (new_flags =3D=3D vma->vm_flags) { *prev =3D vma; - goto out; + return 0; } =20 pgoff =3D vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); @@ -146,21 +90,21 @@ static long madvise_behavior(struct vm_area_struct *= vma, if (start !=3D vma->vm_start) { if (unlikely(mm->map_count >=3D sysctl_max_map_count)) { error =3D -ENOMEM; - goto out; + return error; } error =3D __split_vma(mm, vma, start, 1); if (error) - goto out_convert_errno; + return error; } =20 if (end !=3D vma->vm_end) { if (unlikely(mm->map_count >=3D sysctl_max_map_count)) { error =3D -ENOMEM; - goto out; + return error; } error =3D __split_vma(mm, vma, end, 0); if (error) - goto out_convert_errno; + return error; } =20 success: @@ -169,15 +113,7 @@ static long madvise_behavior(struct vm_area_struct *= vma, */ vma->vm_flags =3D new_flags; =20 -out_convert_errno: - /* - * madvise() returns EAGAIN if kernel resources, such as - * slab, are temporarily unavailable. - */ - if (error =3D=3D -ENOMEM) - error =3D -EAGAIN; -out: - return error; + return 0; } =20 #ifdef CONFIG_SWAP @@ -862,6 +798,93 @@ static long madvise_remove(struct vm_area_struct *vm= a, return error; } =20 +/* + * Apply an madvise behavior to a region of a vma. madvise_update_vma + * will handle splitting a vm area into separate areas, each area with i= ts own + * behavior. + */ +static int madvise_vma_behavior(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end, + unsigned long behavior) +{ + int error =3D 0; + unsigned long new_flags =3D vma->vm_flags; + + switch (behavior) { + case MADV_REMOVE: + return madvise_remove(vma, prev, start, end); + case MADV_WILLNEED: + return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); + case MADV_FREE: + case MADV_DONTNEED: + return madvise_dontneed_free(vma, prev, start, end, behavior); + case MADV_NORMAL: + new_flags =3D new_flags & ~VM_RAND_READ & ~VM_SEQ_READ; + break; + case MADV_SEQUENTIAL: + new_flags =3D (new_flags & ~VM_RAND_READ) | VM_SEQ_READ; + break; + case MADV_RANDOM: + new_flags =3D (new_flags & ~VM_SEQ_READ) | VM_RAND_READ; + break; + case MADV_DONTFORK: + new_flags |=3D VM_DONTCOPY; + break; + case MADV_DOFORK: + if (vma->vm_flags & VM_IO) { + error =3D -EINVAL; + goto out; + } + new_flags &=3D ~VM_DONTCOPY; + break; + case MADV_WIPEONFORK: + /* MADV_WIPEONFORK is only supported on anonymous memory. */ + if (vma->vm_file || vma->vm_flags & VM_SHARED) { + error =3D -EINVAL; + goto out; + } + new_flags |=3D VM_WIPEONFORK; + break; + case MADV_KEEPONFORK: + new_flags &=3D ~VM_WIPEONFORK; + break; + case MADV_DONTDUMP: + new_flags |=3D VM_DONTDUMP; + break; + case MADV_DODUMP: + if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) { + error =3D -EINVAL; + goto out; + } + new_flags &=3D ~VM_DONTDUMP; + break; + case MADV_MERGEABLE: + case MADV_UNMERGEABLE: + error =3D ksm_madvise(vma, start, end, behavior, &new_flags); + if (error) + goto out; + break; + case MADV_HUGEPAGE: + case MADV_NOHUGEPAGE: + error =3D hugepage_madvise(vma, &new_flags, behavior); + if (error) + goto out; + break; + } + + error =3D madvise_update_vma(vma, prev, start, end, new_flags); + +out: + if (error =3D=3D -ENOMEM) + error =3D -EAGAIN; + return error; +} + #ifdef CONFIG_MEMORY_FAILURE /* * Error injection support for memory error handling. @@ -931,27 +954,6 @@ static int madvise_inject_error(int behavior, } #endif =20 -static long -madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end, int behavior) -{ - switch (behavior) { - case MADV_REMOVE: - return madvise_remove(vma, prev, start, end); - case MADV_WILLNEED: - return madvise_willneed(vma, prev, start, end); - case MADV_COLD: - return madvise_cold(vma, prev, start, end); - case MADV_PAGEOUT: - return madvise_pageout(vma, prev, start, end); - case MADV_FREE: - case MADV_DONTNEED: - return madvise_dontneed_free(vma, prev, start, end, behavior); - default: - return madvise_behavior(vma, prev, start, end, behavior); - } -} - static bool madvise_behavior_valid(int behavior) { @@ -990,6 +992,73 @@ madvise_behavior_valid(int behavior) } } =20 +/* + * Walk the vmas in range [start,end), and call the visit function on ea= ch one. + * The visit function will get start and end parameters that cover the o= verlap + * between the current vma and the original range. Any unmapped regions= in the + * original range will result in this function returning -ENOMEM while s= till + * calling the visit function on all of the existing vmas in the range. + * Must be called with the mmap_lock held for reading or writing. + */ +static +int madvise_walk_vmas(unsigned long start, unsigned long end, + unsigned long arg, + int (*visit)(struct vm_area_struct *vma, + struct vm_area_struct **prev, unsigned long start, + unsigned long end, unsigned long arg)) +{ + struct vm_area_struct *vma; + struct vm_area_struct *prev; + unsigned long tmp; + int unmapped_error =3D 0; + + /* + * If the interval [start,end) covers some unmapped address + * ranges, just ignore them, but return -ENOMEM at the end. + * - different from the way of handling in mlock etc. + */ + vma =3D find_vma_prev(current->mm, start, &prev); + if (vma && start > vma->vm_start) + prev =3D vma; + + for (;;) { + int error; + + /* Still start < end. */ + if (!vma) + return -ENOMEM; + + /* Here start < (end|vma->vm_end). */ + if (start < vma->vm_start) { + unmapped_error =3D -ENOMEM; + start =3D vma->vm_start; + if (start >=3D end) + break; + } + + /* Here vma->vm_start <=3D start < (end|vma->vm_end) */ + tmp =3D vma->vm_end; + if (end < tmp) + tmp =3D end; + + /* Here vma->vm_start <=3D start < tmp <=3D (end|vma->vm_end). */ + error =3D visit(vma, &prev, start, tmp, arg); + if (error) + return error; + start =3D tmp; + if (prev && start < prev->vm_end) + start =3D prev->vm_end; + if (start >=3D end) + break; + if (prev) + vma =3D prev->vm_next; + else /* madvise_remove dropped mmap_lock */ + vma =3D find_vma(current->mm, start); + } + + return unmapped_error; +} + /* * The madvise(2) system call. * @@ -1053,9 +1122,7 @@ madvise_behavior_valid(int behavior) */ int do_madvise(unsigned long start, size_t len_in, int behavior) { - unsigned long end, tmp; - struct vm_area_struct *vma, *prev; - int unmapped_error =3D 0; + unsigned long end; int error =3D -EINVAL; int write; size_t len; @@ -1112,51 +1179,8 @@ int do_madvise(unsigned long start, size_t len_in,= int behavior) mmap_read_lock(current->mm); } =20 - /* - * If the interval [start,end) covers some unmapped address - * ranges, just ignore them, but return -ENOMEM at the end. - * - different from the way of handling in mlock etc. - */ - vma =3D find_vma_prev(current->mm, start, &prev); - if (vma && start > vma->vm_start) - prev =3D vma; - blk_start_plug(&plug); - for (;;) { - /* Still start < end. */ - error =3D -ENOMEM; - if (!vma) - goto out; - - /* Here start < (end|vma->vm_end). */ - if (start < vma->vm_start) { - unmapped_error =3D -ENOMEM; - start =3D vma->vm_start; - if (start >=3D end) - goto out; - } - - /* Here vma->vm_start <=3D start < (end|vma->vm_end) */ - tmp =3D vma->vm_end; - if (end < tmp) - tmp =3D end; - - /* Here vma->vm_start <=3D start < tmp <=3D (end|vma->vm_end). */ - error =3D madvise_vma(vma, &prev, start, tmp, behavior); - if (error) - goto out; - start =3D tmp; - if (prev && start < prev->vm_end) - start =3D prev->vm_end; - error =3D unmapped_error; - if (start >=3D end) - goto out; - if (prev) - vma =3D prev->vm_next; - else /* madvise_remove dropped mmap_lock */ - vma =3D find_vma(current->mm, start); - } -out: + error =3D madvise_walk_vmas(start, end, behavior, madvise_vma_behavior)= ; blk_finish_plug(&plug); if (write) mmap_write_unlock(current->mm); --=20 2.28.0