From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE8F1C433F5 for ; Wed, 9 Mar 2022 00:24:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F4998D0002; Tue, 8 Mar 2022 19:24:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A2FD8D0001; Tue, 8 Mar 2022 19:24:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66BFC8D0002; Tue, 8 Mar 2022 19:24:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 564018D0001 for ; Tue, 8 Mar 2022 19:24:24 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 21281A945A for ; Wed, 9 Mar 2022 00:24:24 +0000 (UTC) X-FDA: 79222951248.21.44D203D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 810B9A0008 for ; Wed, 9 Mar 2022 00:24:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646785463; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MWxGHfzk1NFn72F5fALKy03cifHH/VxoHr37b3M8dWU=; b=Gj5UNRlzvUU27hPsE5B1T4RBMIAp9PjsC8gzLD4ACKBR8nAUx5e8BFM5EDDRzOAQ8VCYeg p4iJKBknWW75Vo894QVmh5iCC24mz9eDlxKocksoQZ6sSBA3z4g9NhOuh0zBo2aWGu/4I/ srZc0fPLXbzywRzuhpOj7vVa9TWDdIc= Received: from mail-io1-f71.google.com (mail-io1-f71.google.com [209.85.166.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-660-snw3gg0BM_KcnbetfAxERQ-1; Tue, 08 Mar 2022 19:24:22 -0500 X-MC-Unique: snw3gg0BM_KcnbetfAxERQ-1 Received: by mail-io1-f71.google.com with SMTP id s14-20020a0566022bce00b00645e9bc9773so601494iov.20 for ; Tue, 08 Mar 2022 16:24:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=MWxGHfzk1NFn72F5fALKy03cifHH/VxoHr37b3M8dWU=; b=gFHR015PAswN7KCdn/9fSMTdm4mN4dl6uVtTABXZCiuSsKsY/SeTSSSF4CWeMgWXXw xvJhs/oW0HBa0EsnmXZi6YxN5Lnzn5r33R1BAZkg17mc9+z2prQFXo3CoaH8dC/WvDp8 40+Aex9hIxIGAkK3ATqHkHE4ayUCevzG6t3hwZmk60mAhWW9y02luzwT0M5Rr7HNMfoF gPH6jdAtMYg2xsE56x3V3t+8Zz53nXCc0tipzec7KAFJ8sN2KcoMOU6BMw/1/Ghz+Zi2 Y9SrX+wq5VezNJrhqD1IJMuNglycljjBv2WMrQSE+CKNp4S9kxl2/hiqh7BVkeKsh3Zs nuvw== X-Gm-Message-State: AOAM5332FwdDnvG9Frgx3bzD+dkU4a9VGIJ3GK8KsRjUGzRBXRMuex8v OtIzR+9W8qCoLU87tUYzqAyHq4tN8FDue4rP3exw3xR6+EP07ybwPPkmiBXyXrP6p8hM/HMbh7K hM5GrxbuKp8c= X-Received: by 2002:a92:c086:0:b0:2c5:eae4:ced7 with SMTP id h6-20020a92c086000000b002c5eae4ced7mr14438113ile.94.1646785461348; Tue, 08 Mar 2022 16:24:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJz5qPapOLi12nNZ07z/TMMkyg3AmpQFXZWg2t4tBSDZxjSFCSg8iVqvI3a66ATXQQJIX0hEPA== X-Received: by 2002:a92:c086:0:b0:2c5:eae4:ced7 with SMTP id h6-20020a92c086000000b002c5eae4ced7mr14438090ile.94.1646785461035; Tue, 08 Mar 2022 16:24:21 -0800 (PST) Received: from ?IPV6:2601:280:4400:a2e0::11d7? ([2601:280:4400:a2e0::11d7]) by smtp.gmail.com with ESMTPSA id a4-20020a5d9544000000b00640a6eb6e1esm201323ios.53.2022.03.08.16.24.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Mar 2022 16:24:20 -0800 (PST) Message-ID: Date: Tue, 8 Mar 2022 17:24:19 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH v3] mm/oom: do not oom reap task with an unresolved robust futex To: Michal Hocko Cc: Waiman Long , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, jsavitz@redhat.com, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@collabora.com References: <20220114180135.83308-1-npache@redhat.com> <43a6c470-9fc2-6195-9a25-5321d17540e5@redhat.com> <118fc685-c68d-614f-006a-7d5487302122@redhat.com> <7f1ba14f-34e8-5f05-53b7-c12913693df8@redhat.com> From: Nico Pache In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 810B9A0008 X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Gj5UNRlz; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of npache@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=npache@redhat.com X-Stat-Signature: 6i8wpndko7atu4yfpcgoaciiih3s9qy8 X-HE-Tag: 1646785463-923222 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/3/22 00:48, Michal Hocko wrote: > On Wed 02-03-22 12:26:45, Nico Pache wrote: >> >> >> On 3/2/22 09:24, Michal Hocko wrote: >>> Sorry, this has slipped through cracks. >>> >>> On Mon 14-02-22 15:39:31, Nico Pache wrote: >>> [...] >>>> We've recently been discussing the following if statement in __oom_reap_task_mm: >>>> if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) >>>> >>>> Given the comment above it, and some of the upstream discussion the original >>>> RFC, we are struggling to see why this should be a `||` and not an `&&`. If we >>>> only want to reap anon memory and reaping shared memory can be dangerous is this >>>> statement incorrect? >>>> >>>> We have a patch queued up to make this change, but wanted to get your opinion on >>>> why this was originally designed this way in case we are missing something. >>> >>> I do not really see why this would be wrong. Private file backed >>> mappings can contain a reapable memory as well. I do not see how this >>> would solve the futex issue >> We were basing our discussion around the following comment: >> /* >> * Only anonymous pages have a good chance to be dropped >> * without additional steps which we cannot afford as we >> * are OOM already. >> * >> * We do not even care about fs backed pages because all >> * which are reclaimable have already been reclaimed and >> * we do not want to block exit_mmap by keeping mm ref >> * count elevated without a good reason. >> */ >> >> So changing to an && would align the functionality with this comment by ignoring >> fs backed pages, and additionally it prevents shared mappings from being reaped. >> We have tested this change and found we can no longer reproduce the issue. In >> our case we allocate the mutex on a MAP_SHARED|MAP_ANONYMOUS mmap so the if- >> statement in question would no longer return true after the && change. >> >> If it is the case that private fs backed pages matter perhaps we want something >> like this: >> if ((vma_is_anonymous(vma) && !(vma->vm_flags & VM_SHARED)) >> ||(!vma_is_anonymous(vma) && !(vma->vm_flags & VM_SHARED))) >> >> or more simply: >> if(!(vma->vm_flags & VM_SHARED)) >> >> to exclude all VM_SHARED mappings. > > I would have to think about that some more but I do not really see how > this is related to the futex issue. In other words what kind of problem > does this solve? > We had a misunderstanding of what vma_is_anonymous actually checks for... It returns true if the VMA is PRIVATE|ANONYMOUS. We may follow up with a patch to change the name of this function or at least add a comment at the top of the function to be more descriptive. Furthermore, we ended up being able to reproduce this issue on the && kernel. We have also found the actual cause of the issue, and we'll post that fix. Its related to the glibc allocation done for pthreads as we discussed earlier in this thread. The mapping that stores the futex robust list is in userspace and a race occurs between the oom_reap_task_mm and the exit path that handles the futex cleanup. Cheers, -- Nico