[RFC 00/13] make direct compaction more deterministic

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 12/13] mm, compaction: more reliably increase direct compaction priority

Michal Hocko-4
On Mon 16-05-16 11:27:56, Vlastimil Babka wrote:
> On 05/16/2016 10:14 AM, Michal Hocko wrote:
> > On Mon 16-05-16 09:31:44, Vlastimil Babka wrote:
[...]
> > > Also my understanding of the initial compaction priorities is to lower the
> > > latency if fragmentation is just light and there's enough memory. Once we
> > > start struggling, I don't see much point in not switching to the full
> > > compaction priority quickly.
> >
> > That is true but why to compact when there are high order pages and they
> > are just hidden by the watermark check.
>
> Compaction should skip such zone regardless of priority.

The point I've tried to raise is that we shouldn't conflate the purpose
of the two. The reclaim is here primarily to get us over the watermarks
while compaction is here to form high order pages. If we get both
together the distinction is blured which, I believe, will lead to more
complicated code in the end. I might be wrong here of course but let's
try to have compaction as much wmark check free as possible.
--
Michal Hocko
SUSE Labs
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 13/13] mm, compaction: fix and improve watermark handling

Michal Hocko-4
In reply to this post by Vlastimil Babka
On Mon 16-05-16 11:50:22, Vlastimil Babka wrote:

> On 05/16/2016 11:25 AM, Michal Hocko wrote:
> > On Tue 10-05-16 09:36:03, Vlastimil Babka wrote:
> > > Compaction has been using watermark checks when deciding whether it was
> > > successful, and whether compaction is at all suitable. There are few problems
> > > with these checks.
> > >
> > > - __compact_finished() uses low watermark in a check that has to pass if
> > >    the direct compaction is to finish and allocation should succeed. This is
> > >    too pessimistic, as the allocation will typically use min watermark. It
> > >    may happen that during compaction, we drop below the low watermark (due to
> > >    parallel activity), but still form the target high-order page. By checking
> > >    against low watermark, we might needlessly continue compaction. After this
> > >    patch, the check uses direct compactor's alloc_flags to determine the
> > >    watermark, which is effectively the min watermark.
> >
> > OK, this makes some sense. It would be great if we could have at least
> > some clarification why the low wmark has been used previously. Probably
> > Mel can remember?
> >
> > > - __compaction_suitable has the same issue in the check whether the allocation
> > >    is already supposed to succeed and we don't need to compact. Fix it the same
> > >    way.
> > >
> > > - __compaction_suitable() then checks the low watermark plus a (2 << order) gap
> > >    to decide if there's enough free memory to perform compaction. This check
> >
> > And this was a real head scratcher when I started looking into the
> > compaction recently. Why do we need to be above low watermark to even
> > start compaction.
>
> Hmm, above you said you're fine with low wmark (maybe after clarification).
> I don't know why it was used, can only guess.

Yes I can imagine this would be a good backoff for costly orders without
__GFP_REPEAT.

> > Compaction uses additional memory only for a short
> > period of time and then releases the already migrated pages.
>
> As for the 2 << order gap. I can imagine that e.g. order-5 compaction (32
> pages) isolates 20 pages for migration and starts looking for free pages. It
> collects 19 free pages and then reaches an order-4 free page. Splitting that
> page to collect it would result in 19+16=35 pages isolated, thus exceed the
> 1 << order gap, and fail. With 2 << order gap, chances of this happening are
> reduced.

OK, fair enough but that sounds like a case which is not worth optimize
and introduce a subtle code for.

[...]

> > > - __isolate_free_page uses low watermark check to decide if free page can be
> > >    isolated. It also doesn't use ALLOC_CMA, so add it for the same reasons.
> >
> > Why do we check the watermark at all? What would happen if this obscure
> > if (!is_migrate_isolate(mt)) was gone? I remember I put some tracing
> > there and it never hit for me even when I was testing close to OOM
> > conditions. Maybe an earlier check bailed out but this code path looks
> > really obscure so it should either deserve a large fat comment or to
> > die.
>
> The check is there so that compaction doesn't exhaust memory below reserves
> during its work, just like any other non-privileged allocation.

Hmm. OK this is a fair point. I would expect that the reclaim preceeding
the compaction would compensate for the temporarily used memory but it
is true that a) we might be in the optimistic async compaction which
happens _before_ the reclaim and b) the reclaim might be not effective
enough so some throttling is indeed appropriate.

I guess you do not want to rely on throttling only at the beginning of
the compaction because it would be too racy, which would be true. So I
guess it would be indeed safer to check for the watermark both when we
attempt to compact and when we isolate free pages. Can we at least use a
common helper so that we know that those checks are done same way?
 
 Thanks!
--
Michal Hocko
SUSE Labs
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 00/13] make direct compaction more deterministic

Michal Hocko-4
In reply to this post by Vlastimil Babka
Btw. I think that first three patches are nice cleanups and easy enough
so I would vote for merging them earlier.
--
Michal Hocko
SUSE Labs
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 00/13] make direct compaction more deterministic

Vlastimil Babka
On 05/17/2016 10:01 PM, Michal Hocko wrote:
> Btw. I think that first three patches are nice cleanups and easy enough
> so I would vote for merging them earlier.

I wouldn't mind if patches 1-3 (note: second version of patch 2 posted
as reply!) went to mmotm now, but it's merge window already, so it's
unlikely to get into 4.7 anyway?
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations

Vlastimil Babka
In reply to this post by Michal Hocko-4
On 05/13/2016 02:05 PM, Michal Hocko wrote:

> On Fri 13-05-16 10:23:31, Vlastimil Babka wrote:
>> On 05/12/2016 06:20 PM, Michal Hocko wrote:
>>> On Tue 10-05-16 09:35:56, Vlastimil Babka wrote:
>>> [...]
>>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>>> index 570383a41853..0cb09714d960 100644
>>>> --- a/include/linux/gfp.h
>>>> +++ b/include/linux/gfp.h
>>>> @@ -256,8 +256,7 @@ struct vm_area_struct;
>>>>    #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
>>>>    #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
>>>>    #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
>>>> - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
>>>> - ~__GFP_RECLAIM)
>>>> + __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
>>>
>>> I am not sure this is the right thing to do. I think we should keep
>>> __GFP_NORETRY and clear it where we want a stronger semantic. This is
>>> just too suble that all callsites are doing the right thing.
>>
>> That would complicate alloc_hugepage_direct_gfpmask() a bit, but if you
>> think it's worth it, I can turn the default around, OK.
>
> Hmm, on the other hand it is true that GFP_TRANSHUGE is clearing both
> reclaim flags by default and then overwrites that. This is just too
> ugly. Can we make GFP_TRANSHUGE to only define flags we care about and
> then tweak those that should go away at the callsites which matter now
> that we do not rely on is_thp_gfp_mask?
 
So the following patch attempts what you suggest, if I understand you
correctly. GFP_TRANSHUGE includes all possible flag, and then they are
removed as needed. I don't really think it helps code readability
though. IMHO it's simpler to define GFP_TRANSHUGE as minimal subset and
only add flags on top. You call the resulting #define ugly, but imho it's
better to have ugliness at a single place, and not at multiple usage places
(see the diff below).

Note that this also affects the printk stuff.
With GFP_TRANSHUGE including all possible flags, it's unlikely printk
will ever print "GFP_TRANSHUGE", since most likely one or more flags
will be always missing.

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..e1998eb5c37f 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -256,8 +256,7 @@ struct vm_area_struct;
 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
 #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
- __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
- ~__GFP_RECLAIM)
+ __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 87f09dc986ab..370fbd3b24dd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -216,7 +216,8 @@ struct page *get_huge_zero_page(void)
  if (likely(atomic_inc_not_zero(&huge_zero_refcount)))
  return READ_ONCE(huge_zero_page);
 
- zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE,
+ zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO)
+ & ~(__GFP_MOVABLE | __GFP_NORETRY),
  HPAGE_PMD_ORDER);
  if (!zero_page) {
  count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED);
@@ -882,9 +883,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 }
 
 /*
- * If THP is set to always then directly reclaim/compact as necessary
- * If set to defer then do no reclaim and defer to khugepaged
+ * If THP defrag is set to always then directly reclaim/compact as necessary
+ * If set to defer then do only background reclaim/compact and defer to khugepaged
  * If set to madvise and the VMA is flagged then directly reclaim/compact
+ * When direct reclaim/compact is allowed, try a bit harder for flagged VMA's
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
@@ -896,15 +898,21 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
  else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
  reclaim_flags = __GFP_KSWAPD_RECLAIM;
  else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
- reclaim_flags = __GFP_DIRECT_RECLAIM;
+ reclaim_flags = __GFP_DIRECT_RECLAIM |
+ ((vma->vm_flags & VM_HUGEPAGE) ? 0 : __GFP_NORETRY);
 
- return GFP_TRANSHUGE | reclaim_flags;
+ return (GFP_TRANSHUGE & ~(__GFP_RECLAIM | __GFP_NORETRY)) | reclaim_flags;
 }
 
 /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
 static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 {
- return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
+ /*
+ * We don't want kswapd reclaim, and if khugepaged/defrag is disabled
+ * we disable also direct reclaim. If we do direct reclaim, do retry.
+ */
+ return GFP_TRANSHUGE & ~(khugepaged_defrag() ?
+ (__GFP_KSWAPD_RECLAIM | __GFP_NORETRY) : __GFP_RECLAIM);
 }
 
 /* Caller must hold page table lock. */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0cee863397e4..4a34187827ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3619,11 +3619,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
  /*
  * Looks like reclaim/compaction is worth trying, but
  * sync compaction could be very expensive, so keep
- * using async compaction, unless it's khugepaged
- * trying to collapse.
+ * using async compaction.
  */
- if (!(current->flags & PF_KTHREAD))
- migration_mode = MIGRATE_ASYNC;
+ migration_mode = MIGRATE_ASYNC;
  }
  }
 
--
2.8.2




Reply | Threaded
Open this post in threaded view
|

Re: [RFC 11/13] mm, compaction: add the ultimate direct compaction priority

Vlastimil Babka
In reply to this post by Vlastimil Babka
On 05/16/2016 09:17 AM, Vlastimil Babka wrote:
>> >Wouldn't it be better to pull the prio check into compaction_deferred
>> >directly? There are more callers and I am not really sure all of them
>> >would behave consistently.
> I'll check, thanks.

Hm so the other callers of compaction_deferred() are in the context
where there's no direct compaction priority set. They would have to pass
something like DEF_COMPACT_PRIORITY. That starts getting subtle so I'd
rather not go that way.
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 13/13] mm, compaction: fix and improve watermark handling

Mel Gorman-4
In reply to this post by Michal Hocko-4
On Mon, May 16, 2016 at 11:25:05AM +0200, Michal Hocko wrote:

> On Tue 10-05-16 09:36:03, Vlastimil Babka wrote:
> > Compaction has been using watermark checks when deciding whether it was
> > successful, and whether compaction is at all suitable. There are few problems
> > with these checks.
> >
> > - __compact_finished() uses low watermark in a check that has to pass if
> >   the direct compaction is to finish and allocation should succeed. This is
> >   too pessimistic, as the allocation will typically use min watermark. It
> >   may happen that during compaction, we drop below the low watermark (due to
> >   parallel activity), but still form the target high-order page. By checking
> >   against low watermark, we might needlessly continue compaction. After this
> >   patch, the check uses direct compactor's alloc_flags to determine the
> >   watermark, which is effectively the min watermark.
>
> OK, this makes some sense. It would be great if we could have at least
> some clarification why the low wmark has been used previously. Probably
> Mel can remember?
>

Two reasons -- it was a very rough estimate of whether enough pages are free
for compaction to have any chance. Secondly, it was to minimise the risk
that compaction would isolate so many pages that the zone was completely
depleted. This was a concern during the initial prototype of compaction.

> > - __compaction_suitable() then checks the low watermark plus a (2 << order) gap
> >   to decide if there's enough free memory to perform compaction. This check
>
> And this was a real head scratcher when I started looking into the
> compaction recently. Why do we need to be above low watermark to even
> start compaction. Compaction uses additional memory only for a short
> period of time and then releases the already migrated pages.
>

Simply minimising the risk that compaction would deplete the entire
zone. Sure, it hands pages back shortly afterwards. At the time of the
initial prototype, page migration was severely broken and the system was
constantly crashing. The cautious checks were left in place after page
migration was fixed as there wasn't a compelling reason to remove them
at the time.

--
Mel Gorman
SUSE Labs
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 13/13] mm, compaction: fix and improve watermark handling

Michal Hocko-4
On Wed 18-05-16 14:50:04, Mel Gorman wrote:

> On Mon, May 16, 2016 at 11:25:05AM +0200, Michal Hocko wrote:
> > On Tue 10-05-16 09:36:03, Vlastimil Babka wrote:
> > > Compaction has been using watermark checks when deciding whether it was
> > > successful, and whether compaction is at all suitable. There are few problems
> > > with these checks.
> > >
> > > - __compact_finished() uses low watermark in a check that has to pass if
> > >   the direct compaction is to finish and allocation should succeed. This is
> > >   too pessimistic, as the allocation will typically use min watermark. It
> > >   may happen that during compaction, we drop below the low watermark (due to
> > >   parallel activity), but still form the target high-order page. By checking
> > >   against low watermark, we might needlessly continue compaction. After this
> > >   patch, the check uses direct compactor's alloc_flags to determine the
> > >   watermark, which is effectively the min watermark.
> >
> > OK, this makes some sense. It would be great if we could have at least
> > some clarification why the low wmark has been used previously. Probably
> > Mel can remember?
> >
>
> Two reasons -- it was a very rough estimate of whether enough pages are free
> for compaction to have any chance. Secondly, it was to minimise the risk
> that compaction would isolate so many pages that the zone was completely
> depleted. This was a concern during the initial prototype of compaction.
>
> > > - __compaction_suitable() then checks the low watermark plus a (2 << order) gap
> > >   to decide if there's enough free memory to perform compaction. This check
> >
> > And this was a real head scratcher when I started looking into the
> > compaction recently. Why do we need to be above low watermark to even
> > start compaction. Compaction uses additional memory only for a short
> > period of time and then releases the already migrated pages.
> >
>
> Simply minimising the risk that compaction would deplete the entire
> zone. Sure, it hands pages back shortly afterwards. At the time of the
> initial prototype, page migration was severely broken and the system was
> constantly crashing. The cautious checks were left in place after page
> migration was fixed as there wasn't a compelling reason to remove them
> at the time.

OK, then moving to min_wmark + bias from low_wmark should work, right?
This would at least remove the discrepancy between the reclaim and
compaction thresholds to some degree. Which is good IMHO.

Thanks!
--
Michal Hocko
SUSE Labs
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 13/13] mm, compaction: fix and improve watermark handling

Mel Gorman-4
On Wed, May 18, 2016 at 04:27:53PM +0200, Michal Hocko wrote:

> > > > - __compaction_suitable() then checks the low watermark plus a (2 << order) gap
> > > >   to decide if there's enough free memory to perform compaction. This check
> > >
> > > And this was a real head scratcher when I started looking into the
> > > compaction recently. Why do we need to be above low watermark to even
> > > start compaction. Compaction uses additional memory only for a short
> > > period of time and then releases the already migrated pages.
> > >
> >
> > Simply minimising the risk that compaction would deplete the entire
> > zone. Sure, it hands pages back shortly afterwards. At the time of the
> > initial prototype, page migration was severely broken and the system was
> > constantly crashing. The cautious checks were left in place after page
> > migration was fixed as there wasn't a compelling reason to remove them
> > at the time.
>
> OK, then moving to min_wmark + bias from low_wmark should work, right?

Yes. I did recall there was another reason but it's marginal. I didn't
want compaction isolation free pages to artifically push a process into
direct reclaim but given that we are likely under memory pressure at
that time anyway, it's unlikely that compaction is the sole reason
processes are entering direct reclaim.

--
Mel Gorman
SUSE Labs
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations

Michal Hocko-4
In reply to this post by Vlastimil Babka
On Wed 18-05-16 13:59:53, Vlastimil Babka wrote:

> On 05/13/2016 02:05 PM, Michal Hocko wrote:
> > On Fri 13-05-16 10:23:31, Vlastimil Babka wrote:
> >> On 05/12/2016 06:20 PM, Michal Hocko wrote:
> >>> On Tue 10-05-16 09:35:56, Vlastimil Babka wrote:
> >>> [...]
> >>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> >>>> index 570383a41853..0cb09714d960 100644
> >>>> --- a/include/linux/gfp.h
> >>>> +++ b/include/linux/gfp.h
> >>>> @@ -256,8 +256,7 @@ struct vm_area_struct;
> >>>>    #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
> >>>>    #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
> >>>>    #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> >>>> - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
> >>>> - ~__GFP_RECLAIM)
> >>>> + __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
> >>>
> >>> I am not sure this is the right thing to do. I think we should keep
> >>> __GFP_NORETRY and clear it where we want a stronger semantic. This is
> >>> just too suble that all callsites are doing the right thing.
> >>
> >> That would complicate alloc_hugepage_direct_gfpmask() a bit, but if you
> >> think it's worth it, I can turn the default around, OK.
> >
> > Hmm, on the other hand it is true that GFP_TRANSHUGE is clearing both
> > reclaim flags by default and then overwrites that. This is just too
> > ugly. Can we make GFP_TRANSHUGE to only define flags we care about and
> > then tweak those that should go away at the callsites which matter now
> > that we do not rely on is_thp_gfp_mask?
>  
> So the following patch attempts what you suggest, if I understand you
> correctly. GFP_TRANSHUGE includes all possible flag, and then they are
> removed as needed. I don't really think it helps code readability
> though.

yeah it is ugly has _hell_. I do not think this deserves too much time
to discuss as the flag is mostly internal but one last proposal would be
to define different THP allocations context explicitly. Some callers
would still need some additional meddling but maybe it would be slightly
better to read. Dunno. Anyway if you think this is not really an
improvement then I won't insist on any change to your original patch.
---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..e7926b466107 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -255,9 +255,14 @@ struct vm_area_struct;
 #define GFP_DMA32 __GFP_DMA32
 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
-#define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+
+/* Optimistic or latency sensitive THP allocation - page fault path */
+#define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
  __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
  ~__GFP_RECLAIM)
+/* More serious THP allocation request - kcompactd */
+#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) & \
+ ~__GFP_NORETRY
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a4d4c807d92..937b89c6c0aa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -216,7 +216,7 @@ struct page *get_huge_zero_page(void)
  if (likely(atomic_inc_not_zero(&huge_zero_refcount)))
  return READ_ONCE(huge_zero_page);
 
- zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE,
+ zero_page = alloc_pages((GFP_TRANSHUGE_LIGHT | __GFP_ZERO) & ~__GFP_MOVABLE,
  HPAGE_PMD_ORDER);
  if (!zero_page) {
  count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED);
@@ -888,23 +888,31 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
- gfp_t reclaim_flags = 0;
+ gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT;
 
  if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
     (vma->vm_flags & VM_HUGEPAGE))
- reclaim_flags = __GFP_DIRECT_RECLAIM;
+ gfp_mask = GFP_TRANSHUGE;
  else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
- reclaim_flags = __GFP_KSWAPD_RECLAIM;
- else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
- reclaim_flags = __GFP_DIRECT_RECLAIM;
+ gfp_mask |= __GFP_KSWAPD_RECLAIM;
+ else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) {
+ if (vm->vm_flags & VM_HUGEPAGE)
+ gfp_mask = GFP_TRANSHUGE;
+ else
+ gfp_mask = GFP_TRANSHUGE | __GFP_NORETRY;
+ }
 
- return GFP_TRANSHUGE | reclaim_flags;
+ return gfp_mask;
 }
 
 /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
 static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 {
- return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
+ gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT;
+ if (khugepaged_defrag())
+ gfp_mask = GFP_TRANSHUGE;
+
+ return gfp_mask;
 }
 
 /* Caller must hold page table lock. */
diff --git a/mm/migrate.c b/mm/migrate.c
index 53ab6398e7a2..1cd5c8c18343 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1771,7 +1771,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
  goto out_dropref;
 
  new_page = alloc_pages_node(node,
- (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+ (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE) & ~__GFP_RECLAIM,
  HPAGE_PMD_ORDER);
  if (!new_page)
  goto out_fail;
--
Michal Hocko
SUSE Labs
Reply | Threaded
Open this post in threaded view
|

Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations

Vlastimil Babka
On 05/18/2016 05:24 PM, Michal Hocko wrote:

>>  
>> So the following patch attempts what you suggest, if I understand you
>> correctly. GFP_TRANSHUGE includes all possible flag, and then they are
>> removed as needed. I don't really think it helps code readability
>> though.
>
> yeah it is ugly has _hell_. I do not think this deserves too much time
> to discuss as the flag is mostly internal but one last proposal would be
> to define different THP allocations context explicitly. Some callers
> would still need some additional meddling but maybe it would be slightly
> better to read. Dunno. Anyway if you think this is not really an
> improvement then I won't insist on any change to your original patch.
> ---
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 570383a41853..e7926b466107 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -255,9 +255,14 @@ struct vm_area_struct;
>   #define GFP_DMA32 __GFP_DMA32
>   #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
>   #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
> -#define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> +
> +/* Optimistic or latency sensitive THP allocation - page fault path */
> +#define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
>   __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
>   ~__GFP_RECLAIM)
> +/* More serious THP allocation request - kcompactd */
> +#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) & \
> + ~__GFP_NORETRY

[...]

OK I took the core idea and arrived at the following. I think it could
work and the amount of further per-site modifications to GFP_TRANSHUGE*
is reduced, but if anyone think it's overkill to have two
GFP_TRANSHUGE*, I will just return to the original patch.

From 48ddb10e96fd9741a9eb3be9672c13589db7239a Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[hidden email]>
Date: Wed, 4 May 2016 13:40:03 +0200
Subject: [PATCH] mm, thp: remove __GFP_NORETRY from khugepaged and madvised
 allocations

After the previous patch, we can distinguish costly allocations that should be
really lightweight, such as THP page faults, with __GFP_NORETRY. This means we
don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can
also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to
try as hard as khugepaged, as the process has indicated that it benefits from
THP's and is willing to pay some initial latency costs.

We can also make the flags handling less cryptic by distinguishing
GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY
or __GFP_KSWAPD_RECLAIM is done where needed.

The patch effectively changes the current GFP_TRANSHUGE users as follows:

* get_huge_zero_page() - the zero page lifetime should be relatively long and
  it's shared by multiple users, so it's worth spending some effort on it.
  We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores
  direct reclaim to this allocation, which was unintentionally removed by
  commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add
  a stall-free defrag option")

* alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not
  an issue. So if khugepaged "defrag" is enabled (the default), do reclaim
  via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check
  from page alloc.
  As a side-effect, khugepaged will now no longer check if the initial
  compaction was deferred or contended. This is OK, as khugepaged sleep times
  between collapsion attemps are long enough to prevent noticeable disruption,
  so we should allow it to spend some effort.

* migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM,
  so just convert to GFP_TRANSHUGE_LIGHT which is equivalent.

* alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are
  now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY
  if direct reclaim/compaction is at all allowed (by default it's allowed only
  for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT).

Signed-off-by: Vlastimil Babka <[hidden email]>
---
 include/linux/gfp.h            | 14 ++++++++------
 include/trace/events/mmflags.h |  1 +
 mm/huge_memory.c               | 27 +++++++++++++++------------
 mm/migrate.c                   |  2 +-
 mm/page_alloc.c                |  6 ++----
 tools/perf/builtin-kmem.c      |  1 +
 6 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..1dfca27df492 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -238,9 +238,11 @@ struct vm_area_struct;
  *   are expected to be movable via page reclaim or page migration. Typically,
  *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
  *
- * GFP_TRANSHUGE is used for THP allocations. They are compound allocations
- *   that will fail quickly if memory is not available and will not wake
- *   kswapd on failure.
+ * GFP_TRANSHUGE and GFP_TRANSHUGE_LIGHT are used for THP allocations. They are
+ *   compound allocations that will generally fail quickly if memory is not
+ *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
+ *   version does not attempt reclaim/compaction at all and is by default used
+ *   in page fault path, while the non-light is used by khugepaged.
  */
 #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -255,9 +257,9 @@ struct vm_area_struct;
 #define GFP_DMA32 __GFP_DMA32
 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
-#define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
- __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
- ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+ __GFP_NOMEMALLOC| __GFP_NOWARN) & ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 43cedbf0c759..5a81ab48a2fb 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -11,6 +11,7 @@
 
 #define __def_gfpflag_names \
  {(unsigned long)GFP_TRANSHUGE, "GFP_TRANSHUGE"}, \
+ {(unsigned long)GFP_TRANSHUGE_LIGHT, "GFP_TRANSHUGE_LIGHT"}, \
  {(unsigned long)GFP_HIGHUSER_MOVABLE, "GFP_HIGHUSER_MOVABLE"},\
  {(unsigned long)GFP_HIGHUSER, "GFP_HIGHUSER"}, \
  {(unsigned long)GFP_USER, "GFP_USER"}, \
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 87f09dc986ab..aa87db8c7f8f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -882,29 +882,32 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 }
 
 /*
- * If THP is set to always then directly reclaim/compact as necessary
- * If set to defer then do no reclaim and defer to khugepaged
+ * If THP defrag is set to always then directly reclaim/compact as necessary
+ * If set to defer then do only background reclaim/compact and defer to khugepaged
  * If set to madvise and the VMA is flagged then directly reclaim/compact
+ * When direct reclaim/compact is allowed, don't retry except for flagged VMA's
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
- gfp_t reclaim_flags = 0;
+ bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
 
- if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
-    (vma->vm_flags & VM_HUGEPAGE))
- reclaim_flags = __GFP_DIRECT_RECLAIM;
- else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
- reclaim_flags = __GFP_KSWAPD_RECLAIM;
- else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
- reclaim_flags = __GFP_DIRECT_RECLAIM;
+ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
+ &transparent_hugepage_flags) && vma_madvised)
+ return GFP_TRANSHUGE;
+ else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
+ &transparent_hugepage_flags))
+ return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
+ else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
+ &transparent_hugepage_flags))
+ return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
 
- return GFP_TRANSHUGE | reclaim_flags;
+ return GFP_TRANSHUGE_LIGHT;
 }
 
 /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
 static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 {
- return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
+ return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
 }
 
 /* Caller must hold page table lock. */
diff --git a/mm/migrate.c b/mm/migrate.c
index 53ab6398e7a2..bc82c56fa3af 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1771,7 +1771,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
  goto out_dropref;
 
  new_page = alloc_pages_node(node,
- (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+ (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
  HPAGE_PMD_ORDER);
  if (!new_page)
  goto out_fail;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0cee863397e4..4a34187827ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3619,11 +3619,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
  /*
  * Looks like reclaim/compaction is worth trying, but
  * sync compaction could be very expensive, so keep
- * using async compaction, unless it's khugepaged
- * trying to collapse.
+ * using async compaction.
  */
- if (!(current->flags & PF_KTHREAD))
- migration_mode = MIGRATE_ASYNC;
+ migration_mode = MIGRATE_ASYNC;
  }
  }
 
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 5da5a9511cef..7fde754b344d 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -608,6 +608,7 @@ static const struct {
  const char *compact;
 } gfp_compact_table[] = {
  { "GFP_TRANSHUGE", "THP" },
+ { "GFP_TRANSHUGE_LIGHT", "THL" },
  { "GFP_HIGHUSER_MOVABLE", "HUM" },
  { "GFP_HIGHUSER", "HU" },
  { "GFP_USER", "U" },
--
2.8.2


Reply | Threaded
Open this post in threaded view
|

Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations

Michal Hocko-4
On Fri 20-05-16 15:57:08, Vlastimil Babka wrote:
[...]

> From: Vlastimil Babka <[hidden email]>
> Date: Wed, 4 May 2016 13:40:03 +0200
> Subject: [PATCH] mm, thp: remove __GFP_NORETRY from khugepaged and madvised
>  allocations
>
> After the previous patch, we can distinguish costly allocations that should be
> really lightweight, such as THP page faults, with __GFP_NORETRY. This means we
> don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can
> also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to
> try as hard as khugepaged, as the process has indicated that it benefits from
> THP's and is willing to pay some initial latency costs.
>
> We can also make the flags handling less cryptic by distinguishing
> GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
> GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY
> or __GFP_KSWAPD_RECLAIM is done where needed.
>
> The patch effectively changes the current GFP_TRANSHUGE users as follows:
>
> * get_huge_zero_page() - the zero page lifetime should be relatively long and
>   it's shared by multiple users, so it's worth spending some effort on it.
>   We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores
>   direct reclaim to this allocation, which was unintentionally removed by
>   commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add
>   a stall-free defrag option")
>
> * alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not
>   an issue. So if khugepaged "defrag" is enabled (the default), do reclaim
>   via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check
>   from page alloc.
>   As a side-effect, khugepaged will now no longer check if the initial
>   compaction was deferred or contended. This is OK, as khugepaged sleep times
>   between collapsion attemps are long enough to prevent noticeable disruption,
>   so we should allow it to spend some effort.
>
> * migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM,
>   so just convert to GFP_TRANSHUGE_LIGHT which is equivalent.
>
> * alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are
>   now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY
>   if direct reclaim/compaction is at all allowed (by default it's allowed only
>   for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT).
>
> Signed-off-by: Vlastimil Babka <[hidden email]>

I like it more than the previous approach.

Acked-by: Michal Hocko <[hidden email]>

Thanks!

> ---
>  include/linux/gfp.h            | 14 ++++++++------
>  include/trace/events/mmflags.h |  1 +
>  mm/huge_memory.c               | 27 +++++++++++++++------------
>  mm/migrate.c                   |  2 +-
>  mm/page_alloc.c                |  6 ++----
>  tools/perf/builtin-kmem.c      |  1 +
>  6 files changed, 28 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 570383a41853..1dfca27df492 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -238,9 +238,11 @@ struct vm_area_struct;
>   *   are expected to be movable via page reclaim or page migration. Typically,
>   *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
>   *
> - * GFP_TRANSHUGE is used for THP allocations. They are compound allocations
> - *   that will fail quickly if memory is not available and will not wake
> - *   kswapd on failure.
> + * GFP_TRANSHUGE and GFP_TRANSHUGE_LIGHT are used for THP allocations. They are
> + *   compound allocations that will generally fail quickly if memory is not
> + *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
> + *   version does not attempt reclaim/compaction at all and is by default used
> + *   in page fault path, while the non-light is used by khugepaged.
>   */
>  #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
>  #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> @@ -255,9 +257,9 @@ struct vm_area_struct;
>  #define GFP_DMA32 __GFP_DMA32
>  #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
>  #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
> -#define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
> - ~__GFP_RECLAIM)
> +#define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> + __GFP_NOMEMALLOC| __GFP_NOWARN) & ~__GFP_RECLAIM)
> +#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
>  
>  /* Convert GFP flags to their corresponding migrate type */
>  #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index 43cedbf0c759..5a81ab48a2fb 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -11,6 +11,7 @@
>  
>  #define __def_gfpflag_names \
>   {(unsigned long)GFP_TRANSHUGE, "GFP_TRANSHUGE"}, \
> + {(unsigned long)GFP_TRANSHUGE_LIGHT, "GFP_TRANSHUGE_LIGHT"}, \
>   {(unsigned long)GFP_HIGHUSER_MOVABLE, "GFP_HIGHUSER_MOVABLE"},\
>   {(unsigned long)GFP_HIGHUSER, "GFP_HIGHUSER"}, \
>   {(unsigned long)GFP_USER, "GFP_USER"}, \
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 87f09dc986ab..aa87db8c7f8f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -882,29 +882,32 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  }
>  
>  /*
> - * If THP is set to always then directly reclaim/compact as necessary
> - * If set to defer then do no reclaim and defer to khugepaged
> + * If THP defrag is set to always then directly reclaim/compact as necessary
> + * If set to defer then do only background reclaim/compact and defer to khugepaged
>   * If set to madvise and the VMA is flagged then directly reclaim/compact
> + * When direct reclaim/compact is allowed, don't retry except for flagged VMA's
>   */
>  static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
>  {
> - gfp_t reclaim_flags = 0;
> + bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
>  
> - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
> -    (vma->vm_flags & VM_HUGEPAGE))
> - reclaim_flags = __GFP_DIRECT_RECLAIM;
> - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
> - reclaim_flags = __GFP_KSWAPD_RECLAIM;
> - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
> - reclaim_flags = __GFP_DIRECT_RECLAIM;
> + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
> + &transparent_hugepage_flags) && vma_madvised)
> + return GFP_TRANSHUGE;
> + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
> + &transparent_hugepage_flags))
> + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
> + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
> + &transparent_hugepage_flags))
> + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
>  
> - return GFP_TRANSHUGE | reclaim_flags;
> + return GFP_TRANSHUGE_LIGHT;
>  }
>  
>  /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
>  static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
>  {
> - return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
> + return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
>  }
>  
>  /* Caller must hold page table lock. */
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 53ab6398e7a2..bc82c56fa3af 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1771,7 +1771,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>   goto out_dropref;
>  
>   new_page = alloc_pages_node(node,
> - (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
> + (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>   HPAGE_PMD_ORDER);
>   if (!new_page)
>   goto out_fail;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0cee863397e4..4a34187827ca 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3619,11 +3619,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>   /*
>   * Looks like reclaim/compaction is worth trying, but
>   * sync compaction could be very expensive, so keep
> - * using async compaction, unless it's khugepaged
> - * trying to collapse.
> + * using async compaction.
>   */
> - if (!(current->flags & PF_KTHREAD))
> - migration_mode = MIGRATE_ASYNC;
> + migration_mode = MIGRATE_ASYNC;
>   }
>   }
>  
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 5da5a9511cef..7fde754b344d 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -608,6 +608,7 @@ static const struct {
>   const char *compact;
>  } gfp_compact_table[] = {
>   { "GFP_TRANSHUGE", "THP" },
> + { "GFP_TRANSHUGE_LIGHT", "THL" },
>   { "GFP_HIGHUSER_MOVABLE", "HUM" },
>   { "GFP_HIGHUSER", "HU" },
>   { "GFP_USER", "U" },
> --
> 2.8.2
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [hidden email].  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[hidden email]"> [hidden email] </a>

--
Michal Hocko
SUSE Labs
123