[PATCH 00/28] Optimise page alloc/free fast paths v3

classic Classic list List threaded Threaded
80 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[PATCH 00/28] Optimise page alloc/free fast paths v3

Mel Gorman-4
There were no further responses to the last series but I kept going and
added a few more small bits. Most are basic micro-optimisations.  The last
two patches weaken debugging checks to improve performance at the cost of
delayed detection of some use-after-free and memory corruption bugs. If
they make people uncomfortable, they can be dropped and the rest of the
series stands on its own.

Changelog since v2
o Add more micro-optimisations
o Weak debugging checks in favor of speed

Changelog since v1
o Fix an unused variable warning
o Throw in a few optimisations in the bulk pcp free path
o Rebase to 4.6-rc3

Another year, another round of page allocator optimisations focusing this
time on the alloc and free fast paths. This should be of help to workloads
that are allocator-intensive from kernel space where the cost of zeroing
is not nceessraily incurred.

The series is motivated by the observation that page alloc microbenchmarks
on multiple machines regressed between 3.12.44 and 4.4. Second, there is
discussions before LSF/MM considering the possibility of adding another
page allocator which is potentially hazardous but a patch series improving
performance is better than whining.

After the series is applied, there are still hazards.  In the free paths,
the debugging checking and page zone/pageblock lookups dominate but
there was not an obvious solution to that. In the alloc path, the major
contributers are dealing with zonelists, new page preperation, the fair
zone allocation and numerous statistic updates. The fair zone allocator
is removed by the per-node LRU series if that gets merged so it's nor a
major concern at the moment.

On normal userspace benchmarks, there is little impact as the zeroing cost
is significant but it's visible

aim9
                               4.6.0-rc3             4.6.0-rc3
                                 vanilla         deferalloc-v3
Min      page_test   828693.33 (  0.00%)   887060.00 (  7.04%)
Min      brk_test   4847266.67 (  0.00%)  4966266.67 (  2.45%)
Min      exec_test     1271.00 (  0.00%)     1275.67 (  0.37%)
Min      fork_test    12371.75 (  0.00%)    12380.00 (  0.07%)

The overall impact on a page allocator microbenchmark for a range of orders
and number of pages allocated in a batch is

                                          4.6.0-rc3                  4.6.0-rc3
                                             vanilla            deferalloc-v3r7
Min      alloc-odr0-1               428.00 (  0.00%)           316.00 ( 26.17%)
Min      alloc-odr0-2               314.00 (  0.00%)           231.00 ( 26.43%)
Min      alloc-odr0-4               256.00 (  0.00%)           192.00 ( 25.00%)
Min      alloc-odr0-8               222.00 (  0.00%)           166.00 ( 25.23%)
Min      alloc-odr0-16              207.00 (  0.00%)           154.00 ( 25.60%)
Min      alloc-odr0-32              197.00 (  0.00%)           148.00 ( 24.87%)
Min      alloc-odr0-64              193.00 (  0.00%)           144.00 ( 25.39%)
Min      alloc-odr0-128             191.00 (  0.00%)           143.00 ( 25.13%)
Min      alloc-odr0-256             203.00 (  0.00%)           153.00 ( 24.63%)
Min      alloc-odr0-512             212.00 (  0.00%)           165.00 ( 22.17%)
Min      alloc-odr0-1024            221.00 (  0.00%)           172.00 ( 22.17%)
Min      alloc-odr0-2048            225.00 (  0.00%)           179.00 ( 20.44%)
Min      alloc-odr0-4096            232.00 (  0.00%)           185.00 ( 20.26%)
Min      alloc-odr0-8192            235.00 (  0.00%)           187.00 ( 20.43%)
Min      alloc-odr0-16384           236.00 (  0.00%)           188.00 ( 20.34%)
Min      alloc-odr1-1               519.00 (  0.00%)           450.00 ( 13.29%)
Min      alloc-odr1-2               391.00 (  0.00%)           336.00 ( 14.07%)
Min      alloc-odr1-4               313.00 (  0.00%)           268.00 ( 14.38%)
Min      alloc-odr1-8               277.00 (  0.00%)           235.00 ( 15.16%)
Min      alloc-odr1-16              256.00 (  0.00%)           218.00 ( 14.84%)
Min      alloc-odr1-32              252.00 (  0.00%)           212.00 ( 15.87%)
Min      alloc-odr1-64              244.00 (  0.00%)           206.00 ( 15.57%)
Min      alloc-odr1-128             244.00 (  0.00%)           207.00 ( 15.16%)
Min      alloc-odr1-256             243.00 (  0.00%)           207.00 ( 14.81%)
Min      alloc-odr1-512             245.00 (  0.00%)           209.00 ( 14.69%)
Min      alloc-odr1-1024            248.00 (  0.00%)           214.00 ( 13.71%)
Min      alloc-odr1-2048            253.00 (  0.00%)           220.00 ( 13.04%)
Min      alloc-odr1-4096            258.00 (  0.00%)           224.00 ( 13.18%)
Min      alloc-odr1-8192            261.00 (  0.00%)           229.00 ( 12.26%)
Min      alloc-odr2-1               560.00 (  0.00%)           753.00 (-34.46%)
Min      alloc-odr2-2               424.00 (  0.00%)           351.00 ( 17.22%)
Min      alloc-odr2-4               339.00 (  0.00%)           393.00 (-15.93%)
Min      alloc-odr2-8               298.00 (  0.00%)           246.00 ( 17.45%)
Min      alloc-odr2-16              276.00 (  0.00%)           227.00 ( 17.75%)
Min      alloc-odr2-32              271.00 (  0.00%)           221.00 ( 18.45%)
Min      alloc-odr2-64              264.00 (  0.00%)           217.00 ( 17.80%)
Min      alloc-odr2-128             264.00 (  0.00%)           217.00 ( 17.80%)
Min      alloc-odr2-256             264.00 (  0.00%)           218.00 ( 17.42%)
Min      alloc-odr2-512             269.00 (  0.00%)           223.00 ( 17.10%)
Min      alloc-odr2-1024            279.00 (  0.00%)           230.00 ( 17.56%)
Min      alloc-odr2-2048            283.00 (  0.00%)           235.00 ( 16.96%)
Min      alloc-odr2-4096            285.00 (  0.00%)           239.00 ( 16.14%)
Min      alloc-odr3-1               629.00 (  0.00%)           505.00 ( 19.71%)
Min      alloc-odr3-2               472.00 (  0.00%)           374.00 ( 20.76%)
Min      alloc-odr3-4               383.00 (  0.00%)           301.00 ( 21.41%)
Min      alloc-odr3-8               341.00 (  0.00%)           266.00 ( 21.99%)
Min      alloc-odr3-16              316.00 (  0.00%)           248.00 ( 21.52%)
Min      alloc-odr3-32              308.00 (  0.00%)           241.00 ( 21.75%)
Min      alloc-odr3-64              305.00 (  0.00%)           241.00 ( 20.98%)
Min      alloc-odr3-128             308.00 (  0.00%)           244.00 ( 20.78%)
Min      alloc-odr3-256             317.00 (  0.00%)           249.00 ( 21.45%)
Min      alloc-odr3-512             327.00 (  0.00%)           256.00 ( 21.71%)
Min      alloc-odr3-1024            331.00 (  0.00%)           261.00 ( 21.15%)
Min      alloc-odr3-2048            333.00 (  0.00%)           266.00 ( 20.12%)
Min      alloc-odr4-1               767.00 (  0.00%)           572.00 ( 25.42%)
Min      alloc-odr4-2               578.00 (  0.00%)           429.00 ( 25.78%)
Min      alloc-odr4-4               474.00 (  0.00%)           346.00 ( 27.00%)
Min      alloc-odr4-8               422.00 (  0.00%)           310.00 ( 26.54%)
Min      alloc-odr4-16              399.00 (  0.00%)           295.00 ( 26.07%)
Min      alloc-odr4-32              392.00 (  0.00%)           293.00 ( 25.26%)
Min      alloc-odr4-64              394.00 (  0.00%)           293.00 ( 25.63%)
Min      alloc-odr4-128             405.00 (  0.00%)           305.00 ( 24.69%)
Min      alloc-odr4-256             417.00 (  0.00%)           319.00 ( 23.50%)
Min      alloc-odr4-512             425.00 (  0.00%)           326.00 ( 23.29%)
Min      alloc-odr4-1024            426.00 (  0.00%)           329.00 ( 22.77%)
Min      free-odr0-1                216.00 (  0.00%)           178.00 ( 17.59%)
Min      free-odr0-2                152.00 (  0.00%)           125.00 ( 17.76%)
Min      free-odr0-4                120.00 (  0.00%)            99.00 ( 17.50%)
Min      free-odr0-8                106.00 (  0.00%)            85.00 ( 19.81%)
Min      free-odr0-16                97.00 (  0.00%)            80.00 ( 17.53%)
Min      free-odr0-32                92.00 (  0.00%)            76.00 ( 17.39%)
Min      free-odr0-64                89.00 (  0.00%)            74.00 ( 16.85%)
Min      free-odr0-128               89.00 (  0.00%)            73.00 ( 17.98%)
Min      free-odr0-256              107.00 (  0.00%)            90.00 ( 15.89%)
Min      free-odr0-512              117.00 (  0.00%)           108.00 (  7.69%)
Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
Min      free-odr0-2048             132.00 (  0.00%)           125.00 (  5.30%)
Min      free-odr0-4096             135.00 (  0.00%)           130.00 (  3.70%)
Min      free-odr0-8192             137.00 (  0.00%)           130.00 (  5.11%)
Min      free-odr0-16384            137.00 (  0.00%)           131.00 (  4.38%)
Min      free-odr1-1                318.00 (  0.00%)           289.00 (  9.12%)
Min      free-odr1-2                228.00 (  0.00%)           207.00 (  9.21%)
Min      free-odr1-4                182.00 (  0.00%)           165.00 (  9.34%)
Min      free-odr1-8                163.00 (  0.00%)           146.00 ( 10.43%)
Min      free-odr1-16               151.00 (  0.00%)           135.00 ( 10.60%)
Min      free-odr1-32               146.00 (  0.00%)           129.00 ( 11.64%)
Min      free-odr1-64               145.00 (  0.00%)           130.00 ( 10.34%)
Min      free-odr1-128              148.00 (  0.00%)           134.00 (  9.46%)
Min      free-odr1-256              148.00 (  0.00%)           137.00 (  7.43%)
Min      free-odr1-512              151.00 (  0.00%)           140.00 (  7.28%)
Min      free-odr1-1024             154.00 (  0.00%)           143.00 (  7.14%)
Min      free-odr1-2048             156.00 (  0.00%)           144.00 (  7.69%)
Min      free-odr1-4096             156.00 (  0.00%)           142.00 (  8.97%)
Min      free-odr1-8192             156.00 (  0.00%)           140.00 ( 10.26%)
Min      free-odr2-1                361.00 (  0.00%)           457.00 (-26.59%)
Min      free-odr2-2                258.00 (  0.00%)           224.00 ( 13.18%)
Min      free-odr2-4                208.00 (  0.00%)           223.00 ( -7.21%)
Min      free-odr2-8                185.00 (  0.00%)           160.00 ( 13.51%)
Min      free-odr2-16               173.00 (  0.00%)           149.00 ( 13.87%)
Min      free-odr2-32               166.00 (  0.00%)           145.00 ( 12.65%)
Min      free-odr2-64               166.00 (  0.00%)           146.00 ( 12.05%)
Min      free-odr2-128              169.00 (  0.00%)           148.00 ( 12.43%)
Min      free-odr2-256              170.00 (  0.00%)           152.00 ( 10.59%)
Min      free-odr2-512              177.00 (  0.00%)           156.00 ( 11.86%)
Min      free-odr2-1024             182.00 (  0.00%)           162.00 ( 10.99%)
Min      free-odr2-2048             181.00 (  0.00%)           160.00 ( 11.60%)
Min      free-odr2-4096             180.00 (  0.00%)           159.00 ( 11.67%)
Min      free-odr3-1                431.00 (  0.00%)           367.00 ( 14.85%)
Min      free-odr3-2                306.00 (  0.00%)           259.00 ( 15.36%)
Min      free-odr3-4                249.00 (  0.00%)           208.00 ( 16.47%)
Min      free-odr3-8                224.00 (  0.00%)           186.00 ( 16.96%)
Min      free-odr3-16               208.00 (  0.00%)           176.00 ( 15.38%)
Min      free-odr3-32               206.00 (  0.00%)           174.00 ( 15.53%)
Min      free-odr3-64               210.00 (  0.00%)           178.00 ( 15.24%)
Min      free-odr3-128              215.00 (  0.00%)           182.00 ( 15.35%)
Min      free-odr3-256              224.00 (  0.00%)           189.00 ( 15.62%)
Min      free-odr3-512              232.00 (  0.00%)           195.00 ( 15.95%)
Min      free-odr3-1024             230.00 (  0.00%)           195.00 ( 15.22%)
Min      free-odr3-2048             229.00 (  0.00%)           193.00 ( 15.72%)
Min      free-odr4-1                561.00 (  0.00%)           439.00 ( 21.75%)
Min      free-odr4-2                418.00 (  0.00%)           318.00 ( 23.92%)
Min      free-odr4-4                339.00 (  0.00%)           269.00 ( 20.65%)
Min      free-odr4-8                299.00 (  0.00%)           239.00 ( 20.07%)
Min      free-odr4-16               289.00 (  0.00%)           234.00 ( 19.03%)
Min      free-odr4-32               291.00 (  0.00%)           235.00 ( 19.24%)
Min      free-odr4-64               298.00 (  0.00%)           238.00 ( 20.13%)
Min      free-odr4-128              308.00 (  0.00%)           251.00 ( 18.51%)
Min      free-odr4-256              321.00 (  0.00%)           267.00 ( 16.82%)
Min      free-odr4-512              327.00 (  0.00%)           269.00 ( 17.74%)
Min      free-odr4-1024             326.00 (  0.00%)           271.00 ( 16.87%)
Min      total-odr0-1               644.00 (  0.00%)           494.00 ( 23.29%)
Min      total-odr0-2               466.00 (  0.00%)           356.00 ( 23.61%)
Min      total-odr0-4               376.00 (  0.00%)           291.00 ( 22.61%)
Min      total-odr0-8               328.00 (  0.00%)           251.00 ( 23.48%)
Min      total-odr0-16              304.00 (  0.00%)           234.00 ( 23.03%)
Min      total-odr0-32              289.00 (  0.00%)           224.00 ( 22.49%)
Min      total-odr0-64              282.00 (  0.00%)           218.00 ( 22.70%)
Min      total-odr0-128             280.00 (  0.00%)           216.00 ( 22.86%)
Min      total-odr0-256             310.00 (  0.00%)           243.00 ( 21.61%)
Min      total-odr0-512             329.00 (  0.00%)           273.00 ( 17.02%)
Min      total-odr0-1024            346.00 (  0.00%)           290.00 ( 16.18%)
Min      total-odr0-2048            357.00 (  0.00%)           304.00 ( 14.85%)
Min      total-odr0-4096            367.00 (  0.00%)           315.00 ( 14.17%)
Min      total-odr0-8192            372.00 (  0.00%)           317.00 ( 14.78%)
Min      total-odr0-16384           373.00 (  0.00%)           319.00 ( 14.48%)
Min      total-odr1-1               838.00 (  0.00%)           739.00 ( 11.81%)
Min      total-odr1-2               619.00 (  0.00%)           543.00 ( 12.28%)
Min      total-odr1-4               495.00 (  0.00%)           433.00 ( 12.53%)
Min      total-odr1-8               440.00 (  0.00%)           382.00 ( 13.18%)
Min      total-odr1-16              407.00 (  0.00%)           353.00 ( 13.27%)
Min      total-odr1-32              398.00 (  0.00%)           341.00 ( 14.32%)
Min      total-odr1-64              389.00 (  0.00%)           336.00 ( 13.62%)
Min      total-odr1-128             392.00 (  0.00%)           341.00 ( 13.01%)
Min      total-odr1-256             391.00 (  0.00%)           344.00 ( 12.02%)
Min      total-odr1-512             396.00 (  0.00%)           349.00 ( 11.87%)
Min      total-odr1-1024            402.00 (  0.00%)           357.00 ( 11.19%)
Min      total-odr1-2048            409.00 (  0.00%)           364.00 ( 11.00%)
Min      total-odr1-4096            414.00 (  0.00%)           366.00 ( 11.59%)
Min      total-odr1-8192            417.00 (  0.00%)           369.00 ( 11.51%)
Min      total-odr2-1               921.00 (  0.00%)          1210.00 (-31.38%)
Min      total-odr2-2               682.00 (  0.00%)           576.00 ( 15.54%)
Min      total-odr2-4               547.00 (  0.00%)           616.00 (-12.61%)
Min      total-odr2-8               483.00 (  0.00%)           406.00 ( 15.94%)
Min      total-odr2-16              449.00 (  0.00%)           376.00 ( 16.26%)
Min      total-odr2-32              437.00 (  0.00%)           366.00 ( 16.25%)
Min      total-odr2-64              431.00 (  0.00%)           363.00 ( 15.78%)
Min      total-odr2-128             433.00 (  0.00%)           365.00 ( 15.70%)
Min      total-odr2-256             434.00 (  0.00%)           371.00 ( 14.52%)
Min      total-odr2-512             446.00 (  0.00%)           379.00 ( 15.02%)
Min      total-odr2-1024            461.00 (  0.00%)           392.00 ( 14.97%)
Min      total-odr2-2048            464.00 (  0.00%)           395.00 ( 14.87%)
Min      total-odr2-4096            465.00 (  0.00%)           398.00 ( 14.41%)
Min      total-odr3-1              1060.00 (  0.00%)           872.00 ( 17.74%)
Min      total-odr3-2               778.00 (  0.00%)           633.00 ( 18.64%)
Min      total-odr3-4               632.00 (  0.00%)           510.00 ( 19.30%)
Min      total-odr3-8               565.00 (  0.00%)           452.00 ( 20.00%)
Min      total-odr3-16              524.00 (  0.00%)           424.00 ( 19.08%)
Min      total-odr3-32              514.00 (  0.00%)           415.00 ( 19.26%)
Min      total-odr3-64              515.00 (  0.00%)           419.00 ( 18.64%)
Min      total-odr3-128             523.00 (  0.00%)           426.00 ( 18.55%)
Min      total-odr3-256             541.00 (  0.00%)           438.00 ( 19.04%)
Min      total-odr3-512             559.00 (  0.00%)           451.00 ( 19.32%)
Min      total-odr3-1024            561.00 (  0.00%)           456.00 ( 18.72%)
Min      total-odr3-2048            562.00 (  0.00%)           459.00 ( 18.33%)
Min      total-odr4-1              1328.00 (  0.00%)          1011.00 ( 23.87%)
Min      total-odr4-2               997.00 (  0.00%)           747.00 ( 25.08%)
Min      total-odr4-4               813.00 (  0.00%)           615.00 ( 24.35%)
Min      total-odr4-8               721.00 (  0.00%)           550.00 ( 23.72%)
Min      total-odr4-16              689.00 (  0.00%)           529.00 ( 23.22%)
Min      total-odr4-32              683.00 (  0.00%)           528.00 ( 22.69%)
Min      total-odr4-64              692.00 (  0.00%)           531.00 ( 23.27%)
Min      total-odr4-128             713.00 (  0.00%)           556.00 ( 22.02%)
Min      total-odr4-256             738.00 (  0.00%)           586.00 ( 20.60%)
Min      total-odr4-512             753.00 (  0.00%)           595.00 ( 20.98%)
Min      total-odr4-1024            752.00 (  0.00%)           600.00 ( 20.21%)

 fs/buffer.c                |  10 +-
 include/linux/compaction.h |   6 +-
 include/linux/cpuset.h     |  42 ++-
 include/linux/mm.h         |   5 +-
 include/linux/mmzone.h     |  41 ++-
 include/linux/page-flags.h |   7 +-
 include/linux/vmstat.h     |   2 -
 kernel/cpuset.c            |  14 +-
 mm/compaction.c            |  16 +-
 mm/internal.h              |   7 +-
 mm/mempolicy.c             |  19 +-
 mm/mmzone.c                |   2 +-
 mm/page_alloc.c            | 836 +++++++++++++++++++++++++++------------------
 mm/page_owner.c            |   2 +-
 mm/vmstat.c                |  27 +-
 15 files changed, 602 insertions(+), 434 deletions(-)

--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 01/28] mm, page_alloc: Only check PageCompound for high-order pages

Mel Gorman-4
order-0 pages by definition cannot be compound so avoid the check in the
fast path for those pages.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59de90d5d3a3..5d205bcfe10d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1024,24 +1024,33 @@ void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
 
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
- bool compound = PageCompound(page);
- int i, bad = 0;
+ int bad = 0;
 
  VM_BUG_ON_PAGE(PageTail(page), page);
- VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
 
  trace_mm_page_free(page, order);
  kmemcheck_free_shadow(page, order);
  kasan_free_pages(page, order);
 
+ /*
+ * Check tail pages before head page information is cleared to
+ * avoid checking PageCompound for order-0 pages.
+ */
+ if (order) {
+ bool compound = PageCompound(page);
+ int i;
+
+ VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
+
+ for (i = 1; i < (1 << order); i++) {
+ if (compound)
+ bad += free_tail_pages_check(page, page + i);
+ bad += free_pages_check(page + i);
+ }
+ }
  if (PageAnon(page))
  page->mapping = NULL;
  bad += free_pages_check(page);
- for (i = 1; i < (1 << order); i++) {
- if (compound)
- bad += free_tail_pages_check(page, page + i);
- bad += free_pages_check(page + i);
- }
  if (bad)
  return false;
 
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 02/28] mm, page_alloc: Use new PageAnonHead helper in the free page fast path

Mel Gorman-4
In reply to this post by Mel Gorman-4
The PageAnon check always checks for compound_head but this is a relatively
expensive check if the caller already knows the page is a head page. This
patch creates a helper and uses it in the page free path which only operates
on head pages.

With this patch and "Only check PageCompound for high-order pages", the
performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                             vanilla           nocompound-v1r20
Min      alloc-odr0-1               425.00 (  0.00%)           417.00 (  1.88%)
Min      alloc-odr0-2               313.00 (  0.00%)           308.00 (  1.60%)
Min      alloc-odr0-4               257.00 (  0.00%)           253.00 (  1.56%)
Min      alloc-odr0-8               224.00 (  0.00%)           221.00 (  1.34%)
Min      alloc-odr0-16              208.00 (  0.00%)           205.00 (  1.44%)
Min      alloc-odr0-32              199.00 (  0.00%)           199.00 (  0.00%)
Min      alloc-odr0-64              195.00 (  0.00%)           193.00 (  1.03%)
Min      alloc-odr0-128             192.00 (  0.00%)           191.00 (  0.52%)
Min      alloc-odr0-256             204.00 (  0.00%)           200.00 (  1.96%)
Min      alloc-odr0-512             213.00 (  0.00%)           212.00 (  0.47%)
Min      alloc-odr0-1024            219.00 (  0.00%)           219.00 (  0.00%)
Min      alloc-odr0-2048            225.00 (  0.00%)           225.00 (  0.00%)
Min      alloc-odr0-4096            230.00 (  0.00%)           231.00 ( -0.43%)
Min      alloc-odr0-8192            235.00 (  0.00%)           234.00 (  0.43%)
Min      alloc-odr0-16384           235.00 (  0.00%)           234.00 (  0.43%)
Min      free-odr0-1                215.00 (  0.00%)           191.00 ( 11.16%)
Min      free-odr0-2                152.00 (  0.00%)           136.00 ( 10.53%)
Min      free-odr0-4                119.00 (  0.00%)           107.00 ( 10.08%)
Min      free-odr0-8                106.00 (  0.00%)            96.00 (  9.43%)
Min      free-odr0-16                97.00 (  0.00%)            87.00 ( 10.31%)
Min      free-odr0-32                91.00 (  0.00%)            83.00 (  8.79%)
Min      free-odr0-64                89.00 (  0.00%)            81.00 (  8.99%)
Min      free-odr0-128               88.00 (  0.00%)            80.00 (  9.09%)
Min      free-odr0-256              106.00 (  0.00%)            95.00 ( 10.38%)
Min      free-odr0-512              116.00 (  0.00%)           111.00 (  4.31%)
Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
Min      free-odr0-2048             133.00 (  0.00%)           126.00 (  5.26%)
Min      free-odr0-4096             136.00 (  0.00%)           130.00 (  4.41%)
Min      free-odr0-8192             138.00 (  0.00%)           130.00 (  5.80%)
Min      free-odr0-16384            137.00 (  0.00%)           130.00 (  5.11%)

There is a sizable boost to the free allocator performance. While there
is an apparent boost on the allocation side, it's likely a co-incidence
or due to the patches slightly reducing cache footprint.

Signed-off-by: Mel Gorman <[hidden email]>
---
 include/linux/page-flags.h | 7 ++++++-
 mm/page_alloc.c            | 2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f4ed4f1b0c77..ccd04ee1ba2d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -371,10 +371,15 @@ PAGEFLAG(Idle, idle, PF_ANY)
 #define PAGE_MAPPING_KSM 2
 #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
 
+static __always_inline int PageAnonHead(struct page *page)
+{
+ return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
 static __always_inline int PageAnon(struct page *page)
 {
  page = compound_head(page);
- return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+ return PageAnonHead(page);
 }
 
 #ifdef CONFIG_KSM
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d205bcfe10d..6812de41f698 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1048,7 +1048,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
  bad += free_pages_check(page + i);
  }
  }
- if (PageAnon(page))
+ if (PageAnonHead(page))
  page->mapping = NULL;
  bad += free_pages_check(page);
  if (bad)
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 03/28] mm, page_alloc: Reduce branches in zone_statistics

Mel Gorman-4
In reply to this post by Mel Gorman-4
zone_statistics has more branches than it really needs to take an
unlikely GFP flag into account. Reduce the number and annotate
the unlikely flag.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                    nocompound-v1r10           statbranch-v1r10
Min      alloc-odr0-1               417.00 (  0.00%)           419.00 ( -0.48%)
Min      alloc-odr0-2               308.00 (  0.00%)           305.00 (  0.97%)
Min      alloc-odr0-4               253.00 (  0.00%)           250.00 (  1.19%)
Min      alloc-odr0-8               221.00 (  0.00%)           219.00 (  0.90%)
Min      alloc-odr0-16              205.00 (  0.00%)           203.00 (  0.98%)
Min      alloc-odr0-32              199.00 (  0.00%)           195.00 (  2.01%)
Min      alloc-odr0-64              193.00 (  0.00%)           191.00 (  1.04%)
Min      alloc-odr0-128             191.00 (  0.00%)           189.00 (  1.05%)
Min      alloc-odr0-256             200.00 (  0.00%)           198.00 (  1.00%)
Min      alloc-odr0-512             212.00 (  0.00%)           210.00 (  0.94%)
Min      alloc-odr0-1024            219.00 (  0.00%)           216.00 (  1.37%)
Min      alloc-odr0-2048            225.00 (  0.00%)           221.00 (  1.78%)
Min      alloc-odr0-4096            231.00 (  0.00%)           227.00 (  1.73%)
Min      alloc-odr0-8192            234.00 (  0.00%)           232.00 (  0.85%)
Min      alloc-odr0-16384           234.00 (  0.00%)           232.00 (  0.85%)

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/vmstat.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 5e4300482897..2e58ead9bcf5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -581,17 +581,21 @@ void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset)
  */
 void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags)
 {
- if (z->zone_pgdat == preferred_zone->zone_pgdat) {
+ int local_nid = numa_node_id();
+ enum zone_stat_item local_stat = NUMA_LOCAL;
+
+ if (unlikely(flags & __GFP_OTHER_NODE)) {
+ local_stat = NUMA_OTHER;
+ local_nid = preferred_zone->node;
+ }
+
+ if (z->node == local_nid) {
  __inc_zone_state(z, NUMA_HIT);
+ __inc_zone_state(z, local_stat);
  } else {
  __inc_zone_state(z, NUMA_MISS);
  __inc_zone_state(preferred_zone, NUMA_FOREIGN);
  }
- if (z->node == ((flags & __GFP_OTHER_NODE) ?
- preferred_zone->node : numa_node_id()))
- __inc_zone_state(z, NUMA_LOCAL);
- else
- __inc_zone_state(z, NUMA_OTHER);
 }
 
 /*
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 04/28] mm, page_alloc: Inline zone_statistics

Mel Gorman-4
In reply to this post by Mel Gorman-4
zone_statistics has one call-site but it's a public function. Make
it static and inline.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                    statbranch-v1r20           statinline-v1r20
Min      alloc-odr0-1               419.00 (  0.00%)           412.00 (  1.67%)
Min      alloc-odr0-2               305.00 (  0.00%)           301.00 (  1.31%)
Min      alloc-odr0-4               250.00 (  0.00%)           247.00 (  1.20%)
Min      alloc-odr0-8               219.00 (  0.00%)           215.00 (  1.83%)
Min      alloc-odr0-16              203.00 (  0.00%)           199.00 (  1.97%)
Min      alloc-odr0-32              195.00 (  0.00%)           191.00 (  2.05%)
Min      alloc-odr0-64              191.00 (  0.00%)           187.00 (  2.09%)
Min      alloc-odr0-128             189.00 (  0.00%)           185.00 (  2.12%)
Min      alloc-odr0-256             198.00 (  0.00%)           193.00 (  2.53%)
Min      alloc-odr0-512             210.00 (  0.00%)           207.00 (  1.43%)
Min      alloc-odr0-1024            216.00 (  0.00%)           213.00 (  1.39%)
Min      alloc-odr0-2048            221.00 (  0.00%)           220.00 (  0.45%)
Min      alloc-odr0-4096            227.00 (  0.00%)           226.00 (  0.44%)
Min      alloc-odr0-8192            232.00 (  0.00%)           229.00 (  1.29%)
Min      alloc-odr0-16384           232.00 (  0.00%)           229.00 (  1.29%)

Signed-off-by: Mel Gorman <[hidden email]>
---
 include/linux/vmstat.h |  2 --
 mm/page_alloc.c        | 31 +++++++++++++++++++++++++++++++
 mm/vmstat.c            | 29 -----------------------------
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 73fae8c4a5fb..152d26b7f972 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -163,12 +163,10 @@ static inline unsigned long zone_page_state_snapshot(struct zone *zone,
 #ifdef CONFIG_NUMA
 
 extern unsigned long node_page_state(int node, enum zone_stat_item item);
-extern void zone_statistics(struct zone *, struct zone *, gfp_t gfp);
 
 #else
 
 #define node_page_state(node, item) global_page_state(item)
-#define zone_statistics(_zl, _z, gfp) do { } while (0)
 
 #endif /* CONFIG_NUMA */
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6812de41f698..b56c2b2911a2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2352,6 +2352,37 @@ int split_free_page(struct page *page)
 }
 
 /*
+ * Update NUMA hit/miss statistics
+ *
+ * Must be called with interrupts disabled.
+ *
+ * When __GFP_OTHER_NODE is set assume the node of the preferred
+ * zone is the local node. This is useful for daemons who allocate
+ * memory on behalf of other processes.
+ */
+static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
+ gfp_t flags)
+{
+#ifdef CONFIG_NUMA
+ int local_nid = numa_node_id();
+ enum zone_stat_item local_stat = NUMA_LOCAL;
+
+ if (unlikely(flags & __GFP_OTHER_NODE)) {
+ local_stat = NUMA_OTHER;
+ local_nid = preferred_zone->node;
+ }
+
+ if (z->node == local_nid) {
+ __inc_zone_state(z, NUMA_HIT);
+ __inc_zone_state(z, local_stat);
+ } else {
+ __inc_zone_state(z, NUMA_MISS);
+ __inc_zone_state(preferred_zone, NUMA_FOREIGN);
+ }
+#endif
+}
+
+/*
  * Allocate a page from the given zone. Use pcplists for order-0 allocations.
  */
 static inline
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 2e58ead9bcf5..a4bda11eac8d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -570,35 +570,6 @@ void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset)
 
 #ifdef CONFIG_NUMA
 /*
- * zonelist = the list of zones passed to the allocator
- * z    = the zone from which the allocation occurred.
- *
- * Must be called with interrupts disabled.
- *
- * When __GFP_OTHER_NODE is set assume the node of the preferred
- * zone is the local node. This is useful for daemons who allocate
- * memory on behalf of other processes.
- */
-void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags)
-{
- int local_nid = numa_node_id();
- enum zone_stat_item local_stat = NUMA_LOCAL;
-
- if (unlikely(flags & __GFP_OTHER_NODE)) {
- local_stat = NUMA_OTHER;
- local_nid = preferred_zone->node;
- }
-
- if (z->node == local_nid) {
- __inc_zone_state(z, NUMA_HIT);
- __inc_zone_state(z, local_stat);
- } else {
- __inc_zone_state(z, NUMA_MISS);
- __inc_zone_state(preferred_zone, NUMA_FOREIGN);
- }
-}
-
-/*
  * Determine the per node value of a stat item.
  */
 unsigned long node_page_state(int node, enum zone_stat_item item)
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 05/28] mm, page_alloc: Inline the fast path of the zonelist iterator

Mel Gorman-4
In reply to this post by Mel Gorman-4
The page allocator iterates through a zonelist for zones that match
the addressing limitations and nodemask of the caller but many allocations
will not be restricted. Despite this, there is always functional call
overhead which builds up.

This patch inlines the optimistic basic case and only calls the
iterator function for the complex case. A hindrance was the fact that
cpuset_current_mems_allowed is used in the fastpath as the allowed nodemask
even though all nodes are allowed on most systems. The patch handles this
by only considering cpuset_current_mems_allowed if a cpuset exists. As well
as being faster in the fast-path, this removes some junk in the slowpath.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                    statinline-v1r20              optiter-v1r20
Min      alloc-odr0-1               412.00 (  0.00%)           382.00 (  7.28%)
Min      alloc-odr0-2               301.00 (  0.00%)           282.00 (  6.31%)
Min      alloc-odr0-4               247.00 (  0.00%)           233.00 (  5.67%)
Min      alloc-odr0-8               215.00 (  0.00%)           203.00 (  5.58%)
Min      alloc-odr0-16              199.00 (  0.00%)           188.00 (  5.53%)
Min      alloc-odr0-32              191.00 (  0.00%)           182.00 (  4.71%)
Min      alloc-odr0-64              187.00 (  0.00%)           177.00 (  5.35%)
Min      alloc-odr0-128             185.00 (  0.00%)           175.00 (  5.41%)
Min      alloc-odr0-256             193.00 (  0.00%)           184.00 (  4.66%)
Min      alloc-odr0-512             207.00 (  0.00%)           197.00 (  4.83%)
Min      alloc-odr0-1024            213.00 (  0.00%)           203.00 (  4.69%)
Min      alloc-odr0-2048            220.00 (  0.00%)           209.00 (  5.00%)
Min      alloc-odr0-4096            226.00 (  0.00%)           214.00 (  5.31%)
Min      alloc-odr0-8192            229.00 (  0.00%)           218.00 (  4.80%)
Min      alloc-odr0-16384           229.00 (  0.00%)           219.00 (  4.37%)

perf indicated that next_zones_zonelist disappeared in the profile and
__next_zones_zonelist did not appear. This is expected as the micro-benchmark
would hit the inlined fast-path every time.

Signed-off-by: Mel Gorman <[hidden email]>
---
 include/linux/mmzone.h | 13 +++++++++++--
 mm/mmzone.c            |  2 +-
 mm/page_alloc.c        | 26 +++++++++-----------------
 3 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c60df9257cc7..0c4d5ebb3849 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -922,6 +922,10 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
 #endif /* CONFIG_NUMA */
 }
 
+struct zoneref *__next_zones_zonelist(struct zoneref *z,
+ enum zone_type highest_zoneidx,
+ nodemask_t *nodes);
+
 /**
  * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point
  * @z - The cursor used as a starting point for the search
@@ -934,9 +938,14 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
  * being examined. It should be advanced by one before calling
  * next_zones_zonelist again.
  */
-struct zoneref *next_zones_zonelist(struct zoneref *z,
+static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z,
  enum zone_type highest_zoneidx,
- nodemask_t *nodes);
+ nodemask_t *nodes)
+{
+ if (likely(!nodes && zonelist_zone_idx(z) <= highest_zoneidx))
+ return z;
+ return __next_zones_zonelist(z, highest_zoneidx, nodes);
+}
 
 /**
  * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
diff --git a/mm/mmzone.c b/mm/mmzone.c
index 52687fb4de6f..5652be858e5e 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -52,7 +52,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes)
 }
 
 /* Returns the next zone at or below highest_zoneidx in a zonelist */
-struct zoneref *next_zones_zonelist(struct zoneref *z,
+struct zoneref *__next_zones_zonelist(struct zoneref *z,
  enum zone_type highest_zoneidx,
  nodemask_t *nodes)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b56c2b2911a2..e9acc0b0f787 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3193,17 +3193,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
  */
  alloc_flags = gfp_to_alloc_flags(gfp_mask);
 
- /*
- * Find the true preferred zone if the allocation is unconstrained by
- * cpusets.
- */
- if (!(alloc_flags & ALLOC_CPUSET) && !ac->nodemask) {
- struct zoneref *preferred_zoneref;
- preferred_zoneref = first_zones_zonelist(ac->zonelist,
- ac->high_zoneidx, NULL, &ac->preferred_zone);
- ac->classzone_idx = zonelist_zone_idx(preferred_zoneref);
- }
-
  /* This is the last chance, in general, before the goto nopage. */
  page = get_page_from_freelist(gfp_mask, order,
  alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
@@ -3359,14 +3348,21 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  struct zoneref *preferred_zoneref;
  struct page *page = NULL;
  unsigned int cpuset_mems_cookie;
- int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
+ int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
  gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
  struct alloc_context ac = {
  .high_zoneidx = gfp_zone(gfp_mask),
+ .zonelist = zonelist,
  .nodemask = nodemask,
  .migratetype = gfpflags_to_migratetype(gfp_mask),
  };
 
+ if (cpusets_enabled()) {
+ alloc_flags |= ALLOC_CPUSET;
+ if (!ac.nodemask)
+ ac.nodemask = &cpuset_current_mems_allowed;
+ }
+
  gfp_mask &= gfp_allowed_mask;
 
  lockdep_trace_alloc(gfp_mask);
@@ -3390,16 +3386,12 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 retry_cpuset:
  cpuset_mems_cookie = read_mems_allowed_begin();
 
- /* We set it here, as __alloc_pages_slowpath might have changed it */
- ac.zonelist = zonelist;
-
  /* Dirty zone balancing only done in the fast path */
  ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
 
  /* The preferred zone is used for statistics later */
  preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
- ac.nodemask ? : &cpuset_current_mems_allowed,
- &ac.preferred_zone);
+ ac.nodemask, &ac.preferred_zone);
  if (!ac.preferred_zone)
  goto out;
  ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 06/28] mm, page_alloc: Use __dec_zone_state for order-0 page allocation

Mel Gorman-4
In reply to this post by Mel Gorman-4
__dec_zone_state is cheaper to use for removing an order-0 page as it
has fewer conditions to check.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                       optiter-v1r20              decstat-v1r20
Min      alloc-odr0-1               382.00 (  0.00%)           381.00 (  0.26%)
Min      alloc-odr0-2               282.00 (  0.00%)           275.00 (  2.48%)
Min      alloc-odr0-4               233.00 (  0.00%)           229.00 (  1.72%)
Min      alloc-odr0-8               203.00 (  0.00%)           199.00 (  1.97%)
Min      alloc-odr0-16              188.00 (  0.00%)           186.00 (  1.06%)
Min      alloc-odr0-32              182.00 (  0.00%)           179.00 (  1.65%)
Min      alloc-odr0-64              177.00 (  0.00%)           174.00 (  1.69%)
Min      alloc-odr0-128             175.00 (  0.00%)           172.00 (  1.71%)
Min      alloc-odr0-256             184.00 (  0.00%)           181.00 (  1.63%)
Min      alloc-odr0-512             197.00 (  0.00%)           193.00 (  2.03%)
Min      alloc-odr0-1024            203.00 (  0.00%)           201.00 (  0.99%)
Min      alloc-odr0-2048            209.00 (  0.00%)           206.00 (  1.44%)
Min      alloc-odr0-4096            214.00 (  0.00%)           212.00 (  0.93%)
Min      alloc-odr0-8192            218.00 (  0.00%)           215.00 (  1.38%)
Min      alloc-odr0-16384           219.00 (  0.00%)           216.00 (  1.37%)

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e9acc0b0f787..ab16560b76e6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2414,6 +2414,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
  else
  page = list_first_entry(list, struct page, lru);
 
+ __dec_zone_state(zone, NR_ALLOC_BATCH);
  list_del(&page->lru);
  pcp->count--;
  } else {
@@ -2435,11 +2436,11 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
  spin_unlock(&zone->lock);
  if (!page)
  goto failed;
+ __mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
  __mod_zone_freepage_state(zone, -(1 << order),
   get_pcppage_migratetype(page));
  }
 
- __mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
  if (atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]) <= 0 &&
     !test_bit(ZONE_FAIR_DEPLETED, &zone->flags))
  set_bit(ZONE_FAIR_DEPLETED, &zone->flags);
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 07/28] mm, page_alloc: Avoid unnecessary zone lookups during pageblock operations

Mel Gorman-4
In reply to this post by Mel Gorman-4
Pageblocks have an associated bitmap to store migrate types and whether
the pageblock should be skipped during compaction. The bitmap may be
associated with a memory section or a zone but the zone is looked up
unconditionally. The compiler should optimise this away automatically so
this is a cosmetic patch only in many cases.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ab16560b76e6..d00847bb1612 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6759,23 +6759,23 @@ void *__init alloc_large_system_hash(const char *tablename,
 }
 
 /* Return a pointer to the bitmap storing bits affecting a block of pages */
-static inline unsigned long *get_pageblock_bitmap(struct zone *zone,
+static inline unsigned long *get_pageblock_bitmap(struct page *page,
  unsigned long pfn)
 {
 #ifdef CONFIG_SPARSEMEM
  return __pfn_to_section(pfn)->pageblock_flags;
 #else
- return zone->pageblock_flags;
+ return page_zone(page)->pageblock_flags;
 #endif /* CONFIG_SPARSEMEM */
 }
 
-static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn)
+static inline int pfn_to_bitidx(struct page *page, unsigned long pfn)
 {
 #ifdef CONFIG_SPARSEMEM
  pfn &= (PAGES_PER_SECTION-1);
  return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
 #else
- pfn = pfn - round_down(zone->zone_start_pfn, pageblock_nr_pages);
+ pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages);
  return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
 #endif /* CONFIG_SPARSEMEM */
 }
@@ -6793,14 +6793,12 @@ unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn,
  unsigned long end_bitidx,
  unsigned long mask)
 {
- struct zone *zone;
  unsigned long *bitmap;
  unsigned long bitidx, word_bitidx;
  unsigned long word;
 
- zone = page_zone(page);
- bitmap = get_pageblock_bitmap(zone, pfn);
- bitidx = pfn_to_bitidx(zone, pfn);
+ bitmap = get_pageblock_bitmap(page, pfn);
+ bitidx = pfn_to_bitidx(page, pfn);
  word_bitidx = bitidx / BITS_PER_LONG;
  bitidx &= (BITS_PER_LONG-1);
 
@@ -6822,20 +6820,18 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
  unsigned long end_bitidx,
  unsigned long mask)
 {
- struct zone *zone;
  unsigned long *bitmap;
  unsigned long bitidx, word_bitidx;
  unsigned long old_word, word;
 
  BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
 
- zone = page_zone(page);
- bitmap = get_pageblock_bitmap(zone, pfn);
- bitidx = pfn_to_bitidx(zone, pfn);
+ bitmap = get_pageblock_bitmap(page, pfn);
+ bitidx = pfn_to_bitidx(page, pfn);
  word_bitidx = bitidx / BITS_PER_LONG;
  bitidx &= (BITS_PER_LONG-1);
 
- VM_BUG_ON_PAGE(!zone_spans_pfn(zone, pfn), page);
+ VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
 
  bitidx += end_bitidx;
  mask <<= (BITS_PER_LONG - bitidx - 1);
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 08/28] mm, page_alloc: Convert alloc_flags to unsigned

Mel Gorman-4
In reply to this post by Mel Gorman-4
alloc_flags is a bitmask of flags but it is signed which does not
necessarily generate the best code depending on the compiler. Even
without an impact, it makes more sense that this be unsigned.

Signed-off-by: Mel Gorman <[hidden email]>
---
 include/linux/compaction.h |  6 +++---
 include/linux/mmzone.h     |  3 ++-
 mm/compaction.c            | 12 +++++++-----
 mm/internal.h              |  2 +-
 mm/page_alloc.c            | 26 ++++++++++++++------------
 5 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index d7c8de583a23..242b660f64e6 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -39,12 +39,12 @@ extern int sysctl_compact_unevictable_allowed;
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
- int alloc_flags, const struct alloc_context *ac,
- enum migrate_mode mode, int *contended);
+ unsigned int alloc_flags, const struct alloc_context *ac,
+ enum migrate_mode mode, int *contended);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern unsigned long compaction_suitable(struct zone *zone, int order,
- int alloc_flags, int classzone_idx);
+ unsigned int alloc_flags, int classzone_idx);
 
 extern void defer_compaction(struct zone *zone, int order);
 extern bool compaction_deferred(struct zone *zone, int order);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0c4d5ebb3849..f49bb9add372 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -747,7 +747,8 @@ extern struct mutex zonelists_mutex;
 void build_all_zonelists(pg_data_t *pgdat, struct zone *zone);
 void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx);
 bool zone_watermark_ok(struct zone *z, unsigned int order,
- unsigned long mark, int classzone_idx, int alloc_flags);
+ unsigned long mark, int classzone_idx,
+ unsigned int alloc_flags);
 bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
  unsigned long mark, int classzone_idx);
 enum memmap_context {
diff --git a/mm/compaction.c b/mm/compaction.c
index ccf97b02b85f..244bb669b5a6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1259,7 +1259,8 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
  *   COMPACT_CONTINUE - If compaction should run now
  */
 static unsigned long __compaction_suitable(struct zone *zone, int order,
- int alloc_flags, int classzone_idx)
+ unsigned int alloc_flags,
+ int classzone_idx)
 {
  int fragindex;
  unsigned long watermark;
@@ -1304,7 +1305,8 @@ static unsigned long __compaction_suitable(struct zone *zone, int order,
 }
 
 unsigned long compaction_suitable(struct zone *zone, int order,
- int alloc_flags, int classzone_idx)
+ unsigned int alloc_flags,
+ int classzone_idx)
 {
  unsigned long ret;
 
@@ -1464,7 +1466,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 static unsigned long compact_zone_order(struct zone *zone, int order,
  gfp_t gfp_mask, enum migrate_mode mode, int *contended,
- int alloc_flags, int classzone_idx)
+ unsigned int alloc_flags, int classzone_idx)
 {
  unsigned long ret;
  struct compact_control cc = {
@@ -1505,8 +1507,8 @@ int sysctl_extfrag_threshold = 500;
  * This is the main entry point for direct page compaction.
  */
 unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
- int alloc_flags, const struct alloc_context *ac,
- enum migrate_mode mode, int *contended)
+ unsigned int alloc_flags, const struct alloc_context *ac,
+ enum migrate_mode mode, int *contended)
 {
  int may_enter_fs = gfp_mask & __GFP_FS;
  int may_perform_io = gfp_mask & __GFP_IO;
diff --git a/mm/internal.h b/mm/internal.h
index b79abb6721cf..f6d0a5875ec4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -175,7 +175,7 @@ struct compact_control {
  bool direct_compaction; /* False from kcompactd or /proc/... */
  int order; /* order a direct compactor needs */
  const gfp_t gfp_mask; /* gfp mask of a direct compactor */
- const int alloc_flags; /* alloc flags of a direct compactor */
+ const unsigned int alloc_flags; /* alloc flags of a direct compactor */
  const int classzone_idx; /* zone index of a direct compactor */
  struct zone *zone;
  int contended; /* Signal need_sched() or lock
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d00847bb1612..4bce6298dd07 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1526,7 +1526,7 @@ static inline bool free_pages_prezeroed(bool poisoned)
 }
 
 static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
- int alloc_flags)
+ unsigned int alloc_flags)
 {
  int i;
  bool poisoned = true;
@@ -2388,7 +2388,8 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
 static inline
 struct page *buffered_rmqueue(struct zone *preferred_zone,
  struct zone *zone, unsigned int order,
- gfp_t gfp_flags, int alloc_flags, int migratetype)
+ gfp_t gfp_flags, unsigned int alloc_flags,
+ int migratetype)
 {
  unsigned long flags;
  struct page *page;
@@ -2542,12 +2543,13 @@ static inline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
  * to check in the allocation paths if no pages are free.
  */
 static bool __zone_watermark_ok(struct zone *z, unsigned int order,
- unsigned long mark, int classzone_idx, int alloc_flags,
+ unsigned long mark, int classzone_idx,
+ unsigned int alloc_flags,
  long free_pages)
 {
  long min = mark;
  int o;
- const int alloc_harder = (alloc_flags & ALLOC_HARDER);
+ const bool alloc_harder = (alloc_flags & ALLOC_HARDER);
 
  /* free_pages may go negative - that's OK */
  free_pages -= (1 << order) - 1;
@@ -2610,7 +2612,7 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
 }
 
 bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
-      int classzone_idx, int alloc_flags)
+      int classzone_idx, unsigned int alloc_flags)
 {
  return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
  zone_page_state(z, NR_FREE_PAGES));
@@ -2958,7 +2960,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 /* Try memory compaction for high-order allocations before reclaim */
 static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
- int alloc_flags, const struct alloc_context *ac,
+ unsigned int alloc_flags, const struct alloc_context *ac,
  enum migrate_mode mode, int *contended_compaction,
  bool *deferred_compaction)
 {
@@ -3014,7 +3016,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 #else
 static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
- int alloc_flags, const struct alloc_context *ac,
+ unsigned int alloc_flags, const struct alloc_context *ac,
  enum migrate_mode mode, int *contended_compaction,
  bool *deferred_compaction)
 {
@@ -3054,7 +3056,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 /* The really slow allocator path where we enter direct reclaim */
 static inline struct page *
 __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
- int alloc_flags, const struct alloc_context *ac,
+ unsigned int alloc_flags, const struct alloc_context *ac,
  unsigned long *did_some_progress)
 {
  struct page *page = NULL;
@@ -3093,10 +3095,10 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
  wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone));
 }
 
-static inline int
+static inline unsigned int
 gfp_to_alloc_flags(gfp_t gfp_mask)
 {
- int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
+ unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
  /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */
  BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
@@ -3157,7 +3159,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 {
  bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
  struct page *page = NULL;
- int alloc_flags;
+ unsigned int alloc_flags;
  unsigned long pages_reclaimed = 0;
  unsigned long did_some_progress;
  enum migrate_mode migration_mode = MIGRATE_ASYNC;
@@ -3349,7 +3351,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  struct zoneref *preferred_zoneref;
  struct page *page = NULL;
  unsigned int cpuset_mems_cookie;
- int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
+ unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
  gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
  struct alloc_context ac = {
  .high_zoneidx = gfp_zone(gfp_mask),
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 09/28] mm, page_alloc: Convert nr_fair_skipped to bool

Mel Gorman-4
In reply to this post by Mel Gorman-4
The number of zones skipped to a zone expiring its fair zone allocation quota
is irrelevant. Convert to bool.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4bce6298dd07..e778485a64c1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2677,7 +2677,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  struct zoneref *z;
  struct page *page = NULL;
  struct zone *zone;
- int nr_fair_skipped = 0;
+ bool fair_skipped;
  bool zonelist_rescan;
 
 zonelist_scan:
@@ -2705,7 +2705,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  if (!zone_local(ac->preferred_zone, zone))
  break;
  if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
- nr_fair_skipped++;
+ fair_skipped = true;
  continue;
  }
  }
@@ -2798,7 +2798,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  */
  if (alloc_flags & ALLOC_FAIR) {
  alloc_flags &= ~ALLOC_FAIR;
- if (nr_fair_skipped) {
+ if (fair_skipped) {
  zonelist_rescan = true;
  reset_alloc_batches(ac->preferred_zone);
  }
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 10/28] mm, page_alloc: Remove unnecessary local variable in get_page_from_freelist

Mel Gorman-4
In reply to this post by Mel Gorman-4
zonelist here is a copy of a struct field that is used once. Ditch it.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e778485a64c1..313db1c43839 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2673,7 +2673,6 @@ static struct page *
 get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  const struct alloc_context *ac)
 {
- struct zonelist *zonelist = ac->zonelist;
  struct zoneref *z;
  struct page *page = NULL;
  struct zone *zone;
@@ -2687,7 +2686,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  * Scan zonelist, looking for a zone with enough free.
  * See also __cpuset_node_allowed() comment in kernel/cpuset.c.
  */
- for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
+ for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
  ac->nodemask) {
  unsigned long mark;
 
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 11/28] mm, page_alloc: Remove unnecessary initialisation in get_page_from_freelist

Mel Gorman-4
In reply to this post by Mel Gorman-4
See subject.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 313db1c43839..f5ddb342c967 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2674,7 +2674,6 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  const struct alloc_context *ac)
 {
  struct zoneref *z;
- struct page *page = NULL;
  struct zone *zone;
  bool fair_skipped;
  bool zonelist_rescan;
@@ -2688,6 +2687,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  */
  for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
  ac->nodemask) {
+ struct page *page;
  unsigned long mark;
 
  if (cpusets_enabled() &&
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 13/28] mm, page_alloc: Remove redundant check for empty zonelist

Mel Gorman-4
In reply to this post by Mel Gorman-4
A check is made for an empty zonelist early in the page allocator fast path
but it's unnecessary. When get_page_from_freelist() is called, it'll return
NULL immediately. Removing the first check is slower for machines with
memoryless nodes but that is a corner case that can live with the overhead.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df03ccc7f07c..21aaef6ddd7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3374,14 +3374,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  if (should_fail_alloc_page(gfp_mask, order))
  return NULL;
 
- /*
- * Check the zones suitable for the gfp_mask contain at least one
- * valid zone. It's possible to have an empty zonelist as a result
- * of __GFP_THISNODE and a memoryless node
- */
- if (unlikely(!zonelist->_zonerefs->zone))
- return NULL;
-
  if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
  alloc_flags |= ALLOC_CMA;
 
@@ -3394,8 +3386,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  /* The preferred zone is used for statistics later */
  preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
  ac.nodemask, &ac.preferred_zone);
- if (!ac.preferred_zone)
- goto out;
  ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
 
  /* First allocation attempt */
@@ -3418,7 +3408,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 
  trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
 
-out:
  /*
  * When updating a task's mems_allowed, it is possible to race with
  * parallel threads in such a way that an allocation can fail while
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 14/28] mm, page_alloc: Simplify last cpupid reset

Mel Gorman-4
The current reset unnecessarily clears flags and makes pointless calculations.

Signed-off-by: Mel Gorman <[hidden email]>
---
 include/linux/mm.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffcff53e3b2b..60656db00abd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -837,10 +837,7 @@ extern int page_cpupid_xchg_last(struct page *page, int cpupid);
 
 static inline void page_cpupid_reset_last(struct page *page)
 {
- int cpupid = (1 << LAST_CPUPID_SHIFT) - 1;
-
- page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
- page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
+ page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT;
 }
 #endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */
 #else /* !CONFIG_NUMA_BALANCING */
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 15/28] mm, page_alloc: Move might_sleep_if check to the allocator slowpath

Mel Gorman-4
In reply to this post by Mel Gorman-4
There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
from a context that cannot sleep. Triggering this is almost certainly
a bug but it's also overhead in the fast path. Move the check to the slow
path. It'll be harder to trigger as it'll only be checked when watermarks
are depleted but it'll also only be checked in a path that can sleep.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 21aaef6ddd7a..9ef2f4ab9ca5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3176,6 +3176,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
  return NULL;
  }
 
+ might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
+
  /*
  * We also sanity check to catch abuse of atomic reserves being used by
  * callers that are not in atomic context.
@@ -3369,8 +3371,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 
  lockdep_trace_alloc(gfp_mask);
 
- might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
-
  if (should_fail_alloc_page(gfp_mask, order))
  return NULL;
 
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 16/28] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath

Mel Gorman-4
In reply to this post by Mel Gorman-4
__GFP_HARDWALL only has meaning in the context of cpusets but the fast path
always applies the flag on the first attempt. Move the manipulations into
the cpuset paths where they will be masked by a static branch in the common
case.

With the other micro-optimisations in this series combined, the impact on
a page allocator microbenchmark is

                                           4.6.0-rc2                  4.6.0-rc2
                                       decstat-v1r20                micro-v1r20
Min      alloc-odr0-1               381.00 (  0.00%)           377.00 (  1.05%)
Min      alloc-odr0-2               275.00 (  0.00%)           273.00 (  0.73%)
Min      alloc-odr0-4               229.00 (  0.00%)           226.00 (  1.31%)
Min      alloc-odr0-8               199.00 (  0.00%)           196.00 (  1.51%)
Min      alloc-odr0-16              186.00 (  0.00%)           183.00 (  1.61%)
Min      alloc-odr0-32              179.00 (  0.00%)           175.00 (  2.23%)
Min      alloc-odr0-64              174.00 (  0.00%)           172.00 (  1.15%)
Min      alloc-odr0-128             172.00 (  0.00%)           170.00 (  1.16%)
Min      alloc-odr0-256             181.00 (  0.00%)           183.00 ( -1.10%)
Min      alloc-odr0-512             193.00 (  0.00%)           191.00 (  1.04%)
Min      alloc-odr0-1024            201.00 (  0.00%)           199.00 (  1.00%)
Min      alloc-odr0-2048            206.00 (  0.00%)           204.00 (  0.97%)
Min      alloc-odr0-4096            212.00 (  0.00%)           210.00 (  0.94%)
Min      alloc-odr0-8192            215.00 (  0.00%)           213.00 (  0.93%)
Min      alloc-odr0-16384           216.00 (  0.00%)           214.00 (  0.93%)

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ef2f4ab9ca5..4a364e318873 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3353,7 +3353,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  struct page *page;
  unsigned int cpuset_mems_cookie;
  unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
- gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
+ gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */
  struct alloc_context ac = {
  .high_zoneidx = gfp_zone(gfp_mask),
  .zonelist = zonelist,
@@ -3362,6 +3362,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  };
 
  if (cpusets_enabled()) {
+ alloc_mask |= __GFP_HARDWALL;
  alloc_flags |= ALLOC_CPUSET;
  if (!ac.nodemask)
  ac.nodemask = &cpuset_current_mems_allowed;
@@ -3389,7 +3390,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
 
  /* First allocation attempt */
- alloc_mask = gfp_mask|__GFP_HARDWALL;
  page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
  if (unlikely(!page)) {
  /*
@@ -3414,8 +3414,10 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  * the mask is being updated. If a page allocation is about to fail,
  * check if the cpuset changed during allocation and if so, retry.
  */
- if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie)))
+ if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) {
+ alloc_mask = gfp_mask;
  goto retry_cpuset;
+ }
 
  return page;
 }
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 17/28] mm, page_alloc: Check once if a zone has isolated pageblocks

Mel Gorman-4
In reply to this post by Mel Gorman-4
When bulk freeing pages from the per-cpu lists the zone is checked
for isolated pageblocks on every release. This patch checks it once
per drain. Technically this is race-prone but so is the existing
code.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4a364e318873..835a1c434832 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -831,6 +831,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
  int batch_free = 0;
  int to_free = count;
  unsigned long nr_scanned;
+ bool isolated_pageblocks = has_isolate_pageblock(zone);
 
  spin_lock(&zone->lock);
  nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
@@ -870,7 +871,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
  /* MIGRATE_ISOLATE page should not go to pcplists */
  VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
  /* Pageblock could have been isolated meanwhile */
- if (unlikely(has_isolate_pageblock(zone)))
+ if (unlikely(isolated_pageblocks))
  mt = get_pageblock_migratetype(page);
 
  __free_one_page(page, page_to_pfn(page), zone, 0, mt);
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 18/28] mm, page_alloc: Shorten the page allocator fast path

Mel Gorman-4
In reply to this post by Mel Gorman-4
The page allocator fast path checks page multiple times unnecessarily.
This patch avoids all the slowpath checks if the first allocation attempt
succeeds.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 835a1c434832..7a5f6ff4ea06 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3392,22 +3392,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 
  /* First allocation attempt */
  page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
- if (unlikely(!page)) {
- /*
- * Runtime PM, block IO and its error handling path
- * can deadlock because I/O on the device might not
- * complete.
- */
- alloc_mask = memalloc_noio_flags(gfp_mask);
- ac.spread_dirty_pages = false;
-
- page = __alloc_pages_slowpath(alloc_mask, order, &ac);
- }
+ if (likely(page))
+ goto out;
 
- if (kmemcheck_enabled && page)
- kmemcheck_pagealloc_alloc(page, order, gfp_mask);
+ /*
+ * Runtime PM, block IO and its error handling path can deadlock
+ * because I/O on the device might not complete.
+ */
+ alloc_mask = memalloc_noio_flags(gfp_mask);
+ ac.spread_dirty_pages = false;
 
- trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
+ page = __alloc_pages_slowpath(alloc_mask, order, &ac);
 
  /*
  * When updating a task's mems_allowed, it is possible to race with
@@ -3420,6 +3415,12 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
  goto retry_cpuset;
  }
 
+out:
+ if (kmemcheck_enabled && page)
+ kmemcheck_pagealloc_alloc(page, order, gfp_mask);
+
+ trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype);
+
  return page;
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 19/28] mm, page_alloc: Reduce cost of fair zone allocation policy retry

Mel Gorman-4
In reply to this post by Mel Gorman-4
The fair zone allocation policy is not without cost but it can be reduced
slightly. This patch removes an unnecessary local variable, checks the
likely conditions of the fair zone policy first, uses a bool instead of
a flags check and falls through when a remote node is encountered instead
of doing a full restart. The benefit is marginal but it's there

                                           4.6.0-rc2                  4.6.0-rc2
                                       decstat-v1r20              optfair-v1r20
Min      alloc-odr0-1               377.00 (  0.00%)           380.00 ( -0.80%)
Min      alloc-odr0-2               273.00 (  0.00%)           273.00 (  0.00%)
Min      alloc-odr0-4               226.00 (  0.00%)           227.00 ( -0.44%)
Min      alloc-odr0-8               196.00 (  0.00%)           196.00 (  0.00%)
Min      alloc-odr0-16              183.00 (  0.00%)           183.00 (  0.00%)
Min      alloc-odr0-32              175.00 (  0.00%)           173.00 (  1.14%)
Min      alloc-odr0-64              172.00 (  0.00%)           169.00 (  1.74%)
Min      alloc-odr0-128             170.00 (  0.00%)           169.00 (  0.59%)
Min      alloc-odr0-256             183.00 (  0.00%)           180.00 (  1.64%)
Min      alloc-odr0-512             191.00 (  0.00%)           190.00 (  0.52%)
Min      alloc-odr0-1024            199.00 (  0.00%)           198.00 (  0.50%)
Min      alloc-odr0-2048            204.00 (  0.00%)           204.00 (  0.00%)
Min      alloc-odr0-4096            210.00 (  0.00%)           209.00 (  0.48%)
Min      alloc-odr0-8192            213.00 (  0.00%)           213.00 (  0.00%)
Min      alloc-odr0-16384           214.00 (  0.00%)           214.00 (  0.00%)

The benefit is marginal at best but one of the most important benefits,
avoiding a second search when falling back to another node is not triggered
by this particular test so the benefit for some corner cases is understated.

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a5f6ff4ea06..98b443c97be6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2676,12 +2676,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 {
  struct zoneref *z;
  struct zone *zone;
- bool fair_skipped;
- bool zonelist_rescan;
+ bool fair_skipped = false;
+ bool apply_fair = (alloc_flags & ALLOC_FAIR);
 
 zonelist_scan:
- zonelist_rescan = false;
-
  /*
  * Scan zonelist, looking for a zone with enough free.
  * See also __cpuset_node_allowed() comment in kernel/cpuset.c.
@@ -2701,13 +2699,16 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  * page was allocated in should have no effect on the
  * time the page has in memory before being reclaimed.
  */
- if (alloc_flags & ALLOC_FAIR) {
- if (!zone_local(ac->preferred_zone, zone))
- break;
+ if (apply_fair) {
  if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
  fair_skipped = true;
  continue;
  }
+ if (!zone_local(ac->preferred_zone, zone)) {
+ if (fair_skipped)
+ goto reset_fair;
+ apply_fair = false;
+ }
  }
  /*
  * When allocating a page cache page for writing, we
@@ -2796,18 +2797,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  * include remote zones now, before entering the slowpath and waking
  * kswapd: prefer spilling to a remote zone over swapping locally.
  */
- if (alloc_flags & ALLOC_FAIR) {
- alloc_flags &= ~ALLOC_FAIR;
- if (fair_skipped) {
- zonelist_rescan = true;
- reset_alloc_batches(ac->preferred_zone);
- }
- if (nr_online_nodes > 1)
- zonelist_rescan = true;
- }
-
- if (zonelist_rescan)
+ if (fair_skipped) {
+reset_fair:
+ apply_fair = false;
+ fair_skipped = false;
+ reset_alloc_batches(ac->preferred_zone);
  goto zonelist_scan;
+ }
 
  return NULL;
 }
--
2.6.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 20/28] mm, page_alloc: Shortcut watermark checks for order-0 pages

Mel Gorman-4
In reply to this post by Mel Gorman-4
Watermarks have to be checked on every allocation including the number of
pages being allocated and whether reserves can be accessed. The reserves
only matter if memory is limited and the free_pages adjustment only applies
to high-order pages. This patch adds a shortcut for order-0 pages that avoids
numerous calculations if there is plenty of free memory yielding the following
performance difference in a page allocator microbenchmark;

                                           4.6.0-rc2                  4.6.0-rc2
                                       optfair-v1r20             fastmark-v1r20
Min      alloc-odr0-1               380.00 (  0.00%)           364.00 (  4.21%)
Min      alloc-odr0-2               273.00 (  0.00%)           262.00 (  4.03%)
Min      alloc-odr0-4               227.00 (  0.00%)           214.00 (  5.73%)
Min      alloc-odr0-8               196.00 (  0.00%)           186.00 (  5.10%)
Min      alloc-odr0-16              183.00 (  0.00%)           173.00 (  5.46%)
Min      alloc-odr0-32              173.00 (  0.00%)           165.00 (  4.62%)
Min      alloc-odr0-64              169.00 (  0.00%)           161.00 (  4.73%)
Min      alloc-odr0-128             169.00 (  0.00%)           159.00 (  5.92%)
Min      alloc-odr0-256             180.00 (  0.00%)           168.00 (  6.67%)
Min      alloc-odr0-512             190.00 (  0.00%)           180.00 (  5.26%)
Min      alloc-odr0-1024            198.00 (  0.00%)           190.00 (  4.04%)
Min      alloc-odr0-2048            204.00 (  0.00%)           196.00 (  3.92%)
Min      alloc-odr0-4096            209.00 (  0.00%)           202.00 (  3.35%)
Min      alloc-odr0-8192            213.00 (  0.00%)           206.00 (  3.29%)
Min      alloc-odr0-16384           214.00 (  0.00%)           206.00 (  3.74%)

Signed-off-by: Mel Gorman <[hidden email]>
---
 mm/page_alloc.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98b443c97be6..8923d74b1707 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2619,6 +2619,32 @@ bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
  zone_page_state(z, NR_FREE_PAGES));
 }
 
+static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
+ unsigned long mark, int classzone_idx, unsigned int alloc_flags)
+{
+ long free_pages = zone_page_state(z, NR_FREE_PAGES);
+ long cma_pages = 0;
+
+#ifdef CONFIG_CMA
+ /* If allocation can't use CMA areas don't use free CMA pages */
+ if (!(alloc_flags & ALLOC_CMA))
+ cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES);
+#endif
+
+ /*
+ * Fast check for order-0 only. If this fails then the reserves
+ * need to be calculated. There is a corner case where the check
+ * passes but only the high-order atomic reserve are free. If
+ * the caller is !atomic then it'll uselessly search the free
+ * list. That corner case is then slower but it is harmless.
+ */
+ if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx])
+ return true;
+
+ return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
+ free_pages);
+}
+
 bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
  unsigned long mark, int classzone_idx)
 {
@@ -2740,7 +2766,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
  continue;
 
  mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
- if (!zone_watermark_ok(zone, order, mark,
+ if (!zone_watermark_fast(zone, order, mark,
        ac->classzone_idx, alloc_flags)) {
  int ret;
 
--
2.6.4

1234