Context bound vs Implicit evidence: Performance
In a recent pull request review at work I suggested using context bound to declare effect capabilities instead of implicit values as this is what I see the most in OSS projects and it has also been my preference for a while. It makes the code look nicer even though the latter approach is equivalent. Context bound constraints get translated into implicits values at compile time.
Context bound
def p1[F[_]: Applicative: Console]: F[Unit] =
Console[F].putStrLn("a") *>
Console[F].putStrLn("b") *>
Console[F].putStrLn("c")
Implicit values
def p2[F[_]](implicit ev: Applicative[F], c: Console[F]): F[Unit] =
c.putStrLn("a") *>
c.putStrLn("b") *>
c.putStrLn("c")
Every time we call Console[F]
what we are doing is invoking the “summoner” method normally defined as follows:
object Console {
def apply[F[_]](implicit ev: Console[F]): Console[F] = ev
}
So the implicit value gets resolved at compile time only once but I’ve got a very good question:
Is there any performance penalty introduced by the summoner?
And I thought I knew the answer… But I wasn’t very sure so that was a call for examining the JVM bytecode and see the differences!
JVM bytecode
Here’s the bytecode generated for both p1
and p2
. Let’s take a look at the differences.
Context bound program
public <F extends java.lang.Object> F p1(cats.Applicative<F>, com.github.gvolpe.Console<F>);
descriptor: (Lcats/Applicative;Lcom/github/gvolpe/Console;)Ljava/lang/Object;
flags: ACC_PUBLIC
Code:
stack=4, locals=3, args_size=3
0: getstatic #56 // Field cats/implicits$.MODULE$:Lcats/implicits$;
3: getstatic #56 // Field cats/implicits$.MODULE$:Lcats/implicits$;
6: getstatic #61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$;
9: aload_2
10: invokevirtual #65 // Method com/github/gvolpe/Console$.apply:(Lcom/github/gvolpe/Console;)Lcom/github/gvolpe/Console;
13: ldc #67 // String a
15: invokeinterface #73, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
20: aload_1
21: invokevirtual #77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops;
24: getstatic #61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$;
27: aload_2
28: invokevirtual #65 // Method com/github/gvolpe/Console$.apply:(Lcom/github/gvolpe/Console;)Lcom/github/gvolpe/Console;
31: ldc #79 // String b
33: invokeinterface #73, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
38: invokeinterface #82, 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object;
43: aload_1
44: invokevirtual #77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops;
47: getstatic #61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$;
50: aload_2
51: invokevirtual #65 // Method com/github/gvolpe/Console$.apply:(Lcom/github/gvolpe/Console;)Lcom/github/gvolpe/Console;
54: ldc #84 // String c
56: invokeinterface #73, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
61: invokeinterface #82, 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object;
66: areturn
LineNumberTable:
line 10: 0
line 11: 24
line 10: 43
line 12: 47
LocalVariableTable:
Start Length Slot Name Signature
0 67 0 this Lcom/github/gvolpe/summoner$;
0 67 1 evidence$1 Lcats/Applicative;
0 67 2 evidence$2 Lcom/github/gvolpe/Console;
Signature: #49 // <F:Ljava/lang/Object;>(Lcats/Applicative<TF;>;Lcom/github/gvolpe/Console<TF;>;)TF;
MethodParameters:
Name Flags
evidence$1 final
evidence$2 final
Implicit values program
public <F extends java.lang.Object> F p2(cats.Applicative<F>, com.github.gvolpe.Console<F>);
descriptor: (Lcats/Applicative;Lcom/github/gvolpe/Console;)Ljava/lang/Object;
flags: ACC_PUBLIC
Code:
stack=4, locals=3, args_size=3
0: getstatic #56 // Field cats/implicits$.MODULE$:Lcats/implicits$;
3: getstatic #56 // Field cats/implicits$.MODULE$:Lcats/implicits$;
6: aload_2
7: ldc #90 // String 1
9: invokeinterface #73, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
14: aload_1
15: invokevirtual #77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops;
18: aload_2
19: ldc #92 // String 2
21: invokeinterface #73, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
26: invokeinterface #82, 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object;
31: aload_1
32: invokevirtual #77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops;
35: aload_2
36: ldc #94 // String 3
38: invokeinterface #73, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
43: invokeinterface #82, 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object;
48: areturn
LineNumberTable:
line 15: 0
line 16: 18
line 15: 31
line 17: 35
LocalVariableTable:
Start Length Slot Name Signature
0 49 0 this Lcom/github/gvolpe/summoner$;
0 49 1 evidence$3 Lcats/Applicative;
0 49 2 c Lcom/github/gvolpe/Console;
Signature: #49 // <F:Ljava/lang/Object;>(Lcats/Applicative<TF;>;Lcom/github/gvolpe/Console<TF;>;)TF;
MethodParameters:
Name Flags
ev final
c final
So the bytecode generated for the context bound approach has a few extra calls to getstatic
and invokevirtual
but what does this actually mean? Find below the definition given by Wikipedia:
getstatic
: get a static field value of a class, where the field is identified by field reference in the constant pool index (indexbyte1 « 8 + indexbyte2)invokevirtual
: invoke virtual method on object objectref and puts the result on the stack (might be void); the method is identified by method reference index in constant pool (indexbyte1 « 8 + indexbyte2)
So, is it slower? How can we know? There’s only one way…
Benchmark it all!
When not sure about some performance question / issue, benchmark your code. Benchmarking is not easy but fortunately in the JVM we have a fantastic tool: Java Microbenchark Harness or JMH
for short.
Here are the simple benchmarks I wrote, calling each method thousand times via replicateA
and measuring the throughput:
import cats.Id
import cats.implicits._
import org.openjdk.jmh.annotations._
class benchmarks {
@Benchmark
@BenchmarkMode(Array(Mode.Throughput))
def contextBoundSummoner(): Unit = p1[Id].replicateA(1000).void
@Benchmark
@BenchmarkMode(Array(Mode.Throughput))
def evidenceSummoner(): Unit = p2[Id].replicateA(1000).void
}
And these are the results, running with 20 iterations, 5 warm-up iterations, 1 fork and 1 thread:
sbt> jmh:run -i 20 -wi 5 -f1 -t1
[info] Benchmark Mode Cnt Score Error Units
[info] contextBoundSummoner thrpt 20 15777.375 ± 593.111 ops/s
[info] evidenceSummoner thrpt 20 17302.136 ± 442.127 ops/s
Conclusion
The difference is small enough to not be a performance concern so I would still recommend using the context bounds approach but remember to benchmark and deeply analyze your code before jumping to conclusions!
Update
After publishing it on Twitter I’ve got good feedback and some suggestions so here’s the update.
Macro-based summoner: imp
Chris Birchall shared this interesting macro-based project named imp by Erik Osheim and I ran the same analysis with it.
First of all, the summoner was changed accordingly using the summon
macro.
object Console {
import imp.summon
import language.experimental.macros
def apply[F[_]: Console]: Console[F] = macro summon[Console[F]]
}
And here are the results of the benchmarks, running 20 iterations like before:
sbt> jmh:run -i 20 -wi 5 -f1 -t1
[info] Benchmark Mode Cnt Score Error Units
[info] contextBoundSummoner thrpt 20 14881.788 ± 626.458 ops/s
[info] evidenceSummoner thrpt 20 15039.118 ± 411.016 ops/s
The macro-based solution was faster as it claims to be. The scores are almost identical and that’s because the JVM bytecode generated by both methods are exactly the same! And effectively running the benchmarks more times gives similar results and sometimes the winner is the classic evidenceSummoner
. So we can safely claim that both methods p1
and p2
are exactly the same for the JVM.
FWIW someone else have run benchmarks on imp
before. They’re slightly different though.
@inline final
Pavel Khamutou suggested adding the @inline
keyword to the summoner and I have also made it final
so here’s how it looks like:
object Console {
@inline final def apply[F[_]](implicit ev: Console[F]): Console[F] = ev
}
Unfortunately the generated JVM bytecode was the same as without trying to inline it so the benchmark results were very similar to the first results.
-opt:l:inline
& -opt-inline-from:**
compiler flags
Kaidax suggested turning on the inliner compiler flags as described in this Lightbend blog post. At first I didn’t see any results but after being pointed out on Reddit by /u/zzyzzyxx that I was doing it wrong (thanks!), I tried once again and the bytecode was effectively changed.
The calls to invokevirtual
have been removed and a bunch of extra instructions have been added.
public <F extends java.lang.Object> F p1(cats.Applicative<F>, com.github.gvolpe.Console<F>);
descriptor: (Lcats/Applicative;Lcom/github/gvolpe/Console;)Ljava/lang/Object;
flags: ACC_PUBLIC
Code:
stack=4, locals=3, args_size=3
0: getstatic #56 // Field cats/implicits$.MODULE$:Lcats/implicits$;
3: getstatic #56 // Field cats/implicits$.MODULE$:Lcats/implicits$;
6: getstatic #61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$;
9: ifnonnull 14
12: aconst_null
13: athrow
14: aload_2
15: ldc #63 // String a
17: invokeinterface #69, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
22: aload_1
23: invokevirtual #73 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops;
26: getstatic #61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$;
29: ifnonnull 34
32: aconst_null
33: athrow
34: aload_2
35: ldc #75 // String b
37: invokeinterface #69, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
42: invokeinterface #78, 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object;
47: aload_1
48: invokevirtual #73 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops;
51: getstatic #61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$;
54: ifnonnull 59
57: aconst_null
58: athrow
59: aload_2
60: ldc #80 // String c
62: invokeinterface #69, 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object;
67: invokeinterface #78, 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object;
72: areturn
StackMapTable: number_of_entries = 3
frame_type = 255 /* full_frame */
offset_delta = 14
locals = [ class com/github/gvolpe/summoner$, class cats/Applicative, class com/github/gvolpe/Console ]
stack = [ class cats/implicits$, class cats/implicits$ ]
frame_type = 255 /* full_frame */
offset_delta = 19
locals = [ class com/github/gvolpe/summoner$, class cats/Applicative, class com/github/gvolpe/Console ]
stack = [ class cats/implicits$, class cats/Apply$Ops ]
frame_type = 88 /* same_locals_1_stack_item */
stack = [ class cats/Apply$Ops ]
LineNumberTable:
line 10: 0
line 34: 14
line 10: 14
line 11: 26
line 34: 34
line 11: 34
line 10: 47
line 12: 51
line 34: 59
line 12: 59
LocalVariableTable:
Start Length Slot Name Signature
0 73 0 this Lcom/github/gvolpe/summoner$;
0 73 1 evidence$1 Lcats/Applicative;
0 73 2 evidence$2 Lcom/github/gvolpe/Console;
Signature: #49 // <F:Ljava/lang/Object;>(Lcats/Applicative<TF;>;Lcom/github/gvolpe/Console<TF;>;)TF;
MethodParameters:
Name Flags
evidence$1 final
evidence$2 final
The benchmark results show that it has effectively been optimized:
sbt> jmh:run -i 20 -wi 5 -f1 -t1
[info] Benchmark Mode Cnt Score Error Units
[info] contextBoundSummoner thrpt 20 16330.873 ± 462.765 ops/s
[info] evidenceSummoner thrpt 20 15768.175 ± 587.291 ops/s
Benchmarking machine
The benchmarks have run on a Ubuntu 18.04 LTS, 16 GB RAM and Intel® Core™ i7-8550U CPU @ 1.80GHz × 8 machine on Java Oracle™ 8:
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
Source code
Try it out yourself: https://github.com/gvolpe/summoner-benchmarks
Conclusion #2
The conclusion remains the same. Context bound constraints are my favorite and as demonstrated have very little overhead. Using the macro-based solution is interesting but if you really care about that level of performance maybe the JVM isn’t what you’re looking for? :)
Thank you all for your amazing feedback!