Guava + Hadoop: the Bad Combination

...that is saved with a shade of green

Posted by Tariq Abughofa on October 15, 2019 · 3 mins read

The Gauava library is a famous Java library developed by Google and used in a lot big projects such as Hadoop, Spark, the elasticsearch driver, and many others.

If you used spark or Hadoop there is a good chance that you ran into troubles relating to the Guava library. If any of your dependencies has guava as a dependency, you might run into a run-time error that looks something like this:

java.lang.NoSuchMethodError: com.google.common.util.concurrent.RateLimiter.acquire(I)D

This is a method within the Gauava library that somehow disappeared from that disappeared from the library or as it actually happen a different version of Gauva (coming from spark/hadoop) overwrote the version used within your dependency causing the change in the method signature (in our case acquire).

If you looked at your spark (in our case v2.4.4) dependency tree you see a list that looks like this:

+- org.apache.spark:spark-core_2.11:jar:2.4.4:compile
...
|  +- org.apache.curator:curator-recipes:jar:2.6.0:compile
|  |  +- org.apache.curator:curator-framework:jar:2.6.0:compile
|  |  \- com.google.guava:guava:jar:16.0.1:compile

The dependency tree shows that spark dependes on guava 16.0.1 which is a very old version of the libary (the latest released version at the time of writing this article is 28.1). Any dependency you use is likely to use a newer version and hence produce a conflict error in run-time. You can always go to you maven dependecny list and add something like:

<dependency>
  <groupId>com.google.guava</groupId>
  <artifactId>guava</artifactId>
  <version>28.1-jre</version>
</dependency>

To tell the compiler to use the right version but that won’t work. In run-time, spark will override this version back to the one it uses. How can we fix this dependency conflict then? The shade plugin comes to the rescue here. When you “shade” a library, it changes the package namespace in all the usages of the shaded library with a different pattern, do the same changes in the shaded library, and includes the shaded library within the produced jar. All what you have to do is add these lines to your maven pom.xml file in the plugins section:

<plugin>
   <groupId>org.apache.maven.plugins</groupId>
   <artifactId>maven-shade-plugin</artifactId>
   <version>3.2.1</version>
   <executions>
     <execution>
       <phase>package</phase>
       <goals>
         <goal>shade</goal>
       </goals>
       <configuration>
         <relocations>
           <relocation>
             <pattern>com.google.common</pattern>
             <shadedPattern>shaded.com.google.common</shadedPattern>
           </relocation>
         </relocations>
         <artifactSet>
           <includes>
            <include>com.google.guava:guava</include>
           </includes>
         </artifactSet>
       </configuration>
     </execution>
   </executions>
</plugin>

The addition tells the compiler to shade the package pattern com.google.common within guava with shaded.com.google.common. You can change this pattern with whatever is causing you conflict (in our example above the conflict was in com.google.common.util.concurrent.RateLimiter).

After compiling you will find the jar size increases significantly since the pattern classes are now included in it instead of being provided at run-time like the other dependencies. A little trade-off to getting rid of the conflict issue.


Card image cap
How to Paginate with Elasticsearch

Elasticsearch is a search engine that provides full-text search capabilities. It stores data in collections...

Card image cap
10 Side Income Ideas for Programmers

If you are a solid software engineer already there is a good chance that you...