Sunday, February 16, 2014

Workaround the sun.nio.cs.FastCharsetProvider bottleneck

The CharsetProvider is the class that, given a String representation of a Charset gives you the corresponding Charset object.
This is typically used when you do a new String(bytes[], "UTF-8") or a Charset.forName("UTF-8").
Looking closer at the Charset class tells us that it is actually using two levels of caches :

private static Charset lookup2(String charsetName) {
        Object[] a;
        if ((a = cache2) != null && charsetName.equals(a[0])) {
            cache2 = cache1;
            cache1 = a;
            return (Charset)a[1];

        Charset cs;
        if ((cs = standardProvider.charsetForName(charsetName)) != null ||
            (cs = lookupExtendedCharset(charsetName))           != null ||
            (cs = lookupViaProviders(charsetName))              != null)
            cache(charsetName, cs);
            return cs;

        /* Only need to check the name if we didn't find a charset for it */
        return null;

and if cannot find you charset in the cache, will use the standardProvider which is a sun.nio.cs.StandardCharsets that extends sun.nio.cs.FastCharsetProvider which implementation is synchronized as you can see :

public final Charset charsetForName(String charsetName) {
        synchronized (this) {
            return lookup(canonicalize(charsetName));

So if you are not lucky and uses more than two different encoding, you will go to this synchronized block and create a contention point in your application, as other people talked about herehere and also in this java ticket.

To prevent this issue from happening, we can directly use a Charset object since Java 1.6 in your  code. But regarding all the library that you are using, you will have a hard time patching all of them, as mentioned in this very good post.

Or, we could just patch Java at the source, and then use whatever version of the library and of java that we want, and apply this patch to old systems as well.

package sandbox;

import java.lang.reflect.Field;
import java.nio.charset.Charset;
import java.nio.charset.spi.CharsetProvider;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import org.cliffc.high_scale_lib.NonBlockingHashMap;


 * NonBlockingCharsetProvider to workaround the contention point on
 * {@link CharsetProvider#charsetForName(String)}
 * @author Leo Lewis
 * @see java.nio.charset.spi.CharsetProvider
 * @see java.nio.charset.Charset
public class NonBlockingCharsetProvider extends CharsetProvider {

 private CharsetProvider parent;

 private boolean lazyInit;

 private Map<String, Charset> cache;

  * @param parent
  *            parent charset provider
  * @param lazyInit
  *            if lazy init, init the cache when the application needs the
  *            charset, otherwise populate with the parent in the constructor
  *            if lazy init, will use a ConcurrentMap as it might be changed
  *            and iterated concurrently, otherwise, will use a
  *            guava Immutablehashmap
 public NonBlockingCharsetProvider(final CharsetProvider parent, final boolean lazyInit) {
  this.parent = parent;
  this.lazyInit = lazyInit;
  if (!lazyInit) {
   Map<String, Charset> tmp = new HashMap<>();
   Iterator<Charset> it = parent.charsets();
   while (it.hasNext()) {
    Charset charset =;
    tmp.put(, charset);
   cache = ImmutableMap.copyOf(tmp);
  } else {
   cache = new NonBlockingHashMap<>();

 public Charset charsetForName(final String name) {
  Charset charset = null;
  // if not lazyInit, the value should already be in the cache
  if (lazyInit && !cache.containsKey(name)) {
   // no lock here, so we might call several times the parent and put
   // the entry into the cache, it doesn't matter as the cache will be
   // populated eventually and we won't have to call the parent anymore
   charset = parent.charsetForName(name);
   cache.put(name, charset);
  return cache.get(name);

 public Iterator<Charset> charsets() {
  if (lazyInit) {
   return parent.charsets();
  return cache.values().iterator();

  * Save it if we want to reinstall, set up several times the provider
 private static CharsetProvider standardProvider;

  * Replace the CharsetProvider into the Charset class by an instance of this
  * {@link NonBlockingCharsetProvider}
  * @param lazyInit
  *            see
  *            {@link NonBlockingCharsetProvider#NonBlockingCharsetProvider(CharsetProvider, boolean)}
 public static void setUp(boolean lazyInit) throws Exception {
  Field field = Charset.class.getDeclaredField("standardProvider");
  if (standardProvider == null) {
   standardProvider = (CharsetProvider) field.get(null);
  NonBlockingCharsetProvider nonBlocking = new NonBlockingCharsetProvider(standardProvider,
  field.set(null, nonBlocking);

  * Restore the default java provider
  * @throws Exception
 public static void uninstall() throws Exception {
  if (standardProvider != null) {
   Field field = Charset.class.getDeclaredField("standardProvider");
   field.set(null, standardProvider);

Call the NonBlockingCharsetProvider.setUp(); to replace the java provider using reflection by this non blocking one. 
This provides two modes, lazy that will get the value from the parent when necessary and put it into a concurrent non blocking hashmap (better that the standard ConcurrentHashMap), and a non lazy that get all the parent values at initialization and provides them with a thread safe guava ImmutableHashMap. Performances are pretty close for both mode, the difference is if you want to duplicate the entire Charsets supported by the JRE into the cache, or just the one that your application is using.

Et voila!

Code source is on Github
Benchmark source as well