Tuesday, January 21, 2014

Java - speed up high throughput String processing

In one of my projects I had to process huge amounts of textual data. Millions of strings per seconds were read from input files, processed (splitting, comparing, mapping) then concatenated and written to output file.

Obviously I used StringBuilder with appropriate initial size, buffered reading and writing and all other standard Java tools for fast string processing (if you are aware of better way to do this please do let me know). But still I was not satisfied with performance, GC was kicking in too often, even though I minimized unnecessary object creation.

Then I had idea: what if I reuse StringBuilder objects instead of creating them hundreds of thousands times per second just to perform string concatenation before writing output? Searched the Internet first, just to check if someone else did it. Naturally, I found many debates whether it is good programming practice or not, will it confuse the hell out of JIT etc... decided I have to try it myself...

After doing this little change throughput of my application increased by 30%, GC cycles were shorter and CPU usage was lower.

Even though it looks like bad programming practice - it helps.

Saturday, January 11, 2014

Infinispan (6.0.0.Final) and putAll performance

In one of my projects I had to load over a million key-value pairs into Infinispan cache at application startup (local cache, no transactions).

At first I used individual put(K,V) method invocations and this took around 2 minutes to finish Infinispan cache population. I tried to find online whether putAll(Map<K,V)) method should be faster than individual put invocations - but could not find any information in Infinispan documentation or elsewhere.

After switching to putAll() instead of put() I was able to load same pairs in a matter of 20 seconds.

So, in case you are interested, putAll() is faster for inserts than individual put() invocations. I guess it would be great if this was clearly documented.

Monday, January 6, 2014

BoneCP and Oracle JDBC program name (v$session.program)

For some reason Oracle JDBC driver (thin) does not support client info so it is not possible to set v$session.program. I struggled for good half hour how to set v$session.program to Oracle 11g using BoneCP (version 0.8.0.RELEASE) connection pool.

Here is what worked for me:

BoneCPDataSource dataSource = new BoneCPDataSource();
Properties clientInfoProperties = new Properties();
clientInfoProperties.put("v$session.program", "SomeProgramNamePassedToOracle");
dataSource.setDriverProperties(clientInfoProperties);

and now if you do

select program from v$session;

in sqlplus you will be able to identify your session easily.

enjoy