Direct Bulk FloatBuffer.put is slow

From Android Game Development Wiki

Jump to: navigation, search

Contents

[edit] The Problem

When working with OpenGL ES on Android developers rely heavily on direct ByteBuffers to transfer data to the GPU. The most common use is to upload vertex data which is mostly composed of float values that encode the vertex attributes such as position, color or texture coordinates.

A common scenario is a dynamic Vertex Array or Vertex Buffer Object that gets updated each frame with new vertex data. One would usually assemble the vertex data in a float[] array first then transfer it to a direct ByteBuffer via a call to FloatBuffer.put(array) on a FloatBuffer view to that direct ByteBuffer.

// setup code
float[] vertexData = ...
ByteBuffer byteBuffer = ByteBuffer.allocateDirect( vertexData * Float.SIZE / 8 );
byteBuffer.order( ByteOrder.nativeOrder() );
FloatBuffer floatBuffer = byteBuffer.asFloatBuffer();
 
// executed each frame
... manipulate vertexData array ...
floatBuffer.put( vertexData );
gl.glColorPointer( 4, GL10.GL_FLOAT, 16, byteBuffer );

The call to floatBuffer.put() takes considerable time. It turns out that it is as fast as iterating over all the floats in the array and use the non-bulk FloatBuffer.put() method to insert each float into the buffer individually. It is indeed the case that the current implementation of the bulk FloatBuffer.put() method used on Android taken from the Apache Harmony project itself iterates over the passed array.

The problem was first reported on the Android bug tracker at [1].

The problem does not affect the bulk put() methods of direct versions of ByteBuffer, ShortBuffer or IntBuffer!

[edit] The Solution

The first solution to the problem was created by the initial bug reporter. Noticing the IntBuffer.put( int[] src ) was not affected by the problem he deviced a method based on converting a float array to an int array via Float.floatToIntBits(). He also created a diagram depicting the performance difference between FloatBuffer.put() in bulk and non-bulk mode, IntBuffer.put() and his conversion method.

Bulkput.png

The following code snippet demonstrates the conversion method:

ByteBuffer buffer = ByteBuffer.allocateDirect( 1024*1024 * Float.SIZE / 8 );
buffer.order(ByteOrder.nativeOrder());                  
IntBuffer intBuffer = buffer.asIntBuffer();
 
float[] floatArray = new float[1024*1024];
int[] intArray = new int[1024*1024];
 
for( int i = 0; i < floatArray.length; i++ )
	intArray[i] = Float.floatToIntBits(floatArray[i]);
intBuffer.put(intArray);

[edit] Even faster with JNI

Another method uses native code to get an even greater speedup. The following class and native code should be used with care though as no error checking is performed.

public class BufferUtils
{
   /**
    * Copies the full src array to the given ByteBuffer. The
    * ByteBuffer is assumed to be a direct buffer. The method
    * will crash if that is not the case! The position of the
    * buffer is set to 0 and the limit is set according to numFloats
    * depending on the buffer type, e.g. for FloatBuffer it is numFloats
    * for a ByteBuffer it is numFloats * 4; This will only work with
    * ByteBuffers or FloatBuffers!
    * 
    * @param src the source array
    * @param dst the buffer, must be either a ByteBuffer or a FloatBuffer
    * @param numFloats the number of floats to copy from src
    * @param offset the offset in floats from which to start copying from src
    */
   public static void copy( float[] src, Buffer dst, int numFloats, int offset )
   {		
      copyJni( src, dst, numFloats, offset );
      dst.position(0);
 
      if( dst instanceof ByteBuffer )
         dst.limit(numFloats << 2);
      else
      if( dst instanceof FloatBuffer )
         dst.limit(numFloats);		
   }
}
JNIEXPORT void JNICALL Java_com_badlogic_gdx_utils_BufferUtils_copyJni___3FLjava_nio_Buffer_2II
  (JNIEnv *env, jclass, jfloatArray src, jobject dst, jint numFloats, jint offset )
{
   float* pDst = (float*)env->GetDirectBufferAddress( dst );
   float* pSrc = (float*)env->GetPrimitiveArrayCritical(src, 0);
 
   memcpy( pDst, pSrc + (offset << 2), numFloats << 2 );
 
   env->ReleasePrimitiveArrayCritical(src, pSrc, 0);
}

This method was benchmarked against the four methods shown in the diagram above. The source code can be found at http://code.google.com/p/libgdx/source/browse/trunk/gdx-tests-android/src/com/badlogic/gdx/MicroBenchmarks.java. Here are the timings for the benchmark executed on a HTC Hero with Android 1.5:

  • FloatBuffer.put( float value ): 40.774 secs
  • FloatBuffer.put(int index, float value): 42.710 secs
  • FloatBuffer.put( float[] values ): 41.109 sec
  • IntBuffer.put( int[] values): 12.59 secs
  • BufferUtils.copy(): 0.14 secs

The same benchmark produced the following timings on a Nexus One with Android 2.2:

  • FloatBuffer.put( float value ): 6.876 secs
  • FloatBuffer.put(int index, float value): 7.006 secs
  • FloatBuffer.put( float[] values ): 6.800 sec
  • IntBuffer.put( int[] values): 1.479 secs
  • BufferUtils.copy(): 0.067 secs

As stated on the bug tracker this problem will be solved in Android 3.0 (Gingerbread). On older Android versions one of the above work-arounds have to be used.

[edit] Sample Code

The Android Game Development Wiki samples project contains a very simple benchmark that demonstrates the run-time of the different methods. You can find the source code at http://code.google.com/p/agd-wiki-samples/source/browse/trunk/src/com/badlogic/agdwikisamples/BulkPutBenchmark.java. The code tests how long each method takes to transfer 4mb worth of float values from a float[] array to a direct ByteBuffer. Note that the benchmark uses a JNI function to perform the native memory copy. The JNI code is located at http://code.google.com/p/agd-wiki-samples/source/browse/trunk/jni/BulkPutBenchmark.cpp. Note that the timings might fluctuate slightly over multiple runs, the relative differences stay the same.

[edit] External Links

Bug report on the Android bug tracker

Personal tools