Renderscript

edit2: See more insightful comments by Jason Sams in the comments
edit: See insightful comments by Romain Guy

Today more details about renderscript emerged, along with 3.0 SDK. Naturally i was all giggly inside to get my hands on something official. I have no idea what to make off it. Note that i only read through the (sparse and incomplete) docs so far and looked over the samples provided in the new SDK. This is by no means a well-grounded, scientific analysis of Renderscript. Just some random thoughts, observations and first impressions.

  • Renderscript is a way to write native code that is either run on the CPU or GPU.
  • It is only availble on Android 3.0 and thus far clearly targeted at tablets only.edit: Romain Guy pointed out that this is in fact wrong: it’s been available since Eclair to some extend.
  • It can be used for either compute or graphical stuff. Why it is called Renderscript in that case is a little beyond me, but i guess it has historical reasons.
  • The language used is C99 with a couple of extensions, mainly stuff similar to what you can find in GLSL, matrices, vectors etc.
  • The Renderscripts are compiled to an intermediate format (probably LLVM IM? Couldn’t check yet) which is then JIT compiled on the device the first time the Renderscript is invoked. If it is indeed Dalvik bytecode and the Dalvik JIT is used i wonder what benefit this approach has. If its not Dalvik bytecode i wonder why there are now two JIT compilers. Given that Renderscript should also be able to target GPUs at some point (i doubt that’s the case yet, docs are unspecific about this…) i assume it’s LLVM which has a GLSL backend.. edit: It’s indeed LLVM based. Sweet. In any case it’s pretty neat technically
  • Structs and functions you define in your Renderscript are reflected to Java classes and methods. It’s a little involved for obvious reasons. Reminds me a lot of JNA. The Java bindings allow you to interact with the native Renderscript, passing data to it and issuing function invocations.
  • If the Renderscript is a graphical one you additionally have a special SurfaceView (RSSurfaceView) to which a specific method (root() (?)) of your Renderscript will render to. There’s a native API available to Renderscripts that is used for rendering. Additionally you can set the vertex and fragment shaders along with some states for blending and similar things. It’s utterly confusing as it is a mixture of fixed-function and shader based graphics programming.
  • The SDK 10 page says Renderscript has its own shading language, whereas the docs say GLSL is used in custom vertex/fragment programs. I guess whoever wrote the SDK 10 page considers Renderscript itself a shading language? In the end it just exposes a specialized graphics API on top of GLES 2.0 plus some intrinsic types for linear algebra.
  • There doesn’t seem to be any way to debug Renderscripts (and that includes the CPU backend as well, for the GPU it of course makes sense…).

Seriously, i don’t know what to make of it, and i don’t just mean that i don’t fully crasp it yet. The samples in the SDK are either trivial and actually not really poster childs for when you should use Rendersrcipt, or they are beating you over the head with they heaviest stick they could find. The notion of being able to just drop an .rs/.rsh files into my java package structure to have native methods is great as oposed to writting JNI bridges. How they want to pull off a GPU backend is a little beyond me, but i give them the benefit of the doubt. In its current state the C99 + extensions doesn’t seem to translate well to the GPU (see OpenCL for a standardized solution). I don’t mind though, running computationally heavy code on the CPU would make me a happy coder monkey just as well. So, here’s a list of things that i think will make Renderscript only useful for a select few. Again, take this with a grain of salt, i probably don’t know what i talk about:

  • The documentation is severly lacking at this point. From experience i know that documenting something you’ve worked on for months (years?) is damn hard, especially with an uninitiated target audience in mind. In their current state no normal application developer can hope to understand what’s going on or how to assess whether to use Renderscript or not. Compute Renderscripts are probably a little bit easier to pick up. Graphical Renderscripts on the other hand demand such intimate knowledge of GLES 2.0 (at least that’s my impression seeing how you have to take care of vertex attributes, uniforms, constants, samplers etc) that most app developers will probably never be able to get to use that new shiny feature of Honeycomb
  • The fixed/shader based hybrid is really really really strange to someone who’s used to write straight GLES 1.x or 2.0. It’s hard to connect the dots and relate the Renderscript graphics API with what will happen in the shader. While there are a shitton of helper classes and builders that should probably ease the pain, they make matters even worse. Documentation is pretty spares in this case again. I don’t consider myself to be an expert, but i think i know my way around GLES enough to be able to assess this issue.
  • There doesn’t seem to be a way to debug things, even when run on the CPU. That will make it even harder for newcomers to get into Renderscript. The general trial and error and debug the hell out of shit approach just won’t work. It gets even worse when writting graphical Renderscripts i assume. The classic “wtf, why is my screen black” expression will be found on a lot of faces i guess. Not that this is any better when using straight GLES of course. But the layer of obfuscation will make it hard to look up solutions to common problems.
  • I hope there will be a community around Renderscript. Again, i’m far from being an expert, but to build that community you either have to have a very accesible API/framework/programming model or top notch documentation. At this point i don’t see either of those two things. A community could somewhat lower the impact of the documentation issue. But it’s really kind of a hen and egg problem. Without good docs not a lot of people will become knowledgeable experts, and without such experts there can’t be a community. Again, take this with a grain of salt. Reading through the docs and examples for a few hours certainly don’t allow me to make an ultimate statement about this. It’s just an impression i get from reading all this
  • People will take the easiest path to achieve the result they want. With graphical Renderscripts that’s probably using the fixed-function shaders provided as defaults and only invoking the “simple” drawing functions of the native graphics API. Seeing things like this
    makes me really really nervous. Unless there’s heavy batching performed it will be hard for these methods to let people achieve the performance they expect from a specialized DSL like Renderscript which is advertised to solve performance problems with regards to rendering
  • Hardware fragmentation with regards to GPUs. If the current situation is any indication then people will cry tears of anger after finding out that all the time taken to learn Renderscript did not pay off because some manufacturer couldn’t get its act together and adhere to the GLES 2.0 specs. Since Renderscript is primarily targeted at tablets and most of those seem to sport a Tegra by Nvidia i can imagine that it might be less of a problem. As soon as Renderscripts enters the handset space it will be a different story. Any phone advertised as supporting GLES 2.0 is a target now.
    • So, i wonder what’s the benfit of using Renderscript over custom native code written via the NDK in conjunction with GLES 2.0 (also exposed in the NDK). Ingoring the possibility for Renderscript to run on the GPU, i don’t see any reason why you couldn’t just create a standard Android App with a native layer doing all the heavy lifting. Benefits: debugging support, 3rd party libraries, greater control of what’s going on under the hood, less obfuscation. edit: derp, looks like Renderscript will automatically distribute computations over multiple cores. That’s of course a plus!. Of course, the graphics API and linear algebra intrinsics of Renderscript are very nice and make live easier (to some extend). But why not provide those to the NDK? Also, if you want easier access to native functions, why not make that work with the NDK in some way? Include the NDK toolchain with ADT. Get rid of JNI and brew your own thing. Make it easy for us to write general native code (including debugging).

      In terms of compute operations i have one word: OpenCL. It’s a standard. It’s being adopted by ATI, Nvidia and even Apple (not sure about IOS but i think i read about that somewhere). It has a few limitations of course, but overall it would be a valid alternative to compute Renderscripts.

      All in all i think Renderscript is an impressive technical feat. But while the demos based on it are really nice (Youtube app etc.) i just don’t see regular app developers getting a hold of it. Grain of salt, yadda yadda. Go check out the examples. And hope that your emulator can render them :p

      This makes me wonder: does the emulator now support GLES 2.0?

New Camera Classes in libgdx

I rewrote the classes OrthographicCamera and PerspectiveCamera this week for profit and success. The reason was that the old classes were c&p jobs from old projects off mine and were pretty nasty. I replaced the crap with new, shiny and easy to use classes and fixed a few bugs along the way.

So what is a camera? Here’s how a camera might look like in a scene:

A camera is defined by a position in 3D space, a direction given as a unit length vector and its “up” vector, again given as a unit length vector. Imagine an arrow coming out of the top of your head pointing towards the sky. That’s the “up” vector. Now tilt your head to the left or right. Can you imagine how it changes direction? Together with the direction vector the “up” vector allows us to tell OpenGL how our camera is orientated. The position vector tells it where in our 3D world the camera is located.

Position and orientation are only one part of the puzzle. The second crucial attribute of a camera is its view frustum. In the image above you can see a pyramid with its top cut off (where the eye is). That’s a view frustum. Anything inside this frustum will be visible on screen. The frustum is delimited by 6 so called clipping planes: near, far, left, right, top and bottom. In the image above those are just the sides of the pyramid. The near clipping plane has a special role: You can think of it as the surface were the picture the camera takes is generated. This process, transforming a 3D point to a 2D plane, is called projection. In general you’ll only work with two types of projection: orthographic and perspective projection.

Orthographic projection is mostly used for 2D graphics. It does not matter how far away an object is from the camera, it will always have the same size on the screen. Perspective projection is what we are used to in the real world: the farther an object is away from our eyes the smaller it gets.

The funny thing about those projection types is that in terms of our camera attribute all they do is changing the shape of the view frustum. In case of a perspective projection the view frustum looks like the pyramid above. In case of an orthographic projection the view frustum is a box. The actual projection process is pretty simple: from each point of an object we draw a line to the camera and calculate where it hits the near clipping plane of the frustum (that’s of course not entirely correct and very simplified. for our purposes it will do). Here’s how that looks for both types of projection:

Can you see why objects get smaller after projection in case of a perspective projection? Or why they stay the same size if you use an orthographic projection? The only difference is how the shape of the frustum! In OpenGL you always work in 3D, no matter whether you use SpriteBatch or draw a text. The secret is that you just pretend the z-axis doesn’t exist, while using an orthographic camera. Here’s a 2D sprite in 3D space:

So forget about there being a difference between 2D and 3D. There is no difference. Here’s the view frustum you’d use if you draw via SpriteBatch without setting any matrices:

It’s just a freaking box! All we do is let our sprites move in the x/y plane, ignoring the z-axis, keeping up the illusion that we are actually working in 2D space.

A perspective camera has two attributes that define its projection: the field of view and the aspect ratio. The field of view is an angle that defines how “open” the frustum is:

The aspect ratio is the ratio between the width and the height of the viewport. The viewport is the rectangular area to which the image the camera “takes” will be rendered to. So if you have a screen with 480×320 pixels your aspect ratio is 480 / 320.

An orthographic camera’s projection is merely defined by its viewport dimensions as can be seen in the image above (the ortho view frustum box one :)).

Now that you (somewhat) know how camera’s work let’s check out new camera classes. There’s three of those suckers: the Camera class, which is a base class, the OrthographicCamera class and the PerspectiveCamera class. The last two derrive from the Camera class and thus share the same members and attributes. Let’s look at the Camera class:

The first three public members are the position, direction and up vector of our camera. They default to standard values so that the camera is located at the origin, looking down the negative z-axis. You can access and manipulate those as you please.

Next we have a shitton of matrices. Those will only interest you if you work with OpenGL ES 2.0. The first two hold the projection and (model-)view matrix. The third one holds those two multiplied together and the final matrix is the inverse of the combined matrix. That one is usually use to do things like picking and so on. As i said, you are unlikely to touch those yourself.

Next we have the near and far clipping plane distance to the camera as well as the viewport width and height. The near and far clipping plane must always be 0 <= near < far. A setter would be nice to assert this but i decided i expose those bastards without safeguards. By default the near clipping plane will be 1 unit away from the camera's position. For ortho camera's you'll usually set the near to 0 (the OrthographicCamera does this automatically on construction!). The viewport width and height is used to calculate the aspect ratio for the PerspectiveCamera and to define the box view frustum for an orthographic camera.

The final member is the Frustum. It's composed of the 6 Planes of the view frustum of a camera. This Frustum can be used for so called culling: checking whether an object is within the view frustum or not. In case it isn't you don't have to draw it! The frustum has a couple of methods you can use to determine whether a BoundingBox, sphere or Point are in the frustum and thus visible. See the javadocs.

Next we have a couple of nice methods. The update() method will recalculate the matrices of the camera. Call this after you have changed any of the attributes of the camera, like its position or near/far clipping plane and so on. The apply() method will set the GL_PROJECTION and GL_MODELVIEW matrix according to the camera's matrices. That won't work with OpenGL ES 2.0 of coures, but you shader loving guys probably know what to do instead 🙂

Next we have a couple of methods that allow you to let the camera look at a specific point in space, rotate it around an axis by some angle and move it by some amount. Those are just little helper functions, you could achieve this by directly manipulating the the position/direction/up vectors of the camera.

finally we have some methods that are needed if we want to do more advanced stuff. The unproject method will take a point in window coordinates (or screen coordinates) and generate a 3D point out of it. That's what OrthographicCamera.screenToWorld() did in the old class. It works like gluUnproject. The x and y coordinate could be touch coordinates, the z-coordinate of the parameter you pass is a value between 0 and 1 usually. 0 means that you generate a point on the near clipping plane, 1 generates a point on the far clipping plane. The project() method does the oposite: it takes a point in the 3D world and transforms it to a 2D point on screen. The getPickRay() method will return you a Ray for ray picking. Think of this as a stick in your 3D world, starting at your camera's position. You usually pass in touch coordinates and then use the Ray with the Intersector class to test whether it hit some geometry/object in your world. So that's what both camera classes share. Neat huh? Let's have a look at the OrthographicCamera:

Yeah, that's it. It has one additional member that lets you define a zoom factor. The constructor takes the viewport width and height you want the camera to have. If you want to have pixel perfect rendering you just specify GRaphics.getWidth()/getHeight() here (or whatever you have). If you want to use it in conjunction with say Box2D you'd probably use a different unit scale, say meters (e.g. 42, 32). You can of course also use that camera in 3D (remember: it is actually working in 3D!), just as in a CAD program for example. The PerspectiveCamera is equally simple:

It also only has a single additional member, the field of view given in degrees. The aspect ratio is calculated from the camera's viewport width and height (which you also have to set in the constructor). You can find a couple of tests in SVN that show some of the features, like picking or using the project method:

  • CullTest: shows you how to perform culling
  • PickingTest: shows you how to perform picking (on spheres)
  • ProjectTest: shows you how to transform a 3D point to 2D and use that to render a 2D image with SpriteBatch on top of a 3D object. It’s like anchoring a 2D element to the 3D object

Yay, tl;dr: we have new camera classes…