One of our first assignments at The Guildhall was to fill a large area with randomly generated voxel terrain (similar to the terrain in the game Minecraft). The goal of the exercise was to learn how to render thousands and thousands of triangles efficiently using OpenGL. We also had to come up with a way to create plausible mountains and valleys for our terrain using both dirt and stone blocks. Stone blocks, being heavier, needed to be more common at lower levels of the terrain, with dirt and air becoming more common at the higher levels. Finally, our example also had to use multithreading to optimize terrain generation.
The finished demo allows the user to fly around a world filled with randomly generated terrain using the keyboard and mouse. Left-clicking a block on the terrain will remove the block from the chunk. Right-clicking places a dirt block on the side of the currently highlighted block.
Perhaps the hardest (and most rewarding) part of the problem to solve was the terrain generation itself. In the same way that a 2D image is composed of many individual pixels, a voxel terrain is essentially a 3D image composed of many individual voxels (i.e. three-dimensional pixels). In a game like Minecraft, these voxels make up the entire world, and are generated randomly as the player walks outward from the world origin to give the illusion that the world continues on indefinitely in every direction. Each voxel is made up of some "material" (such as dirt, stone, brick, or air), and can be manipulated by players in the world. Most importantly, players can place blocks of material into empty voxels in the world to create structures. Players can also remove blocks to sculpt the terrain to their liking.
Generating realistic-looking terrain using voxels can be rather tricky. There are several requirements that the randomly generated terrain must fulfill in order to be believable:
The terrain must have one continuous shape. Hills, valleys, and caves are acceptable, but floating pieces of dirt are not.
The terrain should mimic the structure of the Earth's crust. The surface should be composed almost entirely of dirt, with denser materials (such as stone) found underneath the ground.
No two areas of the world should look exactly the same. The hills and valleys generated for one area of the world should be completely unique.
To generate the 3D terrain, I first used Perlin noise to create a three-dimensional field of fuzzy, organic-looking blobs. To visualize the result, I created a small demo application in Processing that outputs 2D noise similar to what the demo produces.
Perlin noise can be thought of as a noise algorithm that generates random grayscale values between 0.0 and 1.0 and lays them out on a grid of some fixed size (e.g. 64 x 64). These values are then interpolated to produce a blurry, somewhat blocky noise pattern. (See the image above.) The process is then repeated with smaller and smaller grid sizes (e.g. 32 x 32, then 16 x 16 and so on), down to a grid size of 1 x 1 pixel. When all of the noise patterns are blended together, they combine to form a cloudy, organic texture (as seen in the second image above). The thing that makes Perlin noise special, however, is that it is continuous. Given a point in space (2D or 3D) and a random seed, the noise value for that point will always be the same, and is guaranteed to blend nicely with the values around it.
To generate the terrain in the demo, I sample the Perlin noise function at the position of each voxel and get back a value between 0.0 and 1.0. This value becomes the "density" value for each voxel. Any voxel with a density value of less than 0.2 is filled with air (i.e. empty). Blocks with a density between 0.2 and 0.3 are filled with dirt, and anything greater than 0.3 is filled with stone. The result is that air blocks always blend smoothly into dirt blocks, which in turn blend into stone blocks as the density increases.
In isolation, this creates a nebulous, jumbled mess of dirt and stone that fills up the entire world. In order to create valleys and hills, the noise value must be weighted based on its height in the world, so that voxels toward the top of the world are more likely to become air blocks and voxels near the bottom are more likely to become stone. The weighting function I chose mixes 75% of the depth value (between 0.0 and 1.0) with 25% of the Perlin noise value to get rolling hills that quickly fade into solid stone toward the middle layer of the world.
Generating a world filled with believable terrain is one thing. Rendering it is another entirely.
Consider that the dimensions of the world in the demo are roughly 256 x 256 x 128 voxels. In total, there are over 8 million voxels in the world. Even though a single block is represented by a mere 12 triangles (two for each face), to draw every block in the world the graphics card would have to render around one-hundred million triangles to the screen each and every frame! Doing this naively would result in poor performance on even the fastest of machines.
To make matters worse, drawing each block individually would result in thousands of draw calls to the graphics card. Each time instructions are sent to the card, the card must finish its current task completely before it is able to continue on to the next. Because of this, drawing a hundred objects with ten vertices apiece is actually far slower than drawing one large object with a thousand vertices. Thus, drawing the terrain block-by-block would be unbearably slow.
Fortunately, when rendering the voxel terrain, only "solid" voxels (i.e. not air blocks) are visible. On top of that, for each solid block none of the faces need to be drawn unless that face borders a voxel filled with air (i.e. the face is "exposed" and therefore visible to the outside world). Since the majority of the blocks in the world are completely hidden, drawing only the visible faces for each block can reduce the amount of geometry drawn each frame by millions of triangles.
Eliminating the overhead of rendering individual faces to the graphics card involves rendering as much geometry as possible with a single draw call. To accomplish this, sections of the terrain are grouped into a single vertex buffer object (VBO) on the graphics card and then rendered together. Theoretically, the entire world could be grouped into one VBO and rendered with a single draw call (although there are downsides to this). This optimization alone greatly increased rendering performance in the demo.
One caveat of drawing large sections of the world all at once is that the OpenGL state cannot change during the course of the draw call. The entire VBO must be rendered with the same blend mode, draw mode, and texture. Therefore, while a single block can have different texture coordinates than another, the texture used to render all blocks in the world must be the same. This makes it more difficult to represent voxels made of different materials.
One way to solve this problem is to combine the textures for each material in the game into one large texture (as shown below). Grouping the individual textures into one large texture atlas allows us to index into the atlas to get a different material simply by changing the texture coordinates of each face. For example, assuming the texture for each face is laid out in a grid and that each row of the grid contains the textures for a single material, each block in the world could be given a texture coordinate offset to map it to the right material on the texture atlas.
While it is theoretically possible to use a single VBO to render out the entire world, doing so means paying a hefty penalty any time something in the world changes. Because voxels are rendered as part of a larger group of geometry, whenever a single voxel changes, the VBO for the entire section surrounding that voxel must be rebuilt. The larger the section, the longer it takes to rebuild the VBO. Storing the geometry in one VBO would mean rebuilding the geometry for the entire world and sending it to the graphics card every time a single block in the world changes. Terrible!
The only operations that can modify the world in the demo application are placing a block, changing a block, and removing a block. Since all of these operations affect a single voxel in the world, it is advantageous to keep the geometry surrounding each block as small as possible to reduce the time it takes to update the VBO. To achieve this, the world is subdivided into "chunks". Each chunk is 16 x 16 x 128 voxels in size and is drawn using a single VBO on the graphics card. The chunks are large enough to reap the rewards of drawing the geometry in one draw call, but small enough that rebuilding the VBO is fast. Hence, the simulation remains responsive when the user changes a voxel in the world.
Another advantage of subdividing the world in this way is that it allows the world to be generated section by section. In Minecraft, each chunk is generated as needed as the player walks through the world. Whenever an area of the world that hasn't been generated yet comes into view, the server fills the area with new chunks that match seamlessly with the world that has been generated thus far (all thanks to the continuity of Perlin noise). Consequently, the player is never constrained by the extents of the world. Instead, the world can continue on indefinitely in any direction, since it is simply created in front of the player, piece by piece, for as far as he can walk.
While this demo only generates a finite world around the player, extending it to generate an infinite world in this way would not be difficult.
Due to the sheer number of blocks in a chunk, generating even a single chunk can take a long time. Doing this synchronously is a bad idea because generating a chunk blocks the update loop, causing the simulation to momentarily freeze. In a commercial game like Minecraft, where chunks are generated on the fly as the player moves around the world, this would be unacceptable and extremely frustrating for the player.
Since each chunk stands on its own, and can be generated without relying on the data in other chunks, the easiest way to improve the user experience while generating the world is to spawn multiple threads to handle the task asynchronously. Each thread is assigned a chunk to load (provided there is one that hasn't been loaded yet), which it generates in isolation using the Perlin noise method described earlier. Once finished, the thread is assigned to another ungenerated chunk. This process is repeated until every chunk in the world has been generated. Because this happens asynchronously, the player is free to move about the world while the terrain is being generated. Moreover, the time it takes to generate the terrain is significantly reduced because the task has been distributed between multiple concurrent threads.
With the geometry alone, the voxel terrain looks very flat. There are no shadows or highlights, and the entire world looks very uniform. When descending into a cave, everything should get darker, to be more realistic. Certainly there is a way to do that, right?
Having voxels in this case turns out to be a big advantage. Normally, lighting and especially shadows are hard to pull off, and involve using shaders with some complex math. Minecraft, however, demonstrated how lighting and shadows can be achieved by propagating light from voxel to voxel.
Each voxel has a light level value, which is an integer from 0 to 16. For solid blocks, the lighting level is usually 0 (i.e. no light propagates from that block. For air blocks, however, the light level is usually non-zero, propagating to or from the surrounding blocks, depending on which block has the higher light value. To give the illusion of light falloff, whenever light propagates the value is decreased by 1. For example, a block with a lighting value of 8 will fill each surrounding block with a light level of 7, if the value was not already greater than 7.
To represent the light level visually, it is important to realize that the amount of light hitting each face of a solid voxel can be different. When drawing the faces of a solid voxel, what matters is not the light level of the solid voxel itself but the surrounding air blocks (if any). When each exposed face of a solid voxel is drawn, it looks at the light value of the air block in front of that face. All that is necessary to create a nice lighting effect is to draw the face with a gray tint that is darker or lighter depending on the light level of the adjacent air block. A light level of 16 results in white (i.e. no tint), whereas a light level of 0 results in dark gray (almost black). The reason that a light level of 0 is not fully black is purely for player experience. Navigating an unlit cave should be difficult, but not impossible.
To light the world up initially, there needs to be some source of illumination. To give the illusion of daylight, when each chunk is updated, the engine traces down from each voxel at the top of the chunk. To simulate direct sunlight, every air block along the way is given a light level of 16 (i.e. fully lit) until the trace hits a solid block (like dirt or stone). This essentially flood-fills all air blocks directly visible from the top of the world with light. After this step, the lighting is propagated to surrounding blocks, filling outward with a reduced light level for as far as possible until the last block with a lighting level greater than 1 has been propagated.
If you have any trouble running the demo, feel free to shoot me an email at email@example.com. Enjoy!