This is a proposal for a VRML design based on the Open Inventor file format. It has been heavily influenced by the discussions of what VRML should be that have taken place on the VRML mailing list; please visit the VRML home page to get up-to-date.
This design is a subset of the Inventor file format, with compatible extensions. Because saying "the Open Inventor ASCII file format" is annoying, I will just use the phrase "Inventor" in this proposal; however, please remember that the Inventor ASCII file format and the Inventor programming interface are separate entities.
Inventor has taken many programmer-years to design and implement, and is a fairly large, very general system. VRML must be much smaller to become a success; otherwise, implementations will be either incompatible or will take too long to produce. Therefore, only the most commonly used subset of Inventor is proposed here as the basis for VRML.
I have tried to make this proposal readable; I started to write a detailed design spec and quickly got bogged down in the details of field syntax. Lets argue about the bigger issues; I will create a document describing precisely Inventor's syntax later, if it is felt necessary.
Issue: I have tried to anticipate criticisms to Inventor's design; you will see paragraphs marked Nit: throughout this document, where I point out parts of Inventor that are easy to nit-pick. I am going to assume that compatibility with Inventor, warts and all, is a desireable goal, and that minor incompatibilities should not be introduced just to make the design a bit more elegant or logical. I hope we can agree to that, and avoid nit-picking these minor details. Paragraphs marked with Issue: are larger issues that I think need to be discussed.
The Inventor group at Silicon Graphics has committed to separating the ASCII file-reading code from the rest of the Inventor library, repackaging it, and putting it in the public domain as the start of a VRML toolkit to make implementing VRML easier. The file reader will be C++ code that produces a hierarchical structure of C++ classes; a VRML implementor would need to define appropriate render() methods for these classes to implement rendering, a pick() method to implement picking, etc. Or, alternatively, traverses these classes and creates a completely different internal representation for the scene.
John Barrus is organizing an effort to summarize "The Inventor Mentor"; if you are unfamiliar with Inventor you should read it to get an idea of what Inventor is.
This notion of order in the scene graph may be the most controversial feature of Inventor. Most other systems attempt to attach properties to objects, with the properties affecting only that one object. In fact, an early prototype of Inventor was written that way. However, treating properties differently from geometry resulted in several problems. First, if a shape has several properties associated with it, you must still define an order in which the properties are applied. Second, there are some objects, such as lights and cameras, that act as both shapes (things that have a position in the world) and properties (things that affect the way other things look). Getting rid of the distinction between shapes and properties simplified both the implementation and the use of the library.
What kind of object it is. A node might be a cube, a sphere, a texture map, a transformation, etc.
The parameters that distinguish this node from other nodes of the same type. For example, each Sphere node might have a different radius, and different texture maps nodes will certainly contain different images to use as the texture maps. These parameters are called 'fields'. A node can have 0 or more fields.
A name to identify this node. Being able to name nodes and refer to them elsewhere is very powerful; it allows a scene's author to give hints to applications using the scene about what is in the scene, and creates possibilities for very powerful scripting extensions. Nodes do not have to be named, but if they are named, they have only one name.
Child nodes. Object hierarchy is implemented by allowing nodes to contain other nodes. Parent nodes traverse their children in order during rendering. Nodes that may have children are referred to as "group nodes". Group nodes can have zero or more children.
The syntax chosen to represent these pieces of information is straightforward:
DEF objectname objecttype { fields children }Only the objecttype and curly braces are required; nodes may or may not have a name, fields, and children.
The following sections describe the types of objects I think should be the basis of VRML, and describe details of this basic syntax.
Issue: Text3 is a much more compact representation for 3D text than IndexedFaceSet. I'd like to hear from implementors on if they would be willing to implement it; there are also cross-platform issues if VRML allows different fonts revolving around what fonts are available on various systems, what they are named, etc.
IndexedFaceSet supports overall, per-face, and per-vertex materials and normals. IndexedFaceSet will automatically generate normals if the user doesn't specify normals. Faces with fewer than 3 vertices will be ignored.
Here is a simple example of two IndexedFaceSets, showing some of its more advanced features (per-vertex coloring, for example):
#Inventor V2.0 ascii # Two IndexedFaceSets each describing a cube. # Normals are per polygon. The first has OVERALL material binding, and # appears all one color. # The second has colors indexed per vertex. This allows the colors # to be defined in any order and then randomly accessed for each vertex. Separator { Coordinate3 { point [ -1 1 1, -1 -1 1, 1 -1 1, 1 1 1, -1 1 -1, -1 -1 -1, 1 -1 -1, 1 1 -1 ] } Material { diffuseColor [ 1 0 0, 0 1 0, 0 0 1, 1 1 0 ] }# indices 0,1,2,3 Normal { vector [ 0.0 0.0 1.0, 1.0 0.0 0.0, # front and right faces 0.0 0.0 -1.0, -1.0 0.0 0.0, # back and left faces 0.0 1.0 0.0, 0.0 -1.0 0.0 ] # top and bottom faces } NormalBinding { value PER_FACE_INDEXED } MaterialBinding { value OVERALL } IndexedFaceSet { coordIndex [ 0, 1, 2, 3, -1, 3, 2, 6, 7, -1, # front and right faces 7, 6, 5, 4, -1, 4, 5, 1, 0, -1, # back and left faces 0, 3, 7, 4, -1, 1, 5, 6, 2, -1 ] # top and bottom faces normalIndex [ 0, 1, 2, 3, 4, 5 ] # Apply normals to faces, in order } Translation { translation 3 0 0 } MaterialBinding { value PER_VERTEX_INDEXED } IndexedFaceSet { coordIndex [ 0, 1, 2, 3, -1, 3, 2, 6, 7, -1, # front and right faces 7, 6, 5, 4, -1, 4, 5, 1, 0, -1, # back and left faces 0, 3, 7, 4, -1, 1, 5, 6, 2, -1 ] # top and bottom faces materialIndex [ 0, 0, 1, 1, -1, # red/green front 2, 2, 3, 3, -1, # blue/yellow right 0, 0, 1, 1, -1, # red/green back 2, 2, 3, 3, -1, # blue/yellow left 0, 0, 0, 0, -1, # red top 2, 2, 2, 2, -1 ] # blue bottom } }Each of the faces of an IndexedFaceSet are assumed to be convex by default. A ShapeHints node (see below) can be used to change this assumption to allow concave faces. However, all faces must be simple (they must not self-intersect).
If not enough normals are specified to satisfy the current normal binding, normals will be automatically be generated based on the IndexedFaceSet's geometry.
If explicit texture coordinates are not specified using a TextureCoordinate2 node, then default texture coordinates will be automatically generated. A simple planar projection along one of the primary axes is used, mapping the width of the texture image onto the longest dimension of the IndexedFaceSet's bounding box, with the height of the texture image going in the direction of the next-longest dimension of its bounding box.
ShapeHints also has a creaseAngle field used during normal generation; it is a hint to the normal generator about where sharp creases between polygons should be created (if two faces sharing an edge have a dihedral angle less than the creaseAngle, the normals will be smoothed across the edge; otherwise, the edge will appear as a sharp crease).
If a binary format for VRML is developed, it will be worthwhile to specify low-bandwidth alternatives to the standard Inventor Coordinate3 and Normal nodes, which store each coordinate or normal as three floating-point numbers. Lighting is usually good enough even with byte-sized normals; a ByteNormal with normal XYZ vectors with components from -127 to 127 would save a significant amount of network bandwidth. Similarly, a ShortCoordinate3 that specified vertices in the range of -32767 to 32767 (the model would need an appropriate Scale to make it reasonably sized, of course) could also save significant network bandwidth. Note that in the ASCII file format, new nodes aren't necessary-- you can just limit the precision of the ASCII numbers in your scene to a few digits of accuracy. For example, instead of specifying a normal as: [.7071067811865 .7071067811865 0], specify it as [.707 .707 0] to save bandwidth.
Nit: Yeah, I think eight is too many bindings, too. However, implementing all of the bindings is easy, since most shapes only really have two binding (OVERALL or PER_PART), and all of the bindings used by IndexedFaceSet are useful.
Nit: PER_FACE or PER_VERTEX bindings can be done using appropriate indices and PER_FACE_INDEXED or PER_VERTEX_INDEXED bindings. I'm hesitant to get rid of them, though, because PER_FACE is more common and requiring all those indices will increase file sizes.
Inventor has a TextureCoordinateBinding node with DEFAULT, PER_VERTEX, and PER_VERTEX_INDEXED values. Because binding texture coordinates PER_VERTEX is very rare (PER_VERTEX_INDEXED is infinitely more common), I don't think this node should be part of VRML.
Separator { Coordinate3 { point [ 0 0 0, 1 0 0, 0 1 0, # Triangle vertices 2 0 0, 3 0 1, 4 0 0, 5 0 1, 6 0 0] # Zig-zag vertices } IndexedLineSet { coordIndex [ 0, 1, 2, 0, -1, 3, 4, 5, 6, 7 ] } }Unlike IndexedFaceSet, an IndexedLineSet will be drawn with lighting turned off if normals are not specified. Lines with fewer than 2 vertices are ignored.
Like IndexedLineSet, if normals are not specified then the points will be drawn unlighted.
Note: An IndexedPointSet primitive isn't terribly useful, because coordinates used for a PointSet aren't typically shared (unlike polygons and lines, where several polygons or line segments may meet at a vertex).
Issue: If scenes will contain large numbers of these primitives, CubeSet/SphereSet/CylinderSet/ConeSet primitives should be defined to reduce the network bandwidth of (for example) sending "Separator { Translation { translation x y z } Cube { } }" over and over.
Nit: yes, Cubes aren't really Cubes if they have different widths, heights, and depths. But a non-uniform scale can also make a sphere not a sphere.
Issue: Inventor has a Complexity node with a 0.0 to 1.0 value that can be used to control the quality of these shapes. I think complexity control should be left to the browser, which could control the complexity to get good interactive performance.
Inventor's Separator has several fields to control its caching (whether or not it should build a display list when rendering) and culling (whether or not it should draw its children, based on whether or not it is in the view volume) behavior. I propose that VRML require only the renderCulling field, since the caching fields is specific to API's like GL that have a notion of display lists (and the default, Inventor's AUTO caching, works very well).
Another group that is very useful is TransformSeparator, which separates the effects of transformations inside it from the rest of the scene, but allows other properties to "leak" out. This node wasn't implemented to improve performance over Separator (on a well-implemented system Separator should do a lazy push/pop of attributes, only saving/restoring attributes that matter), but was done to allow transformations to transform lights and cameras without affecting the objects that the camera is viewing or the lights are illuminating.
The Switch node traverses none, one, or all of its children based on its whichChild field. It is most useful in programs (for example, a scene may contain two representations of a world, with a named Switch used to switch between them), but it can be very useful for "commenting out" part of the scene.
LevelOfDetail { screenArea [ 10000, 100 ] Sphere { } # Highest level of detail Cube { } # Next level of detail Info { } # Lowest level of detail }Issue: Will implementing this be too hard? I wouldn't mind a much simpler node that just chose a child based on how far away it is from the eye (called "DistanceSwitch", perhaps). DistanceSwitch could either switch based on the distance of the center of its children's bounding box from the eye (but then that forces implementors to be able to figure out bounding boxes for objects), or could just switch based on the distance of point (0,0,0) in object space from the eye (this assumes objects are modelled around (0,0,0) and translated into position).
Issue: The Inventor LevelOfDetail node (and Inventor's primitive shapes-- Cube/Sphere/Cone/Cylinder-- pay attention to the current complexity value, stored in the Complexity node (a lower complexity value causes LevelOfDetail to choose simpler levels of detail). I think it is OK for VRML to leave complexity as a global value controlled by the browser.
Issue: Inventor has two other material nodes; BaseColor is equivalent to a Material except that it only sets the diffuseColor for subsequent shapes. PackedColor is a compact form of BaseColor, with diffuse colors and transparency specified as 32-bit unsigned long values; the red, green, blue and alpha components are specified with 8 bits of precision. I don't think BaseColor adds enough functionality to justify its inclusion in VRML. However, PackedColor does use significantly less bandwidth than Material, and I think it should be included.
Inventor's Texture2 node has a 'model' field which controls how the texture image and the object's lighted color are combined. BLEND is used with greyscale and greyscale+alpha images, and uses the intensity of the texture image to control how much of the object's color is used and how much of a constand blending color (specified in the Texture2 Issue: for VRML, the filename field should take a URL. What image formats should be supported? The same ones as HTML (is it just GIF?)? The SFImage field is an uncompressed, 8-bit-per-component format; should it be eliminated from VRML as too much of a bandwidth hog?
The Texture2Transform node can be used to modify a shape's texture coordinates. A Texture2Transform is a 2D version of the Transform node that transforms texture coordinates instead of vertex coordinates. It has fields that specify a 2D translation, 2D rotation, 2D scale, and a 2D center about which the transformations will be applied. Texture coordinates are either specified explicitly in a TextureCoordinate2 node or are implictly generated by shapes. The cumulative texture transformation is applied to the texture coordinates, and the transformed texture coordinates are used to find the appropriate texel in the texture image. Note that, like regular transformations, Texture2Transform nodes have a cumulative effect.
Texture2Transforms allow the default mapping of textures onto primitive shapes to be changed. For example, you might build a house out of Cube primitives (if you didn't really care about performance!) and change the Texture2Transform so that a wallpaper texture was repeated across the walls, instead of the default mapping of the texture being repeated once across the faces of the cube.
Translation has a single field that specifies an XYZ translation for subsequent objects. Note that all transformations are relative; for example:
Translation { translation 1 0 0 } Translation { translation 3.5 2 1 } Cube { }will result in the cube having a total translation of (4.5,2,1).
Scale has a single field which specifies a relative scale. The scale will be non-uniform in the X, Y or Z directions if all of the components of scaleFactor are not the same.
Rotation has a single field that specifies an axis to rotate about and an angle (in radians) specifying how much right-hand rotation about that axis to apply. Nit: yes, it would have been more convenient if the angle was specified in degrees instead of radians.
MatrixTransform has a single field containing an arbitrary 4 by 4 rotation matrix, to be combined with previous transformations and applied to subsequent objects.
The Transform node combines several common transformation tasks into one convenient node. It has fields specifying a translation, rotation and scaleFactor, along with scaleOrientation and center fields for specifying what coordinate axes the scale should be applied along and about which point the scale and rotation should occur.
I don't think that the viewportMapping, nearDistance, farDistance, or aspectRatio fields need to be part of VRML. viewportMapping is almost always left at its default value of ADJUST_CAMERA. The near and far clipping planes distances are best calculated by the VRML browser and adjusted automatically. And we should assume the aspectRatio will match the window; authors that want their scenes to look squished can insert non-uniform scales.
OrthographicCamera is exactly like PerspectiveCamera, only instead of a heightAngle field to control the field-of-view, it has a height field that specifies how tall the viewing volume is, in world-space coordinates.
Issue: This spec doesn't define any way of specifying a recommended viewing paradigm-- walk-through versus fly-through versus looking at a single object. I think the most common paradigms will be a single object (you just want to move around the object and look at it from all sides) and an immersive "room" or environment (you want to walk or fly or crawl or hop around it exploring). Smart browsers should be able to distinguish between these two cases pretty easily (using position of camera versus rest of scene, plus viewer size (focalDistance) versus rest of scene).
Issue: Are SpotLight and PointLight too hard to implement on non-GL platforms?
Info { string "Created by Thad Beier. Slightly ill-behaved model: has some clockwise polygons. Public domain. " }Note that newlines are are allowed in string fields, allowing one Info node to contain several lines of information.
Issue: Should conventions for the information inside Info nodes be established to allow browsers to interpret that information? For example, the convention for author information could be a line of the form "Author: author_name".
WWWInline { name "http://www.sgi.com/FreeStuff/CoolScene.vrml" bboxCenter 0 0 4 bboxSize 10.5 4.5 8 }The name field is an SFString containing the URL for the file. A 'smart' implementation can delay the retrieval of the file until it is actually rendered, instead of reading it right away; combined with LevelOfDetail , this provides an automatic mechanism for delayed loading of complicated scenes.
The bboxCenter and bboxSize fields allow an author to specify the bounding box for this WWWInline. Specifying a bounding box this way allows a browser to decide whether or not the WWWInline can be seen from the current camera location without reading the WWWInline's contents. If a bounding box is not specified, the contents of the WWWInline do have to be read to determine the WWWInline's bounding box.
WWWAnchor looks very much like WWWInline, except that it is a group node and can have children:
WWWAnchor { name "http://www.sgi.com/FreeStuff/CoolScene.vrml" Separator { Material { diffuseColor 0 0 .8 } Cube { } } }WWWAnchor is a strange node; it must somehow communicate with the browser and cause the browser to load the scene specified in its name field when a child of the WWWAnchor is picked, replacing the "current" scene that the WWWAnchor is part of. Specifying how that happens is up to the browser and implementor of WWWAnchor, as is implementing the picking code.
Issue: What happens when you nest WWWAnchors (you have WWWAnchors as children of WWWAnchors)? Suggestion: the "lowest" WWWAnchor wins.
WWWAnchor also has a "map" field that adds the object-space point on the object the user picked to the URL in the name field. This is like the image-map feature of HTML, and allows scripts to do different things based on exactly what part of an object is picked. For example, given this WWWAnchor:
WWWAnchor { name "http://www.foo.com/cgi-bin/pickMapper" map POINT Cube { } }Picking on the top of the Cube might produce the URL "http://www.foo.com/cgi-bin/pickMapper?.211,1.0,-.56".
Issue: Is this the best way of doing this?
Separator { Units { units FEET } DEF FootCube Cube { } Separator { Units { units METERS } DEF MeterCube Cube { } } }Applications that try to be smart about rearranging the object hierarchy will have trouble figuring out exactly what effect the second Units node will have, since its effect will change if it is moved out from under the first Units node. The rules are much simpler if a simple Scale node is used instead.
Issue: RotationXYZ allows rotation about one of the primary axes. I prefer the generality of Rotation, which allows rotation about an arbitrary axis. However, it might make sense to replace Rotation with RotationXYZ, since the general Transform node can also be used to rotate about an arbitrary axis.
For example, a Sphere has a single
"radius" field which contains a single floating point value, and is
written as:
Sphere { radius 1.0 }
Each node defines reasonable default values for its fields, which are used if the field does not appear as part of the node's definition.
Some fields can contain multiple values. The syntax for a multiple-valued field is a superset of the syntax for single-valued fields. The values are all enclosed in square brackets ("[]") and are separated by commas. The final value may optionally be followed by an extra comma. If a multiple-valued field has only one value, the brackets and commas may be omitted, resulting in the same syntax as single-valued fields. A multiple-valued field may also contain zero values, in which case just a set of empty brackets appears.
Nit: Some of the types for fields are tied to the C programming language ("Float"; Inventor also has "Long" and "Short") and will be misleading on some machines (an Inventor SFFloat is a 32-bit floating point number, even though floats are larger or smaller on different machines).
For example, to give the name "SquareHead" to a cube:
DEF SquareHead Cube {}
Names must not start with a digit (0-9), and must not contain ASCII control characters, whitespace, or the following characters: +\'"{}
Note: The "+" character is illegal for compatibility with Inventor programs, where the characters after the "+" are used to disambiguate multiple nodes with the same name. For example, a user of an Inventor program may give two nodes the name "Joe"; when written, these might appear as "Joe+0" and "Joe+1". The other characters are illegal to make parsing easier and to leave room for future format extensions.
The DEF keyword both defines a named node, and creates a single instance of it. The USE keyword indicates that the most recently defined instance should be used again. If several nodes were given the same name, then the last DEF encountered during parsing "wins". DEF/USE is limited to a single file; there is no mechanism for USE'ing nodes that are DEF'ed in other files.
For example, rendering this scene will result in three spheres being drawn. Both of the spheres are named 'Joe'; the second (smaller) sphere is drawn twice:
Separator { DEF Joe Sphere { } Translation { translation 2 0 0 } DEF Joe Sphere { radius .2 } Translation { translation 2 0 0 } USE Joe }
Objects that are not built-in write out a description of themselves first, which allows them to be read in and ignored by applications that don't understand them.
This description is written just after the opening curly-brace for the node, and consists of the keyword 'fields' followed by a list of the types and names of fields used by that node (to save space, fields with default values that won't be written also will not have their descriptions written). For example, if Cube was not built into the core library, it would be written like this:
Cube { fields [ SFFloat width, SFFloat height, SFFloat depth ] width 10 height 4 depth 3 }By describing the types and names of the cube's fields, a parser can correctly read the new node. Field to field connections and engines (which I do not think should be part of VRML 1.0; see the last section of this document on Futures) require that the parser know the names and types of fields in unknown nodes; it isn't good enough to just search for matching curly-braces outside of strings and store unknown node contents as an unparsed string.
The other feature that allows easy extensibility is the ability to supply an alternate representation for objects. This is done by adding a special field named 'alternateRep' of type 'SFNode' to your new nodes. For example, if I wanted to implement a new kind of material that supported indexOfRefraction, I could also supply a regular Material as an alternate representation for applications that do not understand my RefracMaterial. It the file format, it would look like:
RefracMaterial { fields [ SFNode alternateRep, MFFloat indexOfRefraction, MFColor diffuseColor ] indexOfRefraction 0.2 diffuseColor 0.9 0.0 0.2 alternateRep Material { diffuseColor 0.9 0.0 0.2 } }Inventor uses DSO's (dynamic shared objects; DLL's, dynamic link libraries, on the Windows NT port of Inventor) to support run-time loading of the code for a new node; I can give you a RefracMaterial.so with an implementation (written in C++) of the new RefracMaterial, and existing Inventor applications will then work with then recognize the new node, and NOT use the alternateRep. However, I think it is beyond the scope of VRML to try to define a method for the dynamic loading of platform-independent code across the network, and that that issue is completely independent of VRML.
Note: It would be a little more convenient if VRML shared the same identifying header as Inventor ("Inventor V2.0 ascii"). However, in the long run I think there will be many fewer problems if it is easy to distinguish VRML files from Inventor files. It should be trivial to write a VRML to Inventor translator, and only moderately difficult to write an Inventor to VRML translator that tesselated any primitives that were not part of VRML (e.g. NURBS) into IndexedFaceSets.
Note: Comments and whitespace may not be preserved; in particular, a VRML document server may strip comments and extraneous whitespace from a VRML file before transmitting it. Info nodes should be used for persistent information like copyrights or author information.
Note: Inventor allows a series of root nodes to be parsed from a single file. This causes problems for filters that operate on Inventor files (instancing between the nodes in the different roots tend to get broken as each root is worked on independently), and doesn't really add any functionality.
Issue: The consensus on the mailing list was +X right, +Y into the screen, and +Z up. Is it worth having VRML be incompatible with Inventor (it is easy for a translator to add a Rotation node...)?
DEF RecommendedViews Switch { whichChild 0 # Use first camera by default DEF DefaultView PerspectiveCamera { ... } DEF UnderSofaView PerspectiveCamera { ... } DEF TopOfChimneyView PerspectiveCamera { ... } }The browser could then present this list of recommended views to the user, and change the Switch value to change between them.
DEF WackyCube Cube { } MyScriptingNode { fields [ SFString script ] script "if (....) WackyCube.width += 3;" }The ability to put arbitrary nodes with arbitrary fields in the file, plus the ability for them to refer to other nodes, gives the needed flexibility.
Inventor is missing a good set of engines for doing simple keyframed animated behaviors of objects. It is also missing some simple interactive nodes, such as buttons (like the WWWAnchor node, only more general). The Inventor team at Silicon Graphics will be designing and implementing these kinds of nodes and engines in the near future.
As and example of what standard Inventor can do now, see the FunWithDraggers PostScript document, which is an article Paul Isaacs wrote for the Silicon Graphics developer newsletter on how to build an interactive 3D scene using only standard Inventor nodes and the Inventor ASCII file format (no programming necessary). Here is TrackLight.iv, the file containing the interesting stuff (draggers, connections). If you're curious, here are AllRoom.iv (the main file, which references Room.iv and TrackLight.iv), and Room.iv (which is just the walls of the room).