The Storage Kit: BQuery

Derived from: BEntryList

Declared in: be/nustorage/Query.h

Library: libbe.so

Overview

A query is a means of asking the file system for a set of entries that satisfy certain criteria. As examples, you can ask for all the entries with names that start with a certain letter, or that have nodes that are bigger than a certain size, or that were modified within the last N days, and so on.

The BQuery class lets you create objects that represent specific queries. To use a BQuery you have to follow these steps:

Initialize. The first thing you have to do is initialize the object; there are two parts to the initialization: You have to set the volume that you want to query over (SetVolume()), and set the query's "criteria formula" (SetPredicate())
Fetch. After the BQuery has been properly initialized, you invoke Fetch(). The function returns immediately while the query executes in the background.
Read. As soon as Fetch() returns, you can start reading the list of winning entries by making iterative calls to the entry-list functions GetNextRef(), GetNextEntry(), and GetNextDirents(). If you ask for entries faster than the query can deliver them, your GetNext...() call will block until the next entry arrives. The function returns an error when there are no more entries to retrieve.

The set of entries that the GetNext...() calls retrieve (for a given fetch) are called the query's "static" entries. This distinction will become useful when we speak of "live" queries, below.

Reusing your BQuery

Want to go around again? You can, but first you have to clear the object:

Between each "fetching session," you have to invoke Clear() on your BQuery object.

Clearing erases the object's predicate, volume, target (which we'll get to later), and list of static entries--in other words, clearing gets you back to a fresh BQuery object.

And speaking of going around again, be aware that the Rewind() function, which BQuery inherits from BEntryList, is implemented to be a no-op:

You can't rewind a BQuery's list of static entries. After you've performed a fetch, you should read the entry list as quickly as possible and get on with things; you can't turn back or start over.

CountEntries() is also a no-op. This function is also defined by BEntryList. It doesn't apply to BQueries.

Live Queries

A live query is the gift that keeps on giving. After you tell a live query to fetch, you walk through the entry list (as described above), and then you wait for "query update" messages to be sent to your "target." A query update message describes a single entry that has changed so that...

it now satisfies the predicate (where it didn't use to), or...
it no longer satisfies the predicate (where it did before).

Not every BQuery is live; you have to tell it you want it to be live. To do this, all you have to do is set the object's target, through the SetTarget() function. The target is a BMessenger that identifies a BHandler/BLooper pair (as described in the SetTarget() function). Also...

Live query notifications stop when you Clear() or destroy the BQuery object. .

Another important point regarding live queries is that you can start receiving updates before you're done looking at all the static entries (in other words, before you've reached the end of the GetNext...() loop). It's possible that your target could receive an "entry dropped out" update before you retrieve the entry through a GetNext...() call. If you're using live queries, you should take care in synchronizing the GetNext...() iteration with the target's message processing.

We'll look at the format of the update message in a moment; first, let's fill in some gaps.

The Predicate, Attributes, and Indices

A BQuery's predicate is a logical expression that evaluates to true or false. The "atoms" of the expression are comparisons in the form...

attribute op value

...where attribute is the name of an existing indexed attribute, op is a constant that represents a comparison operation (==, <, >, etc), and value is the value that you want to compare the attribute to.

Attributes

As mentioned above, the attribute part of a query is a name. When you tell the query to fetch, the file system looks for all nodes that have an attribute with that name and then compare the attribute's value to the appropriate value in the predicate. However...

You can only use attributes that are indexed.

To index an attribute, you call the fs_create_index() function. Furthermore...

The query mechanism only knows about attributes that were written after the index (for that attribute) was created.

Unfortunately, there's currently no way to retroactively include existing attributes in a newly created index. (Such a utility would be simple enough to write, but it would take a long time to execute since it would have to look at every file in the file system.)

Also...

Only string and numeric attributes can be queried. Although an attribute can hold any type of data (it's stored as raw bytes), the query mechanism can only perform string and numeric comparisons.

On the bright side, every file gets three attributes for free:

"name" is the name of the entry.
"size" is the size of the data portion of the entry's node. The size is a 64-bit integer, and doesn't include the node's attributes.
"last_modified" is the time the entry's node was last modified (data and attributes), measured in seconds since January 1, 1970. The modification time is recorded as a 32-bit integer.

Technically, "name", "last_modified", and "size" aren't actually attributes--you can't get them through BNode::ReadAttr(), for example. But they're always eligible as the attribute component in a query.

Values

The value part of the "attribute op value" equation is any expression that can be evaluated at the time the predicate is set. Once evaluated, the value doesn't change. For example, you can't specify another attribute as the value component in hopes of comparing, file by file, the value of one attribute to the value of another.

The value is just data. And data is data.

The type of the value should match the type of the attribute: You compare string attributes to strings; numeric attributes to numbers. You aren't prevented from comparing a string to a number (for example), but it may not give you the result you expect.

Constructing a Predicate

There are two ways to construct a predicate:

You can set the predicate formula as a string through SetPredicate(), or
You can construct the predicate by "pushing" the components in Reverse Polish Notation (or "postfix") order through the PushAttr(), PushValue(), and PushOp() functions. There are seven value-pushing functions that push specific types: string, int32, uint32, int64, uint64, float, and double.

You can't combine the methods:

Pushing the predicate always takes precedence over SetPredicate(), regardless of the order inwhich the methods are deployed.

SetPredicate() features:

Comparison operators: = < > <= >= !=
Logical operators: || &&
Negation operator: !
Grouping: ()
String (value) wildcard: * (prefix and/or postfix only)
String (value) quoting: ' '

The following are all legitimate strings that you can pass to SetPredicate():

size < 500

(name = fido) || (size >= 500)

(! ((name = *id*) || ( 'final utterance' = 'pass the salt'))) && (last_modified > 1024563)

Push features:

The PushOp() function takes operator symbols, such as B_EQ (equals), B_GT (greater than), B_LT (less than), and so on. The complete list is given in the PushOp() function description.
Value strings passed as arguments to PushString() are naturally quoted, so you don't have to single-quote to embed spaces or other odd characters.
The '*' wildcard is allowed, or you can use special "contains", "begins with", and "ends with" operators.

In Reverse Polish Notation, the operator is postfixed. You then push the components from left to right. For example, this...

size < 500

...becomes...

size 500 <

The push sequence is...

   query.PushAttr("size");
   query.PushInt32(500);
   query.PushOp(B_LT);

Another example; this...

(name = fido) || (size >= 500)

...becomes...

(name fido =) (size 500 >=) ||

In code:

   query.PushAttr("name");
   query.PushString("fido");
   query.PushOp(B_EQ);
   query.PushAttr("size");
   query.PushInt32(500);
   query.PushOp(B_GE);
   query.PushOp(B_OR);

There are no grouping operators in this notation; they're not needed--grouping is implied by the order in which the components are pushed.

When you're performing a numeric comparison, the Push...() function that you choose doesn't have to exactly match the natural type of the attribute, but you can't mix integers and floating point. For example, even though "size" is a 64 bit value, you can compare it to an int32...

   query.PushAttr("size");
   query.PushInt32(2000);
   query.PushOp(B_GE);

But you can't (or shouldn't) compare it to a float...

   query.PushAttr("size");
   query.PushInt32(2000);
   query.PushOp(B_GE);

Query Update Messages

The BMessages that are delivered by a live query have a what field of B_QUERY_UPDATE. The rest of the message depends on what happened:

If the update is telling you that an entry has passed the predicate, the message's "opcode" field will be B_ENTRY_CREATED.

If the update is telling you that an entry has been eliminated from the query, the "opcode" field will be B_ENTRY_REMOVED.

Note that the format of the messages that a live query generates are the same as the similarly-opcode'd Node Monitor messages. The only difference is the what field (the what for Node Monitor messages is B_NODE_MONITOR).

Entry Created

The B_ENTRY_CREATED opcode means an entry has changed so that it now passes the query's predicate. The other fields in the message are:

"name" (B_STRING_TYPE). The name of the entry.
"directory" (B_INT64_TYPE). The ino_t (node) number of the entry's directory.
"device" (B_INT32_TYPE). The dev_t number of the device on which the entry resides.
"node" (B_INT64_TYPE). The ino_t number of the entry's node itself.

If you want to cache a reference to the entry, notice that you can create an entry_ref and a node_ref with the data in the message's fields:

   /* Create an entry_ref */
   entry_ref ref;
   const char *name;
   ...
   msg->FindInt32("device", &ref.device);
   msg->FindInt64("directory", &ref.directory);
   msg->FindString("name", &name);
   ref.set_name(name);
   
   /* Create a node_ref */
   node_ref nref;
   status_t err;
   
   ...
   msg->FindInt32("device", &nref.device);
   msg->FindInt64("node", &nref.node);

The node_ref is handy because you may want to start monitoring the node (through a call to the Node Monitor). We'll get back to this point when discussing B_ENTRY_REMOVED messages.

Entry Removed

The B_ENTRY_REMOVED opcode means an entry used to pass the predicate, but something has changed (in the entry or the entry's node) so that now it doesn't.

"directory" (B_INT64_TYPE). The ino_t (node) number for the directory in which the entry lives.
"device" (B_INT32_TYPE). The dev_t number of the device on which the entry lives.
"node" (B_INT64_TYPE). The ino_t number of the entry's node.

Notice that the B_ENTRY_REMOVED message doesn't tell you the name of the entry. This is an unfortunate oversight that will be corrected. In the meantime...

If you need to match the node in this message to an entry from a previous B_ENTRY_CREATED (or that you got from a GetNext...() invocation), you have to keep track of the entry/node yourself. However...

But it's not quite that simple. The location of the entry that "contains" the node may have changed since the time that the entry passed the predicate. Follow this outline:

You set up a live query ask for entries that have nodes larger than 500 bytes.
The query mechanism tells you (either in the static set or through a B_ENTRY_CREATED message) that "/boot/home/fido/data" satisifies the predicate.
You create an entry_ref and a node_ref to the entry, and cache them away somewhere.
The user then renames or moves the entry. The query mechanism doesn't tell you about this change--it only cares about the size of the node, not its name
You get a B_ENTRY_REMOVED message. You create a node_ref from the message and match it to your cache--and get an out-of-date entry_ref.

To get around the lack of a "name" field, you should monitor the nodes that you receive in your initial GetNext...() calls and B_ENTRY_CREATED messages.

Constructor and Destructor

BQuery()


      BQuery(void)

Creates a new BQuery object. To use the object, you have to set its predicate and volume, and then tell it to Fetch(). If you want to fetch again, you have to call Clear() first (and re-set the predicate and volume.)

~BQuery()


      virtual ~BQuery(void)

Destroys the BQuery. If the query is live, the query is shot dead. You stop receiving live query updates when you delete the BQuery object.

Member Functions

Fetch()


      status_t Fetch(void)

Tells the BQuery to go fetch the entries that satisfy the predicate. After you've fetched, you can retrieve the set of "static" entries through calls to GetNextEntry(), GetNextRef(), or GetNextDirents().

If you've set the BQuery's target, then this query is live. The live query update messages start rolling in when you tell the object to Fetch(). They stop when you Clear() or destroy the object.

The fetch fails if the object's predicate or volume isn't set, or if you've already fetched but haven't Clear()'d since then.

RETURN CODES

B_NO_ERROR. The fetch is running.
B_NO_INIT. The volume or predicate isn't set.
B_BAD_VALUE. The predicate is improper.
B_NOT_ALLOWED. You've already fetched; Clear() the object and start again.

SetVolume()


      status_t SetVolume(const BVolume *volume)

A query can only look in one volume at a time. This is where you set the volume that you want to look at.

RETURN CODES

B_NO_ERROR. The volume was set.
B_NOT_ALLOWED. You've already fetched, you need to Clear() before you canreset the volume.

Currently, SetVolume() doesn't complain if volume is invalid. However, the subsequent Fetch() will fail (B_NO_INIT).

SetTarget(), IsLive()


      status_t SetTarget(BMessenger target)

      bool IsLive(void) const

Sets the BQuery's target. The target identifies the BLooper/BHandler pair (a la the BInvoker target protocol) that will receive subsequent live query update messages. Calling this function declares the query to be live.

If target is NULL, the BQuery is told to be "not live". However, you can only turn off liveness (in this way) before you Fetch(). In other words, if you set the target, and then call Fetch() and then call SetTarget(NULL), the BQuery will think that it (itself) is not live, but it really is.

IsLive() tells you if the BQuery is live. The "liveness" needn't be actuated yet--live queries don't start operating until you tell the BQuery to Fetch(). The live query is killed when you delete or Clear() the BQuery object.

RETURN CODES

B_NO_ERROR. The target was set (including set to NULL).
B_BAD_VALUE. target doesn't identify a proper looper/handler pair.
B_NOT_ALLOWED. You've already Fetch()'d; you need to Clear().

Note that B_NOT_ALLOWED doesn't apply to SetTarget(NULL) after a Fetch().

Clear()


      status_t Clear(void)

Erases the BQuery's predicate, sets the volume and target to NULL, and turns off live query updates (if the query is live). You call Clear() if you want to Fetch() more than once: You have to Clear() before each Fetch() (except the first).

RETURN CODES

Clear() always return B_NO_ERROR.

CountEntries(), Rewind()

Don't use these functions. They're no-ops for the BQuery class.

GetNextEntry(), GetNextRef(), GetNextDirents()


      virtual status_t GetNextEntry(BEntry *entry, bool traverse = false)

      virtual status_t GetNextRef(entry_ref *ref)

      virtual int32 GetNextDirents(dirent *buf, 
         size_t bufsize, 
         int32 count = INT_MAX)

These functions return the next entry in the "static" entry list. You can retrieve the entry as a BEntry, entry_ref, or dirent structure. The static entry list is the set of entries that initially satisfy the predicate; entries found by the live query mechanism are not included in this list.

When you reach the end of the entry list, the Get...() function returns an indicative value:

GetNextRef() and GetNextEntry() return B_ENTRY_NOT_FOUND.
GetNextDirents() returns 0.

You can only cycle over the list once; the Rewind() function is not defined for BQuery. See the BEntryList class for more information on these functions.

RETURN CODES

GetNextDirents () returns thenumber of dirents it retrieved (currently, it can only retrieve one at a time. The other two functions return these codes:

B_NO_ERROR. The entry was retrieved.
B_ENTRY_NOT_FOUND. You're at the end of the list.

SetPredicate(), GetPredicate(), PredicateLength()


      status_t SetPredicate(const char *expr)

      status_t GetPredicate(char *buf, size_t length)

      size_t PredicateLength(void)

SetPredicate() sets the BQuery's predicate as a string. Predicate strings can be simple, single comparison expressions:

   "name = fido"

Or they can be more complex:

   "((name = fid*) || (size > 500)) && (last_modified < 243567)"

For the complete rules on setting the predicate as a string, see "Constructing a Predicate."

You can also set the predicate throughthe Push...() functions. You can't combine the methods: Pushing the predicate always takes precedence over SetPredicate(), regardless of the order inwhich the methods are deployed.

GetPredicate() copies the predicate into buf; length gives the length of buf, in bytes. If you want to find out how much storage you need to allocate to accommodate the predicate, call PredicateLength() first.

If you set the predicate through the Push...() functions, GetPredicate() converts the pushed construction into a string, and returns a copy of the string to you.

PredicateLength() returns the length of the predicate string, regardless of how it's created.

RETURN CODES

B_NO_ERROR. The predicate was successfully set or gotten.
B_NO_INIT. (Get) The predicate isn't set.
B_BAD_VALUE. (Get) length is shorter than the predicate's length.
B_NOT_ALLOWED. (Set) You've already Fetch()'d; you have to Clear().
B_NO_MEMORY. (Set) Not enough memory to store the predicate string.

PushAttr(), PushOp(), PushUInt32(), PushInt32(), PushUInt64(), PushInt64(), PushFloat(), PushDouble(), PushString(), query_op


      void PushAttr(const char *attr_name)

      void PushOp(query_op operator)

      void PushUInt32(uint32 value)
      void PushInt32(int32 value)
      void PushUInt64(uint64 value)
      void PushInt64(int64 value)
      void PushFloat(float value)
      void PushDouble(double value)
      void PushString(const char *string, bool case_insensitive = false)

You use these functions to construct the BQuery's predicate. They create a predicate expression by pushing attribute names, operators, and values in Reverse Polish Notation (post-fix) order.

PushAttr() pushes an attribute name.
PushOp() pushes one of the query_op operators listed below.
The rest of the functions push values of the designated types.

For details on how the push method works, see "Constructing a Predicate."

The predicate that you construct through these functions can be returned as a string through the GetPredicate() function.

The query_op constants are:

Constant Operation
B_EQ =
B_NE !=
B_GT >
B_LT <
B_GE >=
B_LE <=
B_CONTAINS string contains value ("*value*")
B_BEGINS_WITH string begins with value ("value*")
B_ENDS_WITH string ends with value ("*value")
B_AND &&
B_OR ||
B_NOT !

Constant	Operation
`B_EQ`	=
`B_NE`	!=
`B_GT`	>
`B_LT`	<
`B_GE`	>=
`B_LE`	<=
`B_CONTAINS`	string contains value ("value")
`B_BEGINS_WITH`	string begins with value ("value*")
`B_ENDS_WITH`	string ends with value ("*value")
`B_AND`	&&
`B_OR`	\|\|
`B_NOT`	!

The Be Book, in lovely HTML, for the BeOS Preview Release.

Be is a registered trademark; BeOS, BeBox, BeWare, GeekPort, the Be logo, and the BeOS logo are trademarks of Be, Inc.

Last modified July 17, 1997.

		*You can't* rewind a BQuery's list of static entries.** After you've performed a fetch, you should read the entry list as quickly as possible and get on with things; you can't turn back or start over.
		`CountEntries() is also a no-op`. This function is also defined by BEntryList. It doesn't apply to BQueries.

		size < 500
		(name = fido) \|\| (size >= 500)
		(! ((name = id) \|\| ( 'final utterance' = 'pass the salt'))) && (last_modified > 1024563)