Posted Date: July 11,1995
CD article first appeared on: July 1995
Presented by: Michael Mee
Michael Mee is a program manager in the Microsoft Database Group and has designed key parts of Microsoft Access and FoxPro. He currently designs future enhancements to Data Access Objects (DAO).
This document provides a cornucopia of tips and tricks for making your DAO code as fast as possible. While the code samples are written and tested in Microsoft® Access 2.0, almost all of them apply equally to Microsoft Visual Basic®.
In studying the samples, it is important to keep the following in mind:
A common thread emerges from a close study of the examples given. Having a good understanding of the differences between tables, dynasets and snapshots is a great start to correctly predicting the performance characteristics of a given situation. This is particularly critical for working with ODBC data sources.
An equally important lesson to learn from the examples, is that when processing data, the time spent retrieving and updating data almost totally overwhelms other speed concerns. For example, it quickly became clear while preparing this presentation, that issues of, say, using a Variant data type instead of a String data type, become almost irrelevant if a field is being fetched in the same loop. This is not to say that data retrieval is particularly slow, but just that anything involving a disk drive (or even a disk cache) is at least an order of magnitude slower than doing an in-memory bit test to determine the sub-type of a variant. In short, it is far better to worry about tuning an OpenRecordset statement or pre-assigning Field objects before entering a loop, than deciding whether to Dim variables as String or Variant.
Finally, the single most important thing learnt from preparing this paper was that all the predictions and postulations about how fast a particular operation will be don't mean anything until you actual measure the speed! In their initial form, several of the tips below proved to be totally wrong. The lesson to take away is that, until you test your speed improvements in a real environment, you can't be sure of anything!
The two tables below list the hints which follow in a loose order of priority. The ratings given are somewhat arbitrary and are an attempt to answer the question "which tip should I look at applying first." Because of the differences between ODBC and .MDB access, sometimes a rating of "1" for MDB won't be as appropriate for ODBC. Reading the detailed descriptions in the next section should help.
The first table contains tips that apply to Microsoft Access data formats and which generally apply to other ISAM formats such as Paradox and Xbase. If you never use ODBC data formats, then this is for you.
Tips for Microsoft Access data formats
Rating Tip# Title 1 1 Replace Find with Seek 1 2 Use table-type Recordsets instead of dynaset or snapshot-types 1 40 Replace use of .Sort with specifying an index (or a query) 1 26 Replace use of .Filter with Seek (or a query) 1 20 Use the DB_APPENDONLY flag if you're adding records to dynasets, esp ODBC 1 12 Always wrap transactions around your DAO code 2 13 Return to your location using bookmarks instead of using Find 2 27 Re-query recordsets instead of reopening them 2 43 Avoid refreshing collections unless you absolutely have to 2 14 In inner loops, use Field objects instead of myRS!fieldname (Q112724) 2 23 For snapshots, select just the fields you need instead of all fields 2 7 Store queries instead of using dynamic SQL text - esp low memory machines 2 3 Open attached tables directly by opening the database directly 2 42 Add the DB_DENYWRITE flag if no one else will be writing 3 11 Replace DAO code loops with the equivalent SQL statements - but not always 3 17 Cache tabledef and field property collection references if used many times 3 8 Parameterize queries instead of using dynamic SQL text (especially for ODBC) 3 22 Replace short memo fields with long text fields 3 15 Store infrequently updated tables in the local MDB 3 32 'Posting' queries are faster than explicit code 3 25 Try InStr() instead of Like if you're not concerned with foreign accent chars 3 28 Dim objects and Set them rather than using lengthy references 3 41 Replace floating point numbers with integral numbers 3 36 Compile modules and queries before distributing your application 3 35 Clone dynaset recordsets instead of opening new ones 3 31 Replace old ListTables (etc.) code with collection based code 3 24 Split .mdb files that have lots of tables to minimize DAO collection initialization time 4 9 Open databases exclusive and read-only if for fastest single-user network performance 4 33 Refresh current field values with Move 0 instead of MoveNext/Previous 4 18 Speed finds by creating a temporary indexed table and seeking 4 19 Replace repeated execution of dynamic SQL with a temporary query 4 34 Abandon an Edit or AddNew with Move 0 5 21 Marginal: replace variants with specific data types 5 30 Take advantage of default collections and drop the extra name 5 29 Replace text collection lookup with ordinals (but field object is better) 5 39 Replace Variant string functions with '$' functions if using strings
The second table contains tips that apply to ODBC databases. Since many tips apply to both data formats, there is quite a bit of overlap between the tables. If you are trying to speed up access to ODBC data in particular, this table will probably be of more interest.
Tips for ODBC data formats
Ratin Tip# Title g 1 12 Always wrap transactions around your DAO code 1 40 Replace use of .Sort with specifying an index (or a query) 1 6 With ODBC dynasets, use CacheStart, CacheFill and CacheSize functionality 1 38 On ODBC data use find only on indexed fields, otherwise open a new cursor 1 26 Replace use of .Filter with Seek (or a query) 1 10 With ODBC SQL statements, use pass-through where possible 1 20 Use the DB_APPENDONLY flag if you're adding records to dynasets, esp ODBC 2 43 Avoid refreshing collections unless you absolutely have to 2 23 For snapshots, select just the fields you need instead of all fields 2 27 Re-query recordsets instead of reopening them 2 4 Use snapshots instead of dynasets for remote data (but beware of memos!) 2 13 Return to your location using bookmarks instead of using Find 2 7 Store queries instead of using dynamic SQL text - esp low memory machines 2 37 For ODBC data with OLE or memo fields use dynasets instead of snapshots 2 5 Supply the DB_FORWARDONLY option on ODBC snapshots 2 14 In inner loops, use Field objects instead of myRS!fieldname (Q112724) 3 11 Replace DAO code loops with the equivalent SQL statements - but not always 3 15 Store infrequently updated tables in the local MDB 3 17 Cache tabledef and field property collection references if used many times 3 8 Parameterize queries instead of using dynamic SQL text (especially for ODBC) 3 32 'Posting' queries are faster than explicit code 3 41 Replace floating point numbers with integral numbers 3 36 Compile modules and queries before distributing your application 3 28 Dim objects and Set them rather than using lengthy references 3 35 Clone dynaset recordsets instead of opening new ones 3 22 Replace short memo fields with long text fields 3 31 Replace old ListTables (etc.) code with collection based code 4 33 Refresh current field values with Move 0 instead of MoveNext/Previous 4 16 Reduce your ODBC time-outs to get faster return to DAO 4 18 Speed finds by creating a temporary indexed table and seeking 4 19 Replace repeated execution of dynamic SQL with a temporary query 4 34 Abandon an Edit or AddNew with Move 0 5 30 Take advantage of default collections and drop the extra name 5 29 Replace text collection lookup with ordinals (but Field object is better) 5 21 Marginal: replace variants with specific data types 5 39 Replace Variant string functions with '$' functions if using strings
This section discusses the code found in the database FAST42.MDB. Use this paper in conjunction with the form in that database that lets you test each function.
The sample code consists of functions that directly correspond to the tips given below. All the functions are structured similarly - the slow way comes first, followed by the faster method. The functions are designed to be run with the tables and queries in the "fast42.mdb" database. This database in turn, expects to link to nwind.mdb and a Microsoft SQL Server installation with the Pubs database installed to be able to run some of the functions.
Note that benchmarking is a fine art (or perhaps an evil science?), and the code samples do not claim to be exhaustive. They should be taken as indicative only. In particular, no accounting has been made for:
As always, test any improvements you make on the exact machine and software configuration that they will ultimately be used in (including having other programs loaded and minimized if need be).
If you select a table-type recordset with an index, you can use the Seek method to position the cursor against any criteria that are stored in that index. This will always be the fastest possible way to find a particular piece of data. Even though Find will use indexes where possible, because it is based on a dynaset or snapshot, the overhead of that mechanism will always be somewhat greater than the equivalent Seek.
Be careful about giving in to the temptation of indexing all the fields so that you can seek on any criteria. Unless the table data is purely read-only, the overhead of maintaining indexes will erase the gains of using seek on an obscure column once in a blue moon.
Multi-field indexes can be used with Seek even if you don't want to match all the fields. The one restriction is that you have to provide any field preceding the other fields in the index. For example, if you have an index on Part#, Cust# and Emp#, you can search for a Cust# as long as you have the Part#. Similarly to find an Emp# you must have a Part# and Cust#. Of course you can also use this index to search for a Part# alone.
One of the obstacles to using Seek is that you cannot open attached tables as table-type recordsets. Tip #3 shows how you can look inside the connection information for an attached table and use that information to open the database and hence the table directly.
This tip cannot be used against ODBC data sources because they cannot be opened as tables directly. This is an inherent architectural limitation of the way client-servers work.
Using a table-type recordset will be faster than a dynaset or snapshot for all operations that they support. The two exceptions to this are the .Filter and .Sort properties (see #26 and #40). If you don't need to join data or do complex .Find requests, then this is a great way to speed up your data access. As a general "counter rule", however, as soon as you do need to do any joins, sorting, etc., it will be quicker to use the appropriate dynaset or snapshot (see also #4).
One of the obstacles to using table-type recordsets is that you cannot open attached tables as table-type recordsets. Tip #3 shows how you can look inside the connection information for an attached table and use that information to open the database and hence the table directly.
If the tables involved are small, then the benefits of using this technique will be small and you might choose to use dynasets or snapshots for convenience.
This tip cannot be used against ODBC data sources because they cannot be opened as tables directly. This is an inherent architectural limitation of the way client-servers work.
Note Do not open tables in ODBC directly. This will always be slower than using an attached table. This hint is only for ISAM type databases such as .MDB files, Paradox, or Xbase files.
A common model for distributing applications is to put local data, forms, reports, etc. in an .mdb file that resides on the user's machine and to attach those tables to an .mdb on a common network location (a setup sometimes erroneously referred to as "client-server"). One limitation of this approach is that your code can no longer open the tables as table-type recordsets because attached tables can only be opened as dynasets or snapshots.
The code sample shows how you can work around this restriction in a generic fashion and get the best of both worlds by reading the Connect property to find the name of the database file and then opening the database directly. Doing this allows you to take advantage of tips #1 and #2.
Of course if the tables involved are small, then the benefits of this will be rather small and may actually be counter productive due to the extra overhead of having another database open, etc.
Microsoft Jet provides two kinds of recordsets: dynasets, which can be updated, and snapshots, which cannot. If you don't need to update data and the recordset contains fewer than 500 records, you can reduce the time it takes the recordset to open by requesting a snapshot using DB_OPEN_SNAPSHOT. This causes Jet to use a snapshot, which is generally faster than using a dynaset.
However, if the recordset you're using is large or contains Memo or OLE Object fields, using a dynaset is more efficient. If you use a dynaset, Jet retrieves only the primary key of each record; if you use a snapshot, it retrieves the entire record. A dynaset is also more efficient because Memo and OLE Object fields are retrieved only if they are referenced.
Rather than transferring the data in each column of the row during the fetch, Jet builds a dynaset by fetching the key values that uniquely identify each of the rows. The data from the bookmarked rows is fetched only when needed to populate the table or query datasheet view. Using key values speeds the chunking operation and minimizes network traffic, especially when browsing large tables. Data retrieval is also optimized in that Memo and OLE objects are not retrieved unless they need to be displayed.
A snapshot Recordset does not use bookmarks; instead, the data contained in each of the rows is fetched. Creating snapshot Recordset objects can be faster with small tables, but can be quite time-consuming when large tables with many columns are involved, and network traffic increases drastically. This is especially true if the table or query includes Memo fields and OLE objects. (Note, however, that .MDB snapshots containing Memo or OLE fields will not retrieve the Memo or OLE data into the snapshot until it is actually referenced.) You can further minimize the performance hit with snapshot Recordsets by ensuring you are returning only the columns you need.
Recordset objects of the Snapshot type provide bi-directional scrolling capability by default. If you only need to make a single pass through the records of a Snapshot, you can Or the DB_FORWARDONLY flag with any other existing flag in OpenRecordset()'s intOptions argument. This makes the Snapshot more efficient, because the result set is not copied to a scrollable buffer area.
Microsoft Access 2.0's CacheSize and CacheStart properties let you store all or a part of the data contained in a dynaset-type Recordset in local memory. Local caching of rows dramatically speeds up operations when traversing dynaset rows bi-directionally and shows significant improvements even when moving in a forward only direction.
To use caching, you specify the number of rows to be stored by the value of CacheSize (Long) and the beginning row by the bookmark stored as the value of the CacheStart (String) property. Applying the FillCache method fills the cache with server data. Fetches within the cache boundary occur locally, speeding display of the cached records in a datasheet or in a continuous form. The allowable range of the value of CacheSize is between five and 1,200 records; if the size of the cache exceeds available memory, the excess records spill into a temporary disk file. Typically, you set the value of CacheSize to 100. To recover the cache memory, set CacheSize = 0.
Once you establish the cache, you need to keep moving its location to stay synchronized with the set of records you're working with. Using a cache can provide twice the performance of not specifying cache. If the application requires backward scrolling within the cached region the performance improvements will be even larger. Depending on your scenario, using a cache may be faster than using a read-only forward-only snapshot (especially if the snapshot contains memo or long binary fields which may only be referenced occasionally).
The size of the cache you use will be determined by the application needs. For example, if you are displaying these records to the user, then you might use a cache size determined by the number of rows they can have on the screen. If the code has no user interaction, then a tradeoff between local memory availability, network traffic, row size, and fetch delay can be made, usually by experimentation.
DAO makes it easy to create SQL statements on the fly. As a general guideline, it is better to create stored queries and to open those directly rather than use SQL text in your code. The advantages include:
The example code allows you to time a stored versus a dynamic query. However it is not really designed to help you evaluate the hit on the first query compilation, for example. However, running it on a low memory machine (e.g. 6-8Mb) should give you good feel for the trade-offs involved.
See also tip #8 and tip #19.
Note This tip is especially important for users with limited memory. Avoiding query compilation means that quite a large part of the Jet code will not be used, leaving the memory available for your application to use. See also #7.
A common requirement applications have is to build a SQL string based on values that the user provides. This often leads to code that looks like the following, which finds all authors whose name start with a given letter firstletter:
Dim d As Database, rs As recordset
Set d = OpenDatabase("biblio.mdb")
firstletter = "G"
sqlstr = "Select * from Authors where Author like """ & firstletter & "*"""
Set rs = d.OpenRecordset(sqlstr)
Debug.Print rs!Author
This code uses Basic to build a SQL string and then asks Jet to execute that SQL statement. This technique is often referred to as using dynamic SQL. Some of the reasons given for using this are:
Unfortunately the penalties with doing this can be quite high - especially when using ODBC data. The main steps that happen are:
Another alternative to creating SQL strings dynamically is to use parameterized queries that you create ahead of time. These are queries that have variables in place of actual values in the SQL statement. Your code gathers the values for these variables from the user and then tells the engine what they are. Once the parameterized query has been saved and compiled, Jet can do the following, omitting several steps above:
Already this is less steps than above. However, if you wish to get more values from the user and run the same query with different values, you only need to repeat steps 3 and 4. This is a substantial saving, especially for client-server applications. (Note: if you're not interested in updating the results and the query uses only server data, pass-through queries might be even better.)
To do this in DAO, you use the Parameters collection on a Querydef. For example, the following code creates the query (which you would normally do once through the Microsoft Access query designer):
' This code only executed once - or done in query designer
Dim qd As querydef
Set qd = d.CreateQueryDef("AuthorLike", "PARAMETERS FirstLetter Text; SELECT DISTINCTROW Authors.Au_ID, Authors.Author FROM Authors WHERE ((Authors.Author Like [FirstLetter]))")
qd.Close
Once the query is created, you can set the parameters from code as follows:
firstletter = InputBox("Enter first letter of author's name")
Set qd = d.OpenQueryDef("AuthorLike")
qd!firstletter = "A*"
Set ds = qd.CreateDynaset()
...
qd!firstletter = "S*"
Set ds = qd.CreateDynaset()
As you can see, the code is shorter, easier to understand, easier to maintain - and actually runs faster as well.
If you open an .MDB file both exclusive and read-only, this will avoid the creation of the .LDB file and will also stop any locks from being placed on the database. Over a network, this will result in less network traffic and speed up access because no file locks are being placed. On a local hard disk, the differences are likely to be negligible. Similarly it may be hard to measure the differences on a lightly loaded network.
One of the major enhancements to Microsoft Jet 2.0 was the ability to create pass-through queries. With an ordinary query, Microsoft Jet compiles and runs the query, combining the power of the server with the capabilities of the Jet database engine. With a pass-through query, you enter an SQL statement that Microsoft Jet sends directly to the server without stopping to compile the query.
Pass-through queries offer the following advantages over other queries. Some of the more important ones are:
Full details on pass-through queries can be found by searching Microsoft Access Help for SQL: pass-through queries. See also the Developing Client/Server Applications chapter of Building Application in the Access Developer's Toolkit.
As a general rule, it is better to use a single stored query than to write the equivalent code in DAO. All the reasons given in tip #7 apply, except even more so when it comes to maintainability. However, as the sample code demonstrates, there are scenarios where the DAO code will be faster than the equivalent Jet 2.0 query. In particular, if you're doing a simple single table update it may be quicker to use DAO code rather than using a SQL query. This will only be true if:
In the above case, you may wish to experiment with using code instead of a SQL statement - knowing that you give up the maintainability of a stored query in doing so.
Always use transactions when you add, change, or delete a set of records using code. Because transactions buffer disk writes until they are committed, using a transaction reduces disk access to roughly once per transaction instead of once per record. This can result in dramatic improvements in performance.
It's a good idea to group data updates into logical sets of operations and then perform each set in a transaction; don't try to include all updates in one large transaction. Most operations inside a transaction are buffered in memory and then written to disk when the transaction is committed. If the transaction is too large, it uses memory that other operations may need; if it's too small, it accesses the disk more often than necessary.
Bookmarks are the fastest way to return to a specific record. If you anticipate returning to a record again, the fastest way to do that is to store a bookmark and use it to return. The memory overhead in storing the string for the bookmark will be far less than the execution time involved in using Find to return to the record. You may even be able to anticipate user requests by caching bookmarks of some of the records they've used recently so that you can quickly return to them.
If you are using a table-type recordset then using Seek will be as fast as using a bookmark.
To speed up iterative (looping) processes through large numbers of rows using Access Basic, declare all field references explicitly.
The following is an example of Access Basic code that can be improved by using direct Field references:
While Not rstOrd.EOF
rstOrd.Edit
rstOrd.Fields!Price = rstOrd!Qty * rstOrd!UnitCost
rstOrd.Update
rstOrd.MoveNext
Wend
In the example above, the field variable "lookup" (that is, where Access Basic equates variable names with database fields) for the three field variables Price, Qty, and UnitCost is performed in the same While loop where the calculations are performed. In this configuration, both calculations and lookups must be performed inside the While loop, which is not an efficient design. This is how the changes might look:
The following is an example of Access Basic code that is more efficient:
Dim Price As Field, Qty As Field, UnitCost As Field
Set Price = rstOrd!Price
Set Qty = rstOrd!Qty
Set UnitCost = rstOrd!UnitCost
rst.BeginTrans
While Not rstOrd.EOF
rstOrd.Edit
Price = Qty * UnitCost
rstOrd.Update
rstOrd.MoveNext
Wend
rst.CommitTrans
This example runs faster because Access Basic performs the field lookup only once for each field and completes it before executing the loop. A direct reference to each of the three fields is then stored in the three field variables Price, Qty, and UnitCost. As a result, no lookup is required in the While loop, and the field values are accessed and manipulated directly.
The code examples above are illustrative. In some cases, an Update query can be a faster way to accomplish the task (see tip #11). Also, speed differences will be slight for small numbers of records. Note the use of tip #12 included in the revised example.
Often, an application contains several forms that use the same remote table-for example, as the source for a list box or combo box. If the data in the table doesn't change frequently, you can speed up form loading and reduce server load by using one of the following techniques:
There is no code example for this fragment as it depends so much on the scenario involved.
When you're using an ODBC database, such as SQL Server or ORACLE Server, there may be delays because of network traffic or heavy use of the ODBC server. Rather than waiting indefinitely, you can specify how long Microsoft Access waits before it produces an error. To make your program run faster, you can decrease this time-out so that your application regains control sooner. You might do this when you know that a server is always on a local net and should always respond quickly. Regaining control quickly can help avoid having your user reboot because the delay has become so long.
See the Access 2.0 Help on ODBCTimeout for more details on how to use this.
Some properties of Tabledef and Field objects can be relatively expensive to retrieve. User defined properties that can only be accessed via the Properties collection fall into this category. Rather than referencing such properties directly in an inner loop, it is more efficient to assign them to a local variable before entering the loop and then using the variable instead.
When you know that you are going to do queries that only involve a certain subset of data, you can create a temporary table containing just the rows and columns that you expect to search later. This is created by executing a three or four table join and returning only the needed columns and rows. If you add an index to the temporary table, access to this table is even faster.
Snapshots are a good way to get to this data for ODBC servers, but they cannot be searched with the Seek method because the temporary result that is returned is not indexed. The find methods are unable to use an index for the same reason, and they can be slow if this is a large dynaset. Further WHERE clause refinements to the query still take server time, and if there are a large number of such queries this can be too slow.
One possible solution in this scenario is to do a make-table query to create a "permanent" table in the local database that has its own index, that the time to seek is well worth the time it takes to build. The numbers below show an example of these different approaches with a small table:
The times to find data in each of these result sets is:
This does not scale very well with larger tables that are drawn from SQL Server, but you can always use passthrough to create real SQL Server temporary tables and query against those.
When tried against larger .MDB tables the results are as follows:
For a query that returns 564 records,
Searching for 342 records:
There is a 10% price to pay to get a permanent table built vs. a snapshot. However, on the lookup end, the improved performance is dramatic - almost six times as fast. Obviously the number of subsequent searches you expect to be doing is a big factor here.
The downside to using permanent tables like this are:
Of course like all the tips given in this paper, you should carefully benchmark any solution to ensure that all the tradeoffs are taken into account.
This tip is closely related to tip #7. If you don't wish to store lots of queries, or you can't build them until your application is running, you can still benefit from many of the advantages of stored queries by using temporary queries. Temporary queries are exactly like regular querydef objects, except that they are unnamed (i.e., use "" [zero length string]) to create them. They have the added benefit of requiring no cleanup when you've finished using them (apart from closing them if you wish to regain their resources before your application ends), and they don't appear in the queries section of the database container for any other user.
If you need to append small amounts of data to an ODBC table, open the recordset by ORing in the DB_APPENDONLY flag. This will avoid the population of the dynaset with the bookmarks of all the records in the table.
If you're trying to append a lot of data to an ODBC table that is already stored in an .MDB table, then use the technique suggested under Batching Insertions elsewhere in this document.
A common piece of wisdom is that it is quicker to use native data types than the Variant data type that was introduced in Microsoft Access 1.0 and Visual Basic 3. As this code example shows, when it comes to making DAO code faster this is a classic example of optimizing the wrong thing. Internally, all DAO code uses Variants because it is the only data type that can also contain Null values. Converting from Variants back to native data types takes time, so you're unlikely to see much benefit in most real-world applications.
Spend more time on tips such as #12 and #14 before worrying about matching data types too closely. As stated in the introduction to this paper, database operations tend to dwarf language issues by an order of magnitude, so concentrate on them first.
When designing a database it is often hard to decide just how much room a text field is going to take up. It is tempting in these scenarios to use a memo field and consider the problem solved. This is not always the correct decision. As running the code fragment shows, retrieving a memo field can take more time than retrieving the equivalent.
Snapshot-type recordsets are a complete copy of all the data that you've requested. As such, it makes sense to minimize copying data that you don't need, especially Memo and OLE fields from ODBC data sources. If you are using a lot of fields, it probably makes sense to use a dynaset instead.
Tip #4 explains some key differences between dynasets and snapshots, especially with respect to ODBC data.
When you first reference DAO collections such as Tabledefs, DAO has to retrieve certain basic information about each Tabledef and create a corresponding object. If you have an application with over a hundred tables, say, the time to populate this collection will become significant. A way to avoid this is to split your MDB into several MDB files that contain logically grouped sets of tables and open those databases directly from DAO if you need to access them. This will create separate Tabledef collections in each Database object. Although you will still pay the collection creation hit for each .MDB, you will now have finer granularity over when or if that hit takes place.
Of course you can avoid this hit by not referencing the tabledefs (or querydefs etc.) collections at all, but that generally isn't possible in most applications.
The Like operator is a very powerful pattern matching tool that has been heavily optimized for the types of comparisons it does. However, some of those comparisons may not be relevant to the data that you have stored. In particular, Like knows how to correctly match interesting single and double byte matches (such as ß with ss) as well as simpler 'e' with 'é'. If you do not store any such characters in your data, then you might consider using other string operators such as InStr().
Although this won't work for all pattern matching requirements, those where it can be used are worth benchmarking in your application. You may even find that you can use simpler string functions such as Left() or Right() for further performance gains, although Like does recognize some of those and special case them already.
Warning Don't substitute Basic specific string functions such as InStr() or Left() for queries against ODBC data. Because many ODBC servers may not support these functions, using them may cause all data in the query to be brought to the local machine so that the operators can be applied locally. This will be much slower than having the server perform the Like operation on the server. If in doubt, try it and see. The performance difference will be dramatic on large tables.
The .Filter property of a Recordset is one of the slowest ways of selecting a subset of data, unless the set of records is extremely small. A far better way to subset data is to open the table directly, specify an index that has the data sorted on the criteria that you wish to specify and then seeking to that location (see tip #1).
If you don't have an appropriate index, or the criteria is complex, then create a new query and specify an appropriate WHERE clause. Running this new query will be faster than recreating a new recordset based on the existing recordset.
As a general rule, you should not use the .Filter property on any set of rows that have more than 100 records.
The require method of a dynaset or snapshot-type Recordset is useful if you need to gather any new rows that may have been entered since you first ran the query. Using this method will be considerably faster than creating a new recordset.
Dim rstData as Recordset
Set rstData = dbsCurrent.CreateRecordset("Long Running Query")
'... time passes
rstData.Requery ' far faster than recreating the recordset
The same mechanism that makes tip #14 so effective can be used on any DAO object. In general, it is faster to replace an expression like:
dbEngine.Workspaces(0).Databases(0)
with
Dim dbCurrent as database
Set dbCurrent = dbEngine.Workspaces(0).Databases(0)
and then use the dbCurrent variable for all future references. If you only make the one reference in code, then there is nothing gained by creating a variable to store the reference, but more than two references is generally worth the overhead and extra lines of code.
Note that this is really only an issue for time critical portions of code. As for all optimizing, concentrate on the 10% of code that is used most, rather than worrying about rarely used routines.
There is an extremely minor performance gain to using:
rstOrders(0)
instead of
rstOrders!OrderID
However, the gain is so small as to be almost not worth mentioning and almost definitely not worth the risk to code stability if table structures are modified. Applying tip #11 will result in far better gains than this.
Most DAO objects have several collections (e.g. Database has Querdefs, Tabledefs, and Recordsets). One of these collections is the default collection for that object and allows an abbreviated syntax. For example:
dbEngine.Workspaces(0).Databases(0)
can be shortened to
dbEngine(0)(0)
This shorter syntax is also slightly faster. Before rushing in and changing all your code, however, you might want to consider whether the loss in readability is worth the marginal increase in speed. This may be something to consider only when trying to squeeze the last drop out of some inner loop. Even then, tip #14 suggests a potentially far more effective way of doing this.
Microsoft Access 1.x shipped with some List methods that have better equivalents in Visual Basic 3.0 and Microsoft Access 2.0. In addition to the syntax being shorter, using the new syntax will often be quicker, as illustrated by the sample code. See the Access 2.0 help topic "Converting Code" for examples on how to change this old code to use the new syntax.
The Microsoft Jet database engine provides many advanced query capabilities. Many of these are overlooked when it comes time to write update code. One example of a common scenario where a single update query can replace multiple lines of code is a so-called "Posting" query.
A posting query typically consists of two tables: a table that contains the "master" data, and an "update" table of similar structure that contains changed values for existing data in the master table and new rows to be inserted. Usually you would write explicit code that walks the update table and applies the changes to the master table, either updating or appending new records as required.
The code to do this can be replaced with a single query. By taking advantage of Jet's ability to do updatable outer joins, you can join the update table to the master table and specify that the master fields be updated with the update fields. If a record doesn't exist in the master, then because it is an outer join, the new records will appear in the candidate update set and the join field will be updated - thereby creating a new record.
For example, the following query would update to an Orders table from an Order Updates table:
UPDATE Orders RIGHT JOIN [Order Updates]
ON Orders.[Order ID] = [Order Updates].[Order ID]
SET Orders.[Order ID] = [Order Updates].[Order ID],
Orders.[Customer ID] = [Order Updates].[Customer ID],
Orders.[Employee ID] = [Order Updates].[Employee ID],
Orders.[Required Date] = [Order Updates].[Required Date],
Orders.[Shipped Date] = [Order Updates].[Shipped Date],
Orders.Freight = [Order Updates].[Freight];
This query could be tweaked further to allow for blank entries in the Order Updates table for certain fields. E.g. instead of:
Orders.Freight = [Order Updates].[Freight]
you could use:
Orders.Freight = IIF(IsNull([Order Updates].Freight),
Orders.Fright, [Order Updates].Freight)
The MoveFirst, MoveNext, MovePrevious, and MoveLast methods have been a part of Microsoft Access since it was first released. Microsoft Access 2.0 introduced a complementary method called Move. While at first glance this method may seem superfluous, a closer examination reveals that it is actually quite a powerful substitute for the four existing methods and also has a couple of unique uses of its own.
Perhaps one of the most useful and yet most unintuitive uses for Move is the following statement:
rstData.Move 0
What affect does this have? In the form given above, this will force the Jet engine to re-fetch the data underlying the current record. This might be useful if the cursor position hasn't changed in a while and you wish to be sure that the most current record is going to be edited. As the code example shows, this is faster than the equivalent MoveNext and MovePrevious, and also avoids complications with the cursor being on the last record.
See also tip #34.
As discussed in tip #33, the Move method has some interesting uses that aren't apparent at first glance.
Another use is as follows:
rstData.Edit
...
rstData.Move 0 ' Cancel the edit.
As the comment indicates, Move 0 will have a similar affect to MoveNext by canceling the Edit (or AddNew). However, this is achieved without the hassles of testing or trapping for EOF as would normally be required.
The Clone method is a little used method on a Recordset object. This method is an efficient and fast way of creating a recordset that is a "copy" of another recordset. It is much more efficient than rerunning a query, or even reopening a table-type recordset. As a general rule, cloning a recordset is a low cost operation.
Some examples of where you might use clones include:
Dim rst1 As Recordset, rst2 As Recordset Set rst1 = dbsCurrent.OpenRecordset("Table1") Set rstClone = rst1.Clone() Debug.Print rst1!name rst1.Edit rst1!name = "Smith" Debug.Print rst1!name Debug.Print rstClone!name ...
The code above will print out the original value of Name, followed by "Smith," followed by the original name, because the recordset clone, rstClone, has access to the stored name until the Update method is called.
In the rush of shipping an application this is an easy tip to overlook. This is unfortunate, because the first impression someone gets is marred by the delays associated with compiling first the modules and then the queries themselves as they are used. It is particularly bad if the database is being used read-only, because then it will silently happen every time everyone uses the database.
See tips #4 and #23 for more information about why this is so.
The Find methods are useful for moving to a record that meets a specific criteria within a recordset. A common trap when converting code to work with an ODBC database is to try to replace Seek method calls with Find method calls. As a general rule, this code will be slow.
When trying to find an individual record, the best approach is to create a new recordset with a WHERE clause that contains the Find criteria. This will return all records that match the criteria. Because this is a single operation on the server, this will be as efficient as possible. It may be necessary to change forms or other visual elements to use this approach, but this is generally preferable - especially when large tables are involved.
If your program can't be modified to avoid using Find methods, then understanding exactly how they work will help you make the best use of them. Find will be fairly efficient if the following two criteria are met:
Jet recognizes the above cases and executes a query on the server that returns primary keys for all the records that match the search criteria. If matching records are found, it then moves through the keys it has stored for the recordset until it finds a match. If the local recordset is reasonably sized (say less than 500 rows), this is a relatively quick operation.
One result of the above strategy is that unsuccessful Finds will be as fast as the server allows. This isn't something you'll be able to take advantage of in all scenarios(!), but it may be useful in some.
An example of a reasonably efficient use of Find is if you want to match a lastname in a set of records (assuming lastname was indexed on the server):
RS.FindFirst "lastname = ""Smith"""
RS.FindFirst "lastname Like ""Smith*"""
Find methods will also be reasonably efficient if:
In the above case, the Find criteria are resolved by searching the data which has already been downloaded into the snapshot on the local machine. Of course, if the snapshot is not yet fully populated, that time to fetch data will be added to the search time. This means that Finds that fail will be slow if all the records will have to be fetched, unlike the dynaset case above where no further fetching will be done if the Find fails.
Find methods will usually be slow if:
The algorithm used to resolve the criteria in this case is to fetch each record and see if the criteria match until a match is found or all records have been examined.
At the risk of stating the obvious, don't be tempted to avoid using Find methods and code the equivalent loop in Basic! That is guaranteed to be the slowest technique of all.
As discussed in tip #21, this is probably not an area worth spending a lot of time on until all other data related optimizations have been explored.
The .Sort property of a Recordset is one of the slowest ways of sorting data, unless the set of records is extremely small. A far better way to sort data is to open the table directly and then specify an index that has the data sorted in the order that you wish to move through the table. If you can't open a table-type recordset (e.g., with attached tables [except see tip #3] or ODBC data), then create a new query and specify an ORDER BY clause. Running this new query will be faster than recreating a new recordset based on the existing recordset.
As a general rule, you should not use the .Sort property on any set of rows that have more than 100 records.
This is as much a specific DAO suggestion as a general design and query suggestion that affects all aspects of your application. The Microsoft Access default for number is to use Double. While this has the greatest range and floating point accuracy of all the numeric types, it is also the most expensive to store, convert, and compute. If your numbers are of limited range or always integral values, you should consider changing your column data types to suit. This will make DAO fetch and retrieval code marginally faster due to fewer bytes being retrieved but will have a significant effect on numerical calculations (as integers are used in place of floating point values).
Tip #22 has similar information regarding text versus memo data types.
If you are the only user making changes to data, then you can use the DB_DENYWRITE flag to lock an entire table or database while you make changes. Others will still be able to read the data, but no one except you will be able to write to it. Because there will be less locks being placed to avoid updates colliding, your updates will happen faster.
Only use the Refresh method when you need to. It is generally an expensive operation relative to most DAO operations and should be used sparingly.
DAO has many collections on its various objects. Because some of those collections are based on data stored in the database, the DAO collection can get out of date if someone else changes something in the database. For example, if some other user adds a table, it won't immediately appear in the Tabledefs collection. The Refresh method of each collection forces DAO to fill the collection again with the latest information in the database.
Hopefully the above tips will have provided some valuable insights into how to make your DAO code run faster. In many ways, these are just the tip of the iceberg. Accessing ODBC data, in particular, is the subject of a whole other paper (e.g., the Tech Ed 95's AC301 talk has a paper that concentrates on this).
Probably the best tip of all is: Try it! If you have an idea that you think will make your program run faster, then try it out. If it is a big change, then try and find a way to test your assumptions before committing to a lot of work, but don't let that be a reason not to try it out. Even if it doesn't work, you will have gained valuable insights into why and will be able to use that knowledge on the next attempt.
Finally, don't forget the resources that come with the product itself. There are several help topics on improving performance. Several sections in the manuals also have suggestions. Other supporting papers such as "White Papers" and Knowledge Base articles are available on the Microsoft Developers Network CD-ROM and the usual download locations supported by Microsoft.