Unlimited Possibilities through Code
NOTE: This article was originally part of the VistaDB.Net blog and has been moved here.
ADO.NET is a set of libraries included in the .Net framework to facilitate the communication of applications with various data storage mechanisms. These libraries form the basis for all third parties to provide data access services to users of .Net applications.
Visual Studio 2005 and 2008 did not change the data access model. In fact ADO.NET 2 is the longest running Microsoft data access technology without a major revision. I don’t know if this is going to change in .Net 4, but the stability of ADO.NET is a major reason for its adoption in VistaDB.
In this article I going to give a high level overview of the ADO.NET object model and how VistaDB supports that model. I will then follow up with more articles discussing specifics for this model.
ADO.NET at its most basic level was designed to help developers work efficiently with multi tier databases, across intranet or Internet scenarios. This is a tall order for a single framework, but I think they managed to pull it off quite well.
I often view the object model as being split into two parts – A connected and a disconnected side of the model. Even though there is no distinction between the two “sides”, it helps me to visualize how the data object was intended to be used.
The objects located in the disconnected half of the model do not communicate directly with the connected objects. This is a big deal because it means each side is isolated through interfaces to communicate. I have found this is often the most confusing part of the ADO.NET spec because users think each side should be able to directly communicate.
The ADO.NET DataSet for example does not ever communicate directly with the database. To fetch data from the database into a DataSet you must pass the DataSet into the Fill method of a connected object (The DataAdapter).
DataSets are a container class for a number of DataTable objects (stored in the Tables collection). You can think of this as an in memory representation of multiple tables, their constraints, relationships, and the actual rows.
The data stored in the DataSet is a disconnected copy of the actual database. Any changes you make to a DataSet only exist in RAM. You can quickly get a list of just the changes by calling the GetChanges method to return only modified DataRows. This allows you to submit smaller change sets back to the database.
The Merge method also lets to combine multiple change sets into a single DataSet. This is very useful in multi tier scenarios where you need to receive partial updates from lots of different systems and merge them into a single database change.
The DataSet is also unique in that you can read and write this disconnected data cache to disk. A common use is to put the DataSet into an XML document for transmission to multiple systems to merge into their local database.
The DataTable allows you to examine the actual rows of a DataSet through rows and columns collections. You can store the results of a query in a DataTable by calling the Adapters Fill method. Once the DataTable is filled the database connection is released and operates disconnected only. You can then continue to examine the data without any further communication between the DataTable and the database.
This is important for scenarios where you want to work across a network share. Caching the data on the local machine is vital to performance of the application. Whenever possible you should not be using connected objects against a network shared database, the traffic is very expensive.
Each DataTable has a collection of DataColumns associated with it. This represents a single Column in the database. But the object doesn’t actually contain any of the data stored in the DataTable. This is just a metadata storage class about the column, its constraints, types, etc. It is quite useful for finding when a single column allows nulls, has a unique constraint, etc. The autoincrement property is also implemented at the DataColumn level since you want to ensure each row has a unique value.
One interesting usage for this class is the Expression property. This allows you to define how the data for this column is calculated. This would enable you to compute the value of an items total price based upon units sold and price per unit at the DataColumn level rather than through a SQL command. The added benefit to using this scenario is that updates to the Price and Quantity columns will recalculate the total price column in RAM without having to return to the database to rerun the SQL.
Each row in a DataTable is represented in the Rows collection as a DataRow. To examine the data in a specific column use the Item property of the appropriate DataRow object. You can lookup the columns in a row through an integer index, or by the name of the Column.
The DataRow class is also where you perform updates to a DataSet. When your preparing to edit a row the BeginEdit method should be called on the DataRow object. The EndEdit method then can be used to save the changes back to the DataSet, CancelEdit provides an undo facility. This is key to how the DataGrid and other grid controls allow users to edit, commit, and cancel their changes in memory.
A key concept often missed by developers is that changes to the DataRow even when they are “AcceptChanges” calls are only committing them to the RAM copy of the database (The DataSet). You must still commit the changes back to the database in order to store them permanently.
Each DataColumn may have multiple Constraints. Conditions such as unique are applied through this class. Constraint objects are maintained through the DataTables Constraints collection.
Most tables in a single DataSet will be related in some way. The DataSet can handle mapping things like Orders to OrderDetails with a little help from the DataRelation class. This is a way to enforce referential integrity from the database, without having to make a complete round trip to the database.
Although excluded in the diagram above, this is an important class. You can use multiple DataView objects to example the same DataTable in a DataSet. This can result in massive memory savings since only one copy of the data needs to be maintained in the DataSet. You can do a lot of view type operations with the in memory DataView object.
Connected classes in ADO.NET are designed to communicate directly with the data source. Most of these classes map closely to basic data access concepts such as the Connection to the database, a Query, and QueryResults.
New in ADO.NET 2 this is an object factory that allows all .Net applications to generically load a provider without knowing much about it. Each Provider Factory includes a way to create Connections, ConnectionStringBuilders, etc.
It is this ProviderFactory object that allows VistaDB to be swapped with SQL Server at runtime through your app.config or web.config files. It is not a perfect match for communicating directly against the strongly typed classes, but it is very close.
A Connection is a representation of the actual phyiscal connection to the database. This may be through a server, or locally. You use this class to connect and disconnect from the actual database.
A connection also acts as the handle to the database for other objects like the DataAdapter and Command objects. They do not communicate directly to the database, they must go through a Connection.
This is another new class in ADO.NET 2 that simplified the process of building connection strings and remembering the options for a specific provider, it is a great utility class to save you time. Once you have built up this object you can assign the Connection class ConnectionString property to this object.
The DataAdapter is a new concept, there are no matching concepts in ADO or DAO. The DataAdapter is the bridge between your database and the disconnected ADO.NET objects. The Fill method provides an efficient mechanism to fetch the results of a query into DataSet or DataTable so you can work with the data offline (disconnected). You also use the DataAdapter to submit pending changes from the disconnected objects back to the database.
This is a complex class with lots of internal jobs to track. We will cover it in more detail in another blog post.
This class represents a question or query against a database, a call to a stored procedure, or a direct request to return the contents of a specific table. This is probably the hardest class to port your usage between database vendors as most of the calls at this level are vendor specific syntax.
Using a Command object with a database is pretty easy. You set the Connection property to an existing opened Connection object, and then specify the command you want to execute in the CommandText. You can supply SQL commands, or just the name of a stored procedure or table (although you have to then also change the CommandType property to what you wish to accomplish).
Commands have many ways to execute them, but all of them basically call ExecuteQuery under the hood. If you don’t care about the results, or only want the first row and column there are short cuts you can take (ExecuteNonQuery and ExecuteScalar).
Note that VistaDB does not support the ExecuteXML syntax at this time.
To simplify the process of putting parameters into a Command object the Parameter class allows you to quickly put parameters into a query without string concatenation or worry about SQL Injection attacks. Most developers rely heavily on parameterized queries because of the added benefit of SQL Injection protection; strings are automatically quoted to the specs of the underlying database.
There are many ways to create Parameters for Command objects, but they are mostly just different ways of doing the same thing. Some people prefer one syntax over another, but they all end up as Parameter objects on a Command object at execution in the database.
The DataReader is built as a way to retrieve and examine the rows returned in response to your query as quickly as possible. Not all of the rows have to be ready before you can start working with the data. Only a single row is exposed at a time, so the database engine can get you the first record quickly and then continue to work in the background to find the rest of the answers over time.
The data returned by a DataReader is always read only. This class was built to be a lightweight forward only, read only, way to run through data quickly (this was called a firehose cursor in ADO).
Transactions are used to ensure that multiple changes to database rows occur as a single unit of work. The Connection class has a BeginTransaction method that can be used to create a Transaction.
A definite best practice is to ensure that Transactions are placed in Using statements for rapid cleanup if they are not committed. Otherwise the objects (and any internal locks that may be needed) will remain active until the GC gets around to cleaning it up.
What about Cursors?
You will notice that I never mentioned database cursors in the above discussion. That is because they do not exist within the ADO.NET framework. ADO.NET does not support server side cursors by design. The DataSet and DataTable classes most closely resemble a cursor, and the DataReader class is a close match for a client side forward only, read only, cursor, but it is not the same thing.
©2010-2012 Jason Short. All Rights Reserved. The opinions expressed herein are my own and do not represent my employer’s views in any way.