March26

Introducing the CornerstoneDB Data Generator

What is Data Generator?

CornerstoneDB Data Generator First LookThe new CornerstoneDB Data Generation Tool is a combination of a desktop application and API that allows for generation of data for different database targets.  SQL Server, SQL CE and VistaDB are the initial database targets supported, with more planned for release later.

Data generation allows for repeatable generation of columns into a database.  For example a desktop accounting application may want some way to generate customers, deposits, withdrawals, etc.  This data may be used for application tests (Nunit or other unit testing frameworks), load testing of your application design, or even sample databases for end users.

Why data generation?

There are several other data generation tools out on the market today.  But none of them are 100% managed code, and none of them have APIs that you can use to build data generation into your own product.  A few offer an SDK that is based upon you shipping their application to end users, and paying royalties for each desktop you deploy that generator. 

There are also current no data generators that support SQL Server, VistaDB, and SQL CE (our initial goals).  Most of the SQL Server tools are built around SMO and will not work with VistaDB or SQL CE.  We are using the ADO.Net providers for our generators.  That means that generation will be slower than SMO based solutions, but we can target other databases with the same core code.

Having an API is the key

In our view data generation is pretty much worthless to anyone other than a DBA if you have to install the complete generator product in order to run it.  We have built the CornerstoneDB Data Generator from the ground up to be an API based product that enables developers to built data generation into products or utilities as needed.

Some users will only ever want to use the desktop user interface, and we will have a SKU that only includes the ability to run interactively only.  But we feel that the majority of customers will want the API access for generation within their own applications.

Shared Project Files

Shared Project FilesThe saved project from the desktop application may be saved and used with the API. The API can use these projects directly for generation inside your applications, unit tests, or anything really.  You can also programmatically edit, or build these project files with the API.  The desktop application  may then be used to tweak settings, or preview the data generation.

The ability to share these project files across both the API and user interface means you can have non technical users configure the project, and then run the generation using your own custom application.

Ask the user how much data they want

Applications have lots of different ways to use data generation capabilities.  One possible scenario that is not possible with other data generators is the ability to change the generation based upon user feedback using your own application.  Imagine a scenario where you have a business application that includes a blank database schema.

You could either through code, or preconfigured project files, ask the user if they would like the database to be filled with sample data.  Your scripts could include things like the country where the user resides (generating UK addresses for the sample rather than US ones).

Or perhaps ask them how many years of data they would like to generate.  If a company expects to use your product for 10 years, they may want to see what reports and screens will look like once that amount of data is loaded into the system.

Whether designed dynamically from the API or DataGen UI, all generation project files can be run later using either method.  This design allows users to fully customize there test data and save the project to be run from within unit tests, interactively, or even during application deployment.

Designing your test data

Each table may be customized for whether the table should be included for generation, existing data should be truncated, and the number of rows to generate.

Example Table Settings

The following options are available on a per table basis.

TableView

  1. Truncate on Insert: Do I want this table to be truncated before my test data is inserted?

  2. Generate Table: Do I want this table to be generated or not.

  3. Row Count: How many rows do I want generated for this table.

Per Column Generation Steps

Each column may also be customized for the specific type of generator being used, values, null ratios, unique values, and the most important value is the seed.  Being able to set a seed means your data generation is deterministic and can be repeated over and over.  This is great for unit testing where you always want your database to be a known state before the unit tests are run.

Example Column Settings

ColumnView

  1. Generator: Choose the Data Generator that bests suits your  schema. The drop down shows all the generators that may be used on this Data Type.

  2. Seed: Seed values are used to ensure reproducible generations.

  3. Unique Value: Do I want all values generated to be unique?

  4. Null Ratio: Do I want to some percentage of inserts to contain a null value?

Choosing The Best Column Generator

Designing your test data at a column level requires a decision be made on which Data Generator best suits your intended use.  CornerstoneDB Data Generator offers generators that tackle common data types like NVarChar, Boolean, Integer, etc.  Other more application specific generators are also available for Foreign Keys, date ranges, first names, last names, etc.

Custom generators are also possible in the system.  The FlatFileValue generator allows you to point to a flat file on disk of custom values to be used for the feed.  For example you may have a list of countries your company ships to that you want used for a country field in a table.  Other generators, and the ability to make your own generators will be included with later releases.

Some of the current built in generators include:

  • Boolean Generator
  • Byte Array Generator
  • Country Generator
  • DateTime Generator
  • Decimal Generator
  • Double Generator
  • Dummy Generator (Handles  default column values and other scenarios)
  • Name Generators
  • Flat File Value Generators
  • ForeignKey Generator
  • Number Generator
  • String Generator

Database Providers

The first version of CornerstoneDB Data Generation will handle generation to SQL Server 2005 / 2008 / 2008 R2 (including SQL Express versions), SQL CE, and VistaDB 4. Future provider plans include MySQL, and others.

Visit the Site

CornerstoneDB Data Generator Page - the main page for the CornerstoneDB Data Generator product.

Discussions are closed