What you are working on is basically ETL. So at a high level you need an export component (get stuff) a transform component (map to known format) and a load (take known format and put stuff somewhere). If you are comfortable being tied to a RDBMS you could use something like SQL Server SSIS packages. What I would do is create a host application that managed common aspects of the overall process (errors, and pipeline processing). Then make the specifics of the E, T, and L pluggable. A low ceremony way to get this would be to host the powershell runtime and create each seesion with common context objects that the scripts will use to communicate. You get a built in pipe and filter model for scripts and easy, safe extensibility. This design has worked extremely for my team with a similar situation.
Here is a link describing this approach.
Below is my resultant class based on Dan's suggestion
using System;
using System.Data;
using System.Data.Common;
namespace CustomDataAccess{
public class DataSetBuilder
{
#region Properties
private DataSet _DataSet;
public DataSet DataSet { get { return _DataSet; } }
#endregion
#region Constructors
public DataSetBuilder()
{
this._DataSet = new DataSet();
}
public DataSetBuilder(string DataSetName)
{
this._DataSet = new DataSet(DataSetName);
}
public DataSetBuilder(DataSet DataSet)
{
this._DataSet = DataSet;
}
#endregion
#region Public Methods
public DataSetBuilder InsertTables(DataTable Table)
{
this._DataSet.Tables.Add(Table);
return this;
}
public DataSetBuilder InsertTables(string DbProviderName, string ConnectionString, string TableName, string CommandText)
{
System.Data.Common.DbDataAdapter adapter = Create_Adapter(DbProviderName, ConnectionString);
Fill_Adapter(adapter, TableName, CommandText);
adapter.SelectCommand.Connection.Close();
return this;
}
public DataSetBuilder InsertTables(string DbProviderName, string ConnectionString, string[] TableName, string[] CommandText)
{
if (TableName.Length != CommandText.Length)
{
throw new Exception("Error: Must provide a table name for each command.");
}
System.Data.Common.DbDataAdapter adapter = Create_Adapter(DbProviderName, ConnectionString);
for (int i = 0; i < TableName.Length; i++)
{
Fill_Adapter(adapter, TableName[i], CommandText[i]);
}
adapter.SelectCommand.Connection.Close();
return this;
}
public void AddRelations(string ParentTable, string PrimaryKey, string ChildTable, string ForeignKey, bool NestingRule)
{
Add_Relations(ParentTable, PrimaryKey, ChildTable, ForeignKey, NestingRule);
}
public void AddRelations(string[] ParentTable, string[] PrimaryKey, string[] ChildTable, string[] ForeignKey, bool[] NestingRule)
{
for (int i = 0; i < ParentTable.Length; i++)
{
Add_Relations(ParentTable[i], PrimaryKey[i], ChildTable[i], ForeignKey[i], NestingRule[i]);
}
}
#endregion
#region Private Methods
private System.Data.Common.DbDataAdapter Create_Adapter(string DbProviderName, string ConnectionString)
{
DbProviderFactory dbFactory = System.Data.Common.DbProviderFactories.GetFactory(DbProviderName);
System.Data.Common.DbConnection connection = dbFactory.CreateConnection();
connection.ConnectionString = ConnectionString;
connection.Open();
System.Data.Common.DbCommand command = dbFactory.CreateCommand();
command.Connection = connection;
System.Data.Common.DbDataAdapter adapter = dbFactory.CreateDataAdapter();
adapter.SelectCommand = command;
return adapter;
}
private void Fill_Adapter(System.Data.Common.DbDataAdapter Adapter, string TableName, string CommandText)
{
Adapter.SelectCommand.CommandText = CommandText;
Adapter.Fill(_DataSet, TableName);
}
private void Add_Relations(string ParentTable, string PrimaryKey, string ChildTable, string ForeignKey, bool NestingRule)
{
DataColumn pk = _DataSet.Tables[ParentTable].Columns[PrimaryKey];
DataColumn fk = _DataSet.Tables[ChildTable].Columns[ForeignKey];
DataRelation relation = _DataSet.Relations.Add(pk, fk);
relation.Nested = NestingRule;
}
#endregion
}}
Best Answer
You are going overboard with fancy concepts was too soon. Generics - when you see a case use them, but otherwise don't worry. Factory pattern - way too much flexibility ( and added confusion ) for this yet.
Keep it simple. Use fundamental practices.
Try to imagine the common things between doing a read for XML, a read for CSV whatever. Things like, next record, next line. Since New formats may be added, try to imagine commonality that the to be determined format would have with the known ones. Use this commonality and define an 'interface' or a contract that all formats must adhere to. Though they adhere to the common ground, they all may have their specific internal rules.
For validating the data, try to provide a way to easily plug in new or different validator code blocks. So again, try to define an interface where each validator, responsible for a particular kind of data construction adheres to a contract.
For creating the data constructions you will probably be constrained by whoever designs the suggested output objects more than anything. Try to figure out what the next step for the data objects is, and are there any optimizations you can make by knowing the final use. For example if you know the objects are going to be used in an interactive application, you could help the developer of that app by providing 'summations' or counts of the objects or other kinds of derived information.
I'd say most of these are Template patterns or Strategy patterns. The whole project would be an Adapter pattern.