Excel – OleDB & mixed Excel datatypes : missing data

datasetdatatableexceloledb

I have an Excel worksheet I want to read into a datatable – all is well except for one particular column in my Excel sheet. The column, 'ProductID', is a mix of values like ########## and n#########.

I tried to let OleDB handle everything by itself automatically by reading it into a dataset/datatable, but any values in 'ProductID' like n###### are missing, ignored, and left blank. I tried manually creating my DataTable by looping through each row with a datareader, but with the exact same results.

Here's the code :

// add the column names manually to the datatable as column_1, column_2, ...
for (colnum = 0; colnum < num_columns; colnum ++){
  ds.Tables["products"].Columns.Add("column_" +colnum , System.Type.GetType("System.String")); 
}
while(myDataReader.Read()){
  // loop through each excel row adding a new respective datarow to my datatable 
  DataRow a_row = ds.Tables["products"].NewRow();
  for (col = 0; col < num_columns; col ++){
    try {  a_row[col] = rdr.GetString(col);  }
    catch {  a_row[col] = rdr.GetValue(col).ToString(); }
  }
  ds.Tables["products"].Rows.Add(a_row);
}

I don't understand why it won't let me read in values like n######. How can I do this?

Best Answer

Using .Net 4.0 and reading Excel files, I had a similar issue with OleDbDataAdapter - i.e. reading in a mixed data type on a "PartID" column in MS Excel, where a PartID value can be numeric (e.g. 561) or text (e.g. HL4354), even though the excel column was formatted as "Text".

From what I can tell, ADO.NET chooses the data type based on the majority of the values in the column (with a tie going to numeric data type). i.e. if most of the PartID's in the sample set are numeric, ADO.NET will declare the column to be numeric. Therefore ADO.Net will attempt to cast each cell to a number, which will fail for the "text" PartID values and not import those "text" PartID's.

My solution was to set the OleDbConnection connectionstring to use Extended Properties=IMEX=1;HDR=NO to indicate this is an Import and that the table(s) will not include headers. The excel file has a header row, so in this case tell ado.net not to use it. Then later in the code, remove that header row from the dataset and voilà you have mixed data type for that column.

string sql = "SELECT F1, F2, F3, F4, F5 FROM [sheet1$] WHERE F1 IS NOT NULL";

OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + PrmPathExcelFile + @";Extended Properties=""Excel 8.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text""");

OleDbCommand cmd = new OleDbCommand(sql, connection);
OleDbDataAdapter da = new OleDbDataAdapter(cmd);

DataSet ds = new DataSet();
ds.Tables.Add("xlsImport", "Excel");
da.Fill(ds, "xlsImport");

// Remove the first row (header row)
DataRow rowDel = ds.Tables["xlsImport"].Rows[0];
ds.Tables["xlsImport"].Rows.Remove(rowDel);

ds.Tables["xlsImport"].Columns[0].ColumnName = "LocationID";
ds.Tables["xlsImport"].Columns[1].ColumnName = "PartID";
ds.Tables["xlsImport"].Columns[2].ColumnName = "Qty";
ds.Tables["xlsImport"].Columns[3].ColumnName = "UserNotes";
ds.Tables["xlsImport"].Columns[4].ColumnName = "UserID";

connection.Close(); 

// now you can use LINQ to search the fields

    var data = ds.Tables["xlsImport"].AsEnumerable();
    var query = data.Where(x => x.Field<string>("LocationID") == "COOKCOUNTY").Select(x =>
                new Contact
                {
                    LocationID= x.Field<string>("LocationID"),
                    PartID = x.Field<string>("PartID"),
                    Quantity = x.Field<string>("Qty"),
                    Notes = x.Field<string>("UserNotes"),
                    UserID = x.Field<string>("UserID")
                });