C# Algorithms – Merging Multiple Rows of Data

algorithmscsorting

I need an algorithm to merge multiple rows for data, where one column is the index column and if one or more rows have the same index value merge the column values into a single row, and lastly sort the rows one the index column.

ASCII ART Example:

Starting point:

------------------------------------
| IndexCurve | Column A | Column B |
------------------------------------
| 1          | 1        |          |
------------------------------------
| 2          | 5        | 5        |
------------------------------------
| 1          |          | 1        |
------------------------------------
| 2          | 5        | 5        |
------------------------------------
| 3          | 1        |          |
------------------------------------

Outcome:

------------------------------------
| IndexCurve | Column A | Column B |
------------------------------------
| 1          | 1        | 1        |
------------------------------------
| 2          | 5        | 5        |
------------------------------------
| 3          | 1        |          |
------------------------------------

As you can see the empty columns have been merged, duplicates have been discarded and rows have been sorted on the index column. (NOTE: The column values should not add to each other.)

I have been trying to come up with an efficient algorithm for this but so far I think I have failed as i think its pretty ineffective and also its not working correctly.

Here I have some code to show you what I currently have, the first code pieces are just POCO objects to give you an idea of what I'm working with, have also included equal methods for you to see how I compare objects.

public interface IWitsmlLogCurveInfo
{
    string Mnemonic { get; set; }
    bool IndexCurve { get; set; }
}

public class WitsmlLogCurveInfo
{
    string Mnemonic { get; set; }
    bool IndexCurve { get; set; }

    protected bool Equals(WitsmlLogCurveInfo other)
    {
        return string.Equals(Mnemonic, other.Mnemonic);
    }

    public override bool Equals(object obj)
    {
        if(ReferenceEquals(null, obj))
        {
            return false;
        }
        if(ReferenceEquals(this, obj))
        {
            return true;
        }
        if(obj.GetType() != this.GetType())
        {
            return false;
        }
        return Equals((WitsmlLogCurveInfo)obj);
    }

    public override int GetHashCode()
    {
        return (Mnemonic != null ? Mnemonic.GetHashCode() : 0);
    }
}

public interface IWitsmlLogDataColumn
{
    IWitsmlLogCurveInfo WitsmlLogCurveInfo { get; set; }
    string Value { get; set; }
}

public class WitsmlLogDataColumn : IWitsmlLogDataColumn
{
    public IWitsmlLogCurveInfo WitsmlLogCurveInfo { get; set; }
    public string Value { get; set; }

    protected bool Equals(WitsmlLogDataColumn other)
    {
        return string.Equals(Value, other.Value) && string.Equals(WitsmlLogCurveInfo, other.WitsmlLogCurveInfo);
    }

    public override bool Equals(object obj)
    {
        if(ReferenceEquals(null, obj))
        {
            return false;
        }
        if(ReferenceEquals(this, obj))
        {
            return true;
        }
        if(obj.GetType() != this.GetType())
        {
            return false;
        }
        return Equals((WitsmlLogDataColumn)obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return ((Value != null ? Value.GetHashCode() : 0) * 397) ^ (WitsmlLogCurveInfo != null ? WitsmlLogCurveInfo.GetHashCode() : 0);
        }
    }
}

public interface IWitsmlLogDataRow
{
    IList<IWitsmlLogDataColumn> Columns { get; set; }
    IWitsmlLogDataColumn GetIndexColumn();
}

public class WitsmlLogDataRow : IWitsmlLogDataRow
{
    public IList<IWitsmlLogDataColumn> Columns { get; set; }

    public IWitsmlLogDataColumn GetIndexColumn()
    {
        return Columns.SingleOrDefault(x => x.WitsmlLogCurveInfo.IndexCurve);
    }

    protected bool Equals(WitsmlLogDataRow other)
    {
        return Columns.All(x => other.Columns.Contains(x));
    }

    public override bool Equals(object obj)
    {
        if(ReferenceEquals(null, obj))
        {
            return false;
        }
        if(ReferenceEquals(this, obj))
        {
            return true;
        }
        if(obj.GetType() != this.GetType())
        {
            return false;
        }
        return Equals((WitsmlLogDataRow)obj);
    }

    public override int GetHashCode()
    {
        return (Columns != null ? Columns.GetHashCode() : 0);
    }
}

public enum FrictionType
{
    Annulus,
    Sliding,
    Rotating
}

public interface IWitsmlLogFriction 
{
    FrictionType Type { get; set; }
    IList<IWitsmlLogDataRow> Rows { get; set; }
}

public class WitsmlLogFriction : IWitsmlLogFriction
{
    public FrictionType Type { get; set; }
    public IList<IWitsmlLogDataRow> Rows { get; set; }

    public override string ToString()
    {
        return string.Format("FrictionType: {0}", Type);
    }

    protected bool Equals(WitsmlLogFriction other)
    {
        return Type == other.Type && Rows.All(x => other.Rows.Contains(x));
    }

    public override bool Equals(object obj)
    {
        if(ReferenceEquals(null, obj))
        {
            return false;
        }
        if(ReferenceEquals(this, obj))
        {
            return true;
        }
        if(obj.GetType() != this.GetType())
        {
            return false;
        }
        return Equals((WitsmlLogFriction)obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return ((int)Type * 397) ^ (Rows != null ? Rows.GetHashCode() : 0);
        }
    }
}

Here is the Algorithm Function itself:

private IList<IWitsmlLogDataRow> CombineMultipleFrictionTypes(IList<WitsmlLogFriction> witsmlLogFrictions, WitsmlLogCurveInfo indexCurve)
{
    if(indexCurve == null)
    {
        throw new NoNullAllowedException("IndexCurve can not be null!");
    }
    if(witsmlLogFrictions.Count == 1)
    {
        return witsmlLogFrictions.First().Rows;
    }
    WitsmlLogFriction witsmlLogFrictionA = witsmlLogFrictions.FirstOrDefault();
    List<IWitsmlLogDataRow> rows = new List<IWitsmlLogDataRow>();
    if (witsmlLogFrictionA == null)
    {
        return rows;
    }
    foreach(WitsmlLogFriction witsmlLogFrictionB in witsmlLogFrictions.Where(witsmlLogFriction => witsmlLogFriction.Type != witsmlLogFrictionA.Type))
    {
        foreach (IWitsmlLogDataRow witsmlLogDataRowA in witsmlLogFrictionA.Rows)
        {
            // Get index curve for row A
            IWitsmlLogDataColumn indexCurveA = witsmlLogDataRowA.GetIndexColumn();
            if (indexCurveA == null) continue;



            IWitsmlLogDataRow tempWitsmlLogDataRowA = new WitsmlLogDataRow(witsmlLogDataRowA.Columns);
            foreach(IWitsmlLogDataRow witsmlLogDataRowB in witsmlLogFrictionB.Rows)
            {
                // Get index curve for row B
                IWitsmlLogDataColumn indexCurveB = witsmlLogDataRowB.GetIndexColumn();
                if (indexCurveB == null) continue;



                // Compare index curve values
                if(indexCurveA.Equals(indexCurveB))
                {
                    // Index curve values are identical, append columns from row B onto row A, except row B's index column
                    foreach(IWitsmlLogDataColumn column in witsmlLogDataRowB.Columns.Where(column => !column.WitsmlLogCurveInfo.Equals(indexCurve)))
                    {
                        tempWitsmlLogDataRowA.Columns.Add(column);
                    }
                }
                else
                {
                    rows.Add(witsmlLogDataRowB);
                }
            }
            // Add the new row
            rows.Add(tempWitsmlLogDataRowA);
        }
    }
    switch(indexCurve.IndexType)
    {
        case LogIndexType.datetime:
            return rows.OrderBy(x => Convert.ToDateTime(x.GetIndexColumn().Value)).ToList();
        default:
            throw new ArgumentOutOfRangeException("indexCurve", "IndexType: " + indexCurve.IndexType + " is not supported.");
    }
}

The problem is I think its pretty ineffective, but it could also be the only way to do it, and it adds more data then it should. What would be a better and efficient way to solve this?

Best Answer

  1. Sort your table by index
  2. trivially merge all data with the same index (as they'll all be together), when the index changes to the next value, write the row and repeat.

You may take a hit on sorting, but generally its a common operation with lots of good existing algorithms that can efficiently sort your dataset well. After that, manipulating your sorted dataset is easy.

Related Topic