In .NET, there are two categories of types, reference types and value types.
Structs are value types and classes are reference types.
The general difference is that a reference type lives on the heap, and a value type lives inline, that is, wherever it is your variable or field is defined.
A variable containing a value type contains the entire value type value. For a struct, that means that the variable contains the entire struct, with all its fields.
A variable containing a reference type contains a pointer, or a reference to somewhere else in memory where the actual value resides.
This has one benefit, to begin with:
- value types always contains a value
- reference types can contain a null-reference, meaning that they don't refer to anything at all at the moment
Internally, reference types are implemented as pointers, and knowing that, and knowing how variable assignment works, there are other behavioral patterns:
- copying the contents of a value type variable into another variable, copies the entire contents into the new variable, making the two distinct. In other words, after the copy, changes to one won't affect the other
- copying the contents of a reference type variable into another variable, copies the reference, which means you now have two references to the same somewhere else storage of the actual data. In other words, after the copy, changing the data in one reference will appear to affect the other as well, but only because you're really just looking at the same data both places
When you declare variables or fields, here's how the two types differ:
- variable: value type lives on the stack, reference type lives on the stack as a pointer to somewhere in heap memory where the actual memory lives (though note Eric Lipperts article series: The Stack Is An Implementation Detail.)
- class/struct-field: value type lives completely inside the type, reference type lives inside the type as a pointer to somewhere in heap memory where the actual memory lives.
I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}
As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)
This is better than the common practice of XOR
ing hashcodes for two main reasons. Suppose we have a type with two int
fields:
XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y
By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.
This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR
instead of ADD
as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.
// Note: Not quite FNV!
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ field1.GetHashCode();
hash = (hash * 16777619) ^ field2.GetHashCode();
hash = (hash * 16777619) ^ field3.GetHashCode();
return hash;
}
}
Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.
As per the documentation:
You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:
- You can compute the hash code from fields that are not mutable; or
- You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.
The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing
Best Answer
What's going on in there?
You quote the parameter lists for several overloads of
Add
. These are convenience methods that correspond directly to constructor overloads for theSqlParameter
class. They essentially construct the parameter object using whatever constructor has the same signature as the convenience method you called, and then callSqlParameterCollection.Add(SqlParameter)
like this:AddWithValue
is similar but takes convenience even further, also setting the value. However, it was actually introduced to resolve a framework flaw. To quote MSDN,The constructor overloads for the
SqlParameter
class are mere conveniences for setting instance properties. They shorten the code, with marginal impact on performance: the constructor may bypass setter methods and operate directly on private members. If there's a difference it won't be much.What should I do?
Note the following (from MSDN)
The default type is input. However, if you allow the size to be inferred like this and you recycle the parameter object in a loop (you did say you were concerned with performance) then the size will be set by the first value and any subsequent values that are longer will be clipped. Obviously this is significant only for variable length values such as strings.
If you are passing the same logical parameter repeatedly in a loop I recommend you create a SqlParameter object outside the loop and size it appropriately. Over-sizing a varchar is harmless, so if it's a PITA to get the exact maximum, just set it bigger than you ever expect the column to be. Because you're recycling the object rather than creating a new one for each iteration, memory consumption over the duration of the loop will likely drop even if you get a bit excited with the oversizing.
Truth be told, unless you process thousands of calls, none of this will make much difference.
AddWithValue
creates a new object, sidestepping the sizing problem. It's short and sweet and easy to understand. If you loop through thousands, use my approach. If you don't, useAddWithValue
to keep your code simple and easy to maintain.2008 was a long time ago
In the years since I wrote this, the world has changed. There are new kinds of date, and there is also a problem that didn't cross my mind until a recent problem with dates made me think about the implications of widening.
Widening and narrowing, for those unfamiliar with the terms, are qualities of data type conversions. If you assign an int to a double, there's no loss of precision because double is "wider". It's always safe to do this, so conversion is automatic. This is why you can assign an int to a double but going the other way you have to do an explicit cast - double to int is a narrowing conversion with potential loss of precision.
This can apply to strings: NVARCHAR is wider than VARCHAR, so you can assign a VARCHAR to an NVARCHAR but going the other way requires a cast. Comparison works because the VARCHAR implicitly widens to NVARCHAR, but this will interfere with the use of indexes!
C# strings are Unicode, so AddWithValue will produce an NVARCHAR parameter. At the other end, VARCHAR column values widen to NVARCHAR for comparison. This doesn't stop query execution but it prevents indexes from being used. This is bad.
What can you do about it? You have two possible solutions.
Ditching VARCHAR is probably the best idea. It's a simple change with predictable consequences and it improves your localisation story. However, you may not have this as an option.
These days I don't do a lot of direct ADO.NET. Linq2Sql is now my weapon of choice, and the act of writing this update has left me wondering how it handles this problem. I have a sudden, burning desire to check my data access code for lookup via VARCHAR columns.
2019 and the world has moved on again
Linq2Sql is not available in dotnet Core, so I find myself using Dapper. The [N]VARCHAR problem is still a thing but it's no longer so far buried. I believe one can also use ADO so things have come full circle in that regard.