Web API – Use Empty String, Null, or Remove Empty Property in API Requests/Responses

data typesweb-api

When transferring object through an API, as in schemaless JSON format, what is the ideal way to return non-existent string property? I know that there are different ways of doing this as in examples in the listed links below.

I'm sure I have used null in the past but don't have a good reason to give for doing that. It seems straight forward to use null when dealing with the database. But database seems like an implementation detail that shouldn't concern the party on the other side of the API. E.g. they probably use a schemaless datastore that only store properties with values (non-null).

From a code point of view, restricting string functions to work only with one type, i.e. string (not null), makes them easier to prove; avoiding null is also a reason for having Option object. So, if the code that produces request/response doesn't use null, my guess is the code on the other side of the API won't be forced to use null too.

I rather like the idea of using an empty string as an easy way to avoid using null. One argument that I heard for using null and against the empty string is that empty string means the property exists. Although I understand the difference, I also wonder if it's the just implementation detail and if using either null or empty string makes any real life difference. I also wonder if an empty string is analogous to an empty array.

So, which is the best way of doing it that addresses those concerns? Does it depend on the format of the object being transferred (schema/schemaless)?

Best Answer

TLDR; Remove null properties

The first thing to bear in mind is that applications at their edges are not object-oriented (nor functional if programming in that paradigm). The JSON that you receive is not an object and should not be treated as such. It's just structured data which may (or may not) convert into an object. In general, no incoming JSON should be trusted as a business object until it is validated as such. Just the fact that it deserialized does not make it valid. Since JSON also has limited primitives compared to back-end languages, it is often worth it to make a JSON-aligned DTO for the incoming data. Then use the DTO to construct a business object (or error trying) for running the API operation.

When you look at JSON as just a transmission format, it makes more sense to omit properties that are not set. It's less to send across the wire. If your back-end language does not use nulls by default, you could probably configure your deserializer to give an error. For example, my common setup for Newtonsoft.Json translates null/missing properties to/from F# option types only and will otherwise error. This gives a natural representation of which fields are optional (those with option type).

As always, generalizations only get you so far. There are probably cases where a default or null property fits better. But the key is not to look at data structures at the edge of your system as business objects. Business objects should carry business guarantees (e.g. name at least 3 characters) when successfully created. But data structures pulled off the wire have no real guarantees.

Related Solutions

API Security – How to Avoid Unauthorized Use of an API

You need several types of protection.

Firstly, you need to prevent Site A's key from being used on Site B.

In theory, if the key is bound to a domain, you can't depend on the referer header, but because you're client is embedding a script directly, you can reasonably rely on the document.location on the client-side. Sending that location (or portions of it) to the server directly is unreliable; but you can use it to generate a session key:

Client embeds client_key in request for API library.
Server determines host that has access to the API, if any.
Server picks "salt" for a session key and sends it to the client with the library [or as part of another pre-auth exchange].
Client calculates a session_key using hash(document.location.host + session_salt).
Client uses session_key + client_key for an API call.
Server validates the call by looking up the client_key's host and "salt" in the session, computing the hash, and comparing to the provided client_key.

Secondly, you need to impede Hacker Hank from opening the debug console or using a modified client on Site A to do whatever he wants with your API.

Note though, that it's very difficult, if not impossible, to completely prevent Hacker Hank from abusing the API. But, you can make it more difficult. And the most reasonably way to impede Hank, that I'm aware of, is rate limiting.

Limit the number of requests/second/session and requests/hour/session. (Spikes in activity are probably reasonable, but not sustained above-average traffic from a single client.)
Limit the number of sessions/IP/hour.
Limit the number of requests/IP/hour. Allow spikes, but not sustained heavy traffic from a single IP.

Thirdly, as you're likely already doing: encrypt the traffic. Sure, the NSA will see it; but Hacker Hank is less likely to.

API Design – Should Strings or Enums Be Used for Dictionary Keys?

How is the API exposed? Through an ordinary .NET interface, or through REST or similar?

In the first case, enums are a good choice if you have a limited amount of accepted values. You can check valid values with Enum.IsDefined, you can have inline XML documentation explaining each value, and typo errors are checked by the compiler (and are caught even earlier with a capable IDE).

Make sure everybody uses the values in a form MyEnum.SomeValue, and not in a form of underlying integers. If somebody uses integers, the risk is that when changing the order of the values within the enum or adding the values in the middle of it or removing values will change the match between some values and the underlying numbers.

Also note the importance of Enum.IsDefined. The following code, and especially the last line, is perfectly valid, and will compile and run without errors. Guess what will be the console output?
```
enum Color
{
    Red = 1,
    Green = 2,
    Blue = 3,
}

void Demo(Color color)
{
    Console.WriteLine(color.ToString());
}

Demo((Color)4);
```
In the second case, the values from enum will usually appear as numbers, and remembering what means 14 or 17 in a given context is not particularly exciting. So here, stick with meaningful string values.

Make sure you use an explicit map instead of a simple ToString. Renaming a value within the enum is a simple refactoring task and shouldn't break the code. If the string value matches the actual value name within the enum, the code will break in a subtle way which wouldn't be easy to debug.

See also: Is this a Best Practice with Enum in C# and the comments by MK87.

Best Answer

Related Solutions

API Security – How to Avoid Unauthorized Use of an API

API Design – Should Strings or Enums Be Used for Dictionary Keys?

Related Topic