Is there any algorithm to determine the minimum number of NAND or NOR gates with
- given number of inputs
- availability / unavailability of complemented input
required for realizing a Boolean expression? We can get an AND-OR form as prime implicants via Karnaugh maps that is minimal (as far as I know, the Quine-McCluskey algorithm obtains them deterministically). Does a similar technique exist for NAND or NOR implementations, too? At least, such technique should determine the required minimum number of NAND/NOR gates even without finding the actual diagram?
Applying De Morgan's law on the prime implicants doesn't seem deterministic,
A ⊕ B = A'B + AB' = ((A'B)'(AB')')' [5 NAND gates] A ⊕ B = (AB + A'B')' = ((ABAB+ABB') + (A'AB+A'B'))' = (AB(AB+B') + A'(AB+B'))' = ((AB+A')(AB+B'))' = (((AB)'A)'((AB)'B)')' [4 NAND gates by reusing (AB)']
You can only find the minimum number of gates in a multi-level network by solving an integer programming problem [or equivalents, see below]. This problem is NP-complete, so only practical to solve up to a dozen gates or so.
There exist approximation methods that won't give you the minimum number but are more tractable in terms of time required... These are a vast topic in themselves, basically the whole field of multi-level optimization. You can read a [free] overview here.
For small networks of NAND (up to 4 variables), the problem has been completely solved by exhaustive enumeration [or equivalent methods]. There's a fairly recent  PhD thesis by Elizabeth Ann Ernst that summarizes the ancient results and extends them. Ernst uses branch-and-bound, which improves upon the exhaustive method in practice, but not asymptotically. She also notes that other implicit enumeration methods like integer programming or CSP (constraint satisfaction, solved via SAT) perform worse in practice.
She obviously wrote some software for her method (called BESS), but I'm not sure if it's publicly available somewhere. Full text of her thesis is freely available at umich. And indeed you found the minimal expression for 2-input xor (your 2nd one obviously), the one highlighted below:
She also compared the exact results (for NANDs) with those produced by the heuristic optimizer from ABC.
There (obviously) some [larger] networks for which BESS didn't finish, but allowed an upper bound to be found (at the point where the search was abandoned). For those ABC did quite well [well with respect to the bounds found], as you can see from the 2nd graph below.