I know this is an old question, but I recently reverse-engineered the 74181 and can explain in detail why it has the functions it does.
The 74181 is a 4-bit ALU chip that provides 16 logic functions and 16 arithmetic functions on its operands A and B. Many of the logic functions are what you might expect (AND, OR, XOR), but there are also unusual ones like A OR NOT B. The arithmetic functions are even stranger. While you have A PLUS B, and A MINUS B, some such as (A OR B) PLUS (A AND NOT B) seem pretty random.
There's actually a reason for this set of operations. The logic functions provide all 16 Boolean functions f(A,B). The arithmetic functions all boil down to A PLUS f(A,B) PLUS carry-in.
Step back to see why there are 16 functions. If you have a Boolean function f(A,B) on one-bit inputs, there are 4 rows in the truth table. Each row can output 0 or 1. So there are 2^4 = 16 possible functions. Extend these to 4 bits, and these are exactly the 16 logic functions of the 74181, from trivial 0 and 1 to expected logic like A AND B to contrived operations like NOT A AND B.
The arithmetic functions are simply these 16 functions added to A with the carry-in added. For example, If f(A,B)=B, you get A PLUS B PLUS carry-in. If f(A,B) = NOT B, you get A PLUS NOT B PLUS carry-in, which in two's-complement logic turns into A MINUS B MINUS 1 PLUS carry-in.
Other arithmetic functions take a bit more analysis. Suppose f(A,B) = NOT (A OR B). Then each bit of A PLUS f(A,B) will always be 1 except in the case where A is 0 and B is 1. So the result is A OR NOT B. Even though you're doing addition, the result is a logical function. The other strange arithmetic functions can be explained similarly.
One thing to note is A PLUS A gives you left shift, but there's no way to do right shift on the 74181.
In its implementation the 74181 has four select lines that pick which of the 16 f(A,B) functions are used. The first half of the chip's circuitry computes the four 1-bit sums of A with f(A,B). (Specifically it is creating the Generate and Propagate signals that are used for carry lookahead. This lets the 74181 work in parallel, rather than using a ripple carry.) The second half of the chip's circuitry generates all the carries in parallel and computes the final sum.
Internally, the logic functions are implemented by performing addition with the internal carries all forced high by the M line: A PLUS f(A,B) with all carries. It's straightforward to see that this still generates 16 unique logical functions. However, it permutes the order, which is why if you look at the datasheet there's no obvious connection between the logic functions and the arithmetic functions.
If you want to understand the 74181's internals, first look at the 7483 4-bit adder, which came out two years earlier. It uses the same carry computation techniques, but is simpler to understand as it provides one function, not 32. You can think of the 74181 as the generalization of the 7483.
Best Answer
If you would like to understand how processor designers reach their 'optimal' instruction set, you might want to look at some books, for example "Computer Architecture, A Quantitative Approach" by Hennessy & Patterson
There are quite a lot of papers freely available on the web about computer architecture, and Instruction Set Architecture (ISA).
I'd suggest read the papers and watch the videos at http://riscv.org
As a concrete example, look at the The RISC-V Compressed Instruction Set Manual
Their process for the "RISC-V Compressed Instruction Set Manual" was to gather sets of recognised, representative, 'benchmark' programs, in high level languages like C, and run analysis across them. For example, they might use an existing C compiler, modify its code generation to generate their prototype ISA, pass their benchmark programs through it, and analyse the results. That relies to some extent on the compiler having a rich enough model to make use of their ISA's instructions.
In the specific case of that compressed instruction set analysis, they wanted to identify which instructions were sufficiently common, that providing a compressed instruction would have reduce the code significantly. However, that analysis uses techniques which have been applied before.
The compressed instruction set analysis showed that only 31 instructions accounted for the vast majority of code, so compressing those by 50% reduced total program size by 25%. As quite a lot of code is data, that suggests that a lot of instructions are rarely used.
Remember, adding more instructions, and expanding the ALU is not necessarily free. To maximise throughput (i.e. doing useful work) we try to get a balance between latency (delay) through the ALU, power consumption, and overall instruction time. Adding an instruction which might make the CPU run 5% slower on every instruction, for 0.1% of the code to run 2x faster makes no sense.
This is a version of Amdahl's Law, which essentially says no matter how much one part of a system is improved, if it only accounts for 1% of (a programs) resources (time or space), then the improvement can only benefit the system's performance by 1%.
The vast majority of instructions executed by a program are loads, stores, simple arithmetic, jumps and conditional branches. As long as they execute well, adding a few specialised instructions to make rarely used operations faster will make virtually no difference. Worse, if the addition of specialised instructions slows the common instructions by a tiny amount, then it's likely better to remove them.
To learn more, look for older (1980s) RISC research papers. Maybe start with John L. Hennessy and David Patterson as they were quite prolific publishers, and Hennesey founded MIPS, and Patterson's work became SUN SPARC.