Intuitively, even though your constaints are relaxed, I think your problem is an instance of the bin packing problem, which is NP-hard. Perhaps you should ask the CS Theory group whether this is an instance of BPP/generalized BPP.
Your greedy algorithm works in linearithmic complexity (sorting in O(n log n), binning in O(n log log n), assuming using heaps, because your number of bins is ~ log n), but it is not optimal (easy to find sequences which trick it).
You can do a best-fit greedy in O(n log log n) - find the mean of the numbers in linear time (without sorting), and then for each number try to fit it in a bin such that current average stays as close as possible to the total average - this should be a log-logarithmic operation with respect to n. Whether this is an appropriate solution for your problem depends on how far from optimal you can afford to be.
However, note that you can't go better than this O(n log log n) - because whatever clever scheme you use, you will need to pick each element up, and bin it.
The normal merge sort algorithm - merge step with normally apply n + m -1 comparisons, where one list is of size n and and the other list is of size m. Using this algorithm is the most simplest approach to combine two sorted lists.
If the comparisons are too expensive you could do two things - either you minimize the number of comparisons or you minimize the cost of comparisons.
Let's focus on the minimization of the cost of comparison. You and only you can decide whether the data you are comparing can be quantized or not. If you can quantize them, which is a form of implementing a hash method, which is keeping the ordering. E.g. if your Data is compared by Name, Then the first tname, ... you can take the first to Chars of the name "Klaehn, Ruediger" and reduce/quantize your data element to "Kl.Ru", if you compare it to "Packer, The" you preserve the ordering "Pa.Th" - you can now apply a cheaper comparison algorithm, comparing the reduced values. But if you find another "Kl.Ru", you now have a near value, and you might now switch to a more expensive approach comparing these elements.
If you can extract this quantized value from your data, faster than comparing it, this is the first thing you do, you compare the quantized or hashed value first. Please keep in mind, that this value needs to be computed only once, so you can compute it on creating the data element.
I also mentioned another way, to minimize your comparisons.
I had a look into the classic book TAOCP- Volume 3-Sorting and Searching, (pp.197-207, section 5.3.2) which has full 10 pages on this topic. I found two references to algorithms which are faster than n+m-1 comparisons.
First there is the Hwang-Lin merge algorithm and the second an improvement by Glenn K Manacher - both are cited by TAOCP as well as an algorithm by Christen, which approaches the lower bound of needed comparisons, on special conditions on the length n and m of the lists.
The algorithm of Manacher was presented in Journal of the ACM Vol. 26 Number 3 on pages 434-440: "Significant Improvements to the "Hwan-Lin" Merging Algorithm". the list with m items and the list with n items can be of different length, but they must also be odered by the numbers of elements they contain m<=n
The Hwang-Lin algorithm breaks the lists to merge, apart to smaller lists and sorts the lists by comparing the first element of each sublist, and to decide whether some elements in the sublist need to be compared or not. If the first list is smaller than the second list, then the chance is high, that consecutive elements of the longer list can be transferred into the resulting list without comparison. If the first element of the small ist is greater than the first element of the splitted larger list, all elements in front of sublist can be copied without comparison.
Average case analysis of the merging alorithm of Hwang and Lin (Vega, Frieze, Santha) in Section 2 you can find a pseudocode of the HL-Algorithm. Which is a lot better than my description. And you can see why there are fewer comparisons - the algorithm uses a binary search, to find the index, where to insert the element from the shorter list.
If the lists are not interleaved like in your last example, you should have a remaining smaller and a remaining larger list in most cases. This is when the the HL-algorithm starts to perform better.
Best Answer
if you need ALL results on an arbitrary condition (say
C(a,b) = (sha1(a~c)^sha1(c~a))%2 == 0
with~
as concatenation) then there are n*(n-1)/2 values you need to output that are all independent; so no there is no better way,but if the result of one pair tells you something about the result of another pair then that information can be used to speed things up. like in your example if sum > k then you sort and for each number you check each in order (forwards or backwards). On the first true result you add the numbers still remaining. This will be
O(n log n + r)
with r number of true results.edit: on second thought you can use a binary search each on each number
a
find the first number that is larger thank-a
reducing it to aO(n log n)
completely