(Disclaimer: I don't have information on the percentages of occurrence of various elementary string operations found in common software. The following answer is just my two cents of contribution to address some of your points.)
To give a quick example, Intel provides SIMD instructions for accelerating string operations in its SSE 4.2 instruction set (link to wikipedia article). Example of using these instructions to build useful language-level string functions can be found on this website.
What do these instructions do?
- Given a 16-byte string fragment (either 16 counts of 8-bit characters or 8 counts of 16-bit characters),
- In "Equal Each" mode, it performs an exact match with another string fragment of same length.
- In "Equal Any" mode, it highlights the occurrence of characters which match a small set of characters given by another string. An example is "aeiouy", which detects the vowels in English words.
- In "Range comparison" mode, it compares each character to one or more character ranges. An example of character range is
azAZ
, in which the first pair of characters specifies the range of lower-case English alphabets, and the second pair specifies the upper-case alphabets.
- In "Equal ordered" mode, it performs a substring search.
(All examples above are taken from the above linked website, with some paraphrasing.)
Before beginning a discussion on this topic, it is necessary to gather the prerequisite knowledge needed for such discussion.
It is assumed that you already have college-level introductory knowledge of:
- CPU architecture
- Digital design and synthesis, which teaches introductory level of Hardware Description Language, such as Verilog or VHDL.
- And finally, a practicum project where you use the above knowledge to build something (say, a simple 16-bit ALU, or a multiplier, or some hardware input string pattern detection logic based on state machine), and perform a cost counting (in logic gates and silicon area) and benchmarking of the things being built.
First of all, we must revisit the various schemes for the in-memory representation of strings.
This is because the practicum in hardware design should have informed you that a lot of things in hardware had to be "hard-wired". Hardware can implement complex logic but the wiring between those logic are hard-wired.
My knowledge in this aspect is very little. But just to give a few examples, rope, "cord", "twine", StringBuffer
(StringBuilder
) etc. are all legit contenders for the in-memory representation of strings.
Even in C++ alone, you still have two choices: implicit-length strings (also known as null-terminated strings), and explicit-length strings (in which the length is stored in a field of the string class, and is updated whenever the string is modified).
Finally, in some languages the designer has made the decision of making string objects immutable. That is, if one wish to modify a string, the only way to do so is to:
- Copy the string and apply the modification on-the-fly, or
- Refer to substrings (slices) of the original immutable string and declare the modifications that one wish to have applied. The modification isn't actually evaluated until the new result is consumed by some other code.
There is also a side question of how strings are allocated in memory.
Now you can see that, in software, a lot of wonderful (or crazy) design choices exist. There has been lots of research into how to implement these various choices in hardware. (In fact, this has been a favorite way of formulating a master's thesis for a degree in digital design.)
All this is fine, except that due to economic reasons, a hardware vendor cannot justify the cost of supporting a language/library designer's crazy ideas about what a string should be.
A hardware vendor typically has full access to every master's thesis written by every student (in digital design) in the world. Thus, a hardware vendor's decision of not including such features must be well-informed.
Now let's go back to the very basic, common-sense question: "What string operations are among the most-frequently performed in the typical software, and how can they benefit from hardware acceleration"?
I don't have hard figures, but my guess is that string copying verbatim is probably the #1 operation being performed.
Is string copying already accelerated by hardware? It depends on the expected lengths of strings. If the library code knows that it is copying a string of several thousand characters or more, without modification, it could have easily converted the operation into a memcpy
, which internally uses CPU SIMD (vectorized instructions) to perform the memory movement.
Furthermore, on these new CPU architectures there is the choice of keeping the moved string in CPU cache (for subsequent operations) versus removing it from the CPU cache (to avoid cache pollution).
But how often does one need to copy such long strings?
It turns out that the standard C++ library had to optimize for the other case:
That is, strings with lengths in the low-ten's occur with such high frequency that special cases have to be made to minimize the overhead of memory management for these short strings. Go figure.
Best Answer
The best language I'm aware of specific to text search and processing is awk. If awk doesn't meet your needs, it's likely nothing will unless you create it yourself.
However, if you do need to make your own, you don't need to start completely from scratch for each language. You can use a tool like antlr that can be exported to various languages, or build it in one language and use the respective native interfaces to access it from other languages.