Algorithms – Removing Occurrences of Characters in a String

algorithmsstrings

I am reading this book, Programming Interviews Exposed by John Mongan et. al and in chapter 6 they are discussing removing all instances of characters in a src string using a removal string, e.g. removeChars(string str, string remove).

They suggest to having a boolean lookup array with all values initially set to false, then loop through each character in remove setting the corresponding value in the lookup array to true. (note: this could also be a hash if the possible character set where huge like Unicode-16 or something like that or if str and remove are both relatively small… < 100 characters I suppose). You then iterate through the str with a source and destination index, copying each character only if its corresponding value in the lookup array is false…

I understand their explanation, but I don't understand their code. They have

for(src = 0; src < len; ++src){
   flags[r[src]] == true;
}

which turns the flag value at the remove string indexed at src to true…

so if you start out with PLEASE HELP as your str and LEA as your remove you will be setting in your flag table at 0,1,2... t|t|t but after that you will get an out of bounds exception because r doesn't have have anything greater than 2 in it. Even using their example you get an out of bounds exception. Is their code example unworkable?

Entire Function:

string removeChars( string str, string remove ){
   char[] s = str.toCharArray();
   char[] r = remove.toCharArray();
   bool[] flags = new bool[128]; // assumes ASCII!
   int len = s.Length;
   int src, dst;
   // Set flags for characters to be removed
   for( src = 0; src < len; ++src ){
      flags[r[src]] = true;
    }

   src = 0;
   dst = 0;
   // Now loop through all the characters,
   // copying only if they aren’t flagged
   while( src < len ){
       if( !flags[ (int)s[src] ] ){
       s[dst++] = s[src];
   }
   ++src;
   }
   return new string( s, 0, dst );
}

r comes from the remove string. So in my example the remove string has only a size of 3 while my str string has a size of 11. len is equal to the length of the str string, which would be 11. How can I loop through the r string since it is only size 3? I haven't compiled the code so I can loop through it, but just looking at it I know it won't work. I am thinking they wanted to loop through the r string… in other words they got the length of the wrong string here.

Am I right? Can someone explain this to me?

Best Answer

There's not a lot of code here to work with and it's unlikely everyone has a copy of this book, so this is a guess! It would help if you would post more code.

Using your PLEASE HELP & LEA example, I'd wager the flags array is a lookup table indexed by the ASCII value of a character, and at the end would look like:

flags['A'] -> true
flags['B'] -> false
...
flags['E'] -> true
...
flags['L'] -> true
... (everything else would be false)

Then when doing the removed copy the routine would see that

flags['P'] -> false, so copy it
flags['L'] -> true, so don't copy
flags['E'] -> true, so don't copy
etc

'P' is the ASCII value of character P.