Java Strings – How to Interpret a List of Tokens of Variable Number

javastrings

I have to read a text document with data formatted as follows: (a couple of examples)

07 M W F 1400 1450 C 2004
M W F 0900 1030 EN 2036
06 M T R 1300 1350 EN 1003
17 T R 0900 1015 EN 1052

The problem I'm having is, once splitting up these strings with .split(" ", -1), I get a different number of tokens for many of them, and the position of the difference varies in the first two "sections".

The first part of the string is supposed to represent a scheduling code. It's optional, as can be seen by it being missing in the second line. The second area depicts which days of the week this schedule applies to. It could be anywhere from one to five days, but in these examples there are only 2 or 3. Then the rest are pretty much static: Start time, end time, building code and room number.

What I need to do is construct multiple objects for this schedule based on this info (one object for each day), and I'm not sure how to proceed. How can I tell, when iterating over the array of tokens, what each token represents? I thought about using a switch statement, but that would only work for the days, as there are 5 of them.

The following is the code I (tentatively) plan to use for this object.

public class TimeSlot {

  private Day day;
  private int startTime;    // # of minutes after midnight
  private int endTime;
  private Room room;
  private String slot;

  /**
   * Default constructor. Create an instance of TimeSlot.
   * @param day   day of the week
   * @param start start time of lecture represented as minutes after midnight
   * @param end   end time of lecture represented as minutes after midnight
   * @param room  room the class takes place in
   */
  public TimeSlot(String slot, Day day, int start, int end, Room room) {
    this.slot = slot;
    this.day = day;
    this.startTime = start;
    this.endTime = end;
    this.room = room;
  }
}

enum Day {
  MONDAY,
  TUESDAY,
  WEDNESDAY,
  THURSDAY,
  FRIDAY,
  SATURDAY,
  SUNDAY;
}

public class Room {

  private String buildingCode;
  private String roomNumber;

  public Room(String building, String room) {
    this.buildingCode = building;
    this.roomNumber = room;
  }

  public String buildingCode() {
    return buildingCode;
  }

  public String roomNumber() {
    return roomNumber;
  }

}

Best Answer

As much as people sometimes complain about regular expressions, this is a perfect scenario for using them. I'm not familiar with the syntax for Java myself (but here's a reference), but I can tell you that you're going to want the pattern:

(\d+ )?([MTWRF] )+(\d+) (\d+) (\w+) (\d+)

You can test it out here.

Step by step, this means:

  • (\d+ )? - If they exist, capture any number of digits followed by a space.
  • ([MTWRF] )+ - Capture a letter from the set MTWRF followed by a space, as many times as it happens (but at least once).
    • If you want to enforce the 1-5 rule, replace the + with {1,5}, so it becomes ([MTWRF] ){1,5}.
    • If you want to include Saturday and Sunday, add S and U inside the [] block (order doesn't matter), so it would become ([UMTWRFS] )+.
  • (\d+) - Capture a series of digits. If you want to enforce four digits, replace the + with {4} so it becomes (\d{4}), or simply put in four \d characters to make (\d\d\d\d).
  • (\d+) - Capture another series of digits. See above for modifications.
  • (\w+) - Capture a series of letters
  • (\d+) - Capture a last series of digits. See above for modifications.

You then can access each capture - if the first one doesn't match, it should be empty. If the second one matches multiple letters, you can just get those by splitting on spaces. All other ones don't even have stray spaces for you to worry about.