Skip to main content

Section 5.5 String Extraction

Subsection 5.5.1 Introduction & Motivation

In many real-world programs, it is often necessary to isolate sections of a larger string—whether that means slicing off a file extension, extracting a username, or splitting a name into first and last. This process of substring extraction is fundamental to string processing and will appear frequently throughout the course.
For example, imagine you’re building a chat application that needs to process commands like "/whisper username message". To handle this properly, you’d need to extract the command name ("/whisper"), the target username, and the message content - all from different parts of the input string. Or perhaps you’re reading configuration files with key-value pairs like "database.host=localhost" where you need to separate the key from its value.
In this section, we’ll first explore Java’s built-in substring functionality to understand what it offers and how to use it effectively. Then, we’ll implement our own version in MyString. By building it ourselves, we’ll gain deeper insights into string manipulation and get valuable practice with the Design Recipe methodology.

Subsection 5.5.2 Understanding Java’s substring

Java’s String class provides the substring method in two forms:
  • substring(beginIndex) - Returns a new string containing all characters from beginIndex to the end of the string
  • substring(beginIndex, endIndex) - Returns a new string containing characters from beginIndex (inclusive) up to but not including endIndex (exclusive)
Let’s look at some examples to understand how substring works in practice:
String str = "Hello World";
System.out.println(str.substring(6));     // "World"
System.out.println(str.substring(0, 5));  // "Hello"
System.out.println(str.substring(3, 8));  // "lo Wo"
System.out.println(str.substring(3, 3));  // "" (empty string)
There are several important aspects of substring that every programmer needs to understand:
  • The first index (beginIndex) is inclusive - the character at this position is included in the result
  • The end index (endIndex) is exclusive - the character at this position is NOT included in the result
  • When beginIndex equals endIndex, the result is an empty string
  • The method creates and returns a new String object - it doesn’t modify the original
Off-by-One Errors: It’s easy to forget that endIndex is exclusive, which can lead to accidentally missing or including one extra character in the extracted portion. Careful attention to these boundaries during development can prevent subtle bugs.
Ignoring Edge Cases: When beginIndex == endIndex, the result is "". Negative or out-of-range indices in real Java should raise an error, rather than silently proceeding, emphasizing that validation is a key part of robust substring logic. Always anticipate how your code behaves for minimal or extreme index values.
Skipping Validation: For robustness, it’s important to handle errors gracefully or throw exceptions, rather than returning "" without explanation. This ensures the calling code understands that something went wrong. In production code, informative exceptions can also guide debugging and maintenance.
Common real-world applications of substring include:
  • Parsing CSV files by extracting fields between commas: "alice,bob,charlie".substring(0, 5) gives "alice"
  • Processing configuration entries: "database.host=localhost".substring(13) gives "localhost"
  • Extracting file extensions: "document.pdf".substring(9) gives "pdf"
  • Parsing command strings: "/whisper alice hello".substring(9, 14) gives "alice"

Subsection 5.5.3 Implementing Our Own Substring

Now that we have a solid understanding of how substring works in Java’s String class, we’re ready to implement it ourselves in MyString. Let’s work through the Design Recipe steps methodically.

Subsubsection 5.5.3.1 Data Definition & Method Signature (Steps 1 & 2)

First, let’s review our data structure. Recall that MyString maintains:
  • An internal char[] chars array with fixed length 100
  • An int usedLength field tracking how many characters are actually in use
For substring operations, we need to carefully manage these boundaries to ensure correct character extraction. Here’s our method signature:
/**
 * Returns a new MyString containing characters from beginIndex (inclusive) 
 * to endIndex (exclusive).
 * 
 * @param beginIndex starting position (inclusive)
 * @param endIndex ending position (exclusive)
 * @return new MyString with characters from range, or empty string if invalid range
 */
public MyString substring(int beginIndex, int endIndex)
Purpose: Create a new MyString containing characters from beginIndex (inclusive) up to but not including endIndex. While Java’s actual String.substring() throws exceptions for invalid bounds, our implementation will take a simpler approach:
  • Log an error message if indices are invalid
  • Return an empty string ("") in error cases
  • Focus on the core slicing logic first, leaving more sophisticated error handling for later
For example:
  • new MyString("Hello").substring(1,4) should return a new MyString containing "ell"
  • new MyString("Hello").substring(-1,4) should print an error and return ""
  • new MyString("Hello").substring(3,3) should return "" (empty string when begin equals end)
Key implementation considerations include:
  • Memory management: Creating the new result array with correct size and properly setting its usedLength
By carefully considering these aspects before implementation, we set ourselves up for a more robust and maintainable solution. Now let’s move on to concrete examples and test cases.

Subsubsection 5.5.3.2 Example Test Cases (Step 3)

Before coding, we outline representative scenarios for substring slicing that will guide our implementation and validation. Each test case ensures that common and corner situations are addressed: Thorough testing at this stage often saves time by preventing bugs that only appear after deployment.
Table 5.5.1. Test Cases for substring
Case Input begin end Expected Notes
Normal slice "Hello" 1 4 "ell" Indices 1..3
Single char slice "Hello" 2 3 "l" Indices 2..2
Empty slice "Hello" 2 2 "" begin == end => empty string
Whole string "Hello" 0 5 "Hello" Indices 0..4
Out of range "Hello" -1 3 Error negative index
begin > end "Hello" 3 1 Error throw or fallback for reversed bounds
Empty at end "Hello" 5 5 "" valid zero-length at string end
These examples address boundaries, empty slices, and common use cases. We will reference this table after implementing our method to ensure correctness, checking each scenario to confirm the substring logic performs as intended. By structuring our tests carefully, we create a robust safety net for catching errors early.

Subsection 5.5.4 Step 4: Building the Skeleton / Template

Let’s develop our skeleton gradually, starting with a minimal outline and adding key details as we consider possible edge cases. The goal is to clarify each step, from validation to final return: This incremental mindset allows us to diagnose mistakes in isolation rather than debugging everything at once.
function substring(beginIndex, endIndex):
    1. Check if indices are valid
    2. Create result
    3. Return result
This is too broad. We refine the validation step using what we learned from our test table and typical Java conventions: Such refinement sets the stage for an orderly build-up of our solution.
function substring(beginIndex, endIndex):
    1. Check indices:
       - Is beginIndex negative?
       - Is endIndex > length?
       - Is beginIndex > endIndex?
       If any true: print error and return empty string

    2. Create result
    3. Return result
Next, we must detail how to create and populate the result. The skeleton evolves to address the copying of characters into a new MyString instance: This explicit approach keeps our code transparent, making it easier to trace how the substring is formed.
function substring(beginIndex, endIndex):
    1. Check indices:
       - Is beginIndex negative?
       - Is endIndex > length?
       - Is beginIndex > endIndex?
       If any true: print error and return empty string

    2. Create result:
       a. Calculate how many chars = (endIndex - beginIndex)
       b. Make a new MyString
       c. Copy chars from original[begin..end-1]
          into result[0..sliceLength-1]

    3. Return result
Converting that into a code-oriented skeleton helps us keep track of each step methodically:
public MyString substring(int beginIndex, int endIndex) {
    // 1. Validation
    if (/* TODO: check indices */) {
        System.out.println(/* TODO: error message */);
        return /* TODO: empty string */;
    }

    // 2. Calculate size needed
    int newLength = /* TODO */;

    // 3. Create result and copy chars
    MyString result = /* TODO */;
    // TODO: copy loop

    // 4. Return
    return result;
}
We can then fill in the implementation incrementally, testing after each piece is added to ensure correctness. This layered approach is a hallmark of the Design Recipe, preventing us from tackling the entire method at once without checking our progress. As we proceed, each step builds a stable foundation for the next.

Subsection 5.5.5 Implementation & Testing (Step 5)

We now translate our skeleton into working code step by step, validating each component as we go. By introducing functionality gradually, we isolate potential bugs and confirm correctness sooner. This process ensures each enhancement is well understood before we move on.
public MyString substring(int beginIndex, int endIndex) {
    // 1. Just implement validation first
    if (beginIndex < 0 || 
        endIndex > this.length() || 
        beginIndex > endIndex) {
        System.out.println("Invalid substring indices: " + 
                         beginIndex + " to " + endIndex);
        return new MyString("");
    }

    // For now, just return empty on valid indices
    return new MyString("");
}

// Helper method for testing
private static void testResult(String description, String expected, String actual) {
    System.out.println(description + ":");
    System.out.println("  Expected: \"" + expected + "\"");
    System.out.println("  Actual:   \"" + actual + "\"");
    System.out.println("  " + (expected.equals(actual) ? "PASS" : "FAIL"));
    System.out.println();
}

// Test Version 1:
public static void main(String[] args) {
    MyString str = new MyString("Hello");
    
    // Should print an error and return empty:
    System.out.println("Testing negative index:");
    testResult("Negative begin index", "", 
               str.substring(-1, 3).toString());
    
    // Should return empty (Version 1 doesn't copy yet)
    testResult("Valid indices (currently empty)", "",
               str.substring(1, 4).toString());
}
Having confirmed our validation logic, we extend the method to calculate length and prepare the result object, but still skip character copying. This partial implementation allows us to confirm that the result length is adjusted properly without yet filling in the data. Iterative development like this clarifies precisely how each addition to the code changes its behavior.
public MyString substring(int beginIndex, int endIndex) {
    if (beginIndex < 0 || 
        endIndex > this.length() || 
        beginIndex > endIndex) {
        System.out.println("Invalid substring indices: " + 
                         beginIndex + " to " + endIndex);
        return new MyString("");
    }

    // 2. Calculate slice size
    int sliceLength = endIndex - beginIndex;

    // 3. Create result
    MyString result = new MyString("");
    result.usedLength = sliceLength;
    
    return result;
}

// Test Version 2:
public static void main(String[] args) {
    MyString str = new MyString("Hello");
    
    // Checking length logic
    MyString slice = str.substring(1, 4);
    testResult("Slice length check", "3", 
               String.valueOf(slice.length()));
}
Finally, we copy the characters over. This is where the core functionality of substring is realized, as we replicate the selected range from the original array into a new MyString instance. By solidifying each piece in small increments, we reduce the risk of compounding errors when implementing more complex operations.
public MyString substring(int beginIndex, int endIndex) {
    if (beginIndex < 0 || 
        endIndex > this.length() || 
        beginIndex > endIndex) {
        System.out.println("Invalid substring indices: " + 
                         beginIndex + " to " + endIndex);
        return new MyString("");
    }

    int sliceLength = endIndex - beginIndex;
    MyString result = new MyString("");
    result.usedLength = sliceLength;

    // 3. Copy characters
    for (int i = 0; i < sliceLength; i++) {
        result.chars[i] = this.chars[beginIndex + i];
    }
    
    return result;
}

// Test Version 3:
public static void main(String[] args) {
    MyString str = new MyString("Hello");
    
    // Test all from our table:
    testResult("Normal slice (1,4)", "ell", 
               str.substring(1, 4).toString());
               
    testResult("Single char (2,3)", "l", 
               str.substring(2, 3).toString());
               
    testResult("Empty slice (2,2)", "", 
               str.substring(2, 2).toString());
               
    testResult("Whole string (0,5)", "Hello", 
               str.substring(0, 5).toString());
    
    System.out.println("Testing error cases:");
    testResult("Out of range (-1,3)", "", 
               str.substring(-1, 3).toString());
               
    testResult("begin > end (3,1)", "", 
               str.substring(3, 1).toString());
               
    testResult("Empty at end (5,5)", "", 
               str.substring(5, 5).toString());
}
By evolving the substring method in stages, we isolated issues early on—such as validation—and confirmed each subsequent piece worked properly before adding more complexity. This incremental tactic greatly reduces the risk of hidden errors in final code. With each verified step, we build confidence that the final method will behave as intended under various circumstances.
Performance Note: Older Java versions sometimes shared the backing character array for substring to avoid copying. In modern Java, substring typically creates a new array for the result. For our MyString, we consistently copy characters to preserve immutability but accept the overhead of additional memory. We’ll examine such trade-offs in a later chapter, highlighting the balance between memory usage and implementation clarity. These design decisions have notable implications for large-scale data processing, where efficiency and correctness must be balanced.

Subsection 5.5.6 Complete MyString Class

Below is the full MyString class, now featuring substring alongside previous methods. Though not production-grade (since we simply print errors rather than throwing exceptions), this version shows how the essential slicing logic works in tandem with our earlier searches, providing a cohesive approach to string manipulation. This example also illustrates how thoughtful method design can simplify future extensions or integrations within larger codebases.
We’ll also explore a practical example using CSV files. CSV (Comma-Separated Values) is a simple file format used to store tabular data, where each line represents a row and values are separated by commas. For example, a spreadsheet with names and ages might look like this:
name,age,city
alice,25,seattle
bob,30,portland
charlie,28,vancouver
To process CSV data, we need to split each line at the commas to extract individual values. Let’s see how we can use our substring and indexOf methods together to accomplish this:

Subsection 5.5.7 Reflection & Conclusion

Our incremental approach to implementing substring uncovers several important ideas: it highlights how careful, stepwise development can prevent logical missteps and off-by-one bugs that are common in string processing. Each discrete step of our approach enables thorough validation and targeted troubleshooting.
Skeleton Evolution:
  • We began with a simple three-step layout (validate indices, create result, return it). This clarified our core objectives and prevented us from writing code before establishing a plan.
  • We refined validation details by referencing our test table, ensuring all edge cases were covered. Practical examples guided our checks to make sure the code handled negative indices, out-of-range errors, and other issues gracefully.
  • We decided how to construct and populate the result array based on beginIndex and endIndex, clarifying that the end index is exclusive. This explicit coverage of indices proved vital to correct data extraction.
  • We transformed this outline into code, checking each step with small tests to confirm partial functionality before moving forward. This testing strategy minimized confusion by allowing us to verify each piece as soon as we introduced it.
Implementation Refinements:
  • Version 1: basic validation to catch index errors right away and return an empty string on invalid requests, maintaining code safety from the start.
  • Version 2: introduced new length calculations and set up the result object, confirming we handle correct sizing. This step revealed how index arithmetic translates into actual substring length.
  • Version 3: finished by copying characters to form a proper substring, meeting all valid boundary scenarios and reflecting typical Java substring behavior. At this point, the implementation aligned with our initial specification.
Testing Lessons:
  • Each version got targeted tests to confirm partial functionality quickly, ensuring that any defects could be found before proceeding. Such incremental validation is especially beneficial in larger projects, where debugging after extensive changes can be cumbersome.
  • We eventually tested against every scenario in our table, including negative indices, zero-length slices, and full-string requests. Comprehensive coverage allowed us to trust the final solution more fully.
  • We also explored how to integrate indexOf with substring for realistic tasks like splitting "key=value", emphasizing that searching and slicing are closely interlinked operations. This demonstrated how fundamental string methods can be combined for practical text manipulation.
This stepwise design and testing approach illustrates how the Design Recipe—particularly Step 4 (Skeleton) and Step 5 (Implementation & Testing)—reduces complexity and fosters confidence in the final code. By addressing validation, planning the copy routine, and testing thoroughly, we produce a robust substring method that captures the essence of Java’s slicing logic. Throughout this process, each incremental improvement serves as a checkpoint, validating our assumptions and ensuring we remain aligned with the intended functionality.

Subsection 5.5.8 Check Your Understanding

Exercises Exercises

1. Multiple-Choice: Begin & End Index Behavior.
When calling substring(int beginIndex, int endIndex) on our MyString, which statement best describes how the returned substring is determined?
  • It includes the characters at both beginIndex and endIndex, so the slice is inclusive on both ends.
  • No. Remember, we include the character at beginIndex but exclude the character at endIndex.
  • It includes the character at beginIndex but excludes the one at endIndex. If beginIndex == endIndex, the result is empty.
  • Correct! This matches Java’s standard approach: the start is inclusive, the end is exclusive.
  • Our substring automatically expands endIndex by +1 if beginIndex == endIndex, avoiding empty strings.
  • No. If beginIndex == endIndex, we produce an empty substring.
  • It always returns the entire string if beginIndex or endIndex is out of range, ignoring the invalid indices.
  • No. We return an empty string and print an error message for invalid indices.
2. Multiple-Choice: Edge Case Handling.
How does our MyString.substring method handle invalid index ranges, such as negative beginIndex or an endIndex that exceeds the string’s length?
  • We silently clamp the indices to [0..usedLength], so the caller never sees an error or an empty string.
  • No. We don’t auto-correct indices; we print an error and return empty instead.
  • We print an error message and return an empty MyString (like "") instead of throwing an exception.
  • Correct! This is our simplified approach for educational or prototype settings.
  • We throw a custom exception (InvalidSubstringBoundsException) whenever the indices are out of range.
  • No. We chose to avoid throwing exceptions to keep the code simpler. Real Java String might throw StringIndexOutOfBoundsException.
  • We repeat the substring from the last valid index instead, so calls like substring(-1,3) become substring(0,3).
  • No. We do not auto-adjust; invalid bounds simply lead to returning an empty string.
3. Multiple-Choice: Compare with Java’s substring.
Which difference between our MyString.substring and Java’s built-in String.substring is true?
  • Java’s substring always includes both beginIndex and endIndex, whereas ours excludes endIndex.
  • No. Java’s substring also excludes endIndex.
  • Java’s substring throws an exception if the indices are invalid, whereas ours prints an error message and returns an empty string.
  • Yes. The main difference is how we handle invalid bounds.
  • Java’s substring never actually copies the characters, but our method always does.
  • This was true in older Java versions (sharing backing arrays). Modern Java typically copies, though it’s an implementation detail. Our point is that we explicitly copy in MyString.
  • Java’s substring cannot handle negative indices, while our version gracefully adjusts them to 0 automatically.
  • No. We do not clamp negative indices to 0; we treat them as invalid and return empty.
4. Multiple-Choice: Off-by-One Pitfalls.
Off-by-one errors are a common pitfall in substring slicing. Which scenario below most likely results from an off-by-one error in our copying logic?
  • A substring(0, 5) on a 5-character string returns "" (empty).
  • Yes, that would indicate an off-by-one error, but check all options carefully.
  • substring(0, 5) on "Hello" returns "Hell".
  • Yes, missing the last character suggests an off-by-one in the loop boundary. But see if there’s a more correct answer that covers the essence of “most likely.”
    • substring(0,5) returns only 4 characters instead of 5 (like "Hell").
    • substring(1,4) incorrectly becomes 2 characters ("el") when it should have 3 ("ell").
    Both examples reveal loop boundary errors in copying.
  • Exactly! Missing the last character by 1 is classic off-by-one for substring loops.
  • Our method throws a NullPointerException if beginIndex is out of range.
  • No. That’s not an off-by-one error but rather a null or range-check issue.
5. Multiple-Choice: Substring & CSV Splitting.
We showed an example of extracting fields from a CSV line "alice,bob,charlie" using indexOf and substring. Which approach best describes how we split this line into three separate parts?
  • We read one character at a time and build partial results until we see a comma, ignoring substring altogether.
  • No. We specifically illustrated substring calls around the comma positions found by indexOf.
  • We skip the first comma, then treat the second comma as an error, returning an empty string for the third segment.
  • No. We handle both commas intentionally to parse out all three segments.
  • We locate each comma using indexOf, then call substring between indices to isolate each segment ("alice", "bob", "charlie").
  • Correct! We rely on commas to mark boundaries, then extract slices accordingly.
  • We repeatedly call substring(0,1) to collect characters until we guess we’ve reached a comma or the end.
  • No. That’s not the approach we described. We rely on index positions of commas, not a single-character loop.
6. Short-Answer: Handling Substring Exceptions.
In our MyString code, we print an error and return an empty string for invalid indices instead of throwing an exception. In real Java, String.substring throws a StringIndexOutOfBoundsException. Briefly explain one advantage of throwing an exception rather than silently returning an empty string in production systems.
Answer.
You have attempted of activities on this page.