How to Check If a String Matches a URL Format in Java

JavaJavaBeginner
Practice Now

Introduction

In this lab, you will learn how to check if a given string matches a URL format in Java using regular expressions. We will define a regex pattern specifically designed for URLs, utilize the Pattern.matches() method to test strings against this pattern, and explore how to validate common URL schemes. This hands-on exercise will guide you through the practical steps of implementing URL validation in your Java applications.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL java(("Java")) -.-> java/StringManipulationGroup(["String Manipulation"]) java/StringManipulationGroup -.-> java/regex("RegEx") subgraph Lab Skills java/regex -.-> lab-559993{{"How to Check If a String Matches a URL Format in Java"}} end

Define URL Regex Pattern

In this step, we will learn how to define a regular expression pattern in Java to match URLs. Regular expressions, often shortened to "regex" or "regexp", are sequences of characters that define a search pattern. They are extremely powerful for pattern matching and manipulation of strings.

For validating URLs, a regex pattern helps us check if a given string follows the standard structure of a URL (like http://www.example.com or https://example.org/path).

Let's create a new Java file to work with regex.

  1. Open the WebIDE. In the File Explorer on the left, make sure you are in the ~/project directory.

  2. Right-click in the empty space within the ~/project directory and select "New File".

  3. Name the new file UrlValidator.java and press Enter.

  4. The UrlValidator.java file should open in the Code Editor.

  5. Copy and paste the following Java code into the editor:

    import java.util.regex.Pattern;
    
    public class UrlValidator {
    
        public static void main(String[] args) {
            // Define a simple regex pattern for a URL
            String urlRegex = "^(http|https)://[^\\s/$.?#].[^\\s]*$";
    
            // Compile the regex pattern
            Pattern pattern = Pattern.compile(urlRegex);
    
            System.out.println("URL Regex Pattern Defined.");
        }
    }

    Let's break down the new parts of this code:

    • import java.util.regex.Pattern;: This line imports the Pattern class, which is part of Java's built-in support for regular expressions.
    • String urlRegex = "^(http|https)://[^\\s/$.?#].[^\\s]*$";: This line defines a String variable named urlRegex and assigns it our regular expression pattern.
      • ^: Matches the beginning of the string.
      • (http|https): Matches either "http" or "https".
      • ://: Matches the literal characters "://".
      • [^\\s/$.?#]: Matches any character that is NOT a whitespace character (\\s), a forward slash (/), a dollar sign ($), a period (.), a question mark (?), or a hash symbol (#). This is a simplified way to match the domain name part.
      • .: Matches any character (except newline).
      • [^\\s]*: Matches zero or more characters that are NOT whitespace. This is a simplified way to match the rest of the URL path and query.
      • $: Matches the end of the string.
      • Note the double backslashes (\\) before s. In Java strings, a single backslash is an escape character, so we need \\ to represent a literal backslash in the regex pattern.
    • Pattern pattern = Pattern.compile(urlRegex);: This line compiles the regex string into a Pattern object. Compiling the pattern is more efficient if you plan to use the same pattern multiple times.
    • System.out.println("URL Regex Pattern Defined.");: This line simply prints a message to the console to indicate that the pattern has been defined.
  6. Save the file (Ctrl+S or Cmd+S).

  7. Now, let's compile this Java program. Open the Terminal at the bottom of the WebIDE. Make sure you are in the ~/project directory.

  8. Compile the code using the javac command:

    javac UrlValidator.java

    If there are no errors, the command will complete without output. A UrlValidator.class file will be created in the ~/project directory.

  9. Run the compiled program using the java command:

    java UrlValidator

    You should see the output:

    URL Regex Pattern Defined.

You have successfully defined and compiled a Java program that includes a basic regex pattern for URLs. In the next step, we will use this pattern to test if different strings are valid URLs.

Test URL with Pattern.matches()

In the previous step, we defined a regex pattern for URLs and compiled it into a Pattern object. Now, let's use this pattern to check if different strings are valid URLs using the Pattern.matches() method.

The Pattern.matches(regex, input) method is a convenient way to check if an entire input string matches a given regular expression. It compiles the regex and matches the input against it in a single step.

Let's modify our UrlValidator.java file to test some example URLs.

  1. Open the UrlValidator.java file in the WebIDE editor if it's not already open.

  2. Modify the main method to include the following code. You will add this code after the line Pattern pattern = Pattern.compile(urlRegex);.

    import java.util.regex.Pattern;
    
    public class UrlValidator {
    
        public static void main(String[] args) {
            // Define a simple regex pattern for a URL
            String urlRegex = "^(http|https)://[^\\s/$.?#].[^\\s]*$";
    
            // Compile the regex pattern
            Pattern pattern = Pattern.compile(urlRegex);
    
            // Test some URLs
            String url1 = "http://www.example.com";
            String url2 = "https://example.org/path/to/page";
            String url3 = "ftp://invalid-url.com"; // Invalid scheme
            String url4 = "http:// example.com"; // Invalid character (space)
    
            System.out.println("\nTesting URLs:");
    
            boolean isUrl1Valid = Pattern.matches(urlRegex, url1);
            System.out.println(url1 + " is valid: " + isUrl1Valid);
    
            boolean isUrl2Valid = Pattern.matches(urlRegex, url2);
            System.out.println(url2 + " is valid: " + isUrl2Valid);
    
            boolean isUrl3Valid = Pattern.matches(urlRegex, url3);
            System.out.println(url3 + " is valid: " + isUrl3Valid);
    
            boolean isUrl4Valid = Pattern.matches(urlRegex, url4);
            System.out.println(url4 + " is valid: " + isUrl4Valid);
        }
    }

    Here's what we added:

    • We defined four String variables (url1, url2, url3, url4) containing different example strings, some valid URLs according to our simple pattern, and some invalid ones.
    • We added a print statement to make the output clearer.
    • We used Pattern.matches(urlRegex, url) for each test string. This method returns true if the entire string matches the urlRegex pattern, and false otherwise.
    • We printed the result of the validation for each URL.
  3. Save the UrlValidator.java file.

  4. Compile the modified code in the Terminal:

    javac UrlValidator.java

    Again, if compilation is successful, there will be no output.

  5. Run the compiled program:

    java UrlValidator

    You should see output similar to this:

    URL Regex Pattern Defined.
    
    Testing URLs:
    http://www.example.com is valid: true
    https://example.org/path/to/page is valid: true
    ftp://invalid-url.com is valid: false
    http:// example.com is valid: false

This output shows that our simple regex pattern correctly identified the first two strings as valid URLs (according to the pattern) and the last two as invalid.

You have now successfully used the Pattern.matches() method to test strings against a regular expression pattern in Java.

Validate Common URL Schemes

In the previous steps, we defined a simple regex pattern and used Pattern.matches() to test it. Our current pattern only validates URLs starting with http or https. However, URLs can have other schemes like ftp, mailto, file, etc.

In this step, we will modify our regex pattern to include more common URL schemes. A more robust regex pattern for URLs is quite complex, but we can expand our current pattern to include a few more common schemes for demonstration purposes.

Let's update the UrlValidator.java file.

  1. Open the UrlValidator.java file in the WebIDE editor.

  2. Modify the urlRegex string to include ftp and mailto schemes in addition to http and https. We will also add a test case for an ftp URL.

    Replace the line:

    String urlRegex = "^(http|https)://[^\\s/$.?#].[^\\s]*$";

    with:

    String urlRegex = "^(http|https|ftp|mailto)://[^\\s/$.?#].[^\\s]*$";

    Notice that we simply added |ftp|mailto inside the parentheses () which represent a group, and the | symbol acts as an "OR" operator. This means the pattern will now match strings starting with http, https, ftp, or mailto followed by ://.

  3. Add a new test case for an FTP URL. Add the following lines after the definition of url4:

    String url5 = "ftp://ftp.example.com/files"; // Valid FTP URL
  4. Add the validation for url5 after the validation for url4:

    boolean isUrl5Valid = Pattern.matches(urlRegex, url5);
    System.out.println(url5 + " is valid: " + isUrl5Valid);

    Your complete main method should now look like this:

    import java.util.regex.Pattern;
    
    public class UrlValidator {
    
        public static void main(String[] args) {
            // Define a simple regex pattern for a URL including more schemes
            String urlRegex = "^(http|https|ftp|mailto)://[^\\s/$.?#].[^\\s]*$";
    
            // Compile the regex pattern
            Pattern pattern = Pattern.compile(urlRegex);
    
            System.out.println("URL Regex Pattern Defined.");
    
            // Test some URLs
            String url1 = "http://www.example.com";
            String url2 = "https://example.org/path/to/page";
            String url3 = "invalid-url.com"; // Invalid (missing scheme)
            String url4 = "http:// example.com"; // Invalid character (space)
            String url5 = "ftp://ftp.example.com/files"; // Valid FTP URL
    
            System.out.println("\nTesting URLs:");
    
            boolean isUrl1Valid = Pattern.matches(urlRegex, url1);
            System.out.println(url1 + " is valid: " + isUrl1Valid);
    
            boolean isUrl2Valid = Pattern.matches(urlRegex, url2);
            System.out.println(url2 + " is valid: " + isUrl2Valid);
    
            boolean isUrl3Valid = Pattern.matches(urlRegex, url3);
            System.out.println(url3 + " is valid: " + isUrl3Valid);
    
            boolean isUrl4Valid = Pattern.matches(urlRegex, url4);
            System.out.println(url4 + " is valid: " + isUrl4Valid);
    
            boolean isUrl5Valid = Pattern.matches(urlRegex, url5);
            System.out.println(url5 + " is valid: " + isUrl5Valid);
        }
    }
  5. Save the UrlValidator.java file.

  6. Compile the updated code in the Terminal:

    javac UrlValidator.java
  7. Run the compiled program:

    java UrlValidator

    You should now see output similar to this, with the FTP URL also being marked as valid:

    URL Regex Pattern Defined.
    
    Testing URLs:
    http://www.example.com is valid: true
    https://example.org/path/to/page is valid: true
    invalid-url.com is valid: false
    http:// example.com is valid: false
    ftp://ftp.example.com/files is valid: true

You have successfully modified the regex pattern to include more common URL schemes and tested the updated pattern. This demonstrates how you can adjust regex patterns to match a wider range of inputs.

Summary

In this lab, we began by learning how to define a regular expression pattern in Java specifically for validating URLs. We created a new Java file, UrlValidator.java, and imported the java.util.regex.Pattern class. We then defined a String variable urlRegex containing a basic regex pattern designed to match strings starting with "http" or "https" followed by "://", and compiled this pattern using Pattern.compile(). This initial step focused on setting up the necessary tools and defining the core pattern for URL validation using Java's built-in regex capabilities.