Tag Content Extractor in Java : Hacker Rank Solution : Digit Wood

In a tag-based language like XML or HTML, contents are enclosed between a start tag and an end tag like <tag>contents</tag>. Note that the corresponding end tag starts with a /.

Given a string of text in a tag-based language, parse this text and retrieve the contents enclosed within sequences of well-organized tags meeting the following criterion:

The name of the start and end tags must be same. The HTML code <h1>Hello World</h2> is not valid, because the text starts with an h1 tag and ends with a non-matching h2 tag.
Tags can be nested, but content between nested tags is considered not valid. For example, in <h1><a>contents</a>invalid</h1>, contents is valid but invalid is not valid.
Tags can consist of any printable characters.

Input Format

The first line of input contains a single integer,N (the number of lines).
The N subsequent lines each contain a line of text.

Output Format

For each line, print the content enclosed within valid tags.
If a line contains multiple instances of valid content, print out each instance of valid content on a new line; if no valid content is found, print None.

Sample Input

4
<h1>Nayeem loves counseling</h1>
<h1><h1>Sanjay has no watch</h1></h1><par>So wait for a while</par>
<Amee>safat codes like a ninja</amee>
<SA premium>Imtiaz has a secret crush</SA premium>

Sample Output

Nayeem loves counseling
Sanjay has no watch
So wait for a while
None
Imtiaz has a secret crush

SOLUTION : –

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class Solution{
    public static void main(String[] args){
        
        Scanner in = new Scanner(System.in);
        int testCases = Integer.parseInt(in.nextLine());
        while(testCases>0 && in.hasNextLine()){
            String line = in.nextLine();
            String[] lines = line.split("\n");
            for (String string : lines) {
                String regex = "<(.+)>([^<>]+)</\\1>";
                Pattern pattern = Pattern.compile(regex);
                Matcher matcher = pattern.matcher(string); 
                while (matcher.find()) {
                    // String n1 = matcher.group(0);
                    // String n2 = matcher.group(1);
                    String match = matcher.group(2);
                    // System.out.println("Group 0: " + n1);
                    // System.out.println("Group 1: " + n2);
                    System.out.println(match);             
                }
                
                matcher.reset();
                
                if (matcher.find() == false) {
                    System.out.println("None");
                }
            }
            testCases--;
        }
        in.close();
    }
}

SOLUTION : – 2 – Optimize Solution

To optimize your code, you can simplify the logic inside the while loop and remove unnecessary operations. Here’s the optimized version:

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Solution {
    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);
        int testCases = Integer.parseInt(in.nextLine());
        
        while (testCases-- > 0) {
            String line = in.nextLine();
            String regex = "<(.+)>([^<>]+)</\\1>";
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(line);
            
            boolean found = false;
            while (matcher.find()) {
                System.out.println(matcher.group(2));
                found = true;
            }
            
            if (!found) {
                System.out.println("None");
            }
        }
        in.close();
    }
}

FOLLOW FOR MORE QUESTIONS AND SOLUTIONS | DIGIT WOOD

Digit Wood

Tag Content Extractor in Java : Hacker Rank Solution : Digit Wood

Leave a Reply Cancel reply

About

Categories

Recent Post

Tags