Quantcast
Channel: Java mon amour
Viewing all articles
Browse latest Browse all 1124

Groovy: parse a file line by line, split, sort unique list of words

$
0
0
I know, in bash this would be a one liner... however when things become more complicated, your bash code becomes hell, while Groovy maintains its readability


print "Hello, welcome to the WordParser 1.0\n"

rootDir = "C:\\pierre\\downloads\\istdaseinmensch\\"
myfile = new File(rootDir + "Levi,_Primo_-_Ist_das_ein_Mensch.txt")

myWords = []
countWords = 0
countLines = 0

myfile.eachLine { line ->
if (line.trim().size() == 0) {
return null
} else {
countLines++
words = line.split("[^A-Za-z0-9]+")
for (theWord in words) {
countWords++
myWords.add(theWord.toLowerCase())
}
}
}

print "countLines=" + countLines + " countWords=" + countWords + "\n"

myUniqueWords = myWords.unique().sort()

print "unique words = " + myUniqueWords.size() + "\n"

new File(rootDir + "out.txt").withWriter { out ->
myUniqueWords.each {
out.println(it)
}
}




Next: how to invoke google translate REST API to get a translation for each word, and produce a readable output where each word has a mouse-over hint displaying its translation.
PS try doing this in Puppet, it will be ready by the end of time and meanwhile most of the functions you have used are no longer supported.

Viewing all articles
Browse latest Browse all 1124

Trending Articles