I like sharing things I’m working on. When I’ve searched for a solution, and not found one, I hope that sharing what I did find will help some other poor schmuck like me in the future. In fact, more than once I’ve done a search for a problem, and found a solution – in one of my own blog posts from years back.
I’ve been working on a Rails Security talk, and needed a way to figure out just how many lines of code are present in a default Rails application. Meaning – not just the lines of code for controllers, views, etc, but also the number of lines of code for all of the default gems. So I put together the following code:
GFILE = "Gemfile.lock" list = File.read(GFILE) gems = [] list.each_line do |line| line = line.strip break if line.include?("PLATFORMS") next if line.include?("GEM") next if line.include?("remote:") next if line.include?("specs:") next if line.empty? gems.push(line.split(' ').first) end total = 0 gems.uniq! gems.each do |gem| puts "Processing #{gem}" contents = `gem contents #{gem}`.split local = 0 contents.each do |file| output = `wc -l #{file}` amount = output.strip.split(' ').first.to_i local += amount end puts " LOC: #{local}" total += local end puts "Total Lines: #{total}"
It did what I needed, and I could have just thrown it away. But I figured someone else might need something like this, so I posted in publically. The problem? Let’s be frank – that’s some awful code. `GFILE`? `local`? When we post code, we should put our best code forward – examples of how to do things right. I didn’t post it as a draft – I posted it as a solution that worked. My good friend [J.B. Rainsberger][2] pointed out in a comment that I could do better. And he’s right! So let’s clean this up:
require 'set' GEM_FILE_TO_PROCESS = "Gemfile.lock"
The first thing was to change that horrible name to something understandable. I also included the set library, because I really wanted a `set`, not an `array` that I needed to call `uniq!` on. Next, I extracted the line checks to separate methods, based on what I interpreted them to do when I came across them:
def gem_list_finished?(line) line.include?("PLATFORMS") end def non_gem_line?(line) line.include?("GEM") || line.include?("remote:") || line.include?("specs:") || line.empty? end
I then extracted out the line counting to their own methods:
def gem_name_without_version(line) line.split(' ').first end def get_files_in_gem(gem) `gem contents #{gem}`.split end def line_count_for_file(file) output = `wc -l #{file}` output.strip.split(' ').first.to_i end
This has the handy side effect of removing the system calls (code called in backticks) away from the main logic. I also pulled out the `puts` statements into a log function:
def log(message) puts message end
Now comes the meat of it. We still read in the file, but also initialize a set to store the gems, and put our other critical variable all in one place:
gemfile_list = File.read(GEM_FILE_TO_PROCESS) gems_to_process = Set.new total_line_count = 0
Next we process the Gemfile. Note that it’s clearer now that we have conditions we stop processing, and conditions where we skip processing:
gemfile_list.each_line do |gem_line| gem_line = gem_line.strip break if gem_list_finished?(gem_line) next if non_gem_line?(gem_line) gems_to_process.add(gem_name_without_version(gem_line)) end
Finally, we walk the gems we found and get their contents. I could probably use inject here instead of initializing a variable, but I prefer it to be a little clearer:
gems_to_process.each do |gem| contents = get_files_in_gem(gem) gem_line_count = 0 contents.each do |file| gem_line_count += line_count_for_file(file) end total_line_count += gem_line_count end
All in all, it’s a much better representation of something I would want to share. Could it get better? Absolutely. I could create a model around the Gemfile, asking the lines questions instead of checking them explicitly. And I probably will – if I ever need to turn this into something more explicit. The full code is below, and at [the gist][1]:
require 'set' GEM_FILE_TO_PROCESS = "Gemfile.lock" def gem_list_finished?(line) line.include?("PLATFORMS") end def non_gem_line?(line) line.include?("GEM") || line.include?("remote:") || line.include?("specs:") || line.empty? end def gem_name_without_version(line) line.split(' ').first end def get_files_in_gem(gem) #gem contents returns the files #as a line break delimited string `gem contents #{gem}`.split end def line_count_for_file(file) output = `wc -l #{file}` #line count is the first column from #the returned value output.strip.split(' ').first.to_i end def log(message) puts message end gemfile_list = File.read(GEM_FILE_TO_PROCESS) gems_to_process = Set.new total_line_count = 0 gemfile_list.each_line do |gem_line| gem_line = gem_line.strip break if gem_list_finished?(gem_line) next if non_gem_line?(gem_line) gems_to_process.add(gem_name_without_version(gem_line)) end log "TOTAL GTP: #{gems_to_process.count}" gems_to_process.each do |gem| log "Processing #{gem}" contents = get_files_in_gem(gem) gem_line_count = 0 contents.each do |file| gem_line_count += line_count_for_file(file) end log " LOC: #{gem_line_count}" total_line_count += gem_line_count end log "Total Lines: #{total_line_count}"
So what are your thoughts? Could it be improved even more?
Nice, and well handled.
Now I wonder what would happen if you coded it by top down by intention … :)
I did the following: https://gist.github.com/jbrains/9451941
I did it in 28 commits, so if you want the play-by-play, then you look at the code revision-by-revision.
I didn’t bother with the smell of depending on puts and “, because I didn’t feel like it. :)
I just wish extracting the relevant section of Gemfile.lock were cleaner. Neither sed nor awk nor head nor tail nor cat seemed to do it any better than the procedural Ruby that you wrote initially. I don’t know if one can tell sed to do -n ‘/regex1/,/regex2/p’ while excluding the lines that match those regexes. That’s what I want.
Thanks.