Saving My Photo Library From Picturelife Using Chrome and Ruby

Awhile back when realizing my photo library was becoming large and unwieldy, I decided to look for a photo management service and stumbled onto Picturelife which would not only back up my photos and videos, but also use AI to organize them. I loved this idea so I committed to the service and spent the next few months casually uploading my 200GB library onto their platform from my slow, rural, US internet connection. Once uploaded the service was great and automatically uploaded any photos I stored on my devices.

I would login periodically to browse and use my photos, but for the most part it was set it and forget it. A few days ago when I logged in I noticed the CSS for the site wasn't loading and that the SSL certificate had expired. Not good. I poked around on Google and noticed that Picturelife had been sold to Streamnation (whose site isn't loading at all). I questioned them on Twitter and also got no response. It looks like in true Silicon Valley fashion, the service that sold us on being a backup is now quickly deteriorating. What's worse is that I don't even recall getting an email notification from them warning us that the service was going to disappear.

I quickly started looking for ways to download my entire library and found Picturelife's instructions. Unfortunately, the service that prepares the images for downloading is broken, which was confirmed by the growing number of angry comments at the bottom of the page.

While the situation is looking grim, I noticed on the defunct download page a link to an API. Clicking through I'm presented with a page that had two empty select tags. Opening the inspector in Chrome I see the following:

Blocked Assets

It looks like Chrome is blocking the javascript because its not served over https. If I load the API page over regular http, the API page seems to load correctly, and more importantly the API seems intact.

After toying around with the interface, I figure out I can get my entire list of photos by the Picturelife's id attribute. It doesn't seem to have a limit on its limit size, so I put in a high enough limit to cover my entire library. This way I can download a list of all my ids in one fell swoop and not have to deal with pagination. Picturelife's API app is thoughtful enough to provide a cURL command.

curl http://api.picturelife.com/medias/index \
  -d limit="20000" \
  -d access_token="my_access_token"

Note the access_token. This is required to identify yourself to the API server so you can get your private information.

Now that we have a way to get a list of all the photo ids, we'll need a way to fetch our images using those ids. To do this, I'll open up the Picturelife webpage and see if the interface has a way to download the image. When looking at and clicking on a particular image a menu pops up that does have a link to download the image.

Download Original Image

Let's open up Chrome's debugger again and take a look at the network calls that take place when I click the download link. There are a lot of calls going on in the background, but when I click the link, the following calls pop up:

Image Download Request

It looks like there are two calls being made that interest us. The first is to the following URL:

http://picturelife.com/d/original/s0niPsD1YmShdlR

This is great because the end of this URL looks like the image ids that we fetched earlier. This means we can now use that list of ids to create a list of URLs where we can download our images. To test my theory, I'll just grab a different id from my list and replace the end of the URL and see what happens when I paste it into my browser. When I do this, the original full sized image is downloaded and displayed. We're on the right path, but it seems like the original URL is a redirect to another server. This will be something we have to worry about later, but its OK for now.

Now we have a plan; grab a list of ids from the API, and then use those ids to create a URLs for each image. We then can proceed to download all of our images.

To do this I will write a Ruby gem and use bundler to provide a starting template. Even though I'm going to be cowboy coding I still like to setup automated tests so I can quickly check my work and use multiple files to keep concepts manageable by my human brain. I've made a decision to cowboy code this gem because this is something that won't be used in production, will be used by me only once, and something I need as quick as possible (before Picturelife completely dies).

At this point I'm not going to include every step in detail, but go through the process quickly and provide the entire source at the end.

My first step was to create something that could fetch all the image ids from the Picturelife API. I decided to use httparty for this. With that library, I came up with the following Ruby class.

require "httparty"

module PicturelifeExporter
  class API
    include HTTParty
    base_uri 'http://api.picturelife.com'
    format :json

    def initialize(access_token)
      @access_token = access_token
    end

    def all_images(per_page:)
      per_page = Integer(per_page)

      page = 0
      total_pages = 0
      image_ids = []
      while(page <= total_pages) do
        image_data = image_ids(per_page: per_page, page: page)
        total_pages = image_data.fetch(:total_pages)
        page += 1
        image_ids += image_data.fetch(:ids)
      end

      return image_ids
    end

    def image_ids(per_page:, page:)
      per_page = Integer(per_page)
      page = Integer(page)
      offset = page * per_page

      image_data = 
        self.class.
        get("/medias/index/", 
            { query: 
              {access_token: @access_token,
               ids_only: true,
               limit: per_page,
               offset: offset
              }
            }
           )

      ids = image_data.fetch("media")
      total_pages = Integer(image_data.fetch("total") / per_page)
      return { ids: ids, total_pages: total_pages }
    end
  end
end

With the information about how to use the API that we discovered earlier, its straightforward to grab a list of IDs. The class does require the access_token found on the Picturelife API page. I also did implement pagination in case it was needed, but in most cases you can simply put in a per_page value that is larger than the number of photos you have.

Now that we have an array of image ids, let's make an interface that accepts a url and a destination directory then downloads the image to the supplied directory with the id as the filename.


require 'net/http'
require 'ruby-progressbar'

module PicturelifeExporter
  module FileDownloader

    def self.download(url:, destination:, limit: 10, cookie:)
      progressbar = ProgressBar.create
      uri = URI(url)

      raise ArgumentError, 'too many HTTP redirects' if limit == 0

      req = Net::HTTP::Get.new(uri)
      req['Cookie'] = cookie
      completed_request_size = 0

      begin
        response = Net::HTTP.start(uri.hostname) do |http|
          http.request req do |response|
            case response
            when Net::HTTPRedirection then
              location = response['location']
              warn "redirected to #{location}"
              download(url: location,
                       destination: destination,
                       limit: limit - 1,
                       cookie: cookie)
            when Net::HTTPSuccess then
              destination += extract_extension(response.uri.path)
              total_request_size = response.content_length
              puts "Saving File: #{destination}"
              bar = ProgressBar.create(title: "Items",
                                       starting_at: 0,
                                       format: '%a %B %p%% %r KB/sec',
                                       rate_scale: lambda { |rate| rate / 1024 },
                                       total: total_request_size)
              open destination, 'w' do |io|
                response.read_body do |chunk|
                  bar.progress += chunk.length
                  io.write chunk
                end
              end
            else
              response.value
            end
          end
        end
      rescue Exception => ex
        puts "Error Downloading File: #{ex.class} - #{ex.message}"
      end
    end

    private

    def self.extract_extension(file_name)
      File.extname(file_name) 
    end
  end
end

So there is a lot going on here and admittedly the code isn't as clear as it could be considering I was in a rush to download my images. Let's break it down:

progressbar = ProgressBar.create
uri = URI(url)

raise ArgumentError, 'too many HTTP redirects' if limit == 0

req = Net::HTTP::Get.new(uri)
req['Cookie'] = cookie

Here we are first instantiating a progress bar just so we can track the progress of each file download. From there we're taking the url input and casting it as a Ruby URI object. After ensuring that we're not stuck in a lengthy redirect situation we go ahead and create the Ruby request object using Ruby's Net:HTTP library. One important thing to note here is the fact that we're using a cookie when downloading the file. This is required by Picturelife to make sure you have permission to download the file. I extracted the cookie by examining the redirect we followed earlier. After entering in the URL that leads to the download of the original Picturelife image, cookie information is included in the request.

Inspecting Cookies

You'll notice there's a lot of Optimizely and some Google Analytics stuff for A/B testing and analytics tracking. We're interested in the last bit that starts with _pl_session_id=. We'll need to include that in our header so we do that on the last line of this code segment.

On to actually making and handling the request:

  begin
    response = Net::HTTP.start(uri.hostname) do |http|
      http.request req do |response|
        case response
        when Net::HTTPRedirection then
          location = response['location']
          warn "redirected to #{location}"
          download(url: location,
                   destination: destination,
                   limit: limit - 1,
                   cookie: cookie)
        when Net::HTTPSuccess then
          destination += extract_extension(response.uri.path)
          total_request_size = response.content_length
          puts "Saving File: #{destination}"
          bar = ProgressBar.create(title: "Items",
                                   starting_at: 0,
                                   format: '%a %B %p%% %r KB/sec',
                                   rate_scale: lambda { |rate| rate / 1024 },
                                   total: total_request_size)
          open destination, 'w' do |io|
            response.read_body do |chunk|
              bar.progress += chunk.length
              io.write chunk
            end
          end
        else
          response.value
        end
      end
    end
  rescue Exception => ex
    puts "Error Downloading File: #{ex.class} - #{ex.message}"
  end

The first thing I did was to surround the whole request in a begin rescue block so we can output errors and if the file doesn't download it won't immediately terminate the program. I noticed that some images have already begun to disappear from Picturelife's servers and so I've received 404 errors when trying to download some ids. From here, I went ahead and started and made the actual request. All requests are redirected so I added a case statement that looks for redirects and simply recalls the download method recursively with the new redirected URL. Once a success is reached, we find the extension of the given file and append it to the file destination and then stream the file to the destination. You'll notice that we also update the progress bar.

Lastly, we put all these components together to make a list of all the image ids and then download them.

module PicturelifeExporter
  def self.download_images(destination:, cookie:, access_token:)
    api = API.new(access_token)
    image_info = api.image_ids(per_page: 20000, page: 0)
    image_ids = image_info.fetch(:ids)
    total_images = image_ids.size

    image_ids.each_with_index do |image_id, index|
      image_number = index + 1
      puts "Downloading image #{image_number} / #{total_images}"
      download_image(image_id: image_id,
                     destination: destination,
                     cookie: cookie
                    )
    end
  end

  def self.download_image(image_id:, destination:, cookie:)
    url = URI("http://picturelife.com/d/original/#{image_id}")
    response = 
      PicturelifeExporter::FileDownloader.
      download(url: url,
               destination: "#{destination}#{image_id}",
               cookie: cookie)
  end
end

This reduces downloading all the photos down to one method call: download_images. It requires a destination, the cookie value which we fetched from the browser, and the access token for the API. With that we instantiate the API class we built and then fetch the image ids. Then with each id we download the image to the destination.

Lastly, I wrapped the whole thing in a Thor CLI which you can browse in the source code. This will make the app runnable directly from the command line. You can find the gem here. Keep in mind this is hacked together quickly so there's likely to be issues that are not addressed.