Saving My Photo Library From Picturelife Using Chrome and Ruby
Awhile back when realizing my photo library was becoming large and unwieldy, I decided to look for a photo management service and stumbled onto Picturelife which would not only back up my photos and videos, but also use AI to organize them. I loved this idea so I committed to the service and spent the next few months casually uploading my 200GB library onto their platform from my slow, rural, US internet connection. Once uploaded the service was great and automatically uploaded any photos I stored on my devices.
I would login periodically to browse and use my photos, but for the most part it was set it and forget it. A few days ago when I logged in I noticed the CSS for the site wasn't loading and that the SSL certificate had expired. Not good. I poked around on Google and noticed that Picturelife had been sold to Streamnation (whose site isn't loading at all). I questioned them on Twitter and also got no response. It looks like in true Silicon Valley fashion, the service that sold us on being a backup is now quickly deteriorating. What's worse is that I don't even recall getting an email notification from them warning us that the service was going to disappear.
I quickly started looking for ways to download my entire library and found Picturelife's instructions. Unfortunately, the service that prepares the images for downloading is broken, which was confirmed by the growing number of angry comments at the bottom of the page.
While the situation is looking grim, I noticed on the defunct download page a link to an API. Clicking through I'm presented with a page that had two empty select tags. Opening the inspector in Chrome I see the following:
It looks like Chrome is blocking the javascript because its not served over https. If I load the API page over regular http, the API page seems to load correctly, and more importantly the API seems intact.
After toying around with the interface, I figure out I can get my entire list of photos by the Picturelife's id
attribute. It doesn't seem to have a limit on its limit size, so I put in a high enough limit to cover my entire library. This way I can download a list of all my id
s in one fell swoop and not have to deal with pagination. Picturelife's API app is thoughtful enough to provide a cURL command.
curl http://api.picturelife.com/medias/index \
-d limit="20000" \
-d access_token="my_access_token"
Note the access_token
. This is required to identify yourself to the API server so you can get your private information.
Now that we have a way to get a list of all the photo id
s, we'll need a way to fetch our images using those id
s. To do this, I'll open up the Picturelife webpage and see if the interface has a way to download the image. When looking at and clicking on a particular image a menu pops up that does have a link to download the image.
Let's open up Chrome's debugger again and take a look at the network calls that take place when I click the download link. There are a lot of calls going on in the background, but when I click the link, the following calls pop up:
It looks like there are two calls being made that interest us. The first is to the following URL:
http://picturelife.com/d/original/s0niPsD1YmShdlR
This is great because the end of this URL looks like the image id
s that we fetched earlier. This means we can now use that list of id
s to create a list of URLs where we can download our images. To test my theory, I'll just grab a different id
from my list and replace the end of the URL and see what happens when I paste it into my browser. When I do this, the original full sized image is downloaded and displayed. We're on the right path, but it seems like the original URL is a redirect to another server. This will be something we have to worry about later, but its OK for now.
Now we have a plan; grab a list of id
s from the API, and then use those id
s to create a URLs for each image. We then can proceed to download all of our images.
To do this I will write a Ruby gem and use bundler to provide a starting template. Even though I'm going to be cowboy coding I still like to setup automated tests so I can quickly check my work and use multiple files to keep concepts manageable by my human brain. I've made a decision to cowboy code this gem because this is something that won't be used in production, will be used by me only once, and something I need as quick as possible (before Picturelife completely dies).
At this point I'm not going to include every step in detail, but go through the process quickly and provide the entire source at the end.
My first step was to create something that could fetch all the image id
s from the Picturelife API. I decided to use httparty for this. With that library, I came up with the following Ruby class.
require "httparty"
module PicturelifeExporter
class API
include HTTParty
base_uri 'http://api.picturelife.com'
format :json
def initialize(access_token)
@access_token = access_token
end
def all_images(per_page:)
per_page = Integer(per_page)
page = 0
total_pages = 0
image_ids = []
while(page <= total_pages) do
image_data = image_ids(per_page: per_page, page: page)
total_pages = image_data.fetch(:total_pages)
page += 1
image_ids += image_data.fetch(:ids)
end
return image_ids
end
def image_ids(per_page:, page:)
per_page = Integer(per_page)
page = Integer(page)
offset = page * per_page
image_data =
self.class.
get("/medias/index/",
{ query:
{access_token: @access_token,
ids_only: true,
limit: per_page,
offset: offset
}
}
)
ids = image_data.fetch("media")
total_pages = Integer(image_data.fetch("total") / per_page)
return { ids: ids, total_pages: total_pages }
end
end
end
With the information about how to use the API that we discovered earlier, its straightforward to grab a list of IDs. The class does require the access_token
found on the Picturelife API page. I also did implement pagination in case it was needed, but in most cases you can simply put in a per_page
value that is larger than the number of photos you have.
Now that we have an array of image id
s, let's make an interface that accepts a url and a destination directory then downloads the image to the supplied directory with the id
as the filename.
require 'net/http'
require 'ruby-progressbar'
module PicturelifeExporter
module FileDownloader
def self.download(url:, destination:, limit: 10, cookie:)
progressbar = ProgressBar.create
uri = URI(url)
raise ArgumentError, 'too many HTTP redirects' if limit == 0
req = Net::HTTP::Get.new(uri)
req['Cookie'] = cookie
completed_request_size = 0
begin
response = Net::HTTP.start(uri.hostname) do |http|
http.request req do |response|
case response
when Net::HTTPRedirection then
location = response['location']
warn "redirected to #{location}"
download(url: location,
destination: destination,
limit: limit - 1,
cookie: cookie)
when Net::HTTPSuccess then
destination += extract_extension(response.uri.path)
total_request_size = response.content_length
puts "Saving File: #{destination}"
bar = ProgressBar.create(title: "Items",
starting_at: 0,
format: '%a %B %p%% %r KB/sec',
rate_scale: lambda { |rate| rate / 1024 },
total: total_request_size)
open destination, 'w' do |io|
response.read_body do |chunk|
bar.progress += chunk.length
io.write chunk
end
end
else
response.value
end
end
end
rescue Exception => ex
puts "Error Downloading File: #{ex.class} - #{ex.message}"
end
end
private
def self.extract_extension(file_name)
File.extname(file_name)
end
end
end
So there is a lot going on here and admittedly the code isn't as clear as it could be considering I was in a rush to download my images. Let's break it down:
progressbar = ProgressBar.create
uri = URI(url)
raise ArgumentError, 'too many HTTP redirects' if limit == 0
req = Net::HTTP::Get.new(uri)
req['Cookie'] = cookie
Here we are first instantiating a progress bar just so we can track the progress of each file download. From there we're taking the url input and casting it as a Ruby URI
object. After ensuring that we're not stuck in a lengthy redirect situation we go ahead and create the Ruby request object using Ruby's Net:HTTP
library. One important thing to note here is the fact that we're using a cookie when downloading the file. This is required by Picturelife to make sure you have permission to download the file. I extracted the cookie by examining the redirect we followed earlier. After entering in the URL that leads to the download of the original Picturelife image, cookie information is included in the request.
You'll notice there's a lot of Optimizely and some Google Analytics stuff for A/B testing and analytics tracking. We're interested in the last bit that starts with _pl_session_id=
. We'll need to include that in our header so we do that on the last line of this code segment.
On to actually making and handling the request:
begin
response = Net::HTTP.start(uri.hostname) do |http|
http.request req do |response|
case response
when Net::HTTPRedirection then
location = response['location']
warn "redirected to #{location}"
download(url: location,
destination: destination,
limit: limit - 1,
cookie: cookie)
when Net::HTTPSuccess then
destination += extract_extension(response.uri.path)
total_request_size = response.content_length
puts "Saving File: #{destination}"
bar = ProgressBar.create(title: "Items",
starting_at: 0,
format: '%a %B %p%% %r KB/sec',
rate_scale: lambda { |rate| rate / 1024 },
total: total_request_size)
open destination, 'w' do |io|
response.read_body do |chunk|
bar.progress += chunk.length
io.write chunk
end
end
else
response.value
end
end
end
rescue Exception => ex
puts "Error Downloading File: #{ex.class} - #{ex.message}"
end
The first thing I did was to surround the whole request in a begin rescue
block so we can output errors and if the file doesn't download it won't immediately terminate the program. I noticed that some images have already begun to disappear from Picturelife's servers and so I've received 404 errors when trying to download some id
s. From here, I went ahead and started and made the actual request. All requests are redirected so I added a case statement that looks for redirects and simply recalls the download
method recursively with the new redirected URL. Once a success is reached, we find the extension of the given file and append it to the file destination and then stream the file to the destination. You'll notice that we also update the progress bar.
Lastly, we put all these components together to make a list of all the image id
s and then download them.
module PicturelifeExporter
def self.download_images(destination:, cookie:, access_token:)
api = API.new(access_token)
image_info = api.image_ids(per_page: 20000, page: 0)
image_ids = image_info.fetch(:ids)
total_images = image_ids.size
image_ids.each_with_index do |image_id, index|
image_number = index + 1
puts "Downloading image #{image_number} / #{total_images}"
download_image(image_id: image_id,
destination: destination,
cookie: cookie
)
end
end
def self.download_image(image_id:, destination:, cookie:)
url = URI("http://picturelife.com/d/original/#{image_id}")
response =
PicturelifeExporter::FileDownloader.
download(url: url,
destination: "#{destination}#{image_id}",
cookie: cookie)
end
end
This reduces downloading all the photos down to one method call: download_images
. It requires a destination, the cookie value which we fetched from the browser, and the access token for the API. With that we instantiate the API class we built and then fetch the image id
s. Then with each id
we download the image to the destination.
Lastly, I wrapped the whole thing in a Thor CLI which you can browse in the source code. This will make the app runnable directly from the command line. You can find the gem here. Keep in mind this is hacked together quickly so there's likely to be issues that are not addressed.