About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

How open data can lead to better parking in San Francisco

By on April 13, 2012

If you have ever tried to park around the downtown area of San Francisco during work hours (or any big city for that matter), you’ll know what I am talking about: be prepared to circle around for hours. The good news is the city of San Francisco has released an API to monitor garage parking information on http://www.sfpark.org , among other things. In their own words : “SFpark works by collecting and distributing real-time information about where parking is available so drivers can quickly find open spaces.

This is done via real-time sensors.

As an enthusiastic data analyst, I realized I could use Datameer to get more insights about my parking problem. Sure, SFPark already gives you real-time information about parking (on the first page of http://sfpark.org/),  and analysts have already taken a look at pricing .

What I was looking for was a little different: an overall perspective of parking availability and change over a long period of time.

Datameer allows you to bring in data from a variety of different sources. In this case SF Park‘s API returns JSON data, so we used the built-in adaptor for this. We can easily schedule this import of data in an automated way to say, every 30 minutes or so to start building our time series analysis.

An interesting new feature about importing is you can now partition the data, in the same way Hive does, but in an easy user-friendly way, in this form:

You can then work with a subset of the data, in this case chosen to be on a day-granularity. More information about how to set up partitions here.

Once the data is imported, we can easily deconstruct the JSON data with our set of JSON functions. The complete way to do this and deal with JSON data downloaded from the web is described in details in our video section, but basically we end up with something like this:

We can now construct an analysis and visualize the data to infer some statistics about for example at what time in downtown San Francisco are the garages most full (this was run over a 3 month period in 2012):

It appears from this graph that if you work there, the earlier you arrive the better, because the garages get filled up pretty quickly; people seem to start leaving around 2pm (with a maximum availability of 32%), so it seems like the general trend is to work early in the day, perhaps because all of the financial institutions in that area?

After 5pm (1700) the general availability is around 42%, and it gets easier to park after that.Given that there are around 440,000 total spaces in San Francisco, does the day of the month make a difference in parking space availability? This graph shows that it doesn’t seem so:

The weekends (Feb 26, Mar 3) show the spaces are mostly unoccupied, whereas the average number of spaces occupied on weekdays is around 15,000. Of note, we have an outlier of over 20,000 spaces occupied on Feb 22, not sure what happened that day? (Please tell us if you know!).

Let’s see if there is any drastic difference in garage space occupation for this range of days, per garage or area:

It seems like the number of spaces available is fairly evenly distributed for the garages in the dashboard, except for the Golden Gateway one.

Let’s just add a sort and see the top occupied garages and areas to avoid:

It seems like overall, the Leavenworth area, as well as the south Embarcadero road seem to be pretty bad areas for parking.

What if you wanted to see the results of this study on a particular timeframe only like say on a per-day basis, without having to change the analysis? You can simply enable the result set to be partitioned, like demonstrated here, with this nifty sunburst to control the partition level:

This analysis is refreshed with the latest data on a continuous basis, so let us know if you want to see the latest results as it is being continuously updated.

This study could be further deepened by looking at the prices each garage is charging, and choosing the lowest-price one along with its availability.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook