EsheleD Marketing & Technology

8Feb/120

The BugSense hybrid app: experiences using Clojure on Google App Engine

Today’s post comes to us from Jon Vlachogiannis and Panos Papadopoulos, founders of BugSense, a mobile error analytics service. We hope you find their insights on using Clojure on Google App Engine informative.


BugSense is a cross-platform error analytics infrastructures for mobile devices. BugSense uses Google App Engine to power its backend, processing more than 1.6 million daily errors, generated by more than 45 million devices around the world. Chances are one of the applications installed on your smart phone (like SoundCloud or Trulia) is already using BugSense.

The Problem
Lots of our clients want to optimize and protect their mobile apps (through code obfuscation) using ProGuard. ProGuard creates more compact code, resulting in faster transfer across networks, faster loading, and smaller memory footprints. On top of that it makes programs and libraries harder to reverse-engineer.

However, because the Android Market doesn't automatically de-obfuscate of stack traces from ProGuard-ed apps, developers who want to analyse errors from their apps must get the stack trace from the Market, format it and use ProGuard locally. The whole process for just a single error could take more than 3 minutes, so we decided to add support for ProGuard to BugSense to make debugging easier and faster.

The Solution: Clojure and Python
The main data-serving portion of our app is written in Python, our language of choice, but ProGuard is an open source project in Java. For easier development, we ported parts of ProGuard to Clojure, a dynamic language belonging to the Lisp family that runs on the JVM. This allows us to “beat the averages” by exploiting all the great features that a LISP language offers (such as macros  and exploratory programming). Using Clojure and having access to a vast number of Java libraries assisted us in tackling the difficult problem of de-obfuscation, with great results.

Once we were done, we deployed using AppEngineMagic and now it's trivial (one click) for our users to de-obfuscate their stacktraces. Now we have the best of two worlds: Python for serving data and Java/Clojure for doing calculations, all in the same Google App Engine application. And it scales automatically and runs even faster than running ProGuard on your laptop!



Practically, that means that we can have a heterogeneous app on Google App Engine so that we can keep programming in our favourite language, Python, but still harness the tremendous wealth of Java libraries using Clojure. Running a hybrid app on App Engine is trivial since they share the same resources task queues, Datastore, and memcache.

However, because our app is implemented in multiple languages, we need to start two different local instances (one for Python and one for Clojure). We use a combination of mocks for both of the instances in order to emulate the hybrid app and their interaction in a local environment for development and testing.

Google App Engine, a success factor
We started as a two-developer startup and our product rapidly became popular across the world. Building on Google App Engine helped us focus on product development and forget about infrastructure and administration, thus enabling us to focus more on our customers' needs. (And sleep tight at night.) Furthermore it helped us to keep costs low and iterate quickly.

To learn more about BugSense, check out our website. If you have comments or questions about this post or just want to reach out directly, you can find us at +jonromero or +bugsense.

1Feb/120

App Engine 1.6.2 Released

Some of you may think of dragons as ferocious, treasure-hoarding, fire-breathing monsters. But the App Engine team is embracing the dragon as a symbol of fortune and good luck, and we are excited to announce our first release in the Year of the Dragon.


Experimental Datastore Backup/Restore
Using the Datastore Admin functionality in the Admin Console, you can now use the experimental Datastore Backup/Restore tool to backup your Datastore to Blobstore. You can also select a backup to restore from. The Datastore Backup/Restore feature runs as a MapReduce within your application and counts against your Instance, Datastore Ops, and Storage quotas.

Django® + Cloud SQL
For Python fans of Google’s Cloud SQL (currently available in limited preview), the long awaited out-of-the-box support for the Django framework has arrived and is now available as an experimental feature. Now you can easily use Cloud SQL within the Django framework as you would use any other SQL database.

...And More
Additional features available in 1.6.2 include:

  • Channel API: Developers can now specify how long a channel token will last until it expires, with the default remaining two hours. Channel API quota is now measured both in calls to create a channel and the number of hours of channel time requested.
  • Task Queues: A new X-Appengine-TaskETA header has been added which can be used to measure task delivery latency.
  • Blobstore: The Python API for the Blobstore now provides asynchronous API calls for creating upload URLs and fetching and deleting data.

The full list of our features and bug fixes can be found on our release notes (Java, Python). Join in the discussion about this release and all things App Engine related in our Google Group.


Posted by The App Engine Team

27Jan/120

My summer with the Google App Engine Team

Today’s post is contributed by our Summer 2011 team intern, Chris Bunch. Chris did some great work on our Logs and MapReduce APIs and is also the first “App Engine Triple Crown” winner for developing the Experimental Logs Reader API in Python, Java and Go simultaneously.

Four years ago, I was a brand-new Ph.D. student at the University of California, Santa Barbara and when our research group (the RACELab) heard about Google App Engine, we were intrigued. We thought it presented a new model that enabled apps to scale the right way without severely constricting the types of programs users would write.

But we wanted to experiment with the core functionality of App Engine: the APIs, the scheduler, etc., and so we built AppScale, an open-source implementation of the Google App Engine APIs that allows users to deploy applications written in Python, Java, and Go to the infrastructure of their choice.

Wherever possible, we implement support for the App Engine APIs with alternative open-source technologies. We’ve added support for nine different databases, database-agnostic transactions, a REST interface that users of any programming language can communicate with (via an App Engine app), and the ability to run high performance computing programs over the whole thing and talk to it from your App Engine app. And here’s my favorite part - it all deploys automatically! You don’t need to tell it what block size you want for the distributed file system, or the size of the read buffers: we configure the necessary services automatically. Since AppScale is completely open source, if you don’t like the defaults, change them!

After creating our own system to run Google App Engine apps, I wanted to see how Google does it. Therefore, I decided to become an intern on the App Engine team and see if I could give them (and by extension, the App Engine community) something amazing over the summer. I started off with some work on the MapReduce API, making the sample app much easier to use and prettier all around. I also made a YouTube video showing how it all works and how easy it is to run MapReduce jobs over App Engine.

I then looked at a recurring question that App Engine users encounter: “How can I get my logging information for my application to answer data analytic questions?” It was an excellent problem to tackle, as we have users who want to be able to determine application-specific queries that Google Analytics or the Admin Console don’t answer. Currently users have to use appcfg to grab all their application’s data to a remote machine and run some analysis script over it.

To solve this problem, I created the Logs API, which gives applications programmatic access to their logs from within App Engine itself. Applications can use it to query small numbers of logs within a single request, and they can utilize the Pipeline, MapReduce, or Backends APIs if they have lots of logs they want to analyze. Logs contain both request-level information (e.g., the URL accessed, the HTTP response code returned) as well as logging info generated by the application (the logging module in Python, the Logger class in Java, and the logging methods that Go’s appengine package provides). The Logs API is available for use as of App Engine 1.6.1 by programmers using the Python, Java, or Go runtimes, in both the production environment and the local SDK.

I had a great time putting the Logs API together, and had a unique experience interning with the App Engine team. Programming in Python, Java, and Go on a daily basis was an exciting new challenge, and I loved it! 




Interested in interning with the App Engine team? Check out google.com/students for more information on internships.

20Jan/120

Google Cloud Storage: concurrency controls and deeper App Engine integration

Cross posted from the Google Code Blog 

Google Cloud Storage is a robust, high-performance service that enables developers and businesses to use Google’s infrastructure to store and serve their data. Today, we’re announcing a new feature that gives you greater control over concurrent writes to the same object, and the availability of an App Engine Files API that makes it easier to read and write data from Java App Engine applications.

Write concurrency control

A number of our customers have asked us for greater control over concurrent writes, in order to implement features like strongly consistent write operations and distributed locking semantics in the cloud. In response to your feedback, we’re announcing the release of version-based concurrency control. Every time you update an object, it gets assigned a 32-bit, monotonically increasing sequence number. This version number is returned as a header with every GET or HEAD request. You can then use a conditional write operation to manage concurrent updates to the object (for example, when you want read-modify-write semantics). This feature is currently experimental.


AppEngine Files API for Java applications

Last fall, we announced the ability to read and write your Google Cloud Storage data using the App Engine Files API for Python applications. Today, we’re making the Files API available to Java App Engine applications too. This feature is currently experimental, and we’ll continue to enhance it in the months to come.

As always, we welcome your feedback in our discussion group. If you haven’t tried Google Cloud Storage yet, you can sign up and get started here.

18Jan/120

Happy New Year from the App Engine team

Happy New Year! As we return from our New Year's celebrations, brush the dust off our workstations and gear up for our first release of 2012, we thought it would be fun to take a look back at improvements we have made and what developers have accomplished with App Engine in 2011.


Let’s start with the features and functionality we added last year:




Best of all, with your continued support we accomplished our goal of graduating from preview and became a full fledged Google product.

We’ve seen excellent growth and adoption over the past year, with businesses like Pulse, Evite and Best Buy choosing App Engine for their applications. Even St. James’s Palace chose App Engine to host the Royal Wedding site. We had so much fun collaborating with 17 of the world’s most renowned museums for the Google Art Project and with other Googlers building iGoogle gadgets and Doodles on App Engine. We’ve added more than 1 million registered applications and have more than 150,000 active developers on the App Engine platform generating more than 5 billion page hits per day.

Back in our first blog post in 2008, we asked you to “start your engines” and what a ride we’ve taken. Thank you for making 2011 our best year yet and here’s to making 2012 even better!



Posted by Peter Magnusson, Engineering Director

6Jan/120

Happy Birthday High Replication Datastore: 1 year, 100,000 apps, 0% downtime

Once upon a time, the only way to store persistent data in App Engine was to use the Master/Slave Datastore. Although it was a transactional, massively scalable, fully managed, distributed storage system running on Google’s world-class infrastructure, its availability was tied to the availability of a single datacenter, and when you’re serving hundreds of thousands of applications, relying on any single datacenter is simply not sufficient. One year ago today we unveiled a new offering that was specifically designed to address this weakness: the High Replication Datastore (HRD). Still transactional, still massively scalable, still fully managed, still running on Google’s world-class infrastructure, but with the ability to withstand multiple datacenter outages and no planned downtime!


By the time Google I/O came around last May, HRD was performing beautifully and our customers were happy, so we took the next step and made HRD the default option for all new App Engine applications.

In June we made HRD available in our SDK so that customers could easily experiment with the new consistency guarantees (paxos on your laptop!), and we launched the first version of our migration tool to make it easy to move your apps from Master/Slave to HRD.

In October we released XG Transactions, our first HRD-only feature, which allows users to transact across entity groups.

In November we brought App Engine out of preview and added a 99.95% SLA for HRD applications.

In our most recent release we launched an updated version of our HRD migration tool that ties the duration of the read-only period to your write-rate, rather than the size of your dataset. This makes your migration quick, simple, and easy to plan for regardless of how much data you have. One App Engine customer recently migrated over 500G of Datastore data with only a 10 minute read-only period!

Throughout all this, HRD has had no system-wide downtime (planned or unplanned) and has grown to serve over 3 billion requests per day. Needless to say it’s been a phenomenal year.

We realize that moving data requires planning, testing, coordination, and a strong stomach. However, we believe strongly that HRD provides a fundamentally better service than Master/Slave, and we encourage all our customers to migrate to HRD. Over the coming months you can expect to see further improvements to our migration tools (Blob migrations are on the way!), more HRD-only features like Full Text Search, and of course, more 9s than you can shake a stick at.

Posted by Max Ross, Datastore Tech Lead

25Dec/110

Simple development of App Engine apps using Cloud SQL – Introducing Google Plugin for Eclipse 2.5

Since we added SQL support to App Engine in the form of Google Cloud SQL, the Google Plugin for Eclipse (GPE) team has been working hard on improving the developer experience for developing App Engine apps that can use a Cloud SQL instance as the backing database.

We are pleased to announce the availability of Google Plugin for Eclipse 2.5. GPE 2.5 simplifies app development by eliminating the need for manual tasks like copying Cloud JDBC drivers, setting classpaths, typing in JDBC URLs or filling in JVM arguments for connecting to local/remote database instances.

GPE 2.5 provides support for:
  • Configuring Cloud SQL/MySQL instances
  • Auto-completion for JDBC URLs
  • Creating database connections in Eclipse database development perspective
  • OAuth 2.0 for authentication.

Configuring Cloud SQL/MySQL instances
App Engine provides a local development environment in which you can develop and test your application before deploying to App Engine. With GPE 2.5, you now have the ability to configure your local development server to use a local MySQL instance or a Cloud SQL instance for testing. When you choose to deploy your app, it will use the configured Cloud SQL instance for App Engine.

Auto-completion for JDBC URLs
GPE 2.5 supports auto-completion for JDBC URLs, and quick-fix suggestions for incorrect JDBC URLs.


Creating database connections in Eclipse database development perspective
The Eclipse database development perspective can be used to configure database connections, browse the schema and execute SQL statements on your database.

Using GPE 2.5, database connections are automatically configured in the Eclipse database development perspective for the Development SQL instance and the App Engine SQL instance.


You can also choose to manually create a new database connection for a Cloud SQL instance. In GPE 2.5, we have added a new connection profile for Cloud SQL.


GPE 2.5 now uses OAuth 2.0 (earlier versions were using OAuth 1.0)  to securely access Google services (including Cloud SQL) from GPE. OAuth 2.0 is the latest version of the OAuth protocol focussing on simplicity of client development.

Can’t wait to get started?
Download GPE here and write your first App Engine and Cloud SQL application using GPE by following the instructions here.

We hope GPE 2.5 will make cloud application development using App Engine and Cloud SQL a breeze. We always love to hear your feedback and the GPE group is a great place to share your thoughts.

Posted on behalf of the Google Plugin for Eclipse Team

25Dec/110

App Engine 1.6.1 Released

We have one more release this year to make our developers merry, and while some members of our team enjoy the summer sunshine down under, we’ll be taking a short winter break from releases. Don’t worry, we’ll be back to our normal schedule in January, but we couldn’t resist tempting you with some new features that will keep you up tinkering well past midnight on January 1st.


Platform Changes

  • Frontend Instance Classes - For applications that need more CPU and/or memory to serve requests, we’ve introduced two larger frontend instance classes. Before today, all apps were allocated a fixed instance size no matter what the app was computing in its requests. Now, apps that need more computing power can upgrade the size of their instances.
  • High Replication Datastore (HRD) Migration Tool Has Graduated - The HRD migration tool is now a fully supported feature. The tool allows you to easily migrate your data, limits the downtime required to complete the migration, and also allows you to choose its precise time. Every app can now start the new year off right, improving their uptime and reliability by migrating to HRD!


New APIs

  • Conversion API (Experimental) - Converting between formats within your application can be a pain, but with the experimental Conversion API you can now easily convert between PDF, HTML, text and images. Generating PDF invoices from HTML, displaying PDF menus as HTML or extracting text from images using OCR is now as simple as an API call.
  • Logs Reader API (Experimental) - Want to summarize latency by handler? Summarize request statistics by user? The new logs reader API allows you to programmatically access your logs to build reports, gather statistics, and analyze requests to your heart’s content.


Read the full release notes for Java and Python to get all the details on 1.6.1. We always love to hear what you think, so keep the feedback on our groups coming. App Engine releases will resume again with our regular schedule around the end of January.



Posted by The App Engine Team

25Dec/110

Whentotweet.com – Twitter analytics for the masses

Our post today comes from Stefan and Niklas of Whentotweet.com, a nifty site that recommends the best time of day to tweet based on your followers’ habits.


Twitter handles an amazing number of Tweets - over 200 million tweets are sent per day.

We saw that many Twitter users were tweeting interesting content but much of it was lost in the constant stream of tweets.

Whentotweet.com is born

While there were many tools for corporate Twitter users that performed deep analytics and provided insight into their tweets, there were none that answered the most basic question: what time of the day are my followers actually using Twitter?

And so the idea behind Whentotweet was born. In its current form, Whentotweet analyzes when your followers tweet and gives you a personalized recommendation of the best time of day to tweet to reach as many as possible.

Given the massive amount of data we needed to analyze, we knew it would be a huge engineering challenge to build what we wanted using the tools we had used previously. We also wanted to make sure we could offer at least a basic product for free. Not only did we need to process massive amounts of data - we also needed a way to do it without a second mortgage on our houses!

The Technology Used

As we went over the alternatives we started to sketch different ways of hosting our application. We had previous experience building web sites and knew that traditional cloud hosting would be expensive and difficult to manage for the kind of computing that we needed. After some quick back-of-the-envelope calculations it seemed clear that Google App Engine would give us both the kind of pricing we needed and a way to scale. We decided to write a quick test application to test our assumptions.

The test application blew our minds. Apart from proving our initial assumptions around pricing and scale we started appreciating the quick deploys. On previous projects we were used to one deploy per month. Almost immediately we shifted our schedule to one or sometimes several deploys per day to push new code to customers.

The main APIs that Whentotweet relies on are Google App Engine's task queues and Datastore. Whenever a new user requests a report it is added as a task. A typical report requires a huge number of interactions with external sites. By breaking down each external interaction into separate tasks in different queues it became easy to make sure we kept a steady rate of API calls to external sites without risking that a huge influx of users would break our API limits.

The initial task then spawns new tasks until finally one of the tasks decides that the report is complete and tweets a summary of the result and a link to a more detailed report. Whentotweet uses a "fail fast" technique so whenever any request fails, internal or external, the task terminates and puts itself back on the queue.

The Datastore saves a finished or ongoing analysis. Sometimes a single analysis will be updated several times a second by tasks as they finish and store their results.

The Result

After a few weeks of intense coding, we were ready to test our code on a small sized Twitter account with less than three hundred followers. The results came back in just a few minutes.

After verifying that everything had actually worked as well as we thought, we decided to try another account. This time one of the largest Twitter accounts on the planet: @techcrunch. Handling a Twitter account with over a million followers took the application one week. But after the analysis started, Whentotweet would quietly work in the background without us having to lift a finger.

Whentotweet got off to a better start than we imagined. During the initial launch thousands of people tested it on their Twitter accounts.

After a while blog posts appeared, recommending Whentotweet as an invaluable Twitter tool. Each post would generate a sudden huge spike in traffic. Sometimes, a blog owner would mail us and ask if we were ready for the sudden increase in traffic this would bring. But Whentotweet was built to scale and even massive sites such as Mashable.com didn't slow it down. The most amazing thing is that we didn't need to write a single extra line of code to handle these massive variations in load. Instead, as soon as we wrapped our head around the tools in the App Engine toolbox we knew that Whentotweet would easily scale. App Engine forced us to think outside the box and avoid the fallacies of traditional hosting that create bottlenecks.

Currently, over 38,000 people have tried Whentotweet and we see from the user feedback that they love it. Give it a try at: www.whentotweet.com

- Niklas Agevik (@niklas_a) and Stefan Ålund (@stefan_alund) of Whentotweet.com

25Dec/110

Scaling with the Kindle Fire

Today’s blog post comes to us from Greg Bayer of Pulse, a popular news reading application for iPhone, iPad and Android devices. Pulse has used Google App Engine as a core part of their infrastructure for over a year and they recently celebrated a significant launch. We hope you find their experiences and tips on scaling useful.





As part of the much anticipated Kindle Fire launch, Pulse was announced as one of the only preloaded apps. When you first un-box the Fire, Pulse will be there waiting for you on the home row, next to Facebook and IMDB!

Scale
The Kindle Fire is projected to sell over five million units this quarter alone. This means that those of us who work on backend infrastructure at Pulse have had to prepare for nearly doubling our user-base in a very short period. We also need to be ready for spikes in load due to press events and the holiday season.

Architecture
As I’ve discussed previously on the Pulse Engineering Blog, Pulse’s infrastructure has been designed with scalability in mind from the beginning. We’ve built our web site and client APIs on top of Google App Engine, which has allowed us to grow steadily from 10s to many 1000s of requests per second, without needing to re-architect our systems.

While restrictive in some ways, we’ve found App Engine’s frontend serving instances (running Python in our case) to be extremely scalable, with minimal operational support from our team. We’ve also found the datastore, memcache, and task queue facilities to be equally scalable.

Pulse’s backend infrastructure provides many critical services to our native applications and web site. For example, we cache and serve optimized feed and image data for each source in our catalog. This allows us to minimize latency and data transfer and is especially important to providing an exceptional user experience on limited mobile connections. Providing this service for millions of users requires us to serve 100Ms of requests per day. As with any well designed App Engine app, the vast majority of these requests are served out of memcache and never hit the datastore. Another useful technique we use is to set public cache control headers wherever possible, to allow Google’s edge cache (shown as cached requests on the graph below) and ISP / mobile carrier caches to serve unchanged content directly to users.




Costs
Based on App Engine’s projected billing statements leading up to the recent pricing changes, we were concerned that our costs might increase significantly. To prepare for these changes and the expected additional load from Kindle Fire users, we invested some time in diagnosing and reducing these costs. In most cases, the increases turned out to be an indicator of inefficiencies in our code and/or in the App Engine scheduler. With a little optimization, we have reduced these costs dramatically.

The new tuning sliders for the scheduler make it possible to rein in overly aggressive instance allocation. In the old pricing structure, idle instance time wasn’t charged for at all, so these inefficiencies were usually ignored. Now App Engine charges for all instance time by default. However, any time App Engine runs more idle instances than you’ve allowed, those hours are free. This acts as a hint to the scheduler, helping it reduce unneeded idle instances. By doing some testing to find the optimal cost vs spike latency tolerance and setting the sliders to those levels, we were able to reduce our frontend instance costs to near original levels. Our heavy usage of memcache (which is still free!) also helps keep our instance hours down.



Since datastore operations used to be charged under the umbrella of CPU hours, it was difficult to know the cost of these operations under the old pricing structure. This meant it was easy to miss application inefficiencies, especially for write-heavy workloads where additional indexes can have a multiplicative effect on costs. In our case, the new datastore write operations metric led us to notice some inefficiencies in our design and a tendency to overuse indexes. We are now working to minimize the number of indexes our queries rely on, and this has started to reduce our write costs.

Preparing for the Kindle Fire Launch
We took a few additional steps to prepare for the expected load increase and spikes associated with the Fire’s launch. First, we contacted App Engine’s support team to warn them of the expected increase. This is recommended for any app at or near 10,000 requests per second (to make sure your application is correctly provisioned). We also signed up for a Premier account which gets us additional support and simpler billing.

Architecturally, we decided to split our load across three primary applications, each serving different use cases. While this makes it harder to access data across these applications, those same boundaries serve to isolate potential load-related problems and make tuning simpler. In our case, we were able to divide certain parts of our infrastructure, where cross application data access was less important and load would be significant. Until App Engine provides more visibility into and control of memcache eviction policies, this approach also helps prevent lower priority data from evicting critical data.

I’m hopeful that in the near future such division of services will not be required. Individually tunable load isolation zones and memcache controls would certainly make it a lot more appealing to have everything in a single application. Until then, this technique works quite well, and helps to simplify how we think about scaling.

To learn more about Pulse, check out our website! If you have comments or questions about this post or just want to reach out directly, you can find me @gregbayer.

Page 1 of 41234