How i sent 300k emails through Github’s API in a matter of minutes

To all watchers of the libgdx repository: i’m terribly sorry and hope i didn’t interfer with your work in any way

This is meant as a cautionary tale about using Github’s API on a repository with quite a few watchers (460 in this case).

Earlier this year we migrated our code from Google Code to Github. We didn’t have a good migration plan for the 1200 or so issues back then, so we kept them on Google Code. We now have about 1700 issues on the tracker

Today i finally wanted to tackle the issue tracker migration, using a Python script i found on Github. The script requires one to specify a Github user account that owns the repository the issues will get migrated to. I did a dry run on a fork of the main repo using my Github account, fixed up some issues in the script, and validated things to the best of my abilities. Things looked good.

Then i ran it on the main repository. Luckily i was watching our IRC channel. After about 4 minutes, people started to scream. They each received 789 e-mails from Github. Every single issue i migrated, and every single comment of each issue triggered an e-mail notification to all watchers of the main repository.

This wasn’t apparent to me during the dry runs, as i used my own Github account. The script posts all issues/comments with the user account i supplied, so naturally, i did not get any notification mails.

I stopped the script after 130 issues (4 minutes), and immediately started sending out apologies and a mail to Github support, to which i haven’t received an answer yet. I sent roughly 300k mails through their servers in a matter of minutes. If i hadn’t watched IRC, i’d have send out about 4 million mails to 460 people within an hour.

Let me assure you that i’m extremely sorry about this incident. I know that things like this can interrupt daily workflows quite a bit, even if getting rid of those mails is not a Herculean task. I’d be rather upset if a repo maintainer pulled something like this on me. Please accept my deepest apologies.

The lesson for Github API users: think hard about the implications of automating tasks through the Github API if you have more than a few watchers.

The lesson for Github/API designers: consider safe-guarding against such issues in your API, in case other idiots like me pull off something similar in the future.

26 thoughts on “How i sent 300k emails through Github’s API in a matter of minutes

  1. This is why you should read the scripts you use carefully, line by line, when doing huge tasks like these. Also proper grammar is sent, not send.

  2. I did indeed read the script and modified it according to my needs. Nothing indicated that it would trigger an e-mail flood.

    Thanks for the grammar correction, fixed.

  3. Oh, that’s weird then! Not sure how the Github API works so I wouldn’t really know but I kind of assumed it would be pretty obvious in the source, can you link the script source?

    NP! ๐Ÿ™‚

  4. Github sends those e-mails on their side. It’s not something you can disable. Watchers decide whether they want to receive notificiations.

  5. The script is linked in tha blog post (on mobile, so not linking again :)) It uses the Python Github API, which does not make it obvious what side effects its methods have.

  6. I don’t think most of will have to worry about this problem, but it is good to know ๐Ÿ˜€
    I thought it might have had something to do with why the site was down yesterday- saw an overview of the post in the RSS, but wouldn’t let me see it on the site.

    Let us know if they disable anything due to the high traffic!

  7. As I was bulk-archiving the libGDX GitHub emails yesterday, my only thought was – damn, Mario is one productive developer – I wish I could get that much done in one weekend!

  8. i have just mailed them with the email notification problem last friday, and suggested a digest email solution. and i got the response that the feature request has been put in the list, hope the team could notice that. actually i am curiouse why they dont care the mail sending frequency, as i know, this cost a lot

  9. This reminds me of our Migration to Tumblr stories. For all the users who had turned on auto tweet per post were flooded with tweets when we started migration.

    Recently I saw an addition in Tumblr api, they have added tweet on or off option, this works best when we do migration on content move from different network to Tumblr.

    I guess GitHub should also come up with such option in api (notification on or off)

  10. I think that you don’t have to apologize for that.People save a lot of time with your work so they can give some time for deleting some e-mails.

  11. I have more respect for a developer who can publicly admit his failings than one who quietly sweeps them under the rug, or tries to justify them. Good job, and thanks again for all the great work on the engine.

  12. Haha, well lessons learnt I guess. I have done similar in the past- launched a web app which built up a following of several hundred thousand users, there was a cron that had been set to run once a month and we had forgotten to take it out of maintenance mode, I came into the office the morning after the weekend it set off to find half a million emails in my inbox and to three other developers.

  13. and where’s the failing? does it look like his fault? it’s github’s fault.. the mails are not from him…

  14. Github’s behavior was to send emails, and was functioning as designed. The developer’s lack of knowledge is what caused this, so yes, the failure is on him, and is why he apologized to Github.

    That’s what being a developer is about though. Can’t create without making mistakes.

  15. Thats true, actually I turned off the email notifications because the normal development of libgdx sends a lot xD So I missed this, phew!

Leave a Reply

Your email address will not be published.