Breaking things every six weeks

Attention conservation notice: 900 words of inside baseball about Mozilla. No security content whatsoever.

The Mozilla Project has been taking a whole lot of flak recently over its new rapid release cycle, in which there is a new major version of Firefox (and Thunderbird) every six weeks, and it potentially breaks all your extensions. Especially the big complicated extensions like Firebug that people cannot live without. One might reasonably ask, what the hell? Why would any software development team in their right mind—especially a team developing a critical piece of system infrastructure, which is what Web browsers are these days, like it or not—inflict unpredictable breakage on all their users at six-week intervals?

The first thing to know is that Firefox’s core developers are really focused on making the Web better. If we weren’t, we would be hacking on something other than a Web browser. The old release cycle was way too slow for us to do that effectively; as Jono Xia describes in his blog post It’s Not About the Version Numbers, anything we did might not get out to end users for over a year. When David Baron fixed visited-link history sniffing, he patched Firefox first—but Chrome and Safari shipped the change before we did.

You should read Jono’s post now. I’ll be here when you get back.

Shipping new versions of the browser every six weeks is clearly a better way to improve the Web rapidly, than shipping a new version only once a year or so. But what’s stopping the Mozilla team from shipping a new batch of under-the-hood improvements to the Web every six weeks without breaking anything? Why do we need to break things?

Well, we tried not breaking things for ten years, give or take, and it didn’t work. The second thing to know is that the core browser (Gecko) suffers from enormous technical debt. Like any large, 15-year-old piece of software, we have code in abundance that was written under too much time pressure to get it right, was written so long ago that nobody remembers how it works, isn’t comprehensively tested, or any combination of the above. We also have major components that reasonably seemed like good ideas at the time, but have since proven to be a hindrance (XUL, XBL, XPConnect, etc). We have other major components that should have been recognized as bad ideas at the time, but weren’t (XPCOM, NSPR, etc). And we have code for which there is no excuse at all (Firefox still had code using the infamous Mork file format until just this summer, and I understand it’s still live in Thunderbird).

It gets worse: many of the bugs can’t be fixed without breaking stuff. For example, take bug 234856. That’s a seven-year-old display glitch. What could possibly be an excuse for not fixing a simple display glitch for seven years? Well, the root cause of that bug (described more clearly in bug 643041, where the actual fix is posted) is an error in an XPCOM interface that, until we decided we weren’t going to do this anymore (post-FF4), was frozen—it could not be changed even though it was wrong, precisely so that extensions could depend on it not changing. There are thousands of XPCOM interfaces, and extensions can use all of them. That’s a great strength: it lets Firefox extensions do far more than, say, Chrome extensions can. That’s also a huge problem for people trying to make the core better. (Only about 200 of these interfaces were permanently frozen, but pre-FF4 we tried to avoid changing even the un-frozen ones as much as possible.) You’ll notice that the change in bug 643041 makes it easier to write extensions that manipulate SSL certificates, because now there’s just one nsIX509Cert interface, not three. But taking away nsIX509Cert2 and nsIX509Cert3 breaks code that was using them.

Some bugs can’t even be fixed without breaking Web sites. Any time Gecko doesn’t do the same thing Webkit and/or IE do, we (and the Webkit and IE people) want to make that difference go away—but to do that, at least one of the three has to change, and there may be sites out there relying on the behavior that just got taken away. In some cases, adding features breaks the web. For instance, if you write <element onevent="do_something()"> directly in your HTML, when the event fires, the JavaScript interpreter will try to call a method of element’s DOM API named do_something before it tries to call a global function with that name. Which means that adding DOM methods to any HTML element potentially breaks websites. (This is not a problem if you assign to element.onevent from a <script>.)

This is why Mozilla core developers can seem so callous to the needs of extension and website developers built on Gecko. We know that we depend on both groups for our continued relevance—a browser is no use at all with no websites to browse, and without extensions there is not much reason to pick one browser over another. But we feel that right now it is more important to fix the problems with our existing platform than to provide stability. In the long run, we will have a better platform for both groups to work with. And in the long run, stability will come back. There are many bugs to fix first, but there are not infinitely many bugs, even if it seems like it sometimes. Having said that, there are some things we could be doing right now to make extension and website developers’ lives better … but I’m going to save them for the next post. 900 words is enough.

Note to commenters: I know lots of people are unhappy with the UX changes post-FF3.6, but let’s keep this to discussion of API breakage, please.

Responses to “Breaking things every six weeks”

  1. Philipp

    Thanks Zack, this is one of the best blog posts on this topic up to date. Too bad Mozilla was not able to communicate this message much earlier. I’m absolutely sure the people that decided on the rapid release process did this change not without considering the alternatives and weighting its problems and benefits. But I got the feeling many had a hard time explaining their reasoning.

  2. DaveG

    Part of the problem with extension breakages is a do as we say not as we do sort of thing. For the web, we tell people to check for features and not individual browsers and versions. With extensions, we can do nothing but check for individual browsers and versions. AMO’s plan of running a compatibility check on addons and remotely bumping max versions if possible is a start, but one of these days we really need a way for that to be done automatically and reliably in the client.

    Oh, and binary components need to go away entirely. Aside from their problems they just can’t keep up with needing to be re-compiled for every major version. This keeps coming up in discussions but as of yet no one has made the decision to pull the trigger and begin phasing support for them out.

    1. Zack Weinberg
      With extensions, we can do nothing but check for individual browsers and versions.

      That’s a really good point. It’s not even as if it would be hard to do feature detection. XPCOM, much as I dislike it, is designed to let you do interface detection, and all the standard JS techniques apply. But we have this maxVersion thing getting in the way.

      Oh, and binary components need to go away entirely.

      Absolutely. That’s gonna be near the top of my list of things to change in the next post.

      1. Lozzy

        A couple of things I’m curious about (not an extension dev though)

        Dave seemed to believe that feature detection in extensions was not possible full stop, yet you’re saying that they do … or is that just XPCOM? Having something similar to the feature detection web devs get definitely seems to be the way forwards though.

        Also, do you have any prediction on what the reaction will be from devs who use binary components? Will they welcome someone finally pulling that trigger?

        I know I’m going to be hurt a little by it, since one of my critical extensions is a gestures extension with binary components which has been all but dropped by the developer, so I’ve just been relying on the compatibility reporter and hoping that there’s no changes that will break it to get me through. I know I’ll probably have to move on from it eventually, but it’s just such an excellent extension.

  3. Jeffrey

    …without extensions there is not much reason to pick one browser over another.

    Extensions are mostly for power-users. Most users don’t even know or care about extensions.

    Really, what bothers most users about updates are big interface changes and additional features which they have to evaluate and get used to. Even the good intentioned Mozilla Foundation might make changes that users don’t agree with. As long as the changes are evolutionary and not revolutionary most users won’t have a problem. Fixes for speed, stability and compatibility are safe bets.

    Firefox has been increasingly imposing UI changes and features on users. The transition from Firefox 1 to Firefox 2 was okay but both Firefox 3 and Firefox 4 imposed some changes on users that kept them from upgrading until website compatibility forced them to.

    1. Zack Weinberg

      I do not wish to discuss any of the recent UI changes to Firefox here, only the impact of the rapid release cycle on extension developers and web site maintainers.

  4. Robert Kaiser

    I disagree that any of XUL, XBL, XPConnect, XPCOM, NSPR, even mork, were anything like bad ideas. In fact, some of those were absolutely brilliant, and we are still striving to make standard web technologies match the power of some of those, esp. XUL and XBL - and still have a number of miles to go on our way there. Of course, we might be been going too far or gone along some bad paths in XPCOM and using RDF, but that doesn’t make them bad ideas altogether. some of those were awesome steps in enabling compatibility cross-platform and -language. But, as you mention, a lot of those technologies are more than 12-13 years old and might not be completely in line with current practices or platforms any more - and I agree that we can and need to improve to match up better again.

    Let’s not focus on making the past look like we messed up, let’s focus on learning from what was good and bad and improving the future. Let’s not focus on breaking things, let’s focus on enabling. Let’s not focus on how we suck, but how we can rock.

    At Mozilla, we’re not moaning the past, we’re taking advantage of knowing about it today to build the future.

    1. Zack Weinberg
      I disagree that any of XUL, XBL, XPConnect, XPCOM, NSPR, even mork, were anything like bad ideas.

      Dude, have you seen what a mork file looks like? ;-)

      But seriously … I’ll grant that we don’t have a good replacement for XUL, XBL, XPConnect, XPCOM, NSS, or NSPR right now. I’ll even grant that you can currently do things with some of them that you can’t do with anything else. That doesn’t make any of them good designs, and in particular I stand on the assertion that XPCOM has always been a Bad Thing and should have been recognized as such on day one. Something like NSPR was needed back in the 1990s, as the Unix wars raged, but it is horribly overdesigned for what it does (count the number of abstractions you go through to get from PR_Listen() to listen() some time), and the need for it is much reduced now.

      Let’s not focus on making the past look like we messed up, let’s focus on learning from what was good and bad and improving the future.

      We can’t learn from our mistakes if we don’t acknowledge that they were mistakes.

      1. Dan

        We most certainly can learn from many of the things you’re calling mistakes without acknowledging that they were mistakes.

        A lot of design decisions etc made sense when they were made– an era which is dramatically different from today. Different constraints on resources (cpu, memory, disk space, programmer time), different compilers and OS environments, different web formats, different ideas about the web’s future development, et cetera et cetera ad infinitum. New design ideas, new algorithms, and all kinds of other information have come to light since those days as well.

        In particular, making a truly cross-platform application is not as scary a proposition these days as it was in the days when nglayout was new. You’ve already mentioned how unix api consolidation makes nspr not as necessary as it was in the 90s. I don’t know enough about moz internals to know all of how this effects things, but I imagine a lot of things are simplified by the disappearance of the two platforms which were dominant in the pre-1.0 days; Windows 9x and pre-OSX MacOS were odd beasts in a lot of ways. C++ compiler interop was also a huge headache; even gcc didn’t really get things right until 3.4 in 2004. Cross-platform UI toolkits were basically a joke, and the closest thing to a cross-platform drawing api was a long-forgotten idiosyncrasy of the NeXT platform (Display PostScript).

        Designing a way to write as much of the browser as possible in XML etc was not only an interesting pre-AJAX idea for how the web could become a development platform but also a survival mechanism in a very cross-platform-hostile world.

  5. tom jones

    no frozen APIs is a bitter pill.. 6 week release cycle is another bitter pill..

    why did they have to be taken together? could unfrozen APIs with like 6 months release cycle bring most of the benefit but with a lot less bitter?

  6. Mook

    I suppose the abandonment of freezing interfaces was good, but only when viewed as stopping a charade. My pet bug for that is bug 269323, filed Nov 11, 2004 and WONTFIXed July 25, 2010. It concerns attempting to freeze an interface that hasn’t changed (other than comments) since Feb 24, 2003 (up to current mozilla-central). Most people attempting to use it really just wanted the OS-native toplevel window handle. Simply put, not nearly enough interfaces were frozen to be useful - and it’s not even as if it was impossible to simply stop supporting that interface completely; nsIEnumerator (!= nsISimpleEnumerator) pretty much died without any problems because the API was horrible for JS consumers.

    The avoidance of interface changes only happened on release branches, never trunk - on trunk, interfaces can (and do!) change at will. The only people that tried to avoid interface changing were doing that because binary extensions were using them - and that was really a symptom of not freezing interfaces early enough, since people were obviously finding them useful enough to try to use externally.

    Not that matters, anyway - the declaration to discard the concept for frozen interfaces happened in Firefox 4 (Gecko 2), but rapid releases actually started with Firefox 5 (Gecko 5). The release scheduling has nothing to do with interfaces, and tying the two together is just muddying the waters.

    Binary components are being removed due to perceived (and actual) crashshiness; however, there seemed to have been little work on the Mozilla side to try to help external developers make them less crashy. Instead, it’s just outright banning with no adequate replacement. (JS-ctypes is still rather poorly documented, has few users in the tree other than over-complicated and condensed test cases.) Much like the rapid release cycle before silent updates, this seems to be a case of putting the cart before the horse.

    If addons are for power users, then 85% of all Firefox users are power users. Regardless of what market you want to be in, that’s a pretty big chunk of the actual user base :)

    I think the fact that Mozilla is having trouble catching up to Webkit is two-fold: one, the obvious problems with a codebase from the mid-90s that wanted to be compatible with a whole bunch of systems that don’t really do C++; and two, a consistent problem in getting external contributors. The first part results in the second somewhat, of course - but also, reviews are too slow to maintain momentum, and core developers all work on the same product for the same company, with no regard for external developer and therefore alienating people with frequent breakage. I don’t find it particularly surprising (given past and present attitudes) that the only modules with a owner that didn’t get there via employment at MoCo are xptcall (timeless) and jsd (timeless again, and as far as I know he wasn’t being involved in the jsd2 work).

    Even with all the complaining, of course, I’m likely to stick around - it’s just been frustrating seeing what seems to be a reality distortion field around people involved in MoCo/MoFo with little acknowledgement of the pain of the people who don’t ship Firefox. I certainly didn’t intend to single you out - this post’s just sort of been the straw that broke the camel’s back. If you can think of better venues where feedback would actually be listened to, I’d appreciate any information on that.

    1. Gavin Sharp

      Most of the modules have had their owners since before MoCo even existed, so your point about ownership having been obtained via MoCo employment is demonstrably false.

      the pain of people who don’t ship Firefox exists because the majority of the Mozilla project believes that Firefox is the best tool we have to have an impact on the Web, and to fulfill Mozilla’s mission. It’s not a MoCo vs. the community issue at all.

      1. Mook

        I don’t think it matters to my point whether the entity was called Netscape, Mozilla Foundation, or Mozilla Corporation; the constant here is that external groups - and over the last decade or so there has been a few - have not been able to become central enough to the project.

        I think we disagree on what the Mozilla project is. I think you’re counting the people you currently know about contributing, and I’m thinking about various people who have tried to contributed.

        Anyway, I see a post in .planning that looks interesting…

  7. Jonathan

    Very soundly argued blog post, probably the best I’ve seen on the topic. Another thing you might want to mention is that addon developers who embrace the rapid release cycle will be able to drop backwards compatibility and release new versions of their addons for new versions of Firefox / Thunderbird only. Having tried it, it’s actually pretty awesome.

  8. Arpad Borsos

    Other things that should have been recognized as bad ideas: NSPR. How true. I never understood why it was necessary to recreate the types (bool for example) that are already in the stdlib.

    I also did some refactoring some time ago and remember one of the changes was rejected because it broke a frozen interface. Or wait, it was not frozen but extensions depended on it? Hm no matter. It`s good to hear that some of that cruft is finally being cleaned up.

    1. Michael Kaply

      Other things that should have been recognized as bad ideas: NSPR.

      You really don’t understand how wrong you are.

      The cross platform nature of the Netscape code (including NSPR) allowed us to port it to OS/2 with incredible ease. The focus back then was on cross platform (as was already pointed out). It was even ported to BeOS.

      It’s easy to look at things now and say they were bad decisions, but unless you were there involved in making the decisions, you don’t have a clue.

      You do realize these technology decisions were made 15 years ago, right? The world was a very different place.

      1. Arpad Borsos

        I agree it may have been the best choice 15 years ago. And I haven’t been around for that long.

        But now 15 year later, we have C99, C++11 and POSIX. It’s time to let the past be past and move on to the future ;-)

        Today these old remnants are slowing down progress, and having worked on a few cleanups in widget/ I know what a pain os/2 and beos are and I will be happy to see them removed.

  9. Scott Baker

    I’m afraid I’ll have to disagree.

    I believe that this new rapid release will be the thing that kills Firefox (It may already have) I’m a power user, and I hang out with a lot of other power users and as was correctly stated we live for the extensions where typical users probably don’t care. Keep in mind here it was extensions to a large extent that won Firefox all it’s initial recognition. However, due to the rapid release, extension developers can’t keep up and the majority of the power users I have spoken with have been forced to do one of two things already:

    a) Turn off automatic updates (where I’m at for now)
    b) Switch to chrome

    Power users are the ones friends and family come to for advice, the ones that install software on people’s machines who otherwise would be running whichever browser came on the thing when they bought the package from whichever vender they got it from. Power Users are your best marketing resource - and Mozilla has made it very clear to most power users that they simply don’t care about us - their own software ideals trump our needs. They have told their extension developers who spent hours of their time to build great things on top of an API that their time is less valuable then Mozilla’s and you’ll just have to keep giving your time to suit the same ideals or bugger off.

    This is not how you make a better web, it’s how you shoot yourself in the foot and promote people continuing to use browsers with perhaps known security flaws so they can actually get their work done.

    I love Firefox - it’s been my favorite browser. Sadly though unless this changes, I can’t see continuing to use it and I find that really sad. If there are a ton of bugs and significant platform changes are required that solve them - push a major release that breaks the API and then give people time to recover. Extensions developers will understand that - what they won’t understand is doing that every few weeks, indefinitely.

    I’m probably sending this plea to deaf ears - which is also sad, but I would love to have Firefox back, I would love to see Mozilla come to it’s senses.

  10. James Napolitano

    The second thing to know is that the core browser (Gecko) suffers from enormous technical debt. Like any large, 15-year-old piece of software, we have code in abundance that was written under too much time pressure to get it right, was written so long ago that nobody remembers how it works, isn’t comprehensively tested, or any combination of the above.

    Can anyone provide more info about this? I know over the years there have been a large amount of rewrites (dropping old gfx code for cairo/thebes, replacing the html parser, roc’s compositor work, etc.) and outright code removals (e.g. old plugin technologies like javaxpcom), with more planned. So, how many areas of the codebase (or what fraction of the codebase) would still be old cruft that no one knows how they work and rarely touch? As for the rest of the code, any thoughts on its overall quality or design? There’s also been lots of deCOMtamination work for years; any idea of how much left is there to be done? (I get the impression that it’s a lot!).

    We also have major components that reasonably seemed like good ideas at the time, but have since proven to be a hindrance (XUL, XBL, XPConnect, etc). We have other major components that should have been recognized as bad ideas at the time, but weren’t (XPCOM, NSPR, etc). And we have code for which there is no excuse at all.

    I suppose this (combined with the above) is part why Mozilla didn’t push its XULRunner platform as much as some were calling it to. IIUC, the underlying technologies were flawed and/or needed significant work to overhaul, refactor, and document. For instance, remote XUL had to be disabled due to years of security problems; fixing all of them would have essentially required rewriting the XUL code, which would have taken away from efforts to support new standards like XBL2. Thus it didn’t make sense to encourage more people to build on top of all these old technologies. This would have required more resources to support them and it would have make it harder to make needed changes to Gecko. However, this wasn’t clearly stated, so people were left wondering why on earth Mozilla had this great platform it wasn’t furthering; it seemed like such a waste. (Another reason of course was that Mozilla wanted people building on top of the standard web platform.)

    1. Boris

      Well, obvious cruft that still needs fixing/removing:

      • Replacing XBL1 with a saner component model now that we have a better idea of what that would be like.
      • Replacing the old somewhat-insane XUL flexbox model with CSS3 flexbox.
      • Rearchitecting block layout.
      • Rearchitecting inline layout.
      • Changing the method signatures on nsISupports (e.g. changing the ABI/API of QueryInterface).
      • Finishing changes to how we store the DOM tree.
      • Better DOM bindings.

      That’s in addition to things like IonMonkey, changes to align better with html5/webapps and so forth.

      In brief, you name it it probably needs improvement. This is not exactly isolated to Gecko. ;)

  11. Ron
    And in the long run, stability will come back.

    The problem is, because Mozilla has been rapidly pushing things users don’t want at them, your users will not be coming back. Once you lose them, they’re gone.

  12. lo

    Hi!

    So what’s the deal with new releases updating the major version number?

    Why jump from 4.0 to 5.0, instead of simply using minor version numbers?

    It’s obviously not my main concern over this issue, but it seems like a stupid marketing trick stolen off Chrome and I had to ask.

    Thanks!

  13. Petter

    Any Firefox criticism is systematically being ignored.

    They don’t care about 3rd party developers, power users, or regular users for that matter.

    They have endless excuses as to why each and every opinion of disagreement does not count.

    In short, it’s their way or highway, and they’ve made it, and their arrogance, very clear.

    I invite everyone to do their share to reduce Firefox market share.

    Tell to friends, colleagues, family – and insist they uninstall Firefox or better yet, do the honors for them. It’s all the more users for competition. IE8/9 are actually quite usable.

    The time has come to regain liberty from evil tyranny, the tyranny that is Mozilla.

    In the end, Mozilla will learn they’ve taken the long path of digging their own grave.

    Pride goeth before destruction, and an haughty spirit before a fall.

  14. Mozilla Arrogance

    So, the love for software candy, i.e. rapid release is going to HELP create an environment of dependable long-term software standards for the web? It’s going to improve the existence of strong QA test barriers? NOT! It’s no longer do-it-right-the-first-time now it’s fix it as you go, tweak it as you go, change it for no reason, wow, what progress.