Friday, June 17, 2005

Genunix.Org is Alive equipment with the N2120 staged for the camera only

Take a look at GenUnix.Org There's not much content there now, beyond a mirror of the OpenSolaris launch files and some video from the first Open Solaris User Group meeting; but that'll change in the future. Cyril Plisko has an operational SubVersion (SVN) source repository hosted at the site.

How got started (Part 1 of 2)

Early in May, I got the idea to host an OpenSolaris Community/Mirror site. First off was to leave a message for Paul Vixie of Internet Systems Consortium - because I know that they currently host and a bunch of other, successful, Open Source projects. I wanted to add OpenSolaris to that list.

Within a week I had been contacted by Peter Losher and we got an OK to proceed. I could hardly believe it - access to a clean one gigabit connection to the internet with the rackspace, power, cooling and bandwidth sponsored by ISC.

Next I needed to scrounge up some equipment. We (at Logical Approach) decided to sponsor the site with a maxxed out V20Z: two 146 gigabyte drives, 8 gigabytes of memory and two AMD 252 (2.6GHz) Opteron processors. This would ensure that a site would go online and indicate our committment to this project. However I was reluctant to bringup the site to support the upcoming launch of OpenSolaris, with just one server. I wanted high performance .... but also realized that high reliability and high availability were primary requirements.

So I put together a generic technical spec - generic in that it described the basic architectural building blocks of the site, but did not specify vendor specific part numbers or detailed configuration. The spec. also broke down the equipment into two procurement phases, which were called a Starter System Configuration and an Enhanced System Configuration. This would allow the site to go online with the starter config and, later, to be expanded to the enhanced config. Here is what the top level generic spec looked like:

Starter System Configuration Overview
  1. Server Load Balancer (aka Application Switch) standalone appliance with:
  • 12 * gigabit ethernet ports configured
  • - 2 * optical ports to connect to the ISC infrastructure
  • - 10 * copper UTP ports to connect to the web servers
  • 2 * A/C power supplies
  1. Four 1U dual AMD Opteron based rackmount servers configured
  • 2 * AMD Opteron 252 (2.6GHz) CPUs
  • 8Gb RAM
  • 2 * 146Gb U320 SCSI disk drives
  • 2 * built-in copper gigabit ethernet ports
  • 1 * dual-port gigabit ethernet expansion card
Enhanced System Configuration Overview
  1. One Fibre Channel (FC) SAN disk subsystem configured
  • 12 * 146Gb Fibre Channel 3.5" disk drives
  • 2 * RAID Controllers with 1-GB Cache Each and battery backup
  • 4 * 2Gb/Sec FC Host ports
  • 2 * A/C power supplies
  1. Four Fibre Channel Host adapters
  • PCI 64-bit low profile form factor
  • 2Gb/Sec Optical LC connectors
  • 2m Optical cable
As you can tell, the reliability/availability comes from using a Server Load Balancer (SLB) aka Application Switch, to load balance incoming requests across multiple, backend, servers. The load balancer issues periodic health checks and, assuming all 4 servers are healthy, the requests will be distributed according to the selected load balancing algorithm to the available servers in the defined pool. The real beauty of this approach, is that you can also do scheduled maintenance on any of the servers by "telling" the SLB to take a particular server out of the available pool. You wait until all active sessions expire on the server, then disconnect it. Now you are free to ugrade or repair it. Lets assume you're upgrading the Operating System. After you've completed the upgrade, you have plenty of time to test exhaustively, because the other servers in the pool are serving your client requests. When you've satisified that the upgraded server is ready for production, simply tell the SLB to put it back into the pool. Your user community experiences no impact and are completely unaware that you've just upgraded a server.

This architecture is also cost effective - because you consider each server as a throw away server. I don't mean this literally. Each server can had a single power supply or a single SCSI buss, or non-mirrored disks - because if it fails, it will have little impact on the service you're providing. This is in stark contrast to using high end (read expensive) servers with multiple power supplies, multiple disk subsystem busses and mirrored disk drives.

Next the generic spec was translated into a detailed vendor specific specification, including a parts list. Of course I preferred that Sun would provide hardware sponsorship - so there was a little Sun bias in the original generic spec. For the servers, I really wanted to use the Sun V20Z - it's an awesome server based on the AMD Opteron processor and runs Solaris based applications with impressive speed and efficiency.

I ran the spec by the other members of the CAB as a sanity check. No feedback = good news. Next I presented it to Jim Grisanzio and Stephen Harpster. Initially I got a No - for various reasons. Then Simon Phipps (also a CAB member) told me to forward the proposal to John Fowler.

In the meantime, I was busy upgrading Logicals' V20Z with the required new CPUs, expanded memory capacity and a couple of 146Gb disk drives. Unfortunately the new CPUs were not compatible with the existing motherboard or Voltage Regulator Modules (VRM). The V20Z uses a separate VRM for the CPU and memory. The Sun 252 processor upgrade kits, came with the required VRMs - so that was not an issue. But the included documentation indicated the requirement for a revision K2.5 motherboard, or, in Suns terminology, the Super FRU Chassis assembly, where FRU means Field Replacable Unit. Since this was a Sun supplied upgrade, I called Suns tech support and explained the issue. In less than an hour I had a case number and was told that a replacement motherboard would be dispatched.

It takes about one hour of careful work, to strip your existing motherboard and "transplant" the parts [1] to the replacement. And then about 10 minutes to install the new CPU, heatsink, CPU and memory VRMs. It helps if you are comfortable working on PC hardware - if not, I'd recommend that you find someone who is. One (big) advantage of the updated motherboard is (IMHO) quieter speed controlled fans and support for DDR400 memory parts (with the upgraded CPU).

On June 1 an email arrived with the news I had been awaiting. Bill Channel now had my request, via John Fowler, for Hardware sponsorship and he was ready to get started on making this happen! :)

The hardware was scheduled for delivery on Monday June 6th.

Note [1]: CDROM/floppy assembly, SCSI backplane, PCI risers, Power Supply, SCSI backplane cable assemblies, daughter board with keyboard/mouse connectors, memory, disk drive(s).

Continued in Part II.

How Genunix.Org got started (part 2 of 2)

Peter LosherAl Hopper

Ben Rockwood

It's 16 hours before the public launch of OpenSolaris as I write this paragraph and I'm getting really excited, but I'm also really tired. I've been working furiously to try to get a community run OpenSolaris site online in time to support launch. The actual hardware did'nt arrive at Logical Approach until late afternoon on Monday the 6th of June. The site hardware consists of 4 Sun V20Z servers in a maxxed out configuration - two 146 gigabyte drives, 8 gigabytes of memory and two AMD 252 (2.6GHz) Opteron processors. Three of the V20Zs and an N2120 Applications Switch (aka Server Load Balancer (SLB)) were sponsored by Sun, the 4th V20Z was contributed by our company, Logical Approach.

Now if I did'nt have a day job at Logical - having this hardware arrive on Monday afternoon, with a scheduled install in an internet co-location facility 1,700 miles away on the Friday of the same week, would be doable. A push, yes, but doable. But, unfortunately we have our customers to look after and that week was pretty busy around here - aside from the ongoing Community Advisorary Board (CAB) activity and trying to keep up with the OpenSolaris Pilot program and mailing lists that were becoming increasingly noisey as we approached launch.

We put off the planned Friday hardware install in Palo Alto, until Saturday, shipped out 3 of the machines for overnight delivery on Thursday and then shipped out the 4th V20Z and the Application Switch on Friday for Saturday (next day) AM delivery. You don't want to know what our FedEx bill looked like!

But everything did'nt go smoothly. In fact we got stymied by the Application switch configuration process. Now, you may already know this; but Load Balancers, as a class of tech toys, are complex devices. That is just the nature of the beast. But, unfortunately, moving from one load balancer to another is like moving from one country to another foreign one; almost everything you learned previously is instantly obsoleted - including the language. The terminology changes completely. The menu system changes; the order of configuration setup steps change. In short, you may even feel that your previous experience (with Foundry Networks ServerIron and Alteon Websystems (now subsumed by Nortel)) seems more like a curse, than a blessing. Again, that's the nature of the beast.

So I raised my hand for help (from Sun) on Friday around midday (Central Time). Yep .... that's a great time to ask for help! And finding the right person at Sun can be daunting, especially within such a large organization. It turns out that the N2120 Application switch is a product Sun acquired when they bought Nauticus Networks. It also happens, that the right person to help configure the switch was off getting married. How inconsiderate of him (just kidding)!

So we shipped the switch in a state of less than digital nervana, overnight to Palo Alto and I was on the first flight on Saturday morning, departing from DFW (Dallas Fort-Worth) for San Francisco. The flight was great - the fun started after the plane landed. I had to check a roller-board type case, because it contained hand tools that would have been confiscated if I had tried to bring it onboard as carry-on baggage. It also contained a bunch of CAT-5 and CAT-6 ethernet cables - so it would very likely be given close scrutiny and hand checked.

After the flight landed at SFO, it took about 40 minutes for that bag to make it to the baggage area! Now you know why everyone uses carry on baggage and all the storage space inside the cabin gets exhausted on most flights. Next up: Budget car rental. The first thing that was easy to see is that the entire car rental area at the airpot was mobbed out. I stood inline for about 30 minutes, got the paperwork done and then the lady helping me had an argument with someone responsible for getting cleaned cars ready for pickup in the nearby parking garage. She slams down the phone angrily. Another woman cuts across me and speaks sternly to her asking her "Why do I have to wait for my car". The answer - because it had to be moved and cleaned. The woman looked at me and said "I don't know why I have to wait..." and hurried away still muttering. I guess she was in a rush to continue waiting. I asked if it would help if I could get a dirty car. "No" came the response; along with a look that told me to quit while I was still ahead. In the meantime I gave my ISC contact, Peter Losher, a heads up message that I'm going to be late at the Colo. He had already begun his drive there from his office, after I told him I was in the car rental line. I was driving out of the car rental garage about 50 minutes after first getting in line! How is that for service! :(

So I get to the CoLo in record time. It's possible, that I may have exceeded the legal speed limit on the drive there. Luckily I did'nt have a law enforcement officer confirm whether I did or did not. So I'll admit to the possibly only! :) I arrived at the CoLo close to 1:30 PM - a far cry from the 11:00 AM planned time. Here I met up with Ben Rockwood, the other person crazy enough to try to make the community site happen on such a tight deadline.

So what happened with the App switch I hear you ask? Well, help did'nt arrive in time and we shipped it at the last possible moment on Friday evening. Luckily, and thanks to FedEx, everything made it to the CoLo and was waiting for us when we arrived. We immediately got down to work. The racks were too shallow for this current generation of 1U gear - which Peter rightly says "grew in depth" to make up for what it lost in height! We had to mount the equipment on shelves. We had to move some shelves to get the ones deep enough for the cabinet we were assigned. When it was time to mount the App switch, we could not find any more deep shelves, so we had no option but to mount it, on an available shallow shelf, in a neighbouring rack. Before we did that, I pervailed on everyone present to mount a shallow bracket in the same rack as the V20Zs and temporarily mount the App switch, with a human holding up the other end, so that we could get a group shot of all the equipment front panels. In the pictures you see, the 3rd person is actually holding up the rear of the App switch so that the other 2 people (person in front of the camera and person behind the camera) can snap a photo with all the equipment front panels visible.

We also had a funny incident, where Ben, in his rush to get the job done, applied a mounting rail to the wrong side of a V20Z. This was before we figured out that we had to mount the servers on shelves. Since it was mounted backwards, it's locking mechanism locked the rail in place and then became inaccessible. It proved very difficult to remove. This was one of those embarassing moments that anyone would rather forget. Bens' ultimate solution, however, was ingenious. He removed the end tab from his tape measure and used it to slide down, in the narrow space between the rail and the computer case, until he was able to poke at the locking mechanism and release it. Later that day, after using the tape measure to measure rack spacing, the spring loaded retraction mechanism gobbled up the tape and it disappeared inside the case forever - since it no longer had the tab which would normally prevent that happening!

We ended up being practically thrown out of the CoLo by 4:00 PM - our work complete. This was unfortunage, because I had planned on spending considerable time at the CoLo, but Peter was supposed to be somewhere else at 3:00 PM and we had to leave.

The CoLo cage is a difficult environment to work in; it's noisey and you're constantly bombarded with hot and very dry air. We yelled at each other the entire time and I ended up dehydrated - because of the dry air and not drinking much water that day. We also missed a meal - and that took its' toll on us all. Especially since we'd all been working crazy hours that week. Peter was in the worst shape - he was still recovering from a really bad case of the flu.

Without the App switch, the original, carefully designed site architecture design had to be discarded. Also, we pretty much had to redesign it on the fly - so that if App switch config help arrived on Monday, we would still be able make use of its load balancing capabilities. So some of the server ports were connected up to the app switch and others were connected to Peters new HP 2824 gigabit switch. The IP addressing plan was obsoleted - since the App switch has NAT (Network Address Translation) facilities, and without it, we had no NAT capability. There were other features provided by the App switch which were also part of the design, which I won't go into. We barely had enough time to configure routeable addresses on the SP (Service Processor) management ports and set them up. That was to be our point of entry into the system to make the other changes that were required without the App switch - since the servers were configured per the original design (with the App switch present) while they were on the bench.

On Day 2 (Sunday) of the install I prevailed on Peter to allow me to work in the CoLo, but had to promise him it would only be an hour, maximum. I made the best use of the hour, doing a lot of tidying up work that should have been completed the day before. We also discussed the ISC network topology and peering and made plans to have ISC host our DNS records, at least initially. Also we got the App switch console wired into a terminal server. But setting up remote access to the terminal server, and hence accessing the App switch console port, would be deferred until later that day. Also populating the DNS data would be left until Peter was in a much more hospital work environment - his office.

Both Ben and I assumed we would have as many hours as we needed at the CoLo - but that was not to be, because the policies in place, demanded that we be shaparoned. This was a big factor in the issues that plagued us later. We were simply too rushed and did'nt have time to check then double check, our work. After leaving the CoLo I jumped an earlier flight and ended up back in the DFW area around 11:00 PM. Perfect. On the way home from the airport, I know I exceeded the speed limit. I had a police officier catch up with me about a mile after he first saw the car, and tell me my exact speed at the time: 82 MPH. The following morning I started the work of finishing up the machine configs and making the changes mandated by the lack of the App switch. I hit a "minor" (yeah right!) problem. I could connect to the V20Z SP (Service Processor), but I could not see a console login! I could not access the Operating System. Initially I took this in stride. Knowing that I had only played with the SP facility on the V20Z previously - it was just a case of reading the manual and figuring it out, or so I thought.

Meantime there were other fires to extinguish. I was still seeing DNS errors - our new domain,, was not resolvable. Also I did'nt know how to access the App switch console port remotely. A couple of calls/emails to Peter and he assured me that all would be well with DNS within 30 minutes and that we'd be able to get to the App console port soon. In the meantime I had an email from a gentleman at Sun offering support for the App switch. He emailed very early that morning (around 7:00 AM). But we did'nt have console access to the App switch until around noon. In the meanwhile I had sent him a scaled back App switch specification and details of our (modified) addressing scheme. I told him that I was looking to gain access to the servers that were already (physically) connected to the App switch, but had been unable to figure out how to put the required switch ports in the correct VLAN, and enable them. His first issue was not being able to resolve the name for the unix box that ultimately gains us console access to the switch serial port. Then, after he got the IP address, he could not connect to it (no route to host) from his office. Obviously their office internet access (DNS and routing) is totally foobarred. Last I heard he was working the issue (fire up a GPRS cell modem on his laptop or go home & use his home DSL), but it turns out he was unable to do either. So, I've been hacking on the App switch since about 4:00 PM (CDT) while Ben Rockwood took a fresh look at resolving the "no console access via the SP" issue - using any resources he could find online. We've probably hit an SP bug, in that the factory config (BIOS, SP code and OS) etc. won't allow console access out of the box.

On the App switch, I got to the point where I could ping one of the private interfaces on the servers, but the switch does not appear to have an SSH client or a telnet client. The App switch specialist was able to confirm that it does not have SSH client capability. I came to the realization that it did'nt have a telnet client all on my own. After I had achieved the ability to ping a server and I went looking for the telnet command in the menu system. It does, however, have an SSH daemon and a telnet daemon. How strange. So by 6:00 PM Central, after 2 hours (wasted) on hacking on the App switch config I realize that access to the servers via the App switch had reached a dead end.

Meanwhile Ben is trying every trick he can think of to get around the SP to console login issue and making use of every resource he could think of, on the 'net. By about 8:15 PM or so this avenue was not yielding any results. So I talked with Ben and then reported the bad news to the OpenSolaris Pilot community and several people (via a CC list) that I had made promises to. It was bitterly disappointing.

Now it's 12:35 AM (CDT) on Tues the 14th: OpenSolaris Launch Day. Stories have already been posted and I got a call about 15 minutes ago from Ben who says he has been successful in making arrangements to gain physical access to the CoLo. Pretty amazing. I was about to turn in ... but he'll need some help to test (now that's a novel idea, is'nt it!) from the outside. So I'm working this blog while waiting and then we'll see what we can get done.

1:00 AM (CDT): I get a call from Ben and we've in business thanks to his heroic efforts and the incredible co-operation of the ISC folks. So now we _start_ working the machines.

4:15 AM (CDT) and I just emailed Cyril Plisko and let him know that the SVN repository zone and logins are ready for his use - as promised. He has his own zone on the machine, called svn, and the fully qualified hostname is He just logged in and I'm checking with Ben R to see if there is anything he needs help with.

4:30 AM: I get some sleep while Ben continues to finish up the "starter" site.

8:15 AM (CDT) - One hour and 45 minutes before launch: I send an email to Derek Cicero telling him that we're ready for content. The subject line reads: "Rabbit emerges out of hat". After receiving a reply with a URL on where to get the content, I wake up Ben (by phone) and send him Dereks' URL. Minutes later, there's content available on! :)

9:00 AM (CDT): I get on the Sun launch conference call and help out where possible. It was a great opportunity to "live" the launch event. I continued to watch our site ( and do some cleanup & further testing on it. The press releases fly and the file downloads begin to fly, everything goes very, very smoothly.

10:00 AM (CDT) Official Launch Time: Looking at /. (slashdot) I see that OpenSolaris gets a comparitively easy introduction to the world. Nothing much of a dreaded Linux jihad emerges - thankfully.

Later in the morning I settle into my "Day Job" and interact a little on IRC and the OpenSolaris mailing list. It was your typical Monday. BTW: I hate Mondays!

So lets summarize what went wrong
  • not setting realistic expectations and timelines
  • not anticipating air travel hassles
  • budgeting a 36 hour window of CoLo time; getting 3 1/2 hours
  • lack of testing
  • working a project while fatigued. Fatigue will allow you to make silly errors and not catch them. I know this from my pilot training.
  • delegating a simple task that looks like it can be done easily/quickly (DNS) and then failing to recognize that its not a viable strategy.
  • using non "main stream" equipment that you're not familiar with.
  • assuming, that if you can "talk" to the SP, you can get a console login.
  • and did I mention lack of testing?

And lets summarize what went right
  • we pushed out 90 gigabytes on content on launch day.
  • we saw incredible cooperation from many, many exhausted individuals.
  • we saw Sun Nack, Ack and then deliver on hardware sponsorship in 18 days. Thats pretty incredible for a company as large as Sun.
  • we got incredible co-operation from Peter and the folks at ISC.
In particular Peter worked really hard while still recovering from a bad dose of the 'flu and dealt with other disruptive events, like his laptop disk drive going on the blink and having to be replaced just before he went out of town on an important install.

There were many heroes in this tale. I've already mentioned some and I apologize for those I have left out. PS: send me email if you want something included and I'll update this book.

Saturday, January 22, 2005

Linus must step aside

Have you noticed that company founders or leaders often give up their pivotal role in a company that they have founded or been instrumental in leading? They either step aside or are forced out. Why is this? Because, in order for the company to continue to make progress and grow, they need to step aside. The smart ones recognize this - the dumb ones..... oh well.

So lets examine this phenomenon. The people we are talking about usually share the following character traits: They are brilliant, very talented, visionary and very demanding to work for. These character traits are what makes them different and allows them to create a company or product that others are incapable of. Those are the upsides. But there are corresponding downsides.

They are usually a royal pain in the ask to work with. Highly opinionated, very judgmental and apt to be very stubborn. They can also be inflexible. Again the smart ones recognize their weaknesses and surround themselves with other talented folk who help to balance out their personalities. It's not uncommon to find company partners with very different personalities and styles. And the dumb ones ... well no one can possibly be as smart or as talented as they are, and there's nothing wrong with their personalities anyway - so why even bother "playing well with others"!

So here's my point: Linus Torvalds must step aside and let Linux flourish. Linux has reached the personal limitations of Linus - it's creator and mentor. It's currently limited by the mental boundries and personality of its founder. Oh and yes - it appears the Linus does not recognize this problem and does not understand, that he has to step aside.

Lets take look at a serious limitation of Linux (the OS) which is a direct result of the limitations of Linus: Lack of a stable kernel API. According to Linus - having rigid APIs would limit the creativity of the kernel developers. Well ... yes it would, but it would also bring some decipline to the kernel code and it would allow a driver developer to deliver a device driver that does not have to be re-written every time the APIs change. It would also stop hundreds of developers from constantly rewriting and retesting their code every time the APIs change. It would also force the kernel developers to think with their minds and not with their keyboards!

But is this doable? Can the kernel APIs remain stable and not stifle developer creativity? Answer: Yes and yes. Look at Solaris 10 and the DTrace facility. Over 40,000 tracepoints in the kernel with negligible impact on performance, and yet, the tens of thousands of lines of code that I've written, going back to Solaris 2.5 and earlier, still run on Solaris 10 without any changes! And the same code runs on SPARC and Solaris x86 - with just a simple recompile. Time is money - and just think of the dollars involved by not having to constantly rewrite and retest Solaris based code.

On the flipside Linux has one thing going for it that Solaris does not have - a vibrant and active volunteer "army" of developers. But that's about to change when OpenSolaris goes live later this year. I'm a member of the OpenSolaris Pilot program and it's interesting and exciting to be perusing the crown jewels of Sun ... Solaris source code. Just think of it; you're looking at the fruits of the labors of hundreds of man years of effort from some of the most talented developers on the planet. Awesome.

So step aside Linus - or be run over by the OpenSolaris juggernaut.