Friday, June 17, 2005

How Genunix.Org got started (part 2 of 2)













Peter LosherAl Hopper







Ben Rockwood

It's 16 hours before the public launch of OpenSolaris as I write this paragraph and I'm getting really excited, but I'm also really tired. I've been working furiously to try to get a community run OpenSolaris site online in time to support launch. The actual hardware did'nt arrive at Logical Approach until late afternoon on Monday the 6th of June. The site hardware consists of 4 Sun V20Z servers in a maxxed out configuration - two 146 gigabyte drives, 8 gigabytes of memory and two AMD 252 (2.6GHz) Opteron processors. Three of the V20Zs and an N2120 Applications Switch (aka Server Load Balancer (SLB)) were sponsored by Sun, the 4th V20Z was contributed by our company, Logical Approach.

Now if I did'nt have a day job at Logical - having this hardware arrive on Monday afternoon, with a scheduled install in an internet co-location facility 1,700 miles away on the Friday of the same week, would be doable. A push, yes, but doable. But, unfortunately we have our customers to look after and that week was pretty busy around here - aside from the ongoing Community Advisorary Board (CAB) activity and trying to keep up with the OpenSolaris Pilot program and mailing lists that were becoming increasingly noisey as we approached launch.

We put off the planned Friday hardware install in Palo Alto, until Saturday, shipped out 3 of the machines for overnight delivery on Thursday and then shipped out the 4th V20Z and the Application Switch on Friday for Saturday (next day) AM delivery. You don't want to know what our FedEx bill looked like!

But everything did'nt go smoothly. In fact we got stymied by the Application switch configuration process. Now, you may already know this; but Load Balancers, as a class of tech toys, are complex devices. That is just the nature of the beast. But, unfortunately, moving from one load balancer to another is like moving from one country to another foreign one; almost everything you learned previously is instantly obsoleted - including the language. The terminology changes completely. The menu system changes; the order of configuration setup steps change. In short, you may even feel that your previous experience (with Foundry Networks ServerIron and Alteon Websystems (now subsumed by Nortel)) seems more like a curse, than a blessing. Again, that's the nature of the beast.

So I raised my hand for help (from Sun) on Friday around midday (Central Time). Yep .... that's a great time to ask for help! And finding the right person at Sun can be daunting, especially within such a large organization. It turns out that the N2120 Application switch is a product Sun acquired when they bought Nauticus Networks. It also happens, that the right person to help configure the switch was off getting married. How inconsiderate of him (just kidding)!

So we shipped the switch in a state of less than digital nervana, overnight to Palo Alto and I was on the first flight on Saturday morning, departing from DFW (Dallas Fort-Worth) for San Francisco. The flight was great - the fun started after the plane landed. I had to check a roller-board type case, because it contained hand tools that would have been confiscated if I had tried to bring it onboard as carry-on baggage. It also contained a bunch of CAT-5 and CAT-6 ethernet cables - so it would very likely be given close scrutiny and hand checked.

After the flight landed at SFO, it took about 40 minutes for that bag to make it to the baggage area! Now you know why everyone uses carry on baggage and all the storage space inside the cabin gets exhausted on most flights. Next up: Budget car rental. The first thing that was easy to see is that the entire car rental area at the airpot was mobbed out. I stood inline for about 30 minutes, got the paperwork done and then the lady helping me had an argument with someone responsible for getting cleaned cars ready for pickup in the nearby parking garage. She slams down the phone angrily. Another woman cuts across me and speaks sternly to her asking her "Why do I have to wait for my car". The answer - because it had to be moved and cleaned. The woman looked at me and said "I don't know why I have to wait..." and hurried away still muttering. I guess she was in a rush to continue waiting. I asked if it would help if I could get a dirty car. "No" came the response; along with a look that told me to quit while I was still ahead. In the meantime I gave my ISC contact, Peter Losher, a heads up message that I'm going to be late at the Colo. He had already begun his drive there from his office, after I told him I was in the car rental line. I was driving out of the car rental garage about 50 minutes after first getting in line! How is that for service! :(

So I get to the CoLo in record time. It's possible, that I may have exceeded the legal speed limit on the drive there. Luckily I did'nt have a law enforcement officer confirm whether I did or did not. So I'll admit to the possibly only! :) I arrived at the CoLo close to 1:30 PM - a far cry from the 11:00 AM planned time. Here I met up with Ben Rockwood, the other person crazy enough to try to make the community site happen on such a tight deadline.

So what happened with the App switch I hear you ask? Well, help did'nt arrive in time and we shipped it at the last possible moment on Friday evening. Luckily, and thanks to FedEx, everything made it to the CoLo and was waiting for us when we arrived. We immediately got down to work. The racks were too shallow for this current generation of 1U gear - which Peter rightly says "grew in depth" to make up for what it lost in height! We had to mount the equipment on shelves. We had to move some shelves to get the ones deep enough for the cabinet we were assigned. When it was time to mount the App switch, we could not find any more deep shelves, so we had no option but to mount it, on an available shallow shelf, in a neighbouring rack. Before we did that, I pervailed on everyone present to mount a shallow bracket in the same rack as the V20Zs and temporarily mount the App switch, with a human holding up the other end, so that we could get a group shot of all the equipment front panels. In the pictures you see, the 3rd person is actually holding up the rear of the App switch so that the other 2 people (person in front of the camera and person behind the camera) can snap a photo with all the equipment front panels visible.

We also had a funny incident, where Ben, in his rush to get the job done, applied a mounting rail to the wrong side of a V20Z. This was before we figured out that we had to mount the servers on shelves. Since it was mounted backwards, it's locking mechanism locked the rail in place and then became inaccessible. It proved very difficult to remove. This was one of those embarassing moments that anyone would rather forget. Bens' ultimate solution, however, was ingenious. He removed the end tab from his tape measure and used it to slide down, in the narrow space between the rail and the computer case, until he was able to poke at the locking mechanism and release it. Later that day, after using the tape measure to measure rack spacing, the spring loaded retraction mechanism gobbled up the tape and it disappeared inside the case forever - since it no longer had the tab which would normally prevent that happening!

We ended up being practically thrown out of the CoLo by 4:00 PM - our work complete. This was unfortunage, because I had planned on spending considerable time at the CoLo, but Peter was supposed to be somewhere else at 3:00 PM and we had to leave.

The CoLo cage is a difficult environment to work in; it's noisey and you're constantly bombarded with hot and very dry air. We yelled at each other the entire time and I ended up dehydrated - because of the dry air and not drinking much water that day. We also missed a meal - and that took its' toll on us all. Especially since we'd all been working crazy hours that week. Peter was in the worst shape - he was still recovering from a really bad case of the flu.

Without the App switch, the original, carefully designed site architecture design had to be discarded. Also, we pretty much had to redesign it on the fly - so that if App switch config help arrived on Monday, we would still be able make use of its load balancing capabilities. So some of the server ports were connected up to the app switch and others were connected to Peters new HP 2824 gigabit switch. The IP addressing plan was obsoleted - since the App switch has NAT (Network Address Translation) facilities, and without it, we had no NAT capability. There were other features provided by the App switch which were also part of the design, which I won't go into. We barely had enough time to configure routeable addresses on the SP (Service Processor) management ports and set them up. That was to be our point of entry into the system to make the other changes that were required without the App switch - since the servers were configured per the original design (with the App switch present) while they were on the bench.

On Day 2 (Sunday) of the install I prevailed on Peter to allow me to work in the CoLo, but had to promise him it would only be an hour, maximum. I made the best use of the hour, doing a lot of tidying up work that should have been completed the day before. We also discussed the ISC network topology and peering and made plans to have ISC host our DNS records, at least initially. Also we got the App switch console wired into a terminal server. But setting up remote access to the terminal server, and hence accessing the App switch console port, would be deferred until later that day. Also populating the DNS data would be left until Peter was in a much more hospital work environment - his office.

Both Ben and I assumed we would have as many hours as we needed at the CoLo - but that was not to be, because the policies in place, demanded that we be shaparoned. This was a big factor in the issues that plagued us later. We were simply too rushed and did'nt have time to check then double check, our work. After leaving the CoLo I jumped an earlier flight and ended up back in the DFW area around 11:00 PM. Perfect. On the way home from the airport, I know I exceeded the speed limit. I had a police officier catch up with me about a mile after he first saw the car, and tell me my exact speed at the time: 82 MPH. The following morning I started the work of finishing up the machine configs and making the changes mandated by the lack of the App switch. I hit a "minor" (yeah right!) problem. I could connect to the V20Z SP (Service Processor), but I could not see a console login! I could not access the Operating System. Initially I took this in stride. Knowing that I had only played with the SP facility on the V20Z previously - it was just a case of reading the manual and figuring it out, or so I thought.

Meantime there were other fires to extinguish. I was still seeing DNS errors - our new domain, genunix.org, was not resolvable. Also I did'nt know how to access the App switch console port remotely. A couple of calls/emails to Peter and he assured me that all would be well with DNS within 30 minutes and that we'd be able to get to the App console port soon. In the meantime I had an email from a gentleman at Sun offering support for the App switch. He emailed very early that morning (around 7:00 AM). But we did'nt have console access to the App switch until around noon. In the meanwhile I had sent him a scaled back App switch specification and details of our (modified) addressing scheme. I told him that I was looking to gain access to the servers that were already (physically) connected to the App switch, but had been unable to figure out how to put the required switch ports in the correct VLAN, and enable them. His first issue was not being able to resolve the name for the unix box that ultimately gains us console access to the switch serial port. Then, after he got the IP address, he could not connect to it (no route to host) from his office. Obviously their office internet access (DNS and routing) is totally foobarred. Last I heard he was working the issue (fire up a GPRS cell modem on his laptop or go home & use his home DSL), but it turns out he was unable to do either. So, I've been hacking on the App switch since about 4:00 PM (CDT) while Ben Rockwood took a fresh look at resolving the "no console access via the SP" issue - using any resources he could find online. We've probably hit an SP bug, in that the factory config (BIOS, SP code and OS) etc. won't allow console access out of the box.

On the App switch, I got to the point where I could ping one of the private interfaces on the servers, but the switch does not appear to have an SSH client or a telnet client. The App switch specialist was able to confirm that it does not have SSH client capability. I came to the realization that it did'nt have a telnet client all on my own. After I had achieved the ability to ping a server and I went looking for the telnet command in the menu system. It does, however, have an SSH daemon and a telnet daemon. How strange. So by 6:00 PM Central, after 2 hours (wasted) on hacking on the App switch config I realize that access to the servers via the App switch had reached a dead end.

Meanwhile Ben is trying every trick he can think of to get around the SP to console login issue and making use of every resource he could think of, on the 'net. By about 8:15 PM or so this avenue was not yielding any results. So I talked with Ben and then reported the bad news to the OpenSolaris Pilot community and several people (via a CC list) that I had made promises to. It was bitterly disappointing.

Now it's 12:35 AM (CDT) on Tues the 14th: OpenSolaris Launch Day. Stories have already been posted and I got a call about 15 minutes ago from Ben who says he has been successful in making arrangements to gain physical access to the CoLo. Pretty amazing. I was about to turn in ... but he'll need some help to test (now that's a novel idea, is'nt it!) from the outside. So I'm working this blog while waiting and then we'll see what we can get done.

1:00 AM (CDT): I get a call from Ben and we've in business thanks to his heroic efforts and the incredible co-operation of the ISC folks. So now we _start_ working the machines.

4:15 AM (CDT) and I just emailed Cyril Plisko and let him know that the SVN repository zone and logins are ready for his use - as promised. He has his own zone on the machine, called svn, and the fully qualified hostname is svn.genunix.org. He just logged in and I'm checking with Ben R to see if there is anything he needs help with.

4:30 AM: I get some sleep while Ben continues to finish up the "starter" site.

8:15 AM (CDT) - One hour and 45 minutes before launch: I send an email to Derek Cicero telling him that we're ready for content. The subject line reads: "Rabbit emerges out of hat". After receiving a reply with a URL on where to get the content, I wake up Ben (by phone) and send him Dereks' URL. Minutes later, there's content available on genunix.org! :)

9:00 AM (CDT): I get on the Sun launch conference call and help out where possible. It was a great opportunity to "live" the launch event. I continued to watch our site (www.genunix.org) and do some cleanup & further testing on it. The press releases fly and the file downloads begin to fly, everything goes very, very smoothly.

10:00 AM (CDT) Official Launch Time: Looking at /. (slashdot) I see that OpenSolaris gets a comparitively easy introduction to the world. Nothing much of a dreaded Linux jihad emerges - thankfully.

Later in the morning I settle into my "Day Job" and interact a little on IRC and the OpenSolaris mailing list. It was your typical Monday. BTW: I hate Mondays!

So lets summarize what went wrong
  • not setting realistic expectations and timelines
  • not anticipating air travel hassles
  • budgeting a 36 hour window of CoLo time; getting 3 1/2 hours
  • lack of testing
  • working a project while fatigued. Fatigue will allow you to make silly errors and not catch them. I know this from my pilot training.
  • delegating a simple task that looks like it can be done easily/quickly (DNS) and then failing to recognize that its not a viable strategy.
  • using non "main stream" equipment that you're not familiar with.
  • assuming, that if you can "talk" to the SP, you can get a console login.
  • and did I mention lack of testing?

And lets summarize what went right
  • we pushed out 90 gigabytes on content on launch day.
  • we saw incredible cooperation from many, many exhausted individuals.
  • we saw Sun Nack, Ack and then deliver on hardware sponsorship in 18 days. Thats pretty incredible for a company as large as Sun.
  • we got incredible co-operation from Peter and the folks at ISC.
In particular Peter worked really hard while still recovering from a bad dose of the 'flu and dealt with other disruptive events, like his laptop disk drive going on the blink and having to be replaced just before he went out of town on an important install.

There were many heroes in this tale. I've already mentioned some and I apologize for those I have left out. PS: send me email if you want something included and I'll update this book.

25 comments:

Quit Smoking said...

Hi, I was looking around some blogger blogs for some ideas to start my own on ebooks and you have given me some great ideas. Good blog. I will check it out every week. Thanks

Anonymous said...

Help! I am lost. I was searching for computer movie player and somehow ended up here. How that happened I don't know, however I do like your Blog a lot. Would you mind if I add your Blog to my favorites page so others can visit?

freestuff2 said...

Hey I was just blog surfing amd I found your blog! Looks Great!

I also have a nashville airport
It deals mostly with nashville airport plus other stuff,
You can save up to 50% your next flight!

You should check it out if you get a chance!!

Anonymous said...

Hey, you have a great blog here! I'm definitely going to bookmark you!

Hey, you have a great blog here! I'm definitely going to bookmark you!
I have a avis car site/blog. It pretty much covers avis car related stuff.

Come and check it out if you get time :-)

stony said...

Hey, you have a great blog here! I'm definitely going to bookmark you!

Hey, you have a great blog here! I'm definitely going to bookmark you!
I have a ad adware site/blog. It pretty much covers ad adware related stuff.

Come and check it out if you get time :-)

stony said...

Great blog here! I have a car rental ireland site/blog. It pretty much covers car rental ireland related stuff.

Come and check it out if you get time :-)

stony said...

Great Blog! I have a rental car houston related Site.
Maybe you have a look.

alex said...

Great work on your blog - it was very enlightening. You've got a lot of useful info on there about Car Rental so I've bookmarked your site so I don't lose it. I'm doing a lot of research on Car Rental Exposed and have just started a new blog - I'd really appreciate your comments

The Answer Man said...

Real Estate Investors......You Are Shortchanging Yourself If You Are Just Doing "Fix And Flips", Foreclosures Or Renting

Properties.

Let Me Explain Why......

Click Here For More Information




Rehab Loan

Anonymous said...

Hey, you have a great blog here! I'm definitely going to bookmark you!

I have a ##KEYWORD## site/blog. It pretty much covers ##KEYWORD## errors and related stuff.

Come and check it out if you get time :-)

LoseThatWeight said...

Alot of interesting comments on this blog, I was searching for some doctor related info and some how came across this site. I found it pretty cool, so I bookmarked. I'll really liked the second post on the front page, that got my attention.

My site is in a bit different area, but just as useful. I have a nicotine patch related site focusing on nicotine patch and mens health related topics.

ACHILLE said...

Nice blog. Have you seen your google rating? BlogFlux It's Free and you can add a Little Script to your site that will tell everyone your ranking. I think yours was a 3. I guess you'll have to check it out.

Tip Of The Day
Click Fraud and How to Deter It


Pay per click (PPC) advertising continues to gain popularity in the online marketing world as an effective and inexpensive way to drive targeted visitors to web sites. Research firm eMarketer reported that between 2002 and 2003 the paid search listing market grew 175 percent.

Major trusted search properties such as Google, Overture, FindWhat, Search123 and Kanoodle, all offer PPC campaigns in which you pay only when someone clicks through your banner ad or link. But PPC also has an enemy--click fraud--and understanding what it is and what to do about it should also be a key part of your PPC campaign.

What is Click Fraud?
Click fraud is when someone or something generates illegitimate hits on your banner or text advertisement causing you to pay for worthless clicks. AS PPC campaigns have grown in popularity and keyword prices and bidding have become more competetive, click fraud is on the rise.

Online marketers are becoming increasingly worried about the prospect of click fraud. According to CNET News, some marketing executives estimate that "up to 20 percent of fees in certain advertising categories continue to be based on nonexistent consumers in today's search industry."

This estimate is certainly unsettling for advertisers who, recently, have been paying hefty amounts bidding on desirable search terms. Financial analysts report that in the year 2004 advertisers are paying an average of 45 cents per click. Compare this to 40 cents in 2003 and 30 cents in 2002 the bidding wars continue to rise.

Who's Doing it and Why?
Click fraud perpetrators are most often motivated by trying to increase revenues from affiliate networks or attempting to damage competitors' revenues by forcing them to pay for worthless clicks. The Google Adsense program, in which affiliates receive payment for clicks whether they are real or not, has caused great concern for Google and has intensified its focus on click fraud.

Those engaged in click fraud use a variety of techniques to generate false clicks. Low cost international workers from all over the world are hired to locate and click on ads. The Times of India provided investigative reporting on payment for manual click fraud happening in India. Unethical companies may pay their own employees to click on competitor ads. Last but not least, click fraud can be generated by online robots programmed to click on advertiser or affiliate ads. Some companies go to great lengths creating intricate software that allows for this to happen.

How Can You Deter It?
Many advertisers know about the possibility of click fraud but generally haven't done much in the past to prevent it. Some feel that if they complain to any of the search conglomerates, it could ruin their free listings. Others feel like the problem is beyond them.

"It is a bigger problem, but folks just don't want to take the time to track it down because it's a complex problem," stated John Squire, of web analytics firm Coremetrics, to CNET. "Given that some of the largest marketers manage up to 1 million keywords in a campaign the data can be difficult to crunch."

Companies who do understand and report click fraud to search engine properties have had success receiving refunds for fraudulent clicks. For those advertisers who want to address the possibility of click fraud in PPC campaigns, good option do exists. At the most basic level, advertisers can use general auditing many have been known to compile lists of sites that generate high numbers of clicks but not sales. This will indeed put up a red flag.

On the other hand, because click fraud is advancing at such frequency, click fraud detection companies and software have been popping up all over the country. Let's take a look at some of the options:

- WhosClickingWho.com - This fraud detector tracks all PPC search engines, detects multiple IP's, and even pops up a "ClickMinder" after a potential abuser clicks repeatedly over five times.
- ClickDetective - ClickDetective allows you to track return visitors to your site and alerts you if there is evidence that your site may be under attack. Its reports show you every click in real time rather than a summary hours later.
- BogusClick - BogusClick can help advertisers determine competitor IP addresses, originating PPC search engines and/or partner sites involved, as well as keywords used.
- Clicklab - Clicklab employs a score-based click fraud detection system that applies a series of tests to each visitor session and assigns scores. Calculations are made to indicate bad/good sessions to show an advertiser the quality of traffic.

Click fraud is a big problem in search engine marketing that's only going to get bigger in the future. It is wise for any online advertiser to implement some auditing system. Why continue to waste precious campaign money?!

=============================================
Contacts manager Design & Photo

Anonymous said...

Hi,

Just surfing around for similar sites to mine with information on budget france travel and came across your blog.
Great site you have here, does budget france travel seem a popular amongst bloggers? I've thought of putting up a blog related to budget france travel on my web site.... but I'm not sure if it would be of help yet.

Thanks for letting me nose around :)
Take care.
Steward.

Anonymous said...

Nice Blog

Best Wishes

I have a site 80 million Down Loadable Movies If you have the time stop by.

sexedman said...

Not what I was searching for, but none the less and interesting blog here. Thanks for putting it up. I've enjoyed reading alot of the text here. I got you bookmarked for the future, I'll be back.

My site is a bit different, some think it's odd. I guess it's a matter how you look at it. I have a nicotine patches related site. Most of the articles are on nicotine patches.

Kazzrie said...

Calling ALL Affiliates!!
Now here is the deal of the month. Hurry ....only 24 places left for the Affiliate Bootbamp at FX Networking

Anonymous said...

I discuss this topic daily myself. I also have a website that talks about business download management resource spreadsheet related things. Go check it out if you get a chance.

Johnny said...

hello, your blog is interesting to read, I have a website about internet marketing, it should be helpful to you in making more web money. This is a global trend that more and more people can make a living online, so make sure you visit it.

Gordon said...

Hello, just visited your blog, it's informative. I also have a website related togoogle adsense software. So make sure you visit and hope it's useful.

Google Page Rank 6 said...

Want more clicks to your Adsense Ads on your Blog?

Then you have to check out my blog. I have found a FREE and Legitimate way that will increase your earnings.

Come Check us out. How to Boost Your AdSense Revenue

Brian said...

Hey, you have a great blog here! I'm definitely going to bookmark you!

I have a marketing management site/blog. It pretty much covers marketing management related stuff.

Come and check it out if you get time :-)

Anonymous said...

I love your blog! I also have a site about wakeboarding picturesshaun bronson
. You can check it out at wakeboarding picturesshaun bronson
.

Also, as a special bonus for your visitors, i want to tell you about a site that is giving away a FREE Sony DVD Handycam! Just click the link below and enter your Zipcode to see if you qualify.

FREE Sony DVD Handycam

Anonymous said...

"I just came across your blog about web site advertising and wanted to let you know that I am impressed with the information here. I also have a website with an added resource that pertains to web site advertising so I know that your blog is fantastic. Continue with the great resource that you are providing to everyone on the Net!"

Anonymous said...

Your blog contained issues relating to discount hosting web
which I found quite absorbing. I would argue that discount hosting web
matters are best left to the professionals in most cases.

Anonymous said...

Nice blog! Grats on getting the site up for the launch. Lots of spam comments in this blog - I don't read many blogs, so I guess that the spam is a normal evil.

Anyway, Good work!