Does the Linux community define success the way we do?

I can’t get my mouse (a Logitech Revolution) to work correctly in Ubuntu. There are a lot of non-working solutions to my problem on the net, and I’ve tried them all. I spent 4 hours working on it, which is probably the longest time I’ve ever worked on a home computer problem in my entire life. I never got it working, and still only use Ubuntu on my laptop and my PS3.  

I was prepared to actually take the jump and get rid of MS once and for all on my desktop since my wife’s machine runs Vista and her laptop runs XP, my server/media center runs Vista, my Imac runs OSX, and my laptop runs Vista and Suse. I have nearly every major OS category running in my home. It would have been nice to change my daily use machine to Linux in preparation for the dark and inevitable future of Trusted Computing. Trusted Computing is a set of software and hardware standards designed to uniquely identify PCs and other consumer electronics to not only enforce but also discover illicit uses of media and content that are not approved by the Trusted Computing Group and their shadowy media conglomerate clients.  I am pretty concerned about Trusted Computing, and wanted to eventually move everything in the house to Linux to avoid what I perceive to be the shackling of my computer in the name of servitude to a handful of companies that I don’t think are important enough to demand the crippling of hardware that I’ve spent my good money buying.

For now, moving 100% to Linux is something that I could do if I wanted to spend a lot of time on it, but it is my dream to install an OS, spend a few days configuring and tweaking (FUN! Troubleshooting : NOT FUN) and then lay back for a few weeks and enjoy my creation before installing something new.  I’ve had this dream for a long time.

In 1996, you had to be a damn genius to get Slackware installed and working on a 486.

In 2000, you couldn’t get Linux working on your laptop to save your life even with the assistance of a local LUG, a Unix expert with a rack of equipment in his living room, and 3 or 4 professional developers.

In 2004, wifi was such a bitch to set up that I just gave up.

All these things have been greatly improved over time, but there is so much work left to do if anyone wants to deliver what the world really needs.  The real goal of Linux, in my mind, should be to offer a powerful and free operating system that not only empowers developers and tech heads, but also regular people, low income families that can’t afford to buy an OS, kids that don’t want to steal software for ethical reasons, and many others.  Linux has the ability to bring all of these people together with just two changes…usability and design.  I think that with changes to those key elements Linux could absolutely destroy Microsoft in the home.  Small and medium business will not budge, not for a while, but there is real opportunity to reach people with Linux.  It seems like the leaders of the Linux community understand and want this to happen.  It is why they have chosen to devote so much time to Linux.  The CEOs want to see their companies succeed and they want to be a positive force in the world for free software adoption and the openness it brings.  In the end, these people do not direct the development of Linux.  They advise, they shepherd, but they do not, and cannot direct.  The people working on Linux, in general, do not have to listen to anyone.

Linux is written by developers who could often care less if stupid people get what they want or not…but those stupid people are the ones to bring Linux into the spotlight. Those stupid people are also often computer enthusiasts who have no trouble in any other OS doing all sorts of things that make tech-phobic people say “Wow, I didn’t know computers could do that!!” In Linux, they can barely get a mouse working or share files to their other PCs in their home.  If you know what you’re doing and have a little bit of luck and the right hardware you can usually get it going even if you’re stupid but is this really what it should take to do cool things with your PC?

The developers of Linux don’t understand usability and they don’t understand why it matters.  Maybe they shouldn’t have to.  They’re writing this stuff for themselves and they know they can handle all the  idiosyncracies that they gave birth to. They’re willing to sacrifice ease of use to get more functionality and faster development.  Typical commercial development spends most of their time on the last 10% of the project, and this normally results in a little polish. Linux developers don’t care about polish and they don’t care about usability, they care about functionality. For functionality, Linux has everyone beat in a big way. This sounds great, but it comes at a cost.  If I could rule the world as King and combine this power with usability, we could all have something quite amazing.  The world could quite literally be a better place.

The people interested in the Linux community would do well to find usability experts to contribute to their projects, and graphics designers to help them design their UI resulting in power, beauty, and the most essential freedom and openness that comes right along with GNU.  Bringing these things together and giving them away for free would be one of the most important new developments in computing history.  It would also force commercial developers to finally work on writing software for an open platform, and might even infect them with some ideals that would benefit society.

It seems like there is a real lack of participation from the artists of the world in the Linux community, and my suspicion is that the developers don’t really have a lot of respect for these roles. Really, why would they want to help some stupid person join their exclusive club? It is a badge of honor and distinction in the computing community when you’re running Linux at all, much less full time, and it has been that way for decades. I think that the Linux community as a whole, and the developers specifically, don’t want this to change. They’re proud of themselves and rightfully so, it is HARD to become a Linux expert. CEOs like Shuttleworth should be commended for attempting to bring Linux to the masses, but I fear that the real soul of the community, the developers working in their bedrooms and home offices will never let Linux become the kind of OS that could ever compete with Microsoft or Apple.  They’re smart, we’re stupid, and they’ll damn sure not allow anyone to make their software easier for us to use, especially when it means that they might not be as special as they are now.  They don’t want Linux to succeed by gaining market share and reaching people, they want it to succeed by remaining immensely powerful, customizable, functional, and CRYPTIC.  Keeps the riff-raff out, and that is real success.

Full text indexing and RSS feeds in MySQL

Full text indexing is an amazing feature. Instead of incredibly costly LIKE comparisons, an index can be created by the RDBMS which contains the full text of the column, or table, in question. For things like e-mail message bodies, biographies or other lengthy descriptions, or any other unstructured text, full text indexing is the only solution that makes sense. Using the LIKE method (for example : SELECT * FROM MyTable WHERE col1 LIKE ‘%MySQL’;) is so amazingly slow that it will bring a server to its knees if used often.

You enable full text indexing on MS SQL using the the SSMS gui(right click on a table and use the Full Text Index menu if full text indexing has been enabled on that database), or through T-SQL. For more detailed instructions, check out this great article on developer.com. Once the full text index is created, MS SQL just seems a whole lot faster when searching the indexed fields provided you’re using the right T-SQL. I’ll assume that my MS SQL guru readers are already pretty familiar with that!

Creating a FTI in MySQL requires a little DDL, seen below :

mysql> CREATE TABLE articles (
-> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
-> title VARCHAR(200),
-> body TEXT,
-> FULLTEXT (title,body)
-> );

Once you’ve created and populated your MyISAM table (try using INSERT INTO ISAM_table_name (x) SELECT x from innoDB_table_name for a quick migration from another table with text values.) you can do some really cool things with the index, just like in MS SQL.

On the documentation page from MySQL.com you can find instructions for ranking result sets by relevance using the MATCH() function, which compares nicely with the CONTAINSTABLE keyword in MSSQL which returns both the key and the rank of each key in result or table form. You can also do simple boolean searches, just like the CONTAINS and FREETEXT keywords using IN BOOLEAN MODE. Weighted results are supported using the syntax below :

(MATCH (title) AGAINST (’keywords’ IN BOOLEAN MODE)*10)+

(MATCH (title) AGAINST (’keywords’ IN BOOLEAN MODE)*50);

Also, much like MS SQL, MySQL has some words that it doesn’t index…

  • Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is four characters.
  • Words in the stopword list are ignored. A stopword is a word such as “the” or “some” that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overwritten by a user-defined list.

Some methods I haven’t found in MySQL are :

  • Inflectional search - For example, searching for swim but finding swimming, swam, etc.
  • Proximity based search - Tennis within a few words of raquet gaining a much higher ranking than either of the keywords by themselves. Apparently, MySQL doesn’t store the position of the words in the index, so the proximity searching may not be in the next few versions, if it ever comes at all!

If any MySQL readers out there know how to do these in MySQL (obviously almost anything can be done in PHP, etc.)please drop me a line. You’ll receive full credit here and maybe even a nice paypal thank you!

So as we can see, MySQL does full text indexing just like MS SQL for the most part, but there is a catch. A BIG catch. You can only full text index a MyISAM table. In MySQL, there are multiple storage engines!! This means that many of the databases out there are going to have a tough time with full text indexing if they plan to scale out and stay with innoDB, unless we can find a solution. Quoted from Wikipedia, the following are some of the major differences between MyISAM and innoDB :

  1. InnoDB recovers from a crash or other unexpected shutdown by replaying its logs. MyISAM must fully scan and repair or rebuild any indexes or possibly tables which had been updated but not fully flushed to disk. Since the InnoDB approach is approximately fixed time while the MyISAM time grows with the size of the data files, InnoDB offers greater perceived availability and reliability as database sizes grow.
  2. MyISAM relies on the operating system for caching reads and writes to the data rows while InnoDB does this within the engine itself, combining the row caches with the index caches. Dirty (changed) database pages are not immediately sent to the operating system to be written by InnoDB, which can make it substantially faster than MyISAM in some situations.
  3. InnoDB stores data rows physically in primary key order while MyISAM typically stores them mostly in the order in which they are added. This corresponds to the MS SQL Server feature of “Clustered Indexes” and the Oracle feature known as “index organized tables.” When the primary key is selected to match the needs of common queries this can give a substantial performance benefit. For example, customer bank records might be grouped by customer in InnoDB but by transaction date with MyISAM, so InnoDB would likely require fewer disk seeks and less RAM to retrieve and cache a customer account history. On the other hand, inserting data in orders that differ substantially from primary key (PK) order will presumably require that InnoDB do a lot of reordering of data in order to get it into PK order. This places InnoDB at a slight disadvantage in that it does not permit insertion order based table structuring.
  4. InnoDB currently does not provide the compression and terse row formats provided by MyISAM, so both the disk and cache RAM required may be larger. A lower overhead format is available for MySQL 5.0, reducing overhead by about 20% and use of page compression is planned for a future version.
  5. When operating in fully ACID-compliant modes, InnoDB must do a flush to disk at least once per transaction, though it will combine flushes for inserts from multiple connections. For typical hard drives or arrays, this will impose a limit of about 200 update transactions per second. If you require higher transaction rates, disk controllers with write caching and battery backup will be required in order to maintain transactional integrity. InnoDB also offers several modes which reduce this effect, naturally leading to a loss of transactional integrity. MyISAM has none of this overhead, but only because it does not support transactions.

Now that we’ve read through the details, lets get to the point! InnoDB, MyISAM, and other engines allow a flexibility in decision making for administrators and developers. MS SQL has only one storage engine. This brings us back to the legos vs. furniture philosophy seen in the MySQL backups simplified post from last week. Unix is flexible, and Microsoft has a vision. You’ve got to work a little harder to make decisions in the Unix world, but it pays off when your tools match your problems. I encourage everyone interested in the MySQL storage engines and other database internals to check out Understanding MySQL Internals (Understanding)

For those of you not yet familiar with the SQL Server storage engine, check out Delaney’s amazing work on Inside Microsoft (r) SQL Server ™ 2005: The Storage Engine

My suggestion for this particular case, which requires searching based on a column of RSS text, is that we have both MyISAM and innoDB tables in the same database. This won’t affect our backups, since we use mysqldump and, for the time being, won’t affect our performance too negatively. Since we’ll only be storing the text data in the RSS text table, we don’t need to worry too much about foreign key issues and other things that apparently affect MyISAM. A simple search and JOIN will solve all of our full text needs.

You can create a table using any MySQL compatible engine using the following syntax :

CREATE TABLE t (i INT) ENGINE = MYISAM;

or

CREATE TABLE t (i INT) ENGINE = INNODB;

This is a really cool feature, and allows you to mix and match technologies for storage inside a single database, which is something quite different from MS SQL. Also, you could get really crazy and create an entirely new database for your ISAM needs and use fully qualified data element names like SELECT * FROM databasename.tablename. Most people don’t recommend this.

More details to come with the final implementation, and look for an update to the backup post soon, as we document the FTP process for offsite backups!

WOW! A great new backup tool for MySQL using Amazon’s S3

For those of you that don’t know, Amazon’s S3 is a paid service for offsite storage that charges really competetive rates for data storage and transfer. I recently wrote a post about MySQL backups in general, and mysqldump specifically. Someone just took the solution I wrote about to a whole new level.

This link to Shanti Bradford’s blog YARB explains how the software works and gives examples for the very simple configuration. I am in love with this solution for smaller databases or funded projects with a few dollars to spare for a super clean offsite backup solution. Leave it to a Rails person to find the easiest and best solution to a problem that everyone has! Big ups Shanti!!

Easy file sharing using foldershare

As an employee of an international corporation, my interests are always piqued when I hear the word replication.  When I learned of an acquisition by Microsoft of a product called Byte Taxi, I checked into what it had become within the walls of the largest computer company on earth.  I found my answer in Foldershare.

Foldershare is a new product in beta.  A tiny application that sits in your taskbar and brokers connections for file sharing and replication between you and your friends, family, colleagues…whatever you want!  It also allows some interesting file access to your own machines.  If you had the desire, you could edit your hosts file from Bangladesh, all via the web.  If you want to check it out with me, contact me on google talk at wharrislv.  

A simple and free way to replicate files between your friends, work, home, etc.  This is great for sharing photos amongst family, video files and software amongst friends, and the killer app : offsite backups!  Set up replicated folders to your friends, get everyone a 1TB drive, and automate your offsite backups for free.  The taskbar app is really lightweight, and almost all configuration is done on the web, which mirrors Vista’s UI style.  User based access control allows share level permissions like read only or owner, so you can control what happens to your files.  The client works on both Windows and OSX, and the service is based in the Live ecosystem.  The developers keep a blog that could show some promise, I hope that they keep a real conversation with the beta community.  

The limit on files within a share is 10,000, which is too bad.  I’d love to sync my music collection to work, but I’ll have to continue using the streaming functionality of Jukefly.  The developers have said that they are considering an increase in the future, but for now we’re stuck with that limitation.   Additionally, you can only sync files up to 2gb.  I don’t personally see the reason for these limitations, but I imagine they’re in place to get people ready for the inevitable switch to a payment based model.

One glaring omission is the ability to control your bandwidth utilization.  My measly 1mb upstream can be saturated with replication at times, but overall the feature set makes it all worth it.  You also cannot share network drives, which is a huge oversight in the modern world of network attached storage.

I encourage everyone to give this service a try, the fact that it is free for now makes it all the more attractive.  Let me know what you think about the service in the comments below, I’m curious!

Becoming a great DBA, and how you should spend your time

 

How do you spend your time at work? If you’re a DBA, you should be spending it well. There are so many responsibilities that you will face as a DBA. Sometimes, you’ll need to put the time into tedious and difficult tasks while remaining interested and engaged, other times you’ll need to keep a certain mindset or attitude to maximize cooperation from coworkers. The information below is not a complete roadmap to success, but it is a great set of guidelines for someone looking to become a DBA in a small or midsized organization.

 

To begin with, and perhaps most importantly, you can spend much less time maintaining and tuning your systems if you understand the basics of relational theory and design principles. In order to gain the benefit of decades of research in the relational field, make sure that you are well aware of best practices, good relational design, and the basic fundamentals of databases in general. This is the biggest mistake amongst DBAs. Know your craft at a fundamental level, or be forever forced to repeat the mistakes that we see every day in the MS SQL programming newsgroups.

Once you are sure that you system is well designed, you need to check on your systems, relentlessly and with no mercy. I spend at least an hour per day either checking on systems, or automating system maintenance and observation. This includes creating new alerts with various triggers, automating DMV reports to myself, and running maintenance via direct SQL or the SQL Agent. A lot of the time, a simple manual check on a machine can unearth problems that you haven’t imagined. When you find these, modify your maintenance plan and ensure that the problem doesn’t find itself happening on other SQL instances. It is important that you visit your machines and have a look around occasionally.

Performance tuning your system is closely tied with the monitoring of your system. As you find problems and bottlenecks, solve them. Take care of your indexes and your code and take care of your performance in general. Ensuring that you can scale in the event of unexpected and unpredictable growth will allow your employers to make the kinds of leaps in growth that will keep them healthy. A healthy employer experiencing growth is much more likely to throw a poor DBA a bone when bonus time rolls around.

For further performance tuning I spend time every day rewriting queries for performance. Going over my old code allows me to grow, and utilize new methods that I learned through my research. Rewriting code is probably one of the best things you can do to improve your programming skills. Make sure that you know how to measure performance, so you can verify that your changes are actually improvements! Additionally, there are other people here at my office writing SQL. Developers, report writers, and management are often rolling their own pet projects that often are eventually moved into production. I have to make sure that their code is either portable, or at least documented. Some of this code is really bad, and can take an inordinate amount of time to fix it depending on the talent of the original author.

You must research new technology and read up on blogs and SQL newsgroups every day. Keeping up with the newest published material is also a necessity. It is very important to keep up with not only the newest developments in the industry, but also to go through older material to ensure that you haven’t missed anything important. A good DBA will be a voracious reader. Just last month I implemented some calendar tables after reading Celko’s newest book (featured on the sidebar to your right.) I try to read most of the new titles from major authors in the field like Celko, Delaney, Ben-Gan, and Date.

All that knowledge is worthless without execution. For a quick solution in an emergency, testing is very important! I spend a few days per month testing restores, disaster recovery, and basic administrative functions. Databases are mission critical in almost every case, and you need to make sure that your skills are sharp in case of a problem. Regularly practicing the scenarios listed above will put you miles above the rest when it really counts. I extend this to reviewing the names and functions of valuable troubleshooting DMVs, reviewing the location and purpose of my collection of SQL, and reviewing my ability to fix locking and performance issues. Do you know how to connect into the DAC and kill an out of control process if your CPU is pegged? Could you do it without looking up the processes and methods? If your DB is having major issues and people want it up right now, will you be distracted with your nose in a book, or fixing the problem?

Paying attention to specifics is required in any job, but we lose our way without the big picture. As a database admin, or any IT professional, evaluating and implementing sweeping trends in IT are important if you want to stay on the cutting edge. Green IT and server virtualization are quite big right now (and somewhat related to each other.) I can proudly say that my organization is on the bleeding edge here. You should examine these trends with great care to ensure that you are not missing out on the next jump in technology.

New technology inevitably leads to new projects. New projects are a major draw on my energy, and any rollout of a new system, especially when written in house, takes all my energy in design, planning, and implementation. If you’re like me, most of your time will be spent on supporting new databases and new code! Make sure that you spend some time each month planning for future growth and really putting hard numbers on paper. If you think you need to move to a new SAN in 6 months, tell your CFO and CEO now. They’ll work it into their projections, and you won’t be left caring for an inadequate system at the last moment. Not planning for the future always looks bad and inevitably ends up costing productivity for the entire organization, so even if you’re busy you need to do this.

In the absence of a major project being implemented, or perhaps due to requirements of an upcoming project, we all have to upgrade. You’ve got to keep up! Patching and upgrading don’t happen often, but when they do, they’re a huge drain on my time. Planning a rollout to 22 SQL servers littered all over the globe can be an enormous project. Planning and testing are key here, and should never be rushed. Although I’ve been working with SQL 2008 for months, I estimate that my half completed plan will take me another few months to design, and probably a week to implement. When I do implement my rollout, I’ll be forced to update nearly all of my documentation.

Documentation is a difficult process. Not only does it require you to spend what could very well be years on the inevitably poorly documented existing infrastructure, but any time a change is made the appropriate changes to documentation must be made. Using a change management system is very helpful in this respect, but will often inspire angry e-mails and hateful glances. Ignore these and do what you need to do to ensure that your system is properly documented. Brush up on your UML, or Visio if UML is too much, but never avoid documentation.

 

Complying to international standards like ISO, and local standards like a naming convention for objects, sounds very easy. It never is. Unless you are the conduit through which all changes in your databases are made, you will need to clean up other people’s messes. Stay steadfast in this, for it will make all the difference should anyone ever need to work on your system without you there. It will also ensure that your system is as portable as possible when the inevitable change happens. It will also make documentation much simpler, due to the standardized nature of your work.

One thing that most DBAs let slide until well after a production system is rolled out is : security. Security is so important these days that I spend a few hours at least once or twice per week checking existing security, as well as really taking the time to think about how the system should be used, and allowing my security to reflect that. The days of global dbo are over, and you’re inviting trouble if you let your security wane. Not only are you protecting yourself against malicious destruction, but you also need to protect against mistakes and well meaning but fatally flawed design changes. You should have a security plan for every database in your organization, and that plan should allow people to do ONLY what they need to do. Less is more, always, in access rights.

As for working relationships, be nice to your developers. Spend some time programming so that you can understand their side of the story. A good working relationship with your developers will ensure that they are willing to compromise when it counts. Sometimes, you need to compromise too, just make sure that you only compromise on things that don’t affect the integrity of your system. I find that delivering bad news to my developers is always easier after we’ve spent a recent Friday night at a concert or a bar.

To further your relationship with your management team, consider introducing them to Business Intelligence. Any organization with a large amount of data will want to analyze that information. Make sure that you are proficient with ETL processes, and report writing. If you have a BI guy or girl, they will likely take care of this for you. If you don’t have one, spend some time on BI and show your boss what a database can do in the right hands. Don’t worry about overloading yourself, be firm that if they want to continue on the BI path, they’ll need to hire someone else. They’ll evaluate the situation and decide if the additional cost is worth it. It almost always is, and you’ll be the new IT hero.

I think that if you follow the above to the letter, you will definitely be the best DBA in the world. Unfortunately no one person can hope to achieve it all at once. Hopefully you have some help, but if you don’t, don’t lose hope. Do your best and do as much as you can. Take pride in your work, as that is without a doubt the single most important thing that you can do.