Entries Tagged 'Instructional' ↓

Giving a quick intro to SQL Server class in Las Vegas, NV

I’m planning to give an “Intro to Microsoft SQL Server” class for a friend who is interested in learning SQL server for work.  I’d be covering basic administration via SSMS and SQL, commonly used SQL like INSERT/UPDATE/DELETE with JOINS, and maybe some basic performance tuning/evaluation/troubleshooting via DMVs, execution plans, logs, etc.  It might take about 2-3 hours, and would only be useful for those with little to no experience.  Would anyone be interested in coming along?  It would be nice to show a few people while he’s learning, and would provide more people for asking questions, talking about confusing parts of my lesson, etc.

If you’re interested, or know someone who is, just email me or have them contact me directly, or leave a comment here.  We haven’t set a date, yet, but it will probably be next weekend.

Storing UTC datetime in your database? Is that really how you want to see your dates in reporting services?

If you’ve got your dates and datetimes stored as UTC in your database, but want to show the local time value on your reports in SQL Reporting Services, there is a simple solution for this!

 

Change the expression for the filed to the code below :

=System.TimeZone.CurrentTimeZone.ToLocalTime(Fields!DateTime.Value)

 

This converts your UTC datetime field to the local time zone value.

In MySQL, Count(*) or LIMIT (1) to determine an empty set?

I want to find out if I’ve got an empty set or not.  In MS SQL I’d just use IF EXISTS and write a branch for either outcome, but I don’t have that luxury in MySQL.  So, if I want to find out if a set is empty, which one of the following will be faster?

 

SELECT * FROM `table` WHERE (`table`.app_id = 455) LIMIT 1

 

vs

 

SELECT count(*) AS count_all FROM `table` WHERE (`table`.app_id = 455)

 

The answer here is the LIMIT is probably going to be faster, and the key lies in the WHERE clause. 
If you run a count in MySQL on an indexed column, it is smart enough to use the cardinality of that column’s index to report to you the count.  This makes the count method quite inexpensive when we truly want a count, without qualifications.  As soon as we add that WHERE, we need to return every row that matches our criteria.  The trick with LIMIT, though it is doing the same thing, is that it stops after it gets a match.  Once that first row comes back, it’s done.  On an indexed column, this shouldn’t be much of a difference at all, probably imperceptible to the user and certainly close enough even for a timing test…but on an unindexed column, stopping the table scan in the middle, or if you’re lucky enough, the beginning, would be an enormous time savings.

 

I may revisit this with some real numbers, but a friend of mine brought the question to my attention and I thought it was a really great one.  I’m hoping to get some feedback. :)

Full text indexing and RSS feeds in MySQL

Full text indexing is an amazing feature. Instead of incredibly costly LIKE comparisons, an index can be created by the RDBMS which contains the full text of the column, or table, in question. For things like e-mail message bodies, biographies or other lengthy descriptions, or any other unstructured text, full text indexing is the only solution that makes sense. Using the LIKE method (for example : SELECT * FROM MyTable WHERE col1 LIKE ‘%MySQL’;) is so amazingly slow that it will bring a server to its knees if used often.

You enable full text indexing on MS SQL using the the SSMS gui(right click on a table and use the Full Text Index menu if full text indexing has been enabled on that database), or through T-SQL. For more detailed instructions, check out this great article on developer.com. Once the full text index is created, MS SQL just seems a whole lot faster when searching the indexed fields provided you’re using the right T-SQL. I’ll assume that my MS SQL guru readers are already pretty familiar with that!

Creating a FTI in MySQL requires a little DDL, seen below :

mysql> CREATE TABLE articles (
-> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
-> title VARCHAR(200),
-> body TEXT,
-> FULLTEXT (title,body)
-> );

Once you’ve created and populated your MyISAM table (try using INSERT INTO ISAM_table_name (x) SELECT x from innoDB_table_name for a quick migration from another table with text values.) you can do some really cool things with the index, just like in MS SQL.

On the documentation page from MySQL.com you can find instructions for ranking result sets by relevance using the MATCH() function, which compares nicely with the CONTAINSTABLE keyword in MSSQL which returns both the key and the rank of each key in result or table form. You can also do simple boolean searches, just like the CONTAINS and FREETEXT keywords using IN BOOLEAN MODE. Weighted results are supported using the syntax below :

(MATCH (title) AGAINST (‘keywords’ IN BOOLEAN MODE)*10)+

(MATCH (title) AGAINST (‘keywords’ IN BOOLEAN MODE)*50);

Also, much like MS SQL, MySQL has some words that it doesn’t index…

  • Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is four characters.
  • Words in the stopword list are ignored. A stopword is a word such as “the” or “some” that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overwritten by a user-defined list.

Some methods I haven’t found in MySQL are :

  • Inflectional search – For example, searching for swim but finding swimming, swam, etc.
  • Proximity based search – Tennis within a few words of raquet gaining a much higher ranking than either of the keywords by themselves. Apparently, MySQL doesn’t store the position of the words in the index, so the proximity searching may not be in the next few versions, if it ever comes at all!

If any MySQL readers out there know how to do these in MySQL (obviously almost anything can be done in PHP, etc.)please drop me a line. You’ll receive full credit here and maybe even a nice paypal thank you!

So as we can see, MySQL does full text indexing just like MS SQL for the most part, but there is a catch. A BIG catch. You can only full text index a MyISAM table. In MySQL, there are multiple storage engines!! This means that many of the databases out there are going to have a tough time with full text indexing if they plan to scale out and stay with innoDB, unless we can find a solution. Quoted from Wikipedia, the following are some of the major differences between MyISAM and innoDB :

  1. InnoDB recovers from a crash or other unexpected shutdown by replaying its logs. MyISAM must fully scan and repair or rebuild any indexes or possibly tables which had been updated but not fully flushed to disk. Since the InnoDB approach is approximately fixed time while the MyISAM time grows with the size of the data files, InnoDB offers greater perceived availability and reliability as database sizes grow.
  2. MyISAM relies on the operating system for caching reads and writes to the data rows while InnoDB does this within the engine itself, combining the row caches with the index caches. Dirty (changed) database pages are not immediately sent to the operating system to be written by InnoDB, which can make it substantially faster than MyISAM in some situations.
  3. InnoDB stores data rows physically in primary key order while MyISAM typically stores them mostly in the order in which they are added. This corresponds to the MS SQL Server feature of “Clustered Indexes” and the Oracle feature known as “index organized tables.” When the primary key is selected to match the needs of common queries this can give a substantial performance benefit. For example, customer bank records might be grouped by customer in InnoDB but by transaction date with MyISAM, so InnoDB would likely require fewer disk seeks and less RAM to retrieve and cache a customer account history. On the other hand, inserting data in orders that differ substantially from primary key (PK) order will presumably require that InnoDB do a lot of reordering of data in order to get it into PK order. This places InnoDB at a slight disadvantage in that it does not permit insertion order based table structuring.
  4. InnoDB currently does not provide the compression and terse row formats provided by MyISAM, so both the disk and cache RAM required may be larger. A lower overhead format is available for MySQL 5.0, reducing overhead by about 20% and use of page compression is planned for a future version.
  5. When operating in fully ACID-compliant modes, InnoDB must do a flush to disk at least once per transaction, though it will combine flushes for inserts from multiple connections. For typical hard drives or arrays, this will impose a limit of about 200 update transactions per second. If you require higher transaction rates, disk controllers with write caching and battery backup will be required in order to maintain transactional integrity. InnoDB also offers several modes which reduce this effect, naturally leading to a loss of transactional integrity. MyISAM has none of this overhead, but only because it does not support transactions.

Now that we’ve read through the details, lets get to the point! InnoDB, MyISAM, and other engines allow a flexibility in decision making for administrators and developers. MS SQL has only one storage engine. This brings us back to the legos vs. furniture philosophy seen in the MySQL backups simplified post from last week. Unix is flexible, and Microsoft has a vision. You’ve got to work a little harder to make decisions in the Unix world, but it pays off when your tools match your problems. I encourage everyone interested in the MySQL storage engines and other database internals to check out Understanding MySQL Internals (Understanding)

For those of you not yet familiar with the SQL Server storage engine, check out Delaney’s amazing work on Inside Microsoft (r) SQL Server (tm) 2005: The Storage Engine

My suggestion for this particular case, which requires searching based on a column of RSS text, is that we have both MyISAM and innoDB tables in the same database. This won’t affect our backups, since we use mysqldump and, for the time being, won’t affect our performance too negatively. Since we’ll only be storing the text data in the RSS text table, we don’t need to worry too much about foreign key issues and other things that apparently affect MyISAM. A simple search and JOIN will solve all of our full text needs.

You can create a table using any MySQL compatible engine using the following syntax :

CREATE TABLE t (i INT) ENGINE = MYISAM;

or

CREATE TABLE t (i INT) ENGINE = INNODB;

This is a really cool feature, and allows you to mix and match technologies for storage inside a single database, which is something quite different from MS SQL. Also, you could get really crazy and create an entirely new database for your ISAM needs and use fully qualified data element names like SELECT * FROM databasename.tablename. Most people don’t recommend this.

More details to come with the final implementation, and look for an update to the backup post soon, as we document the FTP process for offsite backups!

Becoming a great DBA, and how you should spend your time

 

How do you spend your time at work? If you’re a DBA, you should be spending it well. There are so many responsibilities that you will face as a DBA. Sometimes, you’ll need to put the time into tedious and difficult tasks while remaining interested and engaged, other times you’ll need to keep a certain mindset or attitude to maximize cooperation from coworkers. The information below is not a complete roadmap to success, but it is a great set of guidelines for someone looking to become a DBA in a small or midsized organization.

 

To begin with, and perhaps most importantly, you can spend much less time maintaining and tuning your systems if you understand the basics of relational theory and design principles. In order to gain the benefit of decades of research in the relational field, make sure that you are well aware of best practices, good relational design, and the basic fundamentals of databases in general. This is the biggest mistake amongst DBAs. Know your craft at a fundamental level, or be forever forced to repeat the mistakes that we see every day in the MS SQL programming newsgroups.

Once you are sure that you system is well designed, you need to check on your systems, relentlessly and with no mercy. I spend at least an hour per day either checking on systems, or automating system maintenance and observation. This includes creating new alerts with various triggers, automating DMV reports to myself, and running maintenance via direct SQL or the SQL Agent. A lot of the time, a simple manual check on a machine can unearth problems that you haven’t imagined. When you find these, modify your maintenance plan and ensure that the problem doesn’t find itself happening on other SQL instances. It is important that you visit your machines and have a look around occasionally.

Performance tuning your system is closely tied with the monitoring of your system. As you find problems and bottlenecks, solve them. Take care of your indexes and your code and take care of your performance in general. Ensuring that you can scale in the event of unexpected and unpredictable growth will allow your employers to make the kinds of leaps in growth that will keep them healthy. A healthy employer experiencing growth is much more likely to throw a poor DBA a bone when bonus time rolls around.

For further performance tuning I spend time every day rewriting queries for performance. Going over my old code allows me to grow, and utilize new methods that I learned through my research. Rewriting code is probably one of the best things you can do to improve your programming skills. Make sure that you know how to measure performance, so you can verify that your changes are actually improvements! Additionally, there are other people here at my office writing SQL. Developers, report writers, and management are often rolling their own pet projects that often are eventually moved into production. I have to make sure that their code is either portable, or at least documented. Some of this code is really bad, and can take an inordinate amount of time to fix it depending on the talent of the original author.

You must research new technology and read up on blogs and SQL newsgroups every day. Keeping up with the newest published material is also a necessity. It is very important to keep up with not only the newest developments in the industry, but also to go through older material to ensure that you haven’t missed anything important. A good DBA will be a voracious reader. Just last month I implemented some calendar tables after reading Celko’s newest book (featured on the sidebar to your right.) I try to read most of the new titles from major authors in the field like Celko, Delaney, Ben-Gan, and Date.

All that knowledge is worthless without execution. For a quick solution in an emergency, testing is very important! I spend a few days per month testing restores, disaster recovery, and basic administrative functions. Databases are mission critical in almost every case, and you need to make sure that your skills are sharp in case of a problem. Regularly practicing the scenarios listed above will put you miles above the rest when it really counts. I extend this to reviewing the names and functions of valuable troubleshooting DMVs, reviewing the location and purpose of my collection of SQL, and reviewing my ability to fix locking and performance issues. Do you know how to connect into the DAC and kill an out of control process if your CPU is pegged? Could you do it without looking up the processes and methods? If your DB is having major issues and people want it up right now, will you be distracted with your nose in a book, or fixing the problem?

Paying attention to specifics is required in any job, but we lose our way without the big picture. As a database admin, or any IT professional, evaluating and implementing sweeping trends in IT are important if you want to stay on the cutting edge. Green IT and server virtualization are quite big right now (and somewhat related to each other.) I can proudly say that my organization is on the bleeding edge here. You should examine these trends with great care to ensure that you are not missing out on the next jump in technology.

New technology inevitably leads to new projects. New projects are a major draw on my energy, and any rollout of a new system, especially when written in house, takes all my energy in design, planning, and implementation. If you’re like me, most of your time will be spent on supporting new databases and new code! Make sure that you spend some time each month planning for future growth and really putting hard numbers on paper. If you think you need to move to a new SAN in 6 months, tell your CFO and CEO now. They’ll work it into their projections, and you won’t be left caring for an inadequate system at the last moment. Not planning for the future always looks bad and inevitably ends up costing productivity for the entire organization, so even if you’re busy you need to do this.

In the absence of a major project being implemented, or perhaps due to requirements of an upcoming project, we all have to upgrade. You’ve got to keep up! Patching and upgrading don’t happen often, but when they do, they’re a huge drain on my time. Planning a rollout to 22 SQL servers littered all over the globe can be an enormous project. Planning and testing are key here, and should never be rushed. Although I’ve been working with SQL 2008 for months, I estimate that my half completed plan will take me another few months to design, and probably a week to implement. When I do implement my rollout, I’ll be forced to update nearly all of my documentation.

Documentation is a difficult process. Not only does it require you to spend what could very well be years on the inevitably poorly documented existing infrastructure, but any time a change is made the appropriate changes to documentation must be made. Using a change management system is very helpful in this respect, but will often inspire angry e-mails and hateful glances. Ignore these and do what you need to do to ensure that your system is properly documented. Brush up on your UML, or Visio if UML is too much, but never avoid documentation.

 

Complying to international standards like ISO, and local standards like a naming convention for objects, sounds very easy. It never is. Unless you are the conduit through which all changes in your databases are made, you will need to clean up other people’s messes. Stay steadfast in this, for it will make all the difference should anyone ever need to work on your system without you there. It will also ensure that your system is as portable as possible when the inevitable change happens. It will also make documentation much simpler, due to the standardized nature of your work.

One thing that most DBAs let slide until well after a production system is rolled out is : security. Security is so important these days that I spend a few hours at least once or twice per week checking existing security, as well as really taking the time to think about how the system should be used, and allowing my security to reflect that. The days of global dbo are over, and you’re inviting trouble if you let your security wane. Not only are you protecting yourself against malicious destruction, but you also need to protect against mistakes and well meaning but fatally flawed design changes. You should have a security plan for every database in your organization, and that plan should allow people to do ONLY what they need to do. Less is more, always, in access rights.

As for working relationships, be nice to your developers. Spend some time programming so that you can understand their side of the story. A good working relationship with your developers will ensure that they are willing to compromise when it counts. Sometimes, you need to compromise too, just make sure that you only compromise on things that don’t affect the integrity of your system. I find that delivering bad news to my developers is always easier after we’ve spent a recent Friday night at a concert or a bar.

To further your relationship with your management team, consider introducing them to Business Intelligence. Any organization with a large amount of data will want to analyze that information. Make sure that you are proficient with ETL processes, and report writing. If you have a BI guy or girl, they will likely take care of this for you. If you don’t have one, spend some time on BI and show your boss what a database can do in the right hands. Don’t worry about overloading yourself, be firm that if they want to continue on the BI path, they’ll need to hire someone else. They’ll evaluate the situation and decide if the additional cost is worth it. It almost always is, and you’ll be the new IT hero.

I think that if you follow the above to the letter, you will definitely be the best DBA in the world. Unfortunately no one person can hope to achieve it all at once. Hopefully you have some help, but if you don’t, don’t lose hope. Do your best and do as much as you can. Take pride in your work, as that is without a doubt the single most important thing that you can do.