SQL Tutorial

Previous | Home  | Next  

SQL Keywords



A structured approach to writing queries
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6

EXAMPLE 5 : Write a query which will indicate, for each city in table COMPANY, how many companies are based in that city. The result set should display the city codes, followed by the the number of companies.

1. Which table(s) contains the information that you need?

Answering this question in this case is relatively easy, as the name of table that we will be working with is provided literally in the exercise. Apart from COMPANY, we will not be needing another table. This is because we are asked to provide the city codes only. The situation would have been different if we were asked to provide the names of the cities as well. Fortunately, only the ids (the keys) need to be displayed, which means that the table COMPANY should provide us with all the information that we need. At this stage, we can simply write down: FROM COMPANY.

2. Are you interested in all the records in this/these table(s)?

No company should be ignored in this query. The operation that we want to perform should take into account all the companies in the table, so the WHERE command should not be used in this query.

3. Are you interested in the actual contents of the records or in statistical information about the records in the table? If so, do you want to receive one answer for the entire table, or would you like to receive different answers for different types of records?

As in the previous exercise, we are not so much interested in the actual values of the attributes rather than in statistical information about these records. The exercise asks us to provide the number of companies, and this suggests of course that we need to use the COUNT(*) function again. There is also a difference with the previous exercise. The result of the query should not be a single number which indicates the number of records in the entire table. In this situation, we want to produce different answers for different types of records. Namely, a different answer should be given for different cities. To be able to do this, we should divide the table into different groups first, on the basis of the values that are present in the column CITY. All the companies which have an identical value in the column CITY will be placed in the same group. The COUNT(*) function should then be applied to each group individually. We use GROUP BY followed by the name or names of one or more columns. The GROUP BY statement is useful only if dividing into groups is actually possible. GROUP BY [PRIMARY KEY] is not useful because this will never result in different groups consisting of more than one record.

If the COUNT(*) function is used in combination with GROUP BY, the number of rows will be counted in each group. In this case, the result will be "1" for "GB EGB", 4 for "GB LON", 1 for "GB OXF", and so on. The result of this query will be a table in which the number of records in each group is provided in a separate row.

4. Which columns do you want to see?

Remember that it is only possible to use GROUP BY if you also include an in-built SQL function in the SELECT list. If you mention only the name of a column, then this will produce an error. The number of expressions that you can use in the select list of a query which contains GROUP BY is highly restriced. If a table has been divided into groups, then the ONLY expressions that you can include after SELECT are
  1. The columns that you mention after GROUP BY. This is in fact a required item in the select list. If you mention, outside of the SQL function, the name of a column which is not also mentioned after GROUP BY, the DBMS will retrun an error. In this particular query, we have used GROUP BY CITY, which means that the select list must include miminally SELECT CITY
  2. An SQL function. This is in fact required. Inside the brackets, you are free to choose each column from the table that you want. Beware, however, that if you choose a non-key item, there might be the risk that one of the columns contains a NULL value. In the latter case, the result of the COUNT() function will be incorrect. In most cases, the safest option is to use COUNT(*).
In this query, we should write: SELECT CITY, COUNT(*) FROM COMPANY GROUP BY CITY.

5. Do you want to see duplicate values or are you only interested in different values? 6. Do the records in the result set need to be sorted in any particular way?

These two steps do not apply in the case of queries which contain GROUP BY. The final form of the query has already been established after step 5.

[ up  | next ]