EXAMPLE 5 : Write a query which will indicate, for each city in table COMPANY,
how many companies are based in that city. The result set should display the
city codes, followed by the the number of companies.
|1. Which table(s) contains the information that you need?
Answering this question in this case is relatively easy, as the name of
table that we will be working with is provided literally in the exercise.
Apart from COMPANY, we will not be needing another table. This is because
we are asked to provide the city codes only. The situation would have been
different if we were asked to provide the names of the cities as well. Fortunately,
only the ids (the keys) need to be displayed, which means that the table
COMPANY should provide us with all the information that we need. At this
stage, we can simply write down: FROM COMPANY.
2. Are you interested in all the records in this/these table(s)?
No company should be ignored in this query. The operation that we want
to perform should take into account all the companies in the table, so
the WHERE command should not be used in this query.
3. Are you interested in the actual contents of the records or in statistical
information about the records in the table? If so, do you want to receive
one answer for the entire table, or would you like to receive different
answers for different types of records?
As in the previous exercise, we are not so much interested in the actual
values of the attributes rather than in statistical information about
these records. The exercise asks us to provide the number of companies,
and this suggests of course that we need to use the COUNT(*) function
again. There is also a difference with the previous exercise. The result
of the query should not be a single number which indicates the number
of records in the entire table. In this situation, we want to produce
different answers for different types of records. Namely, a different
answer should be given for different cities. To be able to do this, we
should divide the table into different groups first, on the basis of the
values that are present in the column CITY. All the companies which have
an identical value in the column CITY will be placed in the same group.
The COUNT(*) function should then be applied to each group individually.
We use GROUP BY followed by the name or names of one or more columns.
The GROUP BY statement is useful only if dividing into groups is actually
possible. GROUP BY [PRIMARY KEY] is not useful because this will never
result in different groups consisting of more than one record.
If the COUNT(*) function is used in combination with GROUP
BY, the number of rows will be counted in each group. In this case, the
result will be "1" for "GB EGB", 4 for "GB LON",
1 for "GB OXF", and so on. The result of this query will be
a table in which the number of records in each group is provided in a
4. Which columns do you want to see?
Remember that it is only possible to use GROUP BY if you also include an
in-built SQL function in the SELECT list. If you mention only the name of
a column, then this will produce an error. The number of expressions that
you can use in the select list of a query which contains GROUP BY is highly
restriced. If a table has been divided into groups, then the ONLY expressions
that you can include after SELECT are
In this query, we should write: SELECT CITY, COUNT(*) FROM COMPANY GROUP
- The columns that you mention after GROUP BY. This is in fact a required
item in the select list. If you mention, outside of the SQL function,
the name of a column which is not also mentioned after GROUP BY, the
DBMS will retrun an error. In this particular query, we have used GROUP
BY CITY, which means that the select list must include miminally SELECT
- An SQL function. This is in fact required. Inside the brackets, you
are free to choose each column from the table that you want. Beware,
however, that if you choose a non-key item, there might be the risk
that one of the columns contains a NULL value. In the latter case, the
result of the COUNT() function will be incorrect. In most cases, the
safest option is to use COUNT(*).
5. Do you want to see duplicate values or are you only interested in different
values? 6. Do the records in the result set need to be sorted in any particular
These two steps do not apply in the case of queries which contain GROUP
BY. The final form of the query has already been established after step
[ up | next ]