SQL Tutorial

Previous | Home  | Next  

SQL Keywords



A structured approach to writing queries
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6
DBWP - SQL Tutorial

EXAMPLE 4 : Write a query which returns the number of people born in the Netherlands after 1875.

1. Which table(s) contain(s) the information that you need?

The table PERSON contains information about persons. The table contains a column called YEAR_OF_BIRTH which will probably be useful for this query. Apparently, we also need to be able to filter the records in the table PERSON on the basis of the country name. We need to be able to select only the people who are born in the Netherlands after a certain year. It is true that the table person contains a column with information about the "COUNTRY_BORN". Nevertheless, you should beware that, in assignments such as these, you cannot be assumed to know the meaning of these codes automatically. As the booktrade database is a relatively small database, it might seem tempting to assume that the code "nl" represents the Netherlands. In the case of larger databases, however, finding the exact meaning of codes might be more difficult. If you were asked to select all the people who are born in, for instance, "Niger", then you can probably imagine that finding the code that you need is more problematic, especialy if the table contains codes such as "NI", "NE", "NG".


In this situation, you really need to consult the look-up table which contains the meaning of these codes. Additionally, the codes in this database happen to be based on an internationally accepted ISO standard, but it might also be the case that the database designer has chosen to use some locally defined codes, or perhaps only a meaningless, automatically generated number for each country in the look-up table. So the situation here is in fact very similar to that in the previous exercise. The foreign key in "COUNTRY_BORN" refers to one of the codes which is the primary key in the table COUNTRY, and it represents a certain country. So, to be able to select rows on the basis of the NAME of the country, the tables PERSON and COUNTRY need to be joined. We can start this query by writing down: FROM PERSON, COUNTRY. As the primary key in COUNTRY functions as a foreign key in PERSON, we shoulld write down the following WHERE command: WHERE COUNTRY_BORN = COUNTRY_CODE. Like ion the previous exercise, this join will result in expanded records where the names of the countries are also available in all the further actions that you want to perform. If there is a match between the country code that is present in COUNTRY_BORN in PERSON and COUNTRY_CODE in COUNTRY, then the record from COUNTRY will be appended to the record in PERSON. This will look as follows :

2. Are you interested in all the records in this/these table(s)?

We are not interested in all the records in these tables. From the listing above, we need to select only those persons who are born in the Netherlands and those persons who are born after 1875. This means that we need to include two criteria in our query, in addition to the criterium that was needed to link the two tabels. The two requirements that we need to add are COUNTRY_NAME = 'NETHERLANDS' and YEAR_OF_BIRTH > 1875.

Having completed stage 2, the query that we have arrived at is FROM PERSON, COUNTRY WHERE COUNTRY_BORN = COUNTRY_CODE AND COUNTRY_NAME = 'NETHERLANDS' AND YEAR_OF_BIRTH > 1875.

3. Are you interested in the actual contents of the records or in statistical information about the records in the table? If so, do you want to receive one single answer for the entire table, or would you like to receive different answers for different types of records?

The previous step should have resulted in a list of persons who are (1) born in the Netherland and (2) born after 1875. The execise states that we do not need to display the actual attributes of these records. In this exercise, we merely want to know the number of records in this set of records. We want to know how many people answer these two criteria that we have specified. The data themselves are not relevant in that case. By default, SQL considers each record separately. Nevertheless, it is also possible to force the SQL processor to consider the table as a whole. If you use one of the in-built functions of SQL, the scope of the operation will move from a single row to an entire table. The following aggregate functions are available in SQL:

COUNT([COLUMN]) counts the number of rows
SUM([COLUMN]) adds up all the values in a certain column
AVG([COLUMN]) calculates the simple average all the values in a certain column
MIN([COLUMN]) selects the record in which the value for the column that is specified is the lowest
MAX([COLUMN]) selects the record in which the value for the column that is specified is the highest

If you use one of these functions, the result will be that the entire table is reduced to a single row with a single column which contains the answer. For the current query, we obviously need to use COUNT function. The next question is perhaps what we should include in between the brackets. As is indicated above, you can enter the name of a column as a parameter of this function. It is important to realise that, when you choose a column for which it is allowed to be empty (NULL), there might be the risk that the count function returns a number which is lower than the actual number of records. This is because empty columns will not be counted. For example,. if you use COUNT(YEAR_OF_DEATH), there might be the problem that records for the people who are still alive will not be cunted. This might of course be your intention under certain circumstances, but in this query we want to make sure that each column is counted. In this situation, the best option is to use COUNT(*). Each record will have at least one column which is not empty. As you know, in well-defined relational database, each record will have at least a primary key. By including all the columns in the aggregate function by means of the asterisk, you know that no records will be skipped. Note that COUNT(P_ID) should produce the exact same result.

If you only want to count the different values, then you can of course use DISTINCT to get rid of duplicates first.

4. Do you want to display each column or only a few columns in these tables?

We only want to know how many people were born after 1875. The result of the query should be a single number.
5. Do you want to see duplicate values or are you only interested in different values? 6. Do the records in the result set need to be sorted in any particular way?

As this query should produce only one row and one column as a result, these two steps are not relevant. The query is now complete.


[ up  | next ]