EXAMPLE 4 : Write a query which returns the number of people born in the Netherlands after 1875.
1. Which table(s) contain(s) the information that you need?
The table PERSON contains information about persons. The
table contains a column called YEAR_OF_BIRTH which will probably be useful
for this query. Apparently, we also need to be able to filter the records
in the table PERSON on the basis of the country name. We need to be able
to select only the people who are born in the Netherlands after a certain
year. It is true that the table person contains a column with information
about the "COUNTRY_BORN". Nevertheless, you should beware that, in assignments
such as these, you cannot be assumed to know the meaning of these codes
automatically. As the booktrade database is a relatively small database,
it might seem tempting to assume that the code "nl" represents the Netherlands.
In the case of larger databases, however, finding the exact meaning of
codes might be more difficult. If you were asked to select all the people
who are born in, for instance, "Niger", then you can probably imagine
that finding the code that you need is more problematic, especialy if
the table contains codes such as "NI", "NE", "NG".
In this situation, you really need to consult the look-up table which
contains the meaning of these codes. Additionally, the codes in this database
happen to be based on an internationally accepted ISO standard, but it
might also be the case that the database designer has chosen to use some
locally defined codes, or perhaps only a meaningless, automatically generated
number for each country in the look-up table. So the situation here is
in fact very similar to that in the previous exercise. The foreign key
in "COUNTRY_BORN" refers to one of the codes which is the primary key
in the table COUNTRY, and it represents a certain country. So, to be able
to select rows on the basis of the NAME of the country, the tables PERSON
and COUNTRY need to be joined. We can start this query by writing down:
FROM PERSON, COUNTRY. As the primary key in COUNTRY functions as
a foreign key in PERSON, we shoulld write down the following WHERE command:
WHERE COUNTRY_BORN = COUNTRY_CODE. Like ion the previous exercise,
this join will result in expanded records where the names of the countries
are also available in all the further actions that you want to perform.
If there is a match between the country code that is present in COUNTRY_BORN
in PERSON and COUNTRY_CODE in COUNTRY, then the record from COUNTRY will
be appended to the record in PERSON. This will look as follows
2. Are you interested in all the records in this/these table(s)?
We are not interested in all the records in these tables. From the listing
above, we need to select only those persons who are born in the Netherlands
and those persons who are born after 1875. This means that we need to
include two criteria in our query, in addition to the criterium that was
needed to link the two tabels. The two requirements that we need to add
are COUNTRY_NAME = 'NETHERLANDS' and YEAR_OF_BIRTH > 1875.
Having completed stage 2, the query that we have arrived at is FROM
PERSON, COUNTRY WHERE COUNTRY_BORN = COUNTRY_CODE AND COUNTRY_NAME = 'NETHERLANDS'
AND YEAR_OF_BIRTH > 1875.
3. Are you interested in the actual contents of the records or in statistical
information about the records in the table? If so, do you want to receive
one single answer for the entire table, or would you like to receive different
answers for different types of records?
The previous step should have resulted in a list of persons who are
(1) born in the Netherland and (2) born after 1875. The execise states
that we do not need to display the actual attributes of these records.
In this exercise, we merely want to know the number of records in this
set of records. We want to know how many people answer these two criteria
that we have specified. The data themselves are not relevant in that case.
By default, SQL considers each record separately. Nevertheless, it is
also possible to force the SQL processor to consider the table as a whole.
If you use one of the in-built functions of SQL, the scope of the operation
will move from a single row to an entire table. The following aggregate
functions are available in SQL:
counts the number of rows
adds up all the values in a certain column
calculates the simple average all the values in a certain
selects the record in which the value for the column
that is specified is the lowest
selects the record in which the value for the column
that is specified is the highest
If you use one of these functions, the result will be that the entire
table is reduced to a single row with a single column which contains the
answer. For the current query, we obviously need to use COUNT function.
The next question is perhaps what we should include in between the brackets.
As is indicated above, you can enter the name of a column as a parameter
of this function. It is important to realise that, when you choose a column
for which it is allowed to be empty (NULL), there might be the risk that
the count function returns a number which is lower than the actual number
of records. This is because empty columns will not be counted. For example,.
if you use COUNT(YEAR_OF_DEATH), there might be the problem that records
for the people who are still alive will not be cunted. This might of course
be your intention under certain circumstances, but in this query we want
to make sure that each column is counted. In this situation, the best
option is to use COUNT(*). Each record will have at least one column which
is not empty. As you know, in well-defined relational database, each record
will have at least a primary key. By including all the columns in the
aggregate function by means of the asterisk, you know that no records
will be skipped. Note that COUNT(P_ID) should produce the exact same result.
If you only want to count the different values, then you can of course
use DISTINCT to get rid of duplicates first.
4. Do you want to display each column or only a few columns in these tables?
We only want to know how many people were born after 1875. The result
of the query should be a single number.
5. Do you want to see duplicate values or are you only
interested in different values? 6. Do the records in the result set need
to be sorted in any particular way?
As this query should produce only one row and one column as a result, these
two steps are not relevant. The query is now complete.