Tuesday, January 27, 2009

Learn 10% SQL That Accounts for 90% Query

I just finished reading Data Crunching: Solve Everyday Problems Using Java, Python, and More , that I borrowed from the National Library. In Chapter 6, the author showed his readers "... meet the 10% of SQL that accounts for 90% of common use ...".

Here I am writing that 10% down in my blog so that next time I can refer to it and I am using SQLite database to illustrate all the steps. BTW, SQLite (SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine) is an extremely powerful database and definitely worth your time in learning it.

Here I am going to create the database table and populate it with data in my Cygwin environment:

$ uname -a
CYGWIN_NT-6.0 user-PC 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin

$ ls -l test.db
ls: cannot access test.db: No such file or directory

$ sqlite3.exe test.db
SQLite version 3.6.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE Person (
   ...> EmpId  INTEGER NOT NULL PRIMARY KEY,
   ...> FirstName TEXT NOT NULL,
   ...> LastName TEXT NOT NULL,
   ...> Rate  DECIMAL
   ...> );
sqlite> CREATE TABLE Assigned (
   ...> EmpId  INTEGER NOT NULL,
   ...> ProjId INTEGER,
   ...> StartDate DATE,
   ...> EndDate DATE
   ...> );
sqlite> CREATE TABLE Customer (
   ...> CustId INTEGER NOT NULL PRIMARY KEY,
   ...> ContactInfo TEXT
   ...> );
sqlite> CREATE TABLE Project (
   ...> ProjId INTEGER NOT NULL PRIMARY KEY,
   ...> ProjName TEXT,
   ...> CustId INTEGER,
   ...> StartDate DATE,
   ...> EndDate DATE
   ...> );
sqlite>
sqlite> INSERT INTO Person VALUES (3001,'Dave','Thomas',400);
sqlite> INSERT INTO Person VALUES (3002,'Andy','Hunt',400);
sqlite> INSERT INTO Person VALUES (4001,'Greg','Wilson',320);
sqlite> INSERT INTO Person VALUES (4002,'Grace','Hopper',500);
sqlite> INSERT INTO Person VALUES (4003,'Alan','Turing',500);
sqlite> INSERT INTO Person VALUES (4004,'Chunk','Babbage',125);
sqlite>
sqlite> INSERT INTO Project VALUES (904,'RubyMath',70043,'2004-05-01','2004-10-30');
sqlite> INSERT INTO Project VALUES (905,'DBBridge',70047,'2004-05-01','2004-10-30');
sqlite>
sqlite> INSERT INTO Customer VALUES (70043,'MegaCorp Inc.');
sqlite> INSERT INTO Customer VALUES (70047,"Deadlines 'R' Us");
sqlite> INSERT INTO Customer VALUES (70101,'UNiversity of Euphoria');
sqlite>
sqlite> INSERT INTO Assigned VALUES (3001,904,'2005-02-01','2005-02-28');
sqlite> INSERT INTO Assigned VALUES (3002,904,'2005-02-01','2005-03-15');
sqlite> INSERT INTO Assigned VALUES (4001,904,'2005-02-01','2005-03-21');
sqlite> INSERT INTO Assigned VALUES (4001,905,'2005-01-10','2005-02-22');
sqlite> INSERT INTO Assigned VALUES (4002,905,'2005-01-20','2005-04-01');
sqlite> INSERT INTO Assigned VALUES (4004,905,'2005-02-10','2005-03-31');
sqlite>
sqlite> .q

$ ls -l test.db
-rwxrwxrwx 1 user None 5120 Jan 27 20:18 test.db

Here are some of the SQL statements in doing join, nesting and negation. Basically they are trying to answer the following questions:

  • Who is paying for the RubyMath project ?
  • Get forenames and surnames of employees on RubyMath project
  • Select people who are NOT assigned to the RubyMath project
  • Select people who are assigned to exactly one project
  • Find people in 904 or 905, but not both
  • Find the most expensive contractors
$ sqlite3.exe test.db
SQLite version 3.6.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> -- Who is paying for the RubyMath project ?
sqlite> SELECT Customer.ContactInfo
   ...> FROM   Customer, Project
   ...> WHERE (Customer.CustId = Project.CustId)
   ...>   AND (Project.ProjName = 'RubyMath');
MegaCorp Inc.
sqlite>
sqlite>
sqlite> SELECT Customer.ContactInfo
   ...> FROM   Customer INNER JOIN Project
   ...> ON     Customer.CustId = Project.CustId
   ...> WHERE  Project.ProjName = 'RubyMath';
MegaCorp Inc.
sqlite>
sqlite>
sqlite> -- Get forenames and surnames of employees on RubyMath project
sqlite> SELECT Person.FirstName, Person.LastName
   ...> FROM   Person, Project, Assigned
   ...> WHERE (Person.EmpId = Assigned.EmpId)
   ...>   AND (Project.ProjName = 'RubyMath')
   ...>   AND (Assigned.ProjId = Project.ProjId);
Dave|Thomas
Andy|Hunt
Greg|Wilson
sqlite>
sqlite>
sqlite> -- Select people who are NOT assigned to the RubyMath project
sqlite> SELECT Person.FirstName, Person.LastName
   ...> FROM   Person
   ...> WHERE  Person.EmpId NOT IN
   ...>        (SELECT Assigned.EmpId
   ...>         FROM   Assigned, Project
   ...>         WHERE (Assigned.ProjId = Project.ProjId)
   ...>         AND   (Project.ProjName = 'RubyMath'));
Grace|Hopper
Alan|Turing
Chunk|Babbage
sqlite>
sqlite>
sqlite> -- Select people who are assigned to exactly one project
sqlite> SELECT Person.FirstName, Person.LastName
   ...> FROM   Person, Assigned
   ...> WHERE  (Person.EmpId = Assigned.EmpId)
   ...> AND    (Assigned.EmpId NOT IN
   ...>         (SELECT A.EmpId
   ...>          FROM   Assigned A, Assigned B
   ...>          WHERE (A.EmpId = B.EmpId)
   ...>          AND   (A.ProjId < B.ProjId)));
Dave|Thomas
Andy|Hunt
Grace|Hopper
Chunk|Babbage
sqlite>
sqlite>
sqlite> -- Find people in 904 or 905, but not both
sqlite> SELECT Person.FirstName, Person.LastName
   ...> FROM   Person, Assigned
   ...> WHERE  (Person.EmpId = Assigned.EmpId)
   ...> AND    ((Assigned.ProjId = 904) OR (Assigned.ProjId = 905))
   ...> AND    (Assigned.ProjId NOT IN
   ...>         (SELECT A.ProjId
   ...>          FROM Assigned A, Assigned B
   ...>          WHERE (A.ProjId = 904) AND (B.ProjId = 905)));
Greg|Wilson
Grace|Hopper
Chunk|Babbage
sqlite>
sqlite>
sqlite> -- Find the most expensive contractors
sqlite> SELECT Person.FirstName, Person.LastName
   ...> FROM Person
   ...> WHERE (Person.Rate NOT IN
   ...>        (SELECT A.Rate
   ...>         FROM Person A, Person B
   ...>         WHERE A.Rate < B.Rate));
Grace|Hopper
Alan|Turing
sqlite> .q

$

Here are the SQL statements in aggregation and views answering the following questions:

  • Get the total rate for all consultants
  • Find consultant who rate is above average
$ sqlite3.exe test.db
SQLite version 3.6.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> -- Get the total rate for all consultants
sqlite> SELECT Project.ProjName, SUM(Person.Rate)
   ...> FROM   Person, Assigned, Project
   ...> WHERE  (Person.EmpId = Assigned.EmpId)
   ...> AND    (Project.ProjId = Assigned.ProjId)
   ...> GROUP BY Assigned.ProjId;
RubyMath|1120
DBBridge|945
sqlite>
sqlite>
sqlite> -- view
sqlite> CREATE VIEW ProjAveRate AS
   ...>   SELECT Project.ProjId AS ProjId,
   ...>          AVG(Person.Rate) AS AveRate
   ...>   FROM   Person, Assigned, Project
   ...>   WHERE  (Person.EmpId = Assigned.EmpId)
   ...>   AND    (Project.ProjId = Assigned.ProjId)
   ...>   GROUP BY Assigned.ProjId;
sqlite>
sqlite>
sqlite> SELECT ProjAveRate.ProjId, Person.FirstName, Person.LastName, Person.Rate, ProjAveRate.AveRate
   ...> FROM   Person, Assigned, ProjAveRate
   ...> WHERE  (Person.EmpId = Assigned.EmpId)
   ...> AND    (Assigned.ProjId = ProjAveRate.ProjId)
   ...> AND    (Person.Rate > ProjAveRate.AveRate);
904|Dave|Thomas|400|373.333333333333
904|Andy|Hunt|400|373.333333333333
905|Greg|Wilson|320|315.0
905|Grace|Hopper|500|315.0
sqlite> .q

$

If you cannot get hold of the book, you may want to read the author's article - Top Ten Data Crunching Tips and Tricks.

Labels:

1 Comments:

Blogger Unknown said...

Hi, thanks for posting this. I wanted to learn SQL with Cygwin and this blog is great.

One thing I noticed while working through the examples, there seems to be a mistake in the example to find people in project 904 or 905 but not both.


sqlite> -- Find people in 904 or 905, but not both
sqlite> SELECT Person.FirstName, Person.LastName
...> FROM Person, Assigned
...> WHERE (Person.EmpId = Assigned.EmpId)
...> AND ((Assigned.ProjId = 904) OR (Assigned.ProjId = 905))
...> AND (Assigned.ProjId NOT IN
...> (SELECT A.ProjId
...> FROM Assigned A, Assigned B
...> WHERE (A.ProjId = 904) AND (B.ProjId = 905)));
Greg|Wilson
Grace|Hopper
Chunk|Babbage

Now Greg Wilson is the only guy in both 904 and 905. So the result should be everyone else.
Instead we're getting Greg, Grace and Chunk. Who are all working project 905.

The last section of this, assuming it's independent of the previous linkages and constraints (is it?) is
...> (SELECT A.ProjId
...> FROM Assigned A, Assigned B
...> WHERE (A.ProjId = 904) AND (B.ProjId = 905))

This returns 904, because we're returning a ProjId. And the selection is trivial, because there's no linkage between the tables A and B. So from here we're returning 904, and the main loop is going to select people not in that project. Thus we're going to select everyone on project 904 or 905 but not in project 904 – so everyone on project 905. Which we do.

It looks like, in the inner check, we're wanting to identify an Employee that is involved in both projects, but you'd want to return a Assigned.EmpId for that. Also, you'd want to link the two tables with an AND (A.EmpId = B.EmpId) to make it meaningful.

...> (SELECT A.EmpId
...> FROM Assigned A, Assigned B
...> WHERE (A.EmpId = B.EmpId)
...> AND (A.ProjId = 904) AND (B.ProjId = 905)

So the total check would be

sqlite> SELECT Person.FirstName, Person.LastName
...> FROM Person, Assigned
...> WHERE (Person.EmpId = Assigned.EmpId)
...> AND ((Assigned.ProjId = 904) OR (Assigned.ProjId = 905))
...> AND (Assigned.Empid NOT IN
...> (SELECT A.EmpId
...> FROM Assigned A, Assigned B
...> WHERE (A.EmpId = B.EmpId)
...> AND (A.ProjId = 904) AND (B.ProjId = 905)
...> ));

Does that make sense?

11:01 AM  

Post a Comment

<< Home