Query


Query is the essential function provided by the VB ELF program, which all the other facilities are designed to support. We use "query" to mean the process of typing in a question about the information in your database, and its translation into the "native" data access language (SQL); it also includes the automatic formatting of the answer in any of several response styles including datasheet, form, worksheet or graph.


The Basics

In general, all query styles are requests for information of one form or another. They include asking direct questions, giving commands to list or display certain data, naming or describing the data set you're interested in, and using key phrases to trigger special formatting or graphing features. Syntax which is NOT supported by the program includes any statement designed to change the data in your tables, or to change the structure of a table within the database.

So for instance, VB ELF permits direct questions such as:
Which customers bought seafood last month?
or statements like:
Show the customers that bought seafood last month.
as well as descriptions of the required data:
Customers that bought seafood last month.
Finally, a simple example of a data formatting keyword is "graph", as in:
Graph the customers that bought seafood each month.


Updating data using queries

The answers returned may or may not be "updateable" recordsets; this depends on the complexity of the query and the relationships between the database tables. VB ELF does not enforce a "read-only" property on the datasets it returns, so you should be aware that writing over the entries in the returned datasheet will change the underlying data; unless the Microsoft Jet Engine itself prevents you, which it usually does by ignoring your keystrokes or beeping and displaying a "This Recordset is not updateable." message. For instance, if you type in Show the number of customers that bought seafood each month. and try to type an "8" over a "7" for the month of June, Jet will not permit this because of its data-integrity rules -- the sum must match up with the order details. Sometimes the reasons why Jet objects to a change are less clear, for instance if you try to update data from tables participating in a one-to-many relationship.

VB ELF does not directly support changing data by using English statements because we believe that -- while English query tools are coming into their own as useful, even essential data access instruments -- they are not yet reliable enough to trust with the task of mass changes to your database. On the other hand, we don't block its use as an update tool, for instance by enforcing a "snapshot" property on the datasets returned. In some cases, it might be useful to combine the reporting and updating features, for example by listing the records satisfying some condition (employees in Washington) then updating some field (eg. changing [Reports To] from 2 to 3, so that everyone who reported to Fuller now reports to Leverling).

Although VB ELF does not deliberately prohibit this, it often generates recordsets that aren't updateable, due to use of the DISTINCT keyword. While this guarantees that there won't be any redundant information reported, it also prevents updates. You can override this by switching a query into a design view (for instance by opening the SQL window, or using the Edit button or SQL panel on the Worksheet), and deleting the DISTINCT keyword from the SQL.

If you want to specify that the responses should NOT permit changes to the data, we have provided a "Read-Only" drop-down option from the Responses Tab of the Settings window. However, this works only for OLE DB data and does not appear to be currently supported by many of the (still quite-young) OLE DB Providers. In fact, so far it appears to be supported only by the OLE DB Providers for Jet.


Viewing Samples

One way to become familiar with the many permissible styles of query is to use the built-in Samples listing, which will show examples from Northwind and several other databases which match the query currently entered into the Query window. For instance, if we type Show Reports To for employees in Washington, then click
Edit Query (Eraser) on the Toolbar, and click Samples, we'll see a list including:

show photos of employees hired during 1991
show the ship date and subtotals for all orders since March of 1991
give the copyright dates for Ernest Hemingway's books
how many of Hemingway's books were published prior to 1960?
compare sum of subtotals showing customer name across and employee down
average together all the prices of Hemingway's books
graph the sum of subtotals for seafood against beverages
which customers have placed exactly 6 orders for seafood since 1991
show alphabetical listing of customers
Employees who live neither in the UK nor in Seattle

This lister uses a very simple word-counting method to try to find examples that are "like" the query you've typed in. It's not an exact science; for instance here the queries about "Hemingway" and including "against" probably result from the "in" part of "employees in Washington". The real purpose is simply to expose you to some of the possible constructions that you might use, without overwhelming you by "dumping" all of our samples. Of course, if you want to view a more complete list, you can click the Assorted Samples button, which shows the example list without applying a filter.


Editing a Query (fixing VB ELF's mistakes)

Before we talk about the results of a query, namely the response, let's go over what you can do if the translation from English to SQL doesn't generate the results you expect. Although it's quite hard to write SQL from scratch, it's often much easier to note and correct flaws in SQL that's generated for you. If you are posting these query translations to a large database, it may be prudent to do a "sanity-check" on the SQL, a quick scan that checks whether the code seems to include all the right elements in the right place.

We might type in, for example, Show the dates for workers in Washington, intending to view information about their dates of employment. Either by reviewing the SQL or by looking at the columns returned, we see that VB ELF has interpreted "date" to mean "Birth Date" rather than "Hire Date". By clicking on the Edit Query button, then selecting the "What's Wrong" drop-down box, we can get a look at some of the basic reasoning VB ELF applied to this translation. First, "worker=>Employee" -- this seems correct. "Employee=>Employee ID" -- well, OK. 'WA' as data -- sure! [Birth Date] as field -- um, no.

There are actually two ways to fix this mistake, aside from the obvious one of changing the query to read "Hire Date" (for instance, you may not immediately know the name of the field that provides hiring date information). Simply select [Birth Date] from the "What's Wrong" drop-down and click Try again -- ELF will retranslate the query using every option it can think of without interpreting "date" as [Birth Date]. In this case, the next best translation gives you what you actually want, the [Hire Date]. Another way to accomplish the same thing would be to click the Employees table in the "What's Missing" list, select [Hire Date] and click the [Add to "missing" list] button. In some cases, it may help to combine these two approaches.

In other cases the Edit Query window won't help much, if at all. (We hope that this is just a "patch" until the basic system recognizes most queries accurately, and until our customization wizards are in place to help you tailor interfaces to eliminate these mis-translations.) In these cases, you still have two choices before you revert back to the "roll-up-your-sleeves-and-write-SQL" alternative. If the error is straight-forward enough, you may be able to patch the SQL by writing the changes directly into the SQL window, then clicking Execute. In fact, you could do that here, by typing the word "Hire" over "Birth". But sometimes there's no simple, obvious change to get from where you are to where you want to be.

Let's say we've typed "Reports" instead of "Reports To" in the example above:
show the Reports field for employees in Washington
If we run this query, we'll see all kinds of information unrelated to the question. The first records starts out: "Soft drinks, coffees, teas, beer, and ale". Checking the SQL, we see references to Categories.Description and Categories.[Category Name]. What's going on?

In this case, VB ELF has changed "Report" to "Description", and "field" to "Province" then to "Region". In fact, both of these are pretty reasonable, given that the dictionary definition of "REPORT" is "history;statement;account;DESCRIPTION" etc. "FIELD" is defined as "world;area;department;subject;territory;PROVINCE", and "PROVINCE" is "world;AREA" etc. while one of the fields that Washington comes from, "Region", also has a very similar definition, "AREA;neighborhood" etc.

While this is some comfort to us as designers of the system, it doesn't help you -- the user -- get the information you're after. And there are just too many errors in the "What's Wrong" list; checking off any combination doesn't seem to help.

Still, you can use the information in the "What's Wrong" listing of possible candidates to help you modify the original query. It's not hard to see that "Report" is being interpreted in a different way than you meant it, in this case. Now you can be more specific, typing "Reports To", instead of just "Reports". If you're administering a natural language interface for others, you can use this kind of error to help suggest additions to your list of Phrase substitutions, or possibilities for the verbs included via the Verb Mapper. In this case, adding
When I Type: report      I Really Mean: report to   does the trick.


Response Styles

The style of response is selectable from the Query window by clicking on Datasheet, Form, Worksheet or Graph. If you choose Datasheet (from the Query window), you're actually selecting either Datasheet or "Best Guess", whichever was picked last from the Responses tab of the Selectable Settings window. ("Best Guess" works exactly like Datasheet unless your query contains a reference to an OLE object, such as a picture -- in that case, it displays the answer in a form.) In VB ELF, there's only one style of form response, currently; it shows each row of the dataset by itself, rather than in a grid view, like the datasheet.

The Worksheet style is the newest Response style; it also relates to the topic of verifying and correcting errant query translations.

While the basic "scan the SQL" method works for simpler types of queries, VB ELF has the ability to generate extremely complex sequences of SQL, to answer considerably complicated questions. In fact, VB ELF can generate up to seven sequential SQL statements in response to a single question. While we can't come up with any examples offhand that require the full seven, here's an example that takes six:
Show the percentage of customers that have ordered every Produce product.

To do this, VB ELF creates a list of the customers that have purchased products from the Product category, then counts the number of distinct Produce products each customer has ordered. It weeds out those that have purchased fewer than the total number of products in the Produce category, then returns a count of the remaining records, as a percentage of the total number of customers.

In this case, VB ELF performs flawlessly, but you can immediately see that along with the capability for such responses, VB ELF has the potential to create complex gibberish. In fact, if only a single symbol in the entire 42-line SQL sequence is out of place, the answer could be meaningless or incorrect. This is why it's important that the SQL generated be verifiable in a step-by-step manner. To this end we've created the Worksheet response style, which is sort of a QBE grid on steroids, for dealing with the multiple SQL statements that VB ELF can come up with.

The Worksheet shows you the intermediate datasets that chain together to form the eventual answer, as well as the individual statements of the SQL chain. An Explanation box shows a precis of the approximate logic that VB ELF is using to cobble together the answer to your query.

For this example, the Q1 explanation is not too illuminating -- this is just the source for the eventual count of all customers, for the denominator of the percent. (Note that VB ELF interprets percentage of customers to mean "percentage of customers who have placed at least one order", which is why it doesn't simply use the Customer table for this calculation.) Starting with the second SQL statement, Q2, the Explanation is more interesting:
Query assignment, show records associated with each and every one of a filtered set. Step 1: Select records meeting the filter conditions.
The "records associated" are the customers, and the "filtered set" is the set of products filtered by belonging to the Produce category.
The Q3 explanation reads:
Collect "associates" (associated with at least one of the Step 1 records).
In this step the customers that have ordered Produce are listed along with the Produce item.
The Q4 explanation reads:
Filter associates, retaining only those associated with EVERY record in Step 1; this is done by making sure that the count of distinct records for the associate is equal to the total count of distinct (unique) records in Step 1.
The corresponding SQL uses a clause: HAVING count ( [elfQ3].ProductName ) = elfUCount ( "ProductName" , "[elfQ2]" ); to effect this.
The Q5 explanation reads:
Show a complete answer including the fields used to filter the original records, the associate records, and any key information to help identify the reasons for the associations.
Actually, this is kind of wasteful, since in the case of a percentage query the only important data is the final, single number. But sometimes VB ELF can't determine what is strictly required until it reaches the end of its computation.
The Q6 explanation reads:
Display the ratio of the selected set (the previous query) to the full set (the first query) as a percentage. (Note that the ratios are calculated from the count of distinct entries in the selected field, ignoring duplicates.)

Other natural language systems sometimes use what's called an "echo" approach to verifying the correctness of generated SQL. For instance, English Wizard will generate a restatement in a kind of pidgin English, targeting something between the accuracy of SQL and the clarity of English. There are several problems with this approach. First, if you try to be as specific in the Echo as in the SQL itself, you may confuse less technical users -- the very user this feature is designed for. If you implement this feature inexactly -- as of course Linguistic Technology has done -- you may also wind up obscuring rather than clarifying: English Wizard often generates a Restatement that has no discernable relation to either the SQL or the English input, leaving the user completely baffled. Finally, even an accurate restatement of a complicated question doesn't guarantee that the SQL generated is correct.

English Wizard skirts this problem since it's unable to generate complex queries to begin with. Their "Evaluator's Guide to Natural Language Software" of course tries to portray this as a benefit: "Generating a single SQL statement with a sub-select may be executed faster by the DBMS than executing multiple SQL statements because of optimizations the DBMS can make." They are claiming that their product is superior to natural language products which can handle virtually unlimited complexity, because their software, able to generate only a single SQL statement at a time, is more efficient. This is like recommending that you use a unicycle to power your enterprise's trucking needs, since it gets better gas mileage than an eighteen-wheeler.

Like most of their Evaluator's Guide, this is not only misleading and dishonest, it's also wrong. Modern relational database systems, including Microsoft's Jet Engine, have sophisticated Query Execution Planners which optimize a sequence of interrelated SQL statements prior to executing them against the database. There is no performance cost to splitting queries into comprehensible units, and there is of course an enormous benefit to being able to view and verify the pieces independently. For example, using the Edit button on the Worksheet window, any single SQL component of the preceding example can be popped into a normal QBE grid, where the fields and conditions can be modified separately.

The implication that our software can't generate sub-selects when appropriate is also false; in fact, the Worksheet window features an Edit Subquery button that allows you to pop just the sub-select part of a query into a QBE grid. We might as well point out in reply that it's not just complex queries that English Wizard has trouble with. When you ask a question like "How many countries are our employees in?" and you receive an answer equal to the number of employees, English Wizard precedes this injury with an insult: "Warning: Due to a limitation of Microsoft Access, the count displayed may include duplicates." Believe your own eyes: Linguistic Technology is indeed claiming that one of the world's most popular database systems has a "limitation" that prevents it from counting up the number of different items in a column. Nowhere do they admit that it might possibly be a limitation of their own system that they can't simply list the distinct items in one SQL statement, and count them up in the next. Linguistic Technology claims to have "completely reversed the paradigm by putting the responsibility on the software to understand what the user wants". "Responsibility" means honestly admitting weaknesses in your software, not making yourself ridiculous by blaming Microsoft.

Worksheet mode has proved to be enormously popular in our beta test trials, probably because of all the other features which are not related to SQL verification. We now offer a split-screen view of both the datasheet and graph at the same time, your choice of either Microsoft Graph or the popular Pinnacle-BPS graph control. The sliding radio button at the top controls the dimensions of the two panes; either slide it with the mouse, or click to its left or right, to change the setting. In version 2.0. you can also right-click a tree display to split the grid half into both Grid and Tree halves, so you can actually have Grid, Tree and Graph displayed simultaneously.

Since the the Microsoft Graph 5.0 can't be double-clicked to edit its properties, we've added additional sort and filter properties to help you get exactly the image you're looking for, including titles and footnotes. The size and position of the graph image elements can be changed by dragging. Best of all for demos, using the Control key and the mouse (or using the "trackball" graphic) you can rotate the 3D Microsoft graphs in any direction.

Beginning in version 2.0, we've added the ability to further customize the Microsoft graphs by Shift-clicking on any label or legend, and entering text and/or format (Font/Size) information. You can now also get stand-alone Microsoft-style graphs in their own window, by choosing MS Graph style from Responses, then clicking Graph on the query window.

Version 2.0 also adds the ability to export Microsoft Graphs to other applications, by clicking on the graph image and pressing Ctrl-C. This copies the image to the Clipboard, and it can then be pasted into other applications such as Word or Excel with the Paste or Paste Special commands.

The Pinnacle-BPS graph also features attractive options for displaying sets with more than one numeric field in full 3-D (Z-clustered), especially useful for the results of crosstab queries. (Crosstabs are generated using the COMPARE or PIVOT keywords; eg. Compare the sum of subtotals showing the customer name across and month of order date down during 1994). In addition, the More Options selection gives you two additional customization features. One is the ability to graph only selected fields. To use this option, highlight a numeric column by clicking its column header. This adds it to the Selected list. You can add as many numeric columns as you like, then click Graph Selected Columns. In fact, you can combine numeric and text (or numeric key) entries into the Selected list, to explicitly determine both the values and labels for a given graph.

The other new option answers the question we frequently see posted to message boards, how do I get a list of the unique entries in a column? Simply select the column with duplicates (using the same highlight method), then click the Remove Dupes toggle button. The SQL will be rewritten to show only the first value of each given item for that column. Unclick the toggle button to recall the original, unfiltered set.

Let's look at a couple of examples of these powerful new features in action. One of the shortcomings of the current release is that we're still unable to handle multiple aggregates in a single query, such as "Show the total quantity and average discount for seafood products."

(* Note: as of version 2.0, we can now handle this query too, but we're leaving the example in since it illustrates the point about customizability! *)

But using these new features, we can adapt another query to our needs very quickly. With Worksheet mode selected, enter
Show the quantity and discount for each seafood product
Now click the Add button. By default, a new query is opened with the * representing all fields of the previous query (in this case elfQ1). Since we want to create an aggregate query, we'll need to delete the * column and add in all 3 columns from elfQ1, Quantity, Discount and ProductName. The SQL should now read: SELECT Quantity, Discount, ProductName FROM elfQ1; Now add the aggregate operators, to get: SELECT Avg(Discount) AS AvgDiscount, Sum(Quantity) AS SumQuantity, ProductName FROM elfQ1 GROUP BY ProductName; Now since Discount is a percentage, we'll need to multiply it by 100 to get an accurate graph. In fact, when compared to the Sum of the Quantities in the Northwind database, Discounts are rather small values, making for an unbalanced graph. So we can multiply it by another 100 just for the visual effect. So now the SQL should read: SELECT Avg(10000*Discount) AS AvgDiscount, Sum(Quantity) AS SumQuantity, ProductName FROM elfQ1 GROUP BY ProductName; Click Save to save this as elfQ2 (the default name).

Switch into Split View mode and voila, a graph of both the sum of quantities for each seafood product, and the average discount -- corrected for visual balance. As data mining goes, we're not sure this example says anything profound about the Northwind database, but it does illustrate the use of these customization features. By clicking on the More Options label, you can try out the use of the Graph Selected Columns button to get views of the individual aggregates; for instance, by highlighting the SumQuantity field only and showing its graph. Double-click an item to remove it from the list; when the list is empty VB ELF reverts back to its own decisions about what fields to graph. (Note that VB ELF will not attempt to graph numeric values which are used as keys to connect tables, also known as join keys.)

Notice that the "Drop" button is enabled only when you are the last query in any chain of worksheet queries.

While working with large recordsets in Worksheet mode, it can sometimes be useful to restrict the number of rows being processed until you're happy with the final result. The "Restrict rows to" drop-down box lets you choose (or type in) a value for the maximum number of records to display. This is especially useful when creating graphs, which can take some time to display. Once the graph accurately reflects the values you want for a representative sample, click "all" to view the whole set. Since this TOP N feature is individually selectable for each query in the chain, remember to reset this value to "all" for each query you've restricted.

The whole Worksheet is fully resizeable, and should give acceptable displays even on 640x480 resolution screens, although it is designed for use with 800x600 and higher. The remaining options are the Footer checkbox, which obviously hides or displays all the controls below the datasheet and graphs, allowing an expanded view of the information. The graphics controls will be discussed in the section devoted to the pop-up graph style of response. The drop-down box displaying the current query also lists the previous queries from this working session, and you can click one to recall it (or type in a new query), and then click Respond to run the query and return immediately to the Worksheet.

Incidentally, it is possible to have more than one graph active at a time, now that there are two styles of graph response, the Worksheet graph and the Form graph. To do this, first bring up the Form style graph, by selecting graph style and entering something like "show the number of employees in each city" (alternatively, selecting Datasheet and typing "graph the number of employees in each city"). Next click the Worksheet button on the toolbar. The Worksheet will come up in Datasheet mode, but you can switch to Graph or Split styles and make changes to the two graphs independently, for instance viewing one as a bar chart and one as a pie chart. You can also manage to have graphs of two different data sets displayed simultaneously; for instance, once you have a graph you're satisfied with in the Worksheet, you can open the Settings window, change to Graph style, pop open the SQL window, and enter (or cut-and-paste in) the SQL for another graph, then click Execute. Be aware that once you've done this, you've overwritten the SQL used to display the original (worksheet) graph so if you cause this graph to refresh (by making any changes to the Worksheet options, such as full/split view) the Worksheet graph will redraw from the same dataset as the pop-up graph.

Finally, there's that mysterious "extra" Close button at the top right. We've tested the software under Windows 95, Windows 98, Windows NT Server 4.0, and Windows NT Workstation 4.0. For some as yet undetermined reason, the Worksheet brings up an annoying error box when closed using the normal close methods -- under NT Workstation only. We hope to have this resolved in the near future, but we've provided this work-around Close button for NT Workstation users in the meantime.


Last Updated: November, 1999