In general, all query styles are requests for information of one form or another. They include
asking direct questions, giving commands to list or display certain data, naming or describing the
data set you're interested in, and using key phrases to trigger special formatting or graphing
features. Syntax which is NOT supported by the program includes any statement designed to change the
data in your tables, or to change the structure of a table within the database.
VB ELF does not directly support changing data by using English statements because we
believe that -- while English query tools are coming into their own as useful, even essential
data access instruments -- they are not yet reliable enough to trust with the task of mass
changes to your database. On the other hand, we don't block its use as an update tool, for
instance by enforcing a "snapshot" property on the datasets returned. In some cases, it might
be useful to combine the reporting and updating features, for example by listing the records
satisfying some condition (employees in Washington) then updating some field (eg. changing
[Reports To] from 2 to 3, so that everyone who reported to Fuller now reports to Leverling).
Although VB ELF does not deliberately prohibit this, it often generates recordsets that
aren't updateable, due to use of the DISTINCT keyword. While this guarantees that there won't
be any redundant information reported, it also prevents updates. You can override this by
switching a query into a design view (for instance by opening the SQL window, or using the
Edit button or SQL panel on the Worksheet), and deleting the DISTINCT keyword from the SQL.
If you want to specify that the responses should NOT permit changes to the data, we have
provided a "Read-Only" drop-down option from the Responses Tab of the Settings window. However,
this works only for OLE DB data and does not appear to be currently supported by many of the
(still quite-young) OLE DB Providers. In fact, so far it appears to be supported only by the
OLE DB Providers for Jet.
This lister uses a very simple word-counting method to try to find examples that are "like"
the query you've typed in. It's not an exact science; for instance here the queries about
"Hemingway" and including "against" probably result from the "in" part of "employees in Washington".
The real purpose is simply to expose you to some of the possible constructions that you might
use, without overwhelming you by "dumping" all of our samples. Of course, if you want to view
a more complete list, you can click the Assorted Samples button, which shows the
example list without applying a filter.
Editing a Query (fixing VB ELF's mistakes)
Before we talk about the results of a query, namely the response, let's go over what you can
do if the translation from English to SQL doesn't generate the results you expect. Although
it's quite hard to write SQL from scratch, it's often much easier to note and correct flaws
in SQL that's generated for you. If you are posting these query translations to a large
database, it may be prudent to do a "sanity-check" on the SQL, a quick scan that checks whether
the code seems to include all the right elements in the right place.
We might type in, for example, Show the dates for workers in Washington,
intending to view information about their dates of employment. Either by reviewing the SQL or
by looking at the columns returned, we see that VB ELF has interpreted "date" to mean
"Birth Date" rather than "Hire Date". By clicking on the Edit Query button, then selecting
the "What's Wrong" drop-down box, we can get a look at some of the basic reasoning VB ELF
applied to this translation. First, "worker=>Employee" -- this seems correct. "Employee=>Employee ID"
-- well, OK. 'WA' as data -- sure! [Birth Date] as field -- um, no.
There are actually two ways to fix this mistake, aside from the obvious one of changing the
query to read "Hire Date" (for instance, you may not immediately know the name of the field
that provides hiring date information). Simply select [Birth Date] from the "What's Wrong"
drop-down and click Try again -- ELF will retranslate the query using every option it can
think of without interpreting "date" as [Birth Date]. In this case, the next best
translation gives you what you actually want, the [Hire Date]. Another way to accomplish the
same thing would be to click the Employees table in the "What's Missing" list, select
[Hire Date] and click the [Add to "missing" list] button. In some cases, it may help to
combine these two approaches.
In other cases the Edit Query window won't help much, if at all. (We hope that
this is just a "patch" until the basic system recognizes most queries accurately, and
until our customization wizards are in place to help you tailor interfaces to eliminate these
mis-translations.) In these cases, you still have two choices before you revert back to the
"roll-up-your-sleeves-and-write-SQL" alternative. If the error is straight-forward enough, you
may be able to patch the SQL by writing the changes directly into the SQL window, then clicking
Execute. In fact, you could do that here, by typing the word "Hire" over "Birth". But sometimes
there's no simple, obvious change to get from where you are to where you want to be.
Let's say we've typed "Reports" instead of "Reports To" in the example above:
show the Reports field for employees in Washington
If we run this query, we'll see all kinds of information unrelated to the question.
The first records starts out: "Soft drinks, coffees, teas, beer, and ale". Checking the SQL,
we see references to Categories.Description and Categories.[Category Name]. What's going on?
In this case, VB ELF has changed "Report" to "Description", and "field" to "Province" then to
"Region". In fact, both of these are pretty reasonable, given that the dictionary definition of
"REPORT" is "history;statement;account;DESCRIPTION" etc. "FIELD" is defined as
"world;area;department;subject;territory;PROVINCE", and "PROVINCE" is "world;AREA" etc. while
one of the fields that Washington comes from, "Region", also has a very similar definition,
"AREA;neighborhood" etc.
While this is some comfort to us as designers of the system, it doesn't help you -- the user
-- get the information you're after. And there are just too many errors in the "What's Wrong"
list; checking off any combination doesn't seem to help.
Still, you can use the information in the "What's Wrong" listing of possible candidates to
help you modify the original query. It's not hard to see that "Report" is being interpreted in
a different way than you meant it, in this case. Now you can be more specific, typing "Reports
To", instead of just "Reports". If you're administering a natural language interface for others,
you can use this kind of error to help suggest additions to your list of Phrase substitutions,
or possibilities for the verbs included via the Verb Mapper. In this case, adding
When I Type: report
I Really Mean: report to does the trick.
Response Styles
The style of response is selectable from the Query window by clicking on Datasheet, Form,
Worksheet or Graph. If you choose Datasheet (from the Query window), you're actually selecting
either Datasheet or "Best Guess", whichever was picked last from the Responses
tab of the Selectable Settings window. ("Best Guess" works exactly like Datasheet unless your
query contains a reference to an OLE object, such as a picture -- in that case, it displays
the answer in a form.) In VB ELF, there's only one style of form response, currently; it
shows each row of the dataset by itself, rather than in a grid view, like the datasheet.
The Worksheet style is the newest Response style; it also
relates to the topic of verifying and correcting errant query translations.
While the basic "scan the SQL" method works for simpler types of queries, VB ELF has the
ability to generate extremely complex sequences of SQL, to answer considerably complicated
questions. In fact, VB ELF can generate up to seven sequential SQL statements in response
to a single question. While we can't come up with any examples offhand that require the full
seven, here's an example that takes six:
Show the percentage of customers that have ordered every Produce product.
To do this, VB ELF creates a list of the customers that have purchased products from the
Product category, then counts the number of distinct Produce products each customer has ordered.
It weeds out those that have purchased fewer than the total number of products in the Produce
category, then returns a count of the remaining records, as a percentage of the total number of
customers.
In this case, VB ELF performs flawlessly, but you can immediately see that along with the
capability for such responses, VB ELF has the potential to create complex gibberish. In
fact, if only a single symbol in the entire 42-line SQL sequence is out of place, the answer
could be meaningless or incorrect. This is why it's important that the SQL generated be
verifiable in a step-by-step manner. To this end we've created the Worksheet response style,
which is sort of a QBE grid on steroids, for dealing with the multiple SQL statements that
VB ELF can come up with.
The Worksheet shows you the intermediate datasets that chain together to form the eventual
answer, as well as the individual statements of the SQL chain. An Explanation box shows a precis
of the approximate logic that VB ELF is using to cobble together the answer to your query.
For this example, the Q1 explanation is not too illuminating -- this is just the source
for the eventual count of all customers, for the denominator of the percent. (Note that
VB ELF interprets percentage of customers to mean "percentage of customers who have placed
at least one order", which is why it doesn't simply use the Customer table for this calculation.)
Starting with the second SQL statement, Q2, the Explanation is more interesting:
Query assignment, show records associated with each and every one of a filtered set.
Step 1: Select records meeting the filter conditions.
The "records associated" are the customers, and the "filtered set" is the set of products
filtered by belonging to the Produce category.
The Q3 explanation reads:
Collect "associates" (associated with at least one of the Step 1 records).
In this step the customers that have ordered Produce are listed along with the Produce item.
The Q4 explanation reads:
Filter associates, retaining only those associated with EVERY record in Step 1; this is done by
making sure that the count of distinct records for the associate is equal to the total count of
distinct (unique) records in Step 1.
The corresponding SQL uses a clause:
HAVING count ( [elfQ3].ProductName ) = elfUCount ( "ProductName" , "[elfQ2]" );
to effect this.
The Q5 explanation reads:
Show a complete answer including the fields used to filter the original records, the associate records,
and any key information to help identify the reasons for the associations.
Actually, this is kind of wasteful, since in the case of a percentage query the only important
data is the final, single number. But sometimes VB ELF can't determine what is strictly required
until it reaches the end of its computation.
The Q6 explanation reads:
Display the ratio of the selected set (the previous query) to the full set (the first query) as a percentage.
(Note that the ratios are calculated from the count of distinct entries in the selected field, ignoring duplicates.)
Other natural language systems sometimes use what's called an "echo" approach to verifying the
correctness of generated SQL. For instance, English Wizard will generate a restatement in a kind
of pidgin English, targeting something between the accuracy of SQL and the clarity
of English. There are several problems with this approach. First, if you try to be as specific in
the Echo as in the SQL itself, you may confuse less technical users -- the very user this
feature is designed for. If you implement this feature inexactly -- as of course Linguistic
Technology has done -- you may also wind up obscuring rather than clarifying: English Wizard
often generates a Restatement that has no discernable relation to either the SQL or the English
input, leaving the user completely baffled. Finally, even an accurate restatement of a complicated
question doesn't guarantee that the SQL generated is correct.
English Wizard skirts this problem since it's unable to generate complex queries to begin with.
Their "Evaluator's Guide to Natural Language Software" of course tries to portray this as a benefit:
"Generating a single SQL statement with a sub-select may be executed faster by the DBMS than
executing multiple SQL statements because of optimizations the DBMS can make." They are claiming that
their product is superior to natural language products which can handle virtually unlimited
complexity, because their software, able to generate only a single SQL statement at a time, is more
efficient. This is like recommending that you use a unicycle to power your
enterprise's trucking needs, since it gets better gas mileage than an eighteen-wheeler.
Like most of their Evaluator's Guide, this is not only misleading and dishonest, it's also wrong.
Modern relational database systems, including Microsoft's Jet Engine, have sophisticated Query Execution
Planners which optimize a sequence of interrelated SQL statements prior to executing them against
the database. There is no performance cost to splitting queries into comprehensible units, and
there is of course an enormous benefit to being able to view and verify the pieces independently.
For example, using the Edit button on the Worksheet window, any single SQL component of the
preceding example can be popped into a normal QBE grid, where the fields and conditions can be
modified separately.
The implication that our software can't generate sub-selects when appropriate is also false;
in fact, the Worksheet window features an Edit Subquery button that allows you to pop just
the sub-select part of a query into a QBE grid. We might as well point out in reply that
it's not just complex queries that English Wizard has trouble with. When you ask a question like
"How many countries are our employees in?" and you receive an answer equal to the number of
employees, English Wizard precedes this injury with an insult: "Warning: Due to a limitation of
Microsoft Access, the count displayed may include duplicates." Believe your own eyes: Linguistic
Technology is indeed claiming that one of the world's most popular database systems has a
"limitation" that prevents it from counting up the number of different items in a column.
Nowhere do they admit that it might possibly be a limitation of their own system that they
can't simply list the distinct items in one SQL statement, and count them up in the next.
Linguistic Technology claims to have "completely reversed the paradigm by putting the
responsibility on the software to understand what the user wants". "Responsibility" means
honestly admitting weaknesses in your software, not making yourself ridiculous by blaming
Microsoft.
Worksheet mode has proved to be enormously popular in our beta test trials, probably because
of all the other features which are not related to SQL verification. We now offer a split-screen
view of both the datasheet and graph at the same time, your choice of either Microsoft Graph or
the popular Pinnacle-BPS graph control. The sliding radio button at the top controls the dimensions
of the two panes; either slide it with the mouse, or click to its left or right, to change the
setting. In version 2.0. you can also right-click a tree display to split the grid half into both
Grid and Tree halves, so you can actually have Grid, Tree and Graph displayed simultaneously.
Since the the Microsoft Graph 5.0 can't be double-clicked to edit its properties, we've added
additional sort and filter properties to help you get exactly the image you're looking for,
including titles and footnotes. The size and position of the graph image elements can
be changed by dragging. Best of all for demos, using the Control key and the mouse (or using
the "trackball" graphic) you can rotate the 3D Microsoft graphs in any direction.
Beginning in version 2.0, we've added the ability to further customize the Microsoft graphs
by Shift-clicking on any label or legend, and entering text and/or format (Font/Size) information.
You can now also get stand-alone Microsoft-style graphs in their own window, by choosing MS Graph
style from Responses, then clicking Graph on the query window.
Version 2.0 also adds the ability to export Microsoft Graphs to other applications, by clicking
on the graph image and pressing Ctrl-C. This copies the image to the Clipboard, and it can then
be pasted into other applications such as Word or Excel with the Paste or Paste Special commands.
The Pinnacle-BPS graph also features attractive options for displaying sets with more than one numeric field in full 3-D (Z-clustered), especially useful for the results of crosstab queries. (Crosstabs are generated using the COMPARE or PIVOT keywords; eg.
Compare the sum of subtotals showing the customer name across and month of order date down during 1994).
In addition, the More Options selection gives you two additional customization features. One is the
ability to graph only selected fields. To use this option, highlight a numeric column by clicking
its column header. This adds it to the Selected list. You can add as many numeric columns as you
like, then click Graph Selected Columns. In fact, you can combine numeric and text (or numeric key)
entries into the Selected list, to explicitly determine both the values and labels for a given graph.
The other new option answers the question we frequently see
posted to message boards, how do I get a list of the unique entries in a column? Simply select
the column with duplicates (using the same highlight method), then click the Remove Dupes toggle
button. The SQL will be rewritten to show only the first value of each given item for that column.
Unclick the toggle button to recall the original, unfiltered set.
Let's look at a couple of examples of these powerful new features in action. One of the shortcomings
of the current release is that we're still unable to handle multiple aggregates in a single query,
such as "Show the total quantity and average discount for seafood products."
(* Note: as of version 2.0, we can now handle this query too, but we're leaving the example in since
it illustrates the point about customizability! *)
But using these new
features, we can adapt another query to our needs very quickly. With Worksheet mode selected, enter
Show the quantity and discount for each seafood product
Now click the Add button. By default, a new query is opened with the * representing all fields of
the previous query (in this case elfQ1). Since we want to create an aggregate query, we'll need to
delete the * column and add in all 3 columns from elfQ1, Quantity, Discount and ProductName.
The SQL should now read: SELECT Quantity, Discount, ProductName FROM elfQ1;
Now add the aggregate operators, to get:
SELECT Avg(Discount) AS AvgDiscount, Sum(Quantity) AS SumQuantity, ProductName FROM elfQ1 GROUP BY ProductName;
Now since Discount is a percentage, we'll need
to multiply it by 100 to get an accurate graph. In fact, when compared to the Sum of the Quantities in the Northwind database, Discounts are rather small values, making for an unbalanced graph. So we can multiply it by another 100 just for the visual effect. So now the SQL should read:
SELECT Avg(10000*Discount) AS AvgDiscount, Sum(Quantity) AS SumQuantity, ProductName FROM elfQ1 GROUP BY ProductName;
Click Save to save this as elfQ2 (the default name).
Switch into Split View mode and voila, a graph of both the sum of quantities for each seafood
product, and the average discount -- corrected for visual balance. As data mining goes, we're not
sure this example says anything profound about the Northwind database, but it does illustrate the
use of these customization features. By clicking on the More Options label, you can try out
the use of the Graph Selected Columns button to get views of the individual aggregates; for
instance, by highlighting the SumQuantity field only and showing its graph. Double-click an
item to remove it from the list; when the list is empty VB ELF reverts back to its own
decisions about what fields to graph. (Note that VB ELF will not attempt to graph numeric
values which are used as keys to connect tables, also known as join keys.)
Notice that the "Drop" button is enabled only when you are the last query in any
chain of worksheet queries.
While working with large recordsets in Worksheet mode, it can sometimes be useful
to restrict the number of rows being processed until you're happy with the final result.
The "Restrict rows to" drop-down box lets you choose (or type in) a value for the maximum
number of records to display. This is especially useful when creating graphs, which can take
some time to display. Once the graph accurately reflects the values you want for a
representative sample, click "all" to view the whole set. Since this TOP N feature is
individually selectable for each query in the chain, remember to reset this value to "all" for
each query you've restricted.
The whole Worksheet is fully resizeable, and should give acceptable displays even on 640x480
resolution screens, although it is designed for use with 800x600 and higher. The remaining options
are the Footer checkbox, which obviously hides or displays all the controls below the datasheet and
graphs, allowing an expanded view of the information. The graphics controls will be discussed in
the section devoted to the pop-up graph style of response. The drop-down box displaying the
current query also lists the previous queries from this working session, and you can click one
to recall it (or type in a new query), and then click Respond to run the query and return
immediately to the Worksheet.
Incidentally, it is possible to have more than one graph active at a time, now that there
are two styles of graph response, the Worksheet graph and the Form graph. To do this, first bring
up the Form style graph, by selecting graph style and entering something like "show the number of
employees in each city" (alternatively, selecting Datasheet and typing "graph the number of
employees in each city"). Next click the Worksheet button on the toolbar. The Worksheet will
come up in Datasheet mode, but you can switch to Graph or Split styles and make changes to the
two graphs independently, for instance viewing one as a bar chart and one as a pie chart. You can also
manage to have graphs of two different data sets displayed simultaneously; for instance, once you have
a graph you're satisfied with in the Worksheet, you can open the Settings window, change to Graph
style, pop open the SQL window, and enter (or cut-and-paste in) the SQL for another graph, then
click Execute. Be aware that once you've done this, you've overwritten the SQL used to display the
original (worksheet) graph so if you cause this graph to refresh (by making any changes to the
Worksheet options, such as full/split view) the Worksheet graph will redraw from the same dataset
as the pop-up graph.
Finally, there's that mysterious
"extra" Close button at the top right. We've tested the software under Windows 95, Windows 98,
Windows NT Server 4.0, and Windows NT Workstation 4.0. For some as yet undetermined reason, the
Worksheet brings up an annoying error box when closed using the normal close methods -- under NT
Workstation only. We hope to have this resolved in the near future, but we've provided this
work-around Close button for NT Workstation users in the meantime.
Last Updated: November, 1999