{"id":250,"date":"2003-10-06T01:53:16","date_gmt":"2003-10-06T06:53:16","guid":{"rendered":"http:\/\/www.jasonmorrison.net\/content\/?p=250"},"modified":"2008-08-18T02:11:57","modified_gmt":"2008-08-18T07:11:57","slug":"the-pain-of-dialog-flat-file-vs-relational-databases","status":"publish","type":"post","link":"http:\/\/www.jasonmorrison.net\/content\/2003\/the-pain-of-dialog-flat-file-vs-relational-databases\/","title":{"rendered":"The pain of Dialog: Flat file vs. relational databases"},"content":{"rendered":"<p>After my first expose to the Dialog structured search system I wanted to put down some thoughts about relational databases.\u00a0 There may  very well be reasons why flat-file text databases are better for systems like Dialog or OhioLink, but I really don&#8217;t think they are the ones I&#8217;ve heard mentioned in  class.<\/p>\n<p>The first major point made in class was that in a flat file  database like those in dialog the creators could do something like this for a  record with multiple authors:<\/p>\n<p>TI = Title of this article<br \/>\nAU = Smith,  Bob B<br \/>\nAU = Jones, Joseph H<br \/>\nAU = Fakename, Robert  P<br \/>\netc&#8230;<\/p>\n<p>Whereas in a relational database the table would have to have  fields like this:<\/p>\n<p>Table Article<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\nTitle<br \/>\nAuthor1<br \/>\nAuthor2<br \/>\nAuthor3<br \/>\netc&#8230;<\/p>\n<p>Although  I have seen databases designed exactly as described, that  that design defeats the entire point of having a &#8220;relational&#8221;  database&#8211;relationships.\u00a0 A better design would be to break Articles and Authors  into two separate tables, since they are two separate entities, and because they  have a many-to-many relationship (any number of authors can write any number of  articles) a link table would be made as well:<\/p>\n<p>Table  Article<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\nArticle_id<br \/>\nArticle_title<\/p>\n<p>Table  Author<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\nAuthor_id<br \/>\nAuthor_name<\/p>\n<p>Table  Article_Author<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\nArticle_author_id<br \/>\nArticle_id<br \/>\nAuthor_id<\/p>\n<p>This  is a better approach than the flat file database as well, because it means an  author only needs to be entered once, and that the author record only exists in  one place.\u00a0 If a user is typing up hundreds of citations a day, it is likely  they will misspell an author name once and a while&#8211;with the relational  database, they would be picking from a dropdown or using some other method to  select the author record that already exists.\u00a0 Also, it allows for changes to be  made easily.\u00a0 Imagine if a prolific author has adopted a stage name and gained  notoriety&#8211;now the author record can be changed only once and the changes will  be reflect every time an article record&#8211;joined to the author table&#8211;is called  up.<\/p>\n<p>But what if some users will still search for the author&#8217;s old  name?\u00a0 There are a number of approaches the database designer could take, for  example creating a new Author_aliases table that links to the correct record in  the Author table, etc.\u00a0 Also, the tables above are highly simplified.\u00a0 It is  doubtful the author table would have a field for name&#8211;most likely it would have  fields for first, middle, and last name and any other pertinent information as  well.\u00a0 That, and proper construction of the interface, would eliminate such  silliness as having to type Lastname, First in one place and Lastname First  Initial in others.<\/p>\n<p>There are a number of good tutorials on this subject  online, for example:<br \/>\n<a href=\"http:\/\/www.phpbuilder.com\/columns\/barry20000731.php3?page=1\">http:\/\/www.phpbuilder.com\/columns\/barry20000731.php3?page=1<\/a><br \/>\n<a href=\"http:\/\/www.serverwatch.com\/tutorials\/article.php\/1549781\">http:\/\/www.serverwatch.com\/tutorials\/article.php\/1549781<\/a><br \/>\n<a href=\"http:\/\/builder.com.com\/5100-6388-1050841.html\">http:\/\/builder.com.com\/5100-6388-1050841.html<\/a><\/p>\n<p>The  second issue brought up in class was the need for unlimited field size.\u00a0 I  have also seen relational databases where the designer only allowed 5 characters  for a field that after a year really needed 10, but again this is poor design.\u00a0  Relational databases, at least for the last ten years or so, have been able to  handle more or less unlimited field sizes.\u00a0 MySQL, which is available for free,  is a good example.\u00a0 (<a href=\"http:\/\/www.mysql.com\/documentation\/mysql\/bychapter\/manual_Reference.html#Column_types\">http:\/\/www.mysql.com\/documentation\/mysql\/bychapter\/manual_Reference.html#Column_types<\/a>)  For numerical data, the bigint column type has a range of  -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 or 0 to  18,446,744,073,709,551,615 unsigned, and the varchar type supports up to 255  characters.\u00a0 If you need more room than that, the longtext type supports up to  4,294,967,295 characters.\u00a0 War and Peace in ASCII from the Gutenberg project, is  just over 3 million characters.\u00a0 How often do you need to store more than 1,000  copies of war and peace in one column of one record?\u00a0 Expensive commercial  databases like Oracle no doubt have even more impressive figures.<\/p>\n<p>There  are reasons why most of the largest web sites with the most traffic and largest  amount of information use relational database backends, and not CSV files or  even XML for storage and retrieval.\u00a0 XML is great at moving, exchanging, and  marking up information for display.\u00a0 But from what I&#8217;ve read most people seem to  agree it&#8217;s not great for storage of anything of real size, or anything that  needs to be accessed very often and very quickly.<\/p>\n<p>The more I use Dialog  the less impressed I am by it.\u00a0 It strikes me very much as a tool that was  amazing in its day, but severely limited by available hardware.\u00a0 Now that  hardware is ridiculously fast and cheap, its limitations are purely artificial  and indistinguishable from clunky design.\u00a0 I know many people who swear by  command-line interfaces, and I know there are studies showing CLI to be more  &#8220;efficient&#8221; for the most expert of users, but there has to be a reason why 99%  of the world uses Windows or MacOS (or Gnome or KDE even if they run Linux).\u00a0 If  it takes a year for most users to reach expert status and reap the efficiency  benefits, but users can master a 20 percent less efficient GUI in a week, which  is better?\u00a0 And I say that from the point of view of someone who used DOS for  years and is comfortable coding in notepad.\u00a0 And it&#8217;s not as if creating a GUI  means you have to abandon the CLI completely&#8211;both can happily coexist.<\/p>\n<p>There are some major structural problems.\u00a0 The fact that Author name  entry isn&#8217;t standard across Dialog is nonsensical.\u00a0 I understand where some  fields in chemistry databases will differ from fields in business databases, but  nearly everything will have an author, and all that do should conform to a  standard.\u00a0 The advantages of controlled vocabulary dwindle when nothing is well  controlled.\u00a0 Descriptors differ from one database to another, may or may not be  updated, etc.\u00a0 A well-designed relational database would help to eliminate these  sorts of problems.<\/p>\n<p>It would not be hard to make a system  like Dialog with a relational database.\u00a0 Give a decent programmer or dba  complete access to Dialog&#8217;s data and a year full-time, and I bet they could come  up with something.\u00a0 The biggest problem would be trying to reconcile all the  weirdness of the individual databases, like truncating hyphenated names and  such.\u00a0\u00a0 Designing the  tables and fields in MySQL for ERIC and a couple others would be a fun little project that would take less than a week.<\/p>\n<p>I wonder &#8211; has there been  any effort to bring Dialog into the current decade, or even the 1990s?\u00a0 How many  people still actually use it, with so many library and journal catalogs going  online?\u00a0 I know Medline is available elsewhere.\u00a0 I asked a friend of mine  majoring in LS at Pittsburgh and she said one professor showed it to them in one  class, but no one ever actually used it.\u00a0 I understand the difference between being able to search fields vs the web, controlled  vocab, etc., but surely there&#8217;s a less aggravating system out there that  includes these features?<\/p>\n<p>I mean, just the whole bluesheet thing&#8230;\u00a0  searching with the Find on this Page feature of your browser?\u00a0 You should never  rely on your visitors to have a specific browser feature, and search boxes  aren&#8217;t too hard to do.\u00a0 The Dialog Database Catalog is a series of randomly  chopped up PDFs?\u00a0 And none of this is integrated into any of their  telnet-workalike interfaces?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After my first expose to the Dialog structured search system I wanted to put down some thoughts about relational databases.\u00a0 There may very well be reasons why flat-file text databases are better for systems like Dialog or OhioLink, but I really don&#8217;t think they are the ones I&#8217;ve heard mentioned in class. The first major [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[19],"tags":[170,414,32,412,35,413,415,244,36,417,418,177,416],"class_list":["post-250","post","type-post","status-publish","format-standard","hentry","category-blog","tag-catalogs","tag-cli","tag-controlled-vocabulary","tag-dialog","tag-flat-file-databases","tag-gui","tag-library-science","tag-mysql","tag-relational-databases","tag-search","tag-structured-search","tag-user-interface-design","tag-xml"],"aioseo_notices":[],"_links":{"self":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/comments?post=250"}],"version-history":[{"count":1,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/250\/revisions"}],"predecessor-version":[{"id":251,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/250\/revisions\/251"}],"wp:attachment":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/media?parent=250"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/categories?post=250"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/tags?post=250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}