Hi,

I am Mike and run a small law student site,
recently I was donated a dictionary and have a glossary system installed on my forum.

I am pretty bad with scripting but hope someone can help me along,

What I want is to put the dictionary written like this,

A MENSA ET THORO, from bed and board. A divorce a mensa et thoro, is rather a separation of the parties by act of law, than a dissolution of the marriage. It may be granted for the causes of extreme cruelty or desertion of the wife by the hushand. 2 Eccl. Rep. 208.

This kind of divorce does not affect the legitimacy of children, nor authorize a second marriage. V. A vinculo matrimonii; Cruelty Divorce.

A PRENDRE, French, to take, to seize, in contracts, as profits a prendre. Ham. N. P. 184; or a right to take something out of the soil. 5 Ad. & Ell. 764; 1 N. & P. 172 it differs from a right of way,

which is simply an easement or interest which confers no interest in the land. 5 B. & C. 221.

A QUO, A Latin phrases which signifies from which; example, in the computation of time, the day a quo is not to be counted, but the day ad quem is always included. 13 Toull. n. 52 ; 2 Duv. n. 22. A court a quo, the court from which an appeal has been taken; a judge a quo is a judge of a court below. 6 Mart. Lo. R. 520; 1 Har. Cond. L. R. 501. See Ad quem.

A RENDRE, French, to render, to yield, contracts. Profits a rendre; under this term are comprehended rents and services. Ham N. P. 192.

A VINCULO MATRIMONII, from the bond of marriage. A marriage may be dissolved a vinculo, in many states, as in Pennsylvania, on the ground of canonical disabilities before marriage, as that one of the parties was legally married to a person who was then living; impotence, (q. v.,) and the like adultery cruelty and malicious desertion for two years or more. In New York a sentence of imprisonment for life is also a ground for a divorce a vinculo. When the marriage is dissolved a vinculo, the parties may marry again but when the cause is adultery, the guilty party cannot marry his or her paramour.

AB INITIO, from the beginning.

several thousand long,

I want to script this so that the word (in capitols) is placed into one column and the definition in an another column,

I can then place this in to my database, by import..

Can someone help me out?

I'm not sure what you mean by in a column... in a text file?

I'm not sure what you mean by in a column... in a text file?

Sorry yes, in a text file.
Then I can export my database to excel and cut and paste the two columns into there, and import it back into phpmyadmin, a total noob way of doing things I am sure!


Mike

Ah, ok... so in the new text file, you want it Tab delimited... so the upper case words <tab> definition. K.

Ah, ok... so in the new text file, you want it Tab delimited... so the upper case words <tab> definition. K.

Yeah I guess so, I will then just cut and pase it into excel in 2 columns,

M

Sorry yes, in a text file.
Then I can export my database to excel and cut and paste the two columns into there, and import it back into phpmyadmin, a total noob way of doing things I am sure!


Mike

I think it would be easier to write the shell script to generate SQL output directly so you can 'source' that SQL file right into your database via phpmysqladmin.

Are all of the CAPITALS terminated with a comma? If so, the task is easier. If many or most are, manually edit the text file to make it 'conformant'.

Give me a little time; I'll work up something using awk() and sed().

N

Hi,

I send you a PM with the source and database structure..

Try the following two files, an awk script and a sh script. Put them both in the same working directory.

  1. Put your text into lawdict.txt and ensure it conforms to the one requirement: that all of the ALL CAPS TITLES end with a comma (,).
  2. Execute the command sh lawdict.sh .
  3. Edit lawdict.sql and change the last trailing comma to a semi-colon (;).

You now have an example of a shell-script-generated SQL command file.

It ain't perfect, and someone with a fresher brain will probably simplify this and make it about perfect. But the result (the example at the very end) is close to what you need.

lawdict.awk:

BEGIN { lineout = ""; printf("INSERT INTO law_dict_table VALUES\n");}
{
  if ($0 ~ /^[A-Z ]*,/) {
    if (substr(lineout, length(lineout)-6) == "</p><p>") {
      lineout = substr(lineout, 1, length(lineout)-7);
    }
    printf("('%s'),\n", lineout);
    lineout = $0;
  } else {
    if ($0 ~ /^$/) {
      lineout = lineout "</p><p>";
    } else {
      lineout = lineout $0;
    }
  }
}
END {
    printf("VALUES ('%s'),\n", lineout);
}

lawdict.sh:

#! /bin/sh

(
  sed -e "s/'/\'/g" -e 's/\\/\\\\/g' lawdict.txt \
    | awk -f lawdict.awk  \
    | sed -e 's= </p>=</p>=g' -e "s/^\(('[A-Z ]*\),/\1','/" 
) > lawdict.sql

Example lawdict.sql:

INSERT INTO law_dict_table VALUES
(''),
('A MENSA ET THORO',' from bed and board. A divorce a mensa et thoro, is rather a separation of the parties by act of law, than a dissolution of the marriage. It may be granted for the causes of extreme cruelty or desertion of the wife by the hushand. 2 Eccl. Rep. 208.</p><p>This kind of divorce does not affect the legitimacy of children, nor authorize a second marriage. V. A vinculo matrimonii; Cruelty Divorce.'),
('A PRENDRE',' French, to take, to seize, in contracts, as profits a prendre. Ham. N. P. 184; or a right to take something out of the soil. 5 Ad. & Ell. 764; 1 N. & P. 172 it differs from a right of way,</p><p>which is simply an easement or interest which confers no interest in the land. 5 B. & C. 221.'),
('A QUO',' A Latin phrases which signifies from which; example, in the computation of time, the day a quo is not to be counted, but the day ad quem is always included. 13 Toull. n. 52 ; 2 Duv. n. 22. A court a quo, the court from which an appeal has been taken; a judge a quo is a judge of a court below. 6 Mart. Lo. R. 520; 1 Har. Cond. L. R. 501. See Ad quem.'),
('A RENDRE',' French, to render, to yield, contracts. Profits a rendre; under this term are comprehended rents and services. Ham N. P. 192.'),
('A VINCULO MATRIMONII',' from the bond of marriage. A marriage may be dissolved a vinculo, in many states, as in Pennsylvania, on the ground of canonical disabilities before marriage, as that one of the parties was legally married to a person who was then living; impotence, (q. v.,) and the like adultery cruelty and malicious desertion for two years or more. In New York a sentence of imprisonment for life is also a ground for a divorce a vinculo. When the marriage is dissolved a vinculo, the parties may marry again but when the cause is adultery, the guilty party cannot marry his or her paramour.'),
VALUES ('AB INITIO, from the beginning.');

Hi,

I send you a PM with the source and database structure..

Ah. The data are not exactly normalized. That is, each entry does not necessarily meet the 'ALL CAPS' requirement. They're generally ALL CAPS, but some are terminated with space, some with comma, some with period, some with something else.

The script I posted, with a minor change, will probably correctly handle the majority of the entries. But proof reading will still be necessary to ensure the task was done correctly. Also, in skimming a little bit of the data, I noticed a few spots of grammatical nonsense; the data should receive a proper proof-reading anyway.

Give me a few days; I'll see what I can hack together.

Your amazing! Thank you so much!
I will keep my eyes glued to this thread!

These two scripts use the HTML files from the web site, as they are better formatted than the text. The shell script fetches the .htm file if it hasn't already fetched it and filters the htm through awk.

I'm not sure why you want to have keywords separate from the title, or why you'd want to separate the title from the definition. It appears the title is almost always an integral part of the definition.

The awk script simply copies the bold-faced matter and puts it into the DB's lawTitle column. It also puts a copy of it into the lawKeywords column. It puts pert-near the whole thing in the lawContent column. It preserves the simple <P> and </P> tags so that your new display program can properly format the definitions.

Save the following two code blocks into files with their respective file names. Then sh lawdict.sh Et voila! You have a 4.5MB SQL command file just about ready for phpmyadmin.

Again, proofreading is necessary to ensure that the script really gets all the entries. There are also errors in the data sources that *ought* to be corrected. Perhaps you can have students report them as they find them, as part of their payment for using the database.

Good luck!

lawdict.sh

#! /bin/sh

for i in "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "y"; do

  file="bouvier_${i}.htm"
  if [ ! -f "$file" ]; then
    wget -O "$file" "http://www.constitution.org/bouv/$file"
  fi

  sed -e 's/\r/\n/g' "$file" | awk -f lawdict.awk

done > lawdict.sql

lawdict.awk

BEGIN {
  lineout = "";
  firstline = 1;
  while (getline > 0) {
    if (index("</CENTER>", $0) > 0) { break; }
  }
  getline;
}

index($0, "<PRE>") > 0 {
  while (getline > 0) {
    if (index($0, "</PRE>") > 0) { break; }
  }
}

{
  if ((index($0, "\t <P> <B>") > 0) || (index($0, "P ALIGN=") > 0)) {
    if (firstline == 1) {
      firstline = 0;
      sub(/^[ \t]+/, "");
      lineout = $0;
    } else {
      /* Get the CAPS */
      bold_beg = index(lineout, "<B>") + 3;
      bold_len = index(lineout, "</B>") - bold_beg;
      field1 = substr(lineout, bold_beg, bold_len);

      /* Prepare for MySQL */
      gsub("'", "\\'", field1);
      gsub("'", "\\'", lineout);

      /* Chuck it in */
      printf("\nINSERT INTO law_dict_table SET\n");
      printf("  lawTitle='%s',\n  lawKeywords='%s',\n  lawContent='%s');\n",
             field1, field1, lineout);

      /* EOF? */
      if (index($0, "P ALIGN=") > 0) {
        exit;
      }

      /* And get the current line ready */
      sub(/^[ \t]+/, "");
      lineout = $0;
    }

  } else if ($0 ~ /^$/) {
    /* Naught to do! */
  } else {
      sub(/^[ \t]*/, "");
      gsub("<HR>", "");
      lineout = lineout " " $0;
  }
}

Fantastic!

Trying it now

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.