Sql Comparing two tables: documents

Showing posts with label documents. Show all posts

Tuesday, March 20, 2012

Compression

Hi everyone,
I am developing a web app and wanting to store documents (doc, xls,
pdf, ppt...etc.) within the DB (MSSQL). Is it advisable in general (I
don't know exactly how big the docs are going to be) to compress the
files before putting them into the DB? How does this affect Full Text
Searching? On a wider note what's the attitude of people in terms of
storing documents in the DB versus on the file system. There seems to
be a lot of differing opinions. Any links to resources would be most
appreciated.
Thanks for your help in advance.
Jose
Jose,
First of all, can I assume that you are using SQL Server 2000? If so, could
you post the full output of: SELECT @.@.version -- as this is very help info
in understanding SQL FTS issues.
Secondly, what exactly do you mean by "compress[ing] the files before
putting them into the DB?" Do you mean to store them as ZIP files in columns
defined with the IMAGE datatype? or are you thinking about placing the FT
Catalog folder (SQL00060005) & files on a compressed drive. As for the
latter, I've not done any testing with compressed drive, but Hilary Cotter
has and from past postings on this subject, he's indicated that the
performance gains is mimumal. As for the former, you will need to purchase a
3rd Party IFilter for the WinZIP ZIP file format and then assuming you are
using SQL Server 2000, you can store the zip files in an IMAGE datatype and
define a "file extension" column that will have the value of "zip" and the
ZIP IFilter will be called by the MSSearch and the mssdmn (Search Filter
Daemon) processes to FT Index the contents of the zip file that contains MS
Word documents. You can purchase the ZIP IFilter from either of the
following two vendors:
http://www.4-share.com/content/products.htm
http://www.ifiltershop.com/zipfilter.html
I've personally have been using the 4-Share ZIP IFilter for over a year and
find it works very well for me.
As for "storing documents in the DB versus on the file system", this is very
much a open and very actively discussed subject, below is one past posting
I've made on this subject with links included...
"There are many issues related to storing the files on disk (faster access,
easy copy/move, etc.) vs. storing the files in the database (transaction
control, change control, audit of changes, etc) than just efficiency and
scalability, although, those are important points... There is a ASPFAQ
website that has a number of references
(http://www.aspfaq.com/show.asp?id=2149) and links to KB articles (I've not
checked them all), but one major reason for storing the files in a SQL
Server table's column defined with an IMAGE datatype, is that in SQL Server
2000 you can take advantage of the new Full-Text Search (FTS) feature to FT
Index the contents of supported file types, primarily MS Office, HTML and
3rd party IFilter's like Adobe's PDF files.
As this is one of those FAQ type questions that have been known to start
religious wars or at the very least a flame war, and while Sharon didn't ask
about other application issues that this question often generates as it is
often an open-ended question, i.e., one that never seems to be answered to
everyone's satisfactions as it usually "depends" upon the application or
upon how one defines the word "best"...
As for the Terra Server, checkout the "about page"
(http://terraserver-usa.com/about.aspx?n=AboutBody) on the site that
explains how MS and others did it and yes, they used the IMAGE datatype,
specifically
http://terraserver-usa.com/about.asp...utTechDbschema (The Imagery table
contains the "blob" [image] field where the imagery data is stored.
Jose, depending upon how many files you have, how frequently they change,
how large they are, what the app is, etc. you may want to review the Terra
Server web site and consider the following rule of thumb that they used:
< 1 million images or big images (> 1MB) put them in the file system.
> 1 million and < 1 MB images, put them in SQL Server.
Note, you can also use "Text-in-Row" if the files are >7000 bytes on avg.
For everything in between, either way will work, depending upon if you need
transactional control over your files. Additionally, if you store the files
in a TEXT or IMAGE column, you can also store related metadata about that
file in SQL Server as well for increased searching capabilities. Also, and
obviously with SQL Server you get built-in support for validating the
consistency of the database, indices, backup, restore, etc. As for loading
&/or extracting the files from SQL Server there are now many methods of
doing this via BCP, BULK INSERT as well as ADO Stream DTS too, if you need
to transform your files in some manner.
Hope this helps, as the primary consideration should always be what is best
for your application...
Regards,
John
"Jose Perez" <jlpv@.totalise.co.uk> wrote in message
news:3724a9d9.0404141212.480e1630@.posting.google.c om...
> Hi everyone,
> I am developing a web app and wanting to store documents (doc, xls,
> pdf, ppt...etc.) within the DB (MSSQL). Is it advisable in general (I
> don't know exactly how big the docs are going to be) to compress the
> files before putting them into the DB? How does this affect Full Text
> Searching? On a wider note what's the attitude of people in terms of
> storing documents in the DB versus on the file system. There seems to
> be a lot of differing opinions. Any links to resources would be most
> appreciated.
> Thanks for your help in advance.
> Jose
|||John,
Thanks a lot for the detailed reply which you gave - it is most
appreciated. In answer to your question, yes I am running SQL Server
2000 (Version Information - Microsoft SQL Server 2000 - 8.00.760
(Intel X86) Dec 17 2002 14:22:05 Copyright (c) 1988-2003 Microsoft
Corporation Standard Edition on Windows NT 5.0 (Build 2195: Service
Pack 4)).
To clarify, I meant storing documents as "ZIP files in columns defined
with the IMAGE datatype". Thanks for the URLs regarding the ZIP
IFilter - I will investigate them. Your post was very helpful -
thanks.
Jose
"John Kane" <jt-kane@.comcast.net> wrote in message news:<#qtcv8pIEHA.964@.TK2MSFTNGP10.phx.gbl>...[vbcol=seagreen]
> Jose,
> First of all, can I assume that you are using SQL Server 2000? If so, could
> you post the full output of: SELECT @.@.version -- as this is very help info
> in understanding SQL FTS issues.
> Secondly, what exactly do you mean by "compress[ing] the files before
> putting them into the DB?" Do you mean to store them as ZIP files in columns
> defined with the IMAGE datatype? or are you thinking about placing the FT
> Catalog folder (SQL00060005) & files on a compressed drive. As for the
> latter, I've not done any testing with compressed drive, but Hilary Cotter
> has and from past postings on this subject, he's indicated that the
> performance gains is mimumal. As for the former, you will need to purchase a
> 3rd Party IFilter for the WinZIP ZIP file format and then assuming you are
> using SQL Server 2000, you can store the zip files in an IMAGE datatype and
> define a "file extension" column that will have the value of "zip" and the
> ZIP IFilter will be called by the MSSearch and the mssdmn (Search Filter
> Daemon) processes to FT Index the contents of the zip file that contains MS
> Word documents. You can purchase the ZIP IFilter from either of the
> following two vendors:
> http://www.4-share.com/content/products.htm
> http://www.ifiltershop.com/zipfilter.html
> I've personally have been using the 4-Share ZIP IFilter for over a year and
> find it works very well for me.
> As for "storing documents in the DB versus on the file system", this is very
> much a open and very actively discussed subject, below is one past posting
> I've made on this subject with links included...
> "There are many issues related to storing the files on disk (faster access,
> easy copy/move, etc.) vs. storing the files in the database (transaction
> control, change control, audit of changes, etc) than just efficiency and
> scalability, although, those are important points... There is a ASPFAQ
> website that has a number of references
> (http://www.aspfaq.com/show.asp?id=2149) and links to KB articles (I've not
> checked them all), but one major reason for storing the files in a SQL
> Server table's column defined with an IMAGE datatype, is that in SQL Server
> 2000 you can take advantage of the new Full-Text Search (FTS) feature to FT
> Index the contents of supported file types, primarily MS Office, HTML and
> 3rd party IFilter's like Adobe's PDF files.
> As this is one of those FAQ type questions that have been known to start
> religious wars or at the very least a flame war, and while Sharon didn't ask
> about other application issues that this question often generates as it is
> often an open-ended question, i.e., one that never seems to be answered to
> everyone's satisfactions as it usually "depends" upon the application or
> upon how one defines the word "best"...
> As for the Terra Server, checkout the "about page"
> (http://terraserver-usa.com/about.aspx?n=AboutBody) on the site that
> explains how MS and others did it and yes, they used the IMAGE datatype,
> specifically
> http://terraserver-usa.com/about.asp...utTechDbschema (The Imagery table
> contains the "blob" [image] field where the imagery data is stored.
> Jose, depending upon how many files you have, how frequently they change,
> how large they are, what the app is, etc. you may want to review the Terra
> Server web site and consider the following rule of thumb that they used:
> < 1 million images or big images (> 1MB) put them in the file system.
> Note, you can also use "Text-in-Row" if the files are >7000 bytes on avg.
> For everything in between, either way will work, depending upon if you need
> transactional control over your files. Additionally, if you store the files
> in a TEXT or IMAGE column, you can also store related metadata about that
> file in SQL Server as well for increased searching capabilities. Also, and
> obviously with SQL Server you get built-in support for validating the
> consistency of the database, indices, backup, restore, etc. As for loading
> &/or extracting the files from SQL Server there are now many methods of
> doing this via BCP, BULK INSERT as well as ADO Stream DTS too, if you need
> to transform your files in some manner.
> Hope this helps, as the primary consideration should always be what is best
> for your application...
> Regards,
> John
>
> "Jose Perez" <jlpv@.totalise.co.uk> wrote in message
> news:3724a9d9.0404141212.480e1630@.posting.google.c om...

Monday, March 19, 2012

composing a reference from fields located in mutiple tables

Consider a situation. There is a table of submitted 'documents'. They have some attributes. There are assignments to process the things, which have a date they were created. Finally there is a price list which specifies the price according to document features and date, so that the assignment to process a document created at different time will have a different cost. In other words, there is a relation
(assignment->document.attribute(s) + assignment.date) -> pricelist.price

Creating relations has the integrity advantages: it is not possible to create an assignment, which price is not defined in the pricelist; precludes the pricelist entry removal if it is referred by any assignments.

Should a view, which combines all the foreign fields into one virtual table, be created to make establishing the reference possible?

Not quite sure I understand the entire request. However, it seems you should be able to enforce referential integrity via Instead Of trigger.

Perhaps, you could give us some sample DDL and desired output to better describe your issue. We might be able to help further then.

|||Normally, you have all the tables interrelated. The reference (a foreign key) points to an object in another table specifying the "container" it belongs to. For instance, many books refer a single author.

Sometimes, you need to establish a complex reference consisting of multiple fields. For instance, a job refers to pricelist. The options in the print job specify a "service id", which has a unique price in the pricelist.

Suppose now that the pricelist can be updated. When job is created, it fixes the latest service cost in int field, the pricelist date. So the cost is uniquely identified by the job options (some fields) and the date. This is a complex key.

What I have faced is that nobody addresses the possibility of having the job attributes fixed in a separate table (say, documents to be processed always have the same settings). A job refers a document, from which the attributes are derived and coupled with the pricelist date identifies the job cost in the pricelist table. Effectively, the complex key is composed from fields located in different tables. A record contains only a part of complex key plus a reference to another entity, which has a rest of the key.

One way to create a relation would be to produce a view joining the key field tables. However, views are not enabled in diagrams. I suppose the reason is because the views are not allowed to participate in data relations.

Actually, I have decided that in my case I do not need to fix the job settings in the referred document, so I'll have all the key fields in one table. Yet, the topic is quite general to be interesting for me and others.

Wednesday, March 7, 2012

Complex SELECT pulling data from three tables

OK, everyone. I have hit a wall while trying to accomplish the
following goal. I have a table (tblDocInfo) that stores information
about documents. Among the information being stored, I have three
fields that store information about attorneys (ReviewingAtty,
DraftingAtty, ContactAtty) who are usually three different people. The
information I am storing in those three fields is of type INT. Those
numbers correspond to the Primary Key in another table (tblAtty) where
the attorney data is stored. For example:
tblDocInfo {
ID int
ReviewingAtty int
DraftingAtty int
ContactAtty int
docDate date
Comments varchar
}
I am trying to execute a query where I can pull the attorney names
along with some other document information, but I am not getting the
results that I want as the three attorneys are normally different and
only one name is coming accross.
For example, if in "tblDocInfo" I have this data (see format above):
23, 2, 4, 6, 23/09/05, Will
24, 1, 5, 3, 11/05/05, Contract
30, 2, 2, 6, 22/09/05, Letter
41, 4, 4, 5, 23/09/05, Warning
and in "tblAtty" I have this data:
1, John Doe
2, Jane Doe
3, Peter Somebody
4, Joe Blow
5, John Smith
6, Michael Cooper
I would like my query to display this:
23, Jane Doe, Joe Blow, Michael Cooper, 23/09/05, Will
24, John Doe, John Smith, Peter Somebody, 11/05/05, Contract
30, Jane Doe, Jane Doe, Michael Cooper, 22/09/05, Letter
41, Joe Blow, Joe Blow, John Smith, 23/09/05, Warning
Thank you.
MarcosSomething like this...
select a.id,at1.Name,at2.Name,at3.Name,a.docDate,a.comments
from tblDocInfo a
join tblAtty a1
on a1.id = a. ReviewingAtty
join tblAtty a2
on a2.id = a. DraftingAtty
join tblAtty a3
on a3.id = a. ContactAtty
group by a.id,at1.Name,at2.Name,at3.Name,a.docDate,a.comments
MJKulangara
http://sqladventures.blogspot.com|||I only see two tables here, and I think there is certainly a better and more
relational design waiting for you to discover it. But for now:
USE tempdb;
GO
CREATE TABLE dbo.Attorneys
(
AttorneyID INT PRIMARY KEY,
FullName VARCHAR(64)
);
GO
CREATE TABLE dbo.DocumentInfo
(
DocumentID INT PRIMARY KEY,
ReviewingAttorney INT NULL FOREIGN KEY REFERENCES
dbo.Attorneys(AttorneyID),
DraftingAttorney INT NULL FOREIGN KEY REFERENCES
dbo.Attorneys(AttorneyID),
ContactAttorney INT NULL FOREIGN KEY REFERENCES
dbo.Attorneys(AttorneyID),
docDate SMALLDATETIME,
Comments VARCHAR(255)
);
GO
SET NOCOUNT ON;
INSERT dbo.Attorneys SELECT 1, 'John Doe';
INSERT dbo.Attorneys SELECT 2, 'Jane Doe';
INSERT dbo.Attorneys SELECT 3, 'Peter Somebody';
INSERT dbo.Attorneys SELECT 4, 'Joe Blow';
INSERT dbo.Attorneys SELECT 5, 'John Smith';
INSERT dbo.Attorneys SELECT 6, 'Michael Cooper';
INSERT dbo.DocumentInfo SELECT 23, 2, 4, 6, '20050923', 'Will';
INSERT dbo.DocumentInfo SELECT 24, 1, 5, 3, '20050511', 'Contract';
INSERT dbo.DocumentInfo SELECT 30, 2, 2, 6, '20050922', 'Letter';
INSERT dbo.DocumentInfo SELECT 41, 4, 4, 5, '20050923', 'Warning';
SELECT d.DocumentID, a1.FullName, a2.FullName, a3.FullName, d.docDate,
d.Comments
FROM dbo.documentInfo d
LEFT OUTER JOIN dbo.Attorneys a1
ON d.ReviewingAttorney = a1.AttorneyID
LEFT OUTER JOIN dbo.Attorneys a2
ON d.DraftingAttorney = a2.AttorneyID
LEFT OUTER JOIN dbo.Attorneys a3
ON d.ContactAttorney = a3.AttorneyID
GO
DROP TABLE dbo.DocumentInfo;
DROP TABLE dbo.Attorneys;
GO
"bodhipooh" <marcos.velez@.ipm.com> wrote in message
news:1139244065.032852.306060@.o13g2000cwo.googlegroups.com...
> OK, everyone. I have hit a wall while trying to accomplish the
> following goal. I have a table (tblDocInfo) that stores information
> about documents. Among the information being stored, I have three
> fields that store information about attorneys (ReviewingAtty,
> DraftingAtty, ContactAtty) who are usually three different people. The
> information I am storing in those three fields is of type INT. Those
> numbers correspond to the Primary Key in another table (tblAtty) where
> the attorney data is stored. For example:
> tblDocInfo {
> ID int
> ReviewingAtty int
> DraftingAtty int
> ContactAtty int
> docDate date
> Comments varchar
> }
> I am trying to execute a query where I can pull the attorney names
> along with some other document information, but I am not getting the
> results that I want as the three attorneys are normally different and
> only one name is coming accross.
> For example, if in "tblDocInfo" I have this data (see format above):
> 23, 2, 4, 6, 23/09/05, Will
> 24, 1, 5, 3, 11/05/05, Contract
> 30, 2, 2, 6, 22/09/05, Letter
> 41, 4, 4, 5, 23/09/05, Warning
> and in "tblAtty" I have this data:
> 1, John Doe
> 2, Jane Doe
> 3, Peter Somebody
> 4, Joe Blow
> 5, John Smith
> 6, Michael Cooper
> I would like my query to display this:
> 23, Jane Doe, Joe Blow, Michael Cooper, 23/09/05, Will
> 24, John Doe, John Smith, Peter Somebody, 11/05/05, Contract
> 30, Jane Doe, Jane Doe, Michael Cooper, 22/09/05, Letter
> 41, Joe Blow, Joe Blow, John Smith, 23/09/05, Warning
>
> Thank you.
> Marcos
>|||Thank you all.
First of all, I got the problem solved. I was trying to do the JOIN
operation, but had a mistake in the order (syntax will always get
you!!) Your help is greatly appreciated. Second, I apologize for not
posting my entire table, or much more information. I realize that
information is often necessary to make it possible for others to be
able to help me, but I am working within some confidentiality
constraints and had to minimize the amount of information that was to
be posted. Still your replies have been VERY helpful.
Marcos|||This example may give you some idea on how to go about this:
http://milambda.blogspot.com/2005/0...s-as-array.html
For a better solution please post DDL and sample data.
ML
http://milambda.blogspot.com/|||SELECT t1.[ID], t2.[AttyName], t3.[AttyName], t4.[AttyName], t1.docdate,
t1.comments
FROM tblDocInfo t1
INNER JOIN tblAtty t2 ON t1.ReviewingAtty = t2.AttyID
INNER JOIN tblAtty t3 ON t1.DraftingAtty = t2.AttyID
INNER JOIN tblAtty t4 ON t1.ContactAtty = t2.AttyID
This would work as long as you don't allow any NULLs.
"bodhipooh" wrote:

> OK, everyone. I have hit a wall while trying to accomplish the
> following goal. I have a table (tblDocInfo) that stores information
> about documents. Among the information being stored, I have three
> fields that store information about attorneys (ReviewingAtty,
> DraftingAtty, ContactAtty) who are usually three different people. The
> information I am storing in those three fields is of type INT. Those
> numbers correspond to the Primary Key in another table (tblAtty) where
> the attorney data is stored. For example:
> tblDocInfo {
> ID int
> ReviewingAtty int
> DraftingAtty int
> ContactAtty int
> docDate date
> Comments varchar
> }
> I am trying to execute a query where I can pull the attorney names
> along with some other document information, but I am not getting the
> results that I want as the three attorneys are normally different and
> only one name is coming accross.
> For example, if in "tblDocInfo" I have this data (see format above):
> 23, 2, 4, 6, 23/09/05, Will
> 24, 1, 5, 3, 11/05/05, Contract
> 30, 2, 2, 6, 22/09/05, Letter
> 41, 4, 4, 5, 23/09/05, Warning
> and in "tblAtty" I have this data:
> 1, John Doe
> 2, Jane Doe
> 3, Peter Somebody
> 4, Joe Blow
> 5, John Smith
> 6, Michael Cooper
> I would like my query to display this:
> 23, Jane Doe, Joe Blow, Michael Cooper, 23/09/05, Will
> 24, John Doe, John Smith, Peter Somebody, 11/05/05, Contract
> 30, Jane Doe, Jane Doe, Michael Cooper, 22/09/05, Letter
> 41, Joe Blow, Joe Blow, John Smith, 23/09/05, Warning
>
> Thank you.
> Marcos
>

Tuesday, March 20, 2012

Compression

Monday, March 19, 2012

composing a reference from fields located in mutiple tables

Wednesday, March 7, 2012

Complex SELECT pulling data from three tables

Sql Comparing two tables

Blog Archive

About Me