Wednesday, March 7, 2012

complex 'query' using full text indexing

Hi people

i am having some difficulty with a quite advanced query (at least from my level).

select * from cities join areacodes on contains (city,'select area from areacodes right join cities on area not in (select city from cities)' )


Server: Msg 7631, Level 15, State 1, Line 1
Syntax error occurred near 'area'. Expected ''''' in search condition 'select area from areacodes right join cities on area not in (select city from cities)'.

what i am trying to do is find the areas in the postcodes table that contain words that are in the cities table. I need to match them up and modify the data in the postcodes table. the data is inconsistent as it comes from two unrelated sources. i have to be aware of spelling mistakes etc and have tried usind the soundex function to match them but it produces too many results. as you can see from the data in the postcodes table some areas (the second column) are not entirely upper case. these are the areas that have matched records in the cities table and the data is valid.

please can someone suggest the correct and easy way of doing this?

Thanks

Chris Morton

POSTCODES TABLE FIRST 30 ROWS

1

Aberdeen

49

3450

2

ABERDEEN FARM LINES

49212

3450

3

ABERFELDY

58652

3451

4

ACORNHOEK

13

3454

5

ACORNHOEK FARM LINES

137952

3454

6

Addo

42

3450

7

Adelaide

46

3450

8

Aggeneys

54

3456

9

AKASIA

12

3452

10

Albertinia

28

3458

11

Alberton

11

3452

12

ALETTASRUS

53922

3451

13

Alexander Bay

27

3456

14

Alexandria

46

3450

15

Alice

40

3450

16

ALICE FARM LINES

4049

3450

17

ALICEDALE

42

3450

18

Aliwal North

51

3450

19

Allanridge

57

3451

20

Alldays

15

3457

21

ALMA

14

3455

22

Amalia

53

3455

23

AMANDEBULT

14

3455

24

Amanzimtoti

31

3453

25

AMATIKULU

35

3453

26

Amersfoort

17

3454

27

Amsterdam

17

3454

28

ANERLEY

39

3453

29

APEL

15

3457

30

APEL FARM LINES

15482

3457

CITIES TABLE FIRST 30 ROWS

1

Aberdeen

3450

2

Addo

3450

3

Adelaide

3450

4

Alexandria

3450

5

Alice

3450

6

Aliwal North

3450

7

Balfour

3450

8

Barkly East

3450

9

Bathurst

3450

10

Bedford

3450

11

Bisho

3450

12

Burgersdorp

3450

13

Butterworth

3450

14

Cathcart

3450

15

Cintsa

3450

16

Coffee Bay

3450

17

Cookhouse

3450

18

Cradock

3450

19

Dordrecht

3450

20

East London

3450

21

Elliot

3450

22

Flagstaff

3450

23

Fort Beaufort

3450

24

Gonubie

3450

25

Graaff Reinet

3450

26

Grahamstown

3450

27

Haga-Haga

3450

28

Hamburg

3450

29

Hankey

3450

30

Herschel

3450

Hi Chris,

The only way I can think of that you can leverage CONTAINS/fulltext functionality in this case is unfortunately through a cursor - you'll need to fetch a row from the Cities table and run a CONTAINS query on your Postcodes table per every city you fetch.

If you have a requiement for matching large amounts of "fuzzy" data, you may want to check if Fuzzy Lookup Transform in SSIS could be useful.

Hope this helps.

Best regards,

No comments:

Post a Comment