English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

I would like your opinion on which would be better.

1/ If I simply create an fulltextindex field called 'tags' to the rest of their details. I could then simply select any users where their keywords match those searched for.

However if I had a 100 million users you would have to do a full text search for 100
million entries.

How slow is a fullindex search if it is having to search a 100 million entries, bearing in mind it is only searching one field, and that field would be fully indexed? (only allowed ten keywords)

or

2/ I could create a table of keywords. This would contain 2 fields...

keyword | index-of-users-with-that-keyword.

when a keyword was searched it would find the keyword get the usernames and then search my other table to get the details of that user.


Basically what im asking is would suggestion number 1, take a fraction of a second to search a 100 million entries or many minutes?

thanks in advance.

2006-12-06 19:04:33 · 4 answers · asked by jonnie b 1 in Computers & Internet Programming & Design

4 answers

No, suggestion #1 probably won't take a fractioni of a second to search through 100 million entries... But it will definitely beat suggestion #2 (which will also be a maintenance nightmare BTW).

It won't take "many minutes" either. Perhaps, a few seconds, maybe a minute.

When you get to a point where you have 100 million users, you'll have what they call "a good problem to have". Your site will be, probably, generating enough revenue for you to throw in a bunch of high-resource, multi-cpu servers with ultra fast storage on a fiberoptic gigabit network, you'd split your index between them, and make it perform.

And until then, pretty much, any solution you pick is going to be lightning-fast. You could even throw your data into a plain file, and use grep to search it (if you are on a POSIX-compliant system) - even that will give you acceptable performance.

But a standard way to do this kind of thing is full text index. I would advice you to use that, and not reinvent the bicycle with database tables, not for performance reasons (like I said, FTI does perform better, but I just don't think you'll be able to see the difference for many years to come), but for technological consideration, and from maintenance standpoint.

Hope, this helps...

2006-12-06 23:46:07 · answer #1 · answered by n0body 4 · 0 0

It depends how many keywords you expect.
If there are many common keywords, you might want to use the 2nd. solution.
100 million users with 10 keywords each means 1 billion keywords. There will be many multiple appearance, so it seems to me that it is waste of time and space.
If you're using a database management system, just create a table of keywords.
There is a wide range of self made solutions, like suffix trees, hash tables, b+-trees etc. but to understand it you need to take your time and learn how to do it. DBMS does it already.

2006-12-06 19:33:28 · answer #2 · answered by Daniel F 2 · 0 1

in case you opt for the ultimate cyber web internet site, then you definately're going to have a pay a small value month-to-month to maintain it up. that's no longer that undesirable and you will incredibly make a stable cyber web internet site. in case you opt for a unfastened one, then you definately're going to ought to deal with the internet cyber web internet site's call being on your URL and such.

2016-12-18 09:01:24 · answer #3 · answered by Anonymous · 0 0

you can find solutions on sf.net

2006-12-06 19:33:39 · answer #4 · answered by Nam Nguyen 2 · 0 0

fedest.com, questions and answers