English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

Let's say I have a hash with 1 million keys. (no kidding... my system's RAM is pretty full!) I want to pick one key at random. And it needs to be efficient, so I can't just load all the keys into an array and pick a random array element. Any ideas?

2007-02-13 13:21:23 · 4 answers · asked by Wolf Harper 6 in Computers & Internet Programming & Design

I prefer to avoid modules because, you never know how they'll work internally. They are likely to do operations which are expensive when done on 1 million keys. It's in RAM for performance; being diskbound to ties or DBMs would bring it to its knees.

And please no rude, smarty-pants answers from know-it-all wannabees who, sadly, don't know it all...

2007-02-13 16:45:11 · update #1

4 answers

it really all depends on where the data is!
if it's in a mysql db, you can just select a random one.

if it's in a tied hash, you can use rand on the number of items in the keys.

rand is built in, so no need to load extra libs.

if the hash is in memory... the same will apply!

2007-02-13 14:09:42 · answer #1 · answered by jake cigar™ is retired 7 · 0 0

1,000,000 keys huh? Pray tell, why are you stuffing
this little lot into RAM, when you could use the inbuilt
Tie::File module to read from a file? Sorry, doesn't
make sense.

As to picking a random number/item. Same thing
goes. There is a Perl module already written that
does just that.

Some people like to re-invent the wheel, rather than
refine it! ;o)


Link below:

Addendum:
--------------
The internal working of certain modules is well
documented, and, as you are probably aware, a
good many of the better and tested ones are
now part of the main Perl distribution; as happens
to be Tie::File ( which was given as an example,
nothing more ).

You also seem to be under some misapprehension as
well. Accordingly, I would heartily recommend that
you look up *exactly* how Tie::File works before
condemning it out of hand, just because it has the
word 'module' attached. And tell me honestly, isn't
that exactly what you are trying to write, a module,
or function, albeit a tiny one?

For the record (pun intended ), I made no mention
about accessing databases .. unless you consider a
flat file as a database? As it happens, I consider your
mash of hashes to be a flat file database! :o)

Finally, I happen to completely disagree with your
assertion that being partly disk bound '..would
bring it to its knees'. IMHO, holding large amounts
of data in memory as you suggest, still does not make
good sound sense, unless working direct from chip.
Why waste valuable system resources that would be
better utilized elsewhere, for one random record
out of one million? How can that possibly be considered
beneficial, performance-wise??

2007-02-13 21:34:42 · answer #2 · answered by Chipz 3 · 0 0

Use keys(%hash) function on your hash,
this will give you an array of all of your keys, and then use rnd function to select a random key. That would be my approach...

Let me know if you need more details...

2007-02-17 10:57:45 · answer #3 · answered by Nataliya P 1 · 0 0

The first thing that comes to mind is to call "each" a random number of times. But I do not know how efficient "each" is.

my $iTimes = int(rand(scalar(%hash)));
my ($key, $val);
while (($key, $val) = each %hash)
{
last if (--$iTimes < 0); # Yeah, this might be off by one
} # while

2007-02-14 15:28:27 · answer #4 · answered by martinthurn 6 · 0 0

fedest.com, questions and answers