Borgatti: Python slow on fetchone, hangs on fetchall

Tuesday, 20 August 2013

Python slow on fetchone, hangs on fetchall

Python slow on fetchone, hangs on fetchall

I'm writing a script to SELECT query a database and parse through ~33,000
records. Unfortunately I'm running into problems at the
cursor.fetchone()/cursor.fetchall() phase of things.
I first tried iterating through the cursor a record at a time like so:
# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
while True:
# Get the next row in the cursor
row = cursor.fetchone()
if row == None:
break
# TODO: Determine if there's any kanji in row[2]
weight = float((row[3] + row[4]))/2
printStatus("Weight: " + str(weight))
Based on the output of printStatus (it prints out a timestamp plus
whatever string is passed to it), the script took approximately 1 second
to process each row. This lead me to believe that the query was being
re-run each time the loop iterated (with a LIMIT 1 or something), as it
took ~1 second for the same query to run once in something like
SQLiteStudio [i]and[/i] return all 33,000 rows. I calculated that, at that
rate, it would take around 7 hours to get through all 33,000 records.
Instead of sitting through that, I tried to use cursor.fetchall() instead:
results = cursor.fetchall()
# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
for row in results:
# TODO: Determine if there's any kanji in row[2]
weight = float((row[3] + row[4]))/2
printStatus("Weight: " + str(weight))
Unfortunately, the Python executable locked up at 25% CPU and ~6MB of RAM
when it got to the cursor.fetchall() line. I left the script running for
~10 minutes, but nothing happened.
Is ~33,000 returned rows (about 5MB of data) too much for Python to grab
at once? Am I stuck iterating through one at a time? Or is there something
I can do to speed things up?
EDIT: Here's some console output
12:56:26.019000: Adding new column 'weight' and related index to r_ele
12:56:26.019000: Querying database
12:56:28.079000: Starting weight calculations
12:56:28.079000: Weight: 1.0
12:56:28.079000: Weight: 0.5
12:56:28.080000: Weight: 0.5
12:56:28.338000: Weight: 1.0
12:56:28.339000: Weight: 3.0
12:56:28.843000: Weight: 1.5
12:56:28.844000: Weight: 1.0
12:56:28.844000: Weight: 0.5
12:56:28.844000: Weight: 0.5
12:56:28.845000: Weight: 0.5
12:56:29.351000: Weight: 0.5
12:56:29.855000: Weight: 0.5
12:56:29.856000: Weight: 1.0
12:56:30.371000: Weight: 0.5
12:56:30.885000: Weight: 0.5
12:56:31.146000: Weight: 0.5
12:56:31.650000: Weight: 1.0
12:56:32.432000: Weight: 0.5
12:56:32.951000: Weight: 0.5
12:56:32.951000: Weight: 0.5
12:56:32.952000: Weight: 1.0
12:56:33.454000: Weight: 0.5
12:56:33.455000: Weight: 0.5
12:56:33.455000: Weight: 1.0
12:56:33.716000: Weight: 0.5
12:56:33.716000: Weight: 1.0
And here's the SQL query:
SELECT
re.id as _id,
re.fk,
(SELECT k_ele.value FROM k_ele WHERE k_ele.fk = re.fk) as kanji,
(SELECT COUNT(re_pri.value) FROM re_pri WHERE re_pri.fk = re.id) as
re_priority,
(SELECT COUNT(ke_pri.value) FROM ke_pri WHERE ke_pri.fk = (SELECT k_ele.id
FROM k_ele WHERE k_ele.value = (SELECT k_ele.value FROM k_ele WHERE
k_ele.fk = re.fk))) as ke_priority
FROM
r_ele as re
WHERE
(SELECT COUNT(re_pri.value) FROM re_pri WHERE re_pri.fk = re.id) +
(SELECT COUNT(ke_pri.value) FROM ke_pri WHERE ke_pri.fk = (SELECT k_ele.id
FROM k_ele WHERE k_ele.value = (SELECT k_ele.value FROM k_ele WHERE
k_ele.fk = re.fk))) > 0;

Borgatti

Tuesday, 20 August 2013

Python slow on fetchone, hangs on fetchall

No comments:

Post a Comment