twitter username checker - no json object could be found

Wenn du dir nicht sicher bist, in welchem der anderen Foren du die Frage stellen sollst, dann bist du hier im Forum für allgemeine Fragen sicher richtig.
Antworten
peterdot
User
Beiträge: 3
Registriert: Freitag 11. November 2016, 10:37

Hi there,

I am new to Python and trying to use a twitter name checker coded by matrixik.
It is working quite well, but after 5-10 iterations checking about 25 names at once, an exception happens:

Code: Alles auswählen

    Exception in thread Thread-18:
    Traceback (most recent call last):
      File "C:\Python27\lib\threading.py", line 8
        self.run()
      File "C:\Python27\lib\site-packages\workerp
        job.run()
      File "check_twitter_names.py", line 44, in
        name_info = json.loads(name_json.text)
      File "C:\Python27\lib\json\__init__.py", li
        return _default_decoder.decode(s)
      File "C:\Python27\lib\json\decoder.py", lin
        obj, end = self.raw_decode(s, idx=_w(s, 0
      File "C:\Python27\lib\json\decoder.py", lin
        raise ValueError("No JSON object could be
    ValueError: No JSON object could be decoded
The checker itself seems to be quite simple.

Code: Alles auswählen

"""
Twitter usernames checker
"""

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function, unicode_literals
from __future__ import absolute_import

import json
import requests
import sys
import workerpool

from threading import Lock

#-----------------------------------------------------------------------------#

TWITTER_NAMES_FILE = 'twitter_usernames.txt'

TWITTER_AVAILABLE_NAMES_FILE = 'names_available.txt'
TWITTER_TAKEN_NAMES_FILE = 'names_taken.txt'

NR_OF_THREADS = 20

#-----------------------------------------------------------------------------#

HEADERS = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; '
                        'Trident/6.0)'}
MUTEX = Lock()


class CheckName(workerpool.Job):
   """Function will check if name is available"""
   def __init__(self, name):
       self.name = name

   def run(self):
       name_check_url = 'https://twitter.com/users/username_available'
       name_json = requests.get(name_check_url,
                                params={'username': self.name},
                                headers=HEADERS)
       name_info = json.loads(name_json.text)
       if name_info['valid']:
           MUTEX.acquire()
#            print('Name available: {}'.format(self.name,))
           with open(TWITTER_AVAILABLE_NAMES_FILE, 'a') as my_file:
               my_file.writelines('{}\n'.format(self.name,))
           MUTEX.release()
       else:
           MUTEX.acquire()
#            print('Name taken: {}'.format(self.name,))
           with open(TWITTER_TAKEN_NAMES_FILE, 'a') as my_file:
               my_file.writelines('{}\n'.format(self.name,))
           MUTEX.release()


def main():
   """\
   Main program
   """
   pool = workerpool.WorkerPool(size=NR_OF_THREADS)

   for name in open(TWITTER_NAMES_FILE):
       job = CheckName(name.strip())
       pool.put(job)

   pool.shutdown()
   pool.wait()
   return 0    # OK

if __name__ == '__main__':
   #Start Program
   STATUS = main()
   sys.exit(STATUS)

It seems the problem is happening with several threads only. Any suggestions? Maybe the problem is not releated to the code itself, but Twitter blocking the mass requests?
I am very thankful for any help.
Best regards,
peterdot
Zuletzt geändert von Anonymous am Freitag 11. November 2016, 13:25, insgesamt 1-mal geändert.
Grund: Quelltext in Python-Codebox-Tags gesetzt.
Sirius3
User
Beiträge: 17753
Registriert: Sonntag 21. Oktober 2012, 17:20

@peterdot: why is `name_check_url` not a constant? `writelines` expects a list, not a string, use `write` instead. `write` as atomic, so no need for a lock; in general, use locks with the with-statement. Probably twitter doesn't like to be flooded. Check the `status_code`.
BlackJack

@Sirius3: Are you sure about `write()`? I'm not. It may be in CPython. And even if it is, the semantics of the 'a' file mode are not very clear, there is some freedom for implementations that really make a lock necessary here IMHO. That said, I simply would not write to one file from different threads.

@peterdot: What is `workerpool` and what's the advantage of that module over `concurrent.futures`, which is part of Python 3's standard library and has a backport to Python 2. Or `multiprocessing.dummy.Pool` from the standard library (both Python 2 and 3)?

Lines 5 and 6 must be at the very top of the file to make sense. The she-bang line has definitely no effect there and the coding comment most likely is useless that far into the file.
peterdot
User
Beiträge: 3
Registriert: Freitag 11. November 2016, 10:37

thanks for your replies!
"workerpool" is the module for multi threading I think:
https://pypi.python.org/pypi/workerpool

The complete code of the name checker was released here:
https://bitbucket.org/matrixik/twitter- ... ss-checker

As I am new to Python and did not code this script myself, I really can't say a lot about the sense of some parts. what I can say is: it works (under Phyton 2.7), at least for some checks. but after some checks, there seems to be some kind of "overflow" with the threading part, or, maybe Twitter itself blocks the requests, which lasts in a very confusing error message about the "json object".

I would really appriciate if some of you with good Phyton knowledge could dig into this and fix the code (as I am simply not skilled enough I think). thank you so much for your help, really appriciate that.
regards!
peterdot
User
Beiträge: 3
Registriert: Freitag 11. November 2016, 10:37

I am now pretty shure that it is Twitter blocking the requests. After an hour, I can check around 50 names. Then suddenly, the script always fails. After waiting for some period, the sctript works again.

It would be so much easier, if the code could provide a useful error message, like "Request blocked" or something like this. Any ideas how to check if requests to Twitter are generally blocked (after my IP gets banned)?
Liffi
User
Beiträge: 153
Registriert: Montag 1. Januar 2007, 17:23

Maybe it helps if you read the official API rate limits.

Although I am not too sure if you are using that API.
Benutzeravatar
snafu
User
Beiträge: 6740
Registriert: Donnerstag 21. Februar 2008, 17:31
Wohnort: Gelsenkirchen

In line 44:

Code: Alles auswählen

name_info = json.loads(name_json.text)
`name_json.text` can return an empty string. This would produce the error message as shown in your first post when passing it to `json.loads()`. To work around the problem you just put in a check like this:

Code: Alles auswählen

if not name_json.text:
    raise IOError('API returned empty JSON response')
Sirius3
User
Beiträge: 17753
Registriert: Sonntag 21. Oktober 2012, 17:20

@snafu: not only empty results, any invalid json leads to that exception. You need to check the status_code:

Code: Alles auswählen

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Twitter usernames checker
"""
from multiprocessing.pool import ThreadPool
import requests
 
TWITTER_NAMES_FILE = 'twitter_usernames.txt'
TWITTER_AVAILABLE_NAMES_FILE = 'names_available.txt'
TWITTER_TAKEN_NAMES_FILE = 'names_taken.txt'
TWITTER_URL = 'https://twitter.com/users/username_available'
 
NR_OF_THREADS = 20
 
HEADERS = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; '
                        'Trident/6.0)'}

def check_name(name):
    name_json = requests.get(TWITTER_URL,
        params={'username': self.name}, headers=HEADERS)
    if name_json.status_code == 200:
        name_info = name_json.json()
        return name_info['valid'], name
    else:
        raise IOError("twitter down")
    
def main():
    pool = ThreadPool(NR_OF_THREADS)
    with open(TWITTER_NAMES_FILE) as names, \
         open(TWITTER_AVAILABLE_NAMES_FILE, 'a') as available, \
         open(TWITTER_TAKEN_NAMES_FILE, 'a') as taken:
        for valid, name in pool.imap_unordered(check_name, names)
            (taken if valid else available).write(name + '\n')
 
if __name__ == '__main__':
   main()
Benutzeravatar
snafu
User
Beiträge: 6740
Registriert: Donnerstag 21. Februar 2008, 17:31
Wohnort: Gelsenkirchen

@Sirius3
Missing colon at the end of line 33. ;)
heiner88
User
Beiträge: 65
Registriert: Donnerstag 20. Oktober 2016, 07:29

Mit folgenden Änderungen funktioniert das obige Programm mit Windows 7 und Python 3.5.2:
(Zeile 21: self.name ==> name, Zeile 19: name.strip(), Zeile 34: taken vertauschen mit available)
(Test-Namen: barackobama, ...)

Code: Alles auswählen

from multiprocessing.pool import ThreadPool
import requests

TWITTER_NAMES_FILE = 'twitter_usernames.txt'
TWITTER_AVAILABLE_NAMES_FILE = 'names_available.txt'
TWITTER_TAKEN_NAMES_FILE = 'names_taken.txt'
TWITTER_URL = 'https://twitter.com/users/username_available'

NR_OF_THREADS = 20

HEADERS = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)'}

def check_name(name):
    name = name.strip()
    name_json = requests.get(TWITTER_URL,
        params={'username': name}, headers=HEADERS)
    if name_json.status_code == 200:
        name_info = name_json.json()
        return name_info['valid'], name
    else:
        raise IOError("twitter down")

def main():
    pool = ThreadPool(NR_OF_THREADS)
    with open(TWITTER_NAMES_FILE) as names, \
         open(TWITTER_AVAILABLE_NAMES_FILE, 'a') as available, \
         open(TWITTER_TAKEN_NAMES_FILE, 'a') as taken:
        for valid, name in pool.imap_unordered(check_name, names):
            (available if valid else taken).write(name + '\n')

if __name__ == '__main__':
   main()
Antworten