Seite 1 von 1

twitter username checker - no json object could be found

Verfasst: Freitag 11. November 2016, 10:39
von peterdot
Hi there,

I am new to Python and trying to use a twitter name checker coded by matrixik.
It is working quite well, but after 5-10 iterations checking about 25 names at once, an exception happens:

Code: Alles auswählen

    Exception in thread Thread-18:
    Traceback (most recent call last):
      File "C:\Python27\lib\threading.py", line 8
        self.run()
      File "C:\Python27\lib\site-packages\workerp
        job.run()
      File "check_twitter_names.py", line 44, in
        name_info = json.loads(name_json.text)
      File "C:\Python27\lib\json\__init__.py", li
        return _default_decoder.decode(s)
      File "C:\Python27\lib\json\decoder.py", lin
        obj, end = self.raw_decode(s, idx=_w(s, 0
      File "C:\Python27\lib\json\decoder.py", lin
        raise ValueError("No JSON object could be
    ValueError: No JSON object could be decoded
The checker itself seems to be quite simple.

Code: Alles auswählen

"""
Twitter usernames checker
"""

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function, unicode_literals
from __future__ import absolute_import

import json
import requests
import sys
import workerpool

from threading import Lock

#-----------------------------------------------------------------------------#

TWITTER_NAMES_FILE = 'twitter_usernames.txt'

TWITTER_AVAILABLE_NAMES_FILE = 'names_available.txt'
TWITTER_TAKEN_NAMES_FILE = 'names_taken.txt'

NR_OF_THREADS = 20

#-----------------------------------------------------------------------------#

HEADERS = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; '
                        'Trident/6.0)'}
MUTEX = Lock()


class CheckName(workerpool.Job):
   """Function will check if name is available"""
   def __init__(self, name):
       self.name = name

   def run(self):
       name_check_url = 'https://twitter.com/users/username_available'
       name_json = requests.get(name_check_url,
                                params={'username': self.name},
                                headers=HEADERS)
       name_info = json.loads(name_json.text)
       if name_info['valid']:
           MUTEX.acquire()
#            print('Name available: {}'.format(self.name,))
           with open(TWITTER_AVAILABLE_NAMES_FILE, 'a') as my_file:
               my_file.writelines('{}\n'.format(self.name,))
           MUTEX.release()
       else:
           MUTEX.acquire()
#            print('Name taken: {}'.format(self.name,))
           with open(TWITTER_TAKEN_NAMES_FILE, 'a') as my_file:
               my_file.writelines('{}\n'.format(self.name,))
           MUTEX.release()


def main():
   """\
   Main program
   """
   pool = workerpool.WorkerPool(size=NR_OF_THREADS)

   for name in open(TWITTER_NAMES_FILE):
       job = CheckName(name.strip())
       pool.put(job)

   pool.shutdown()
   pool.wait()
   return 0    # OK

if __name__ == '__main__':
   #Start Program
   STATUS = main()
   sys.exit(STATUS)

It seems the problem is happening with several threads only. Any suggestions? Maybe the problem is not releated to the code itself, but Twitter blocking the mass requests?
I am very thankful for any help.
Best regards,
peterdot

Re: twitter username checker - no json object could be found

Verfasst: Freitag 11. November 2016, 11:02
von Sirius3
@peterdot: why is `name_check_url` not a constant? `writelines` expects a list, not a string, use `write` instead. `write` as atomic, so no need for a lock; in general, use locks with the with-statement. Probably twitter doesn't like to be flooded. Check the `status_code`.

Re: twitter username checker - no json object could be found

Verfasst: Freitag 11. November 2016, 13:38
von BlackJack
@Sirius3: Are you sure about `write()`? I'm not. It may be in CPython. And even if it is, the semantics of the 'a' file mode are not very clear, there is some freedom for implementations that really make a lock necessary here IMHO. That said, I simply would not write to one file from different threads.

@peterdot: What is `workerpool` and what's the advantage of that module over `concurrent.futures`, which is part of Python 3's standard library and has a backport to Python 2. Or `multiprocessing.dummy.Pool` from the standard library (both Python 2 and 3)?

Lines 5 and 6 must be at the very top of the file to make sense. The she-bang line has definitely no effect there and the coding comment most likely is useless that far into the file.

Re: twitter username checker - no json object could be found

Verfasst: Freitag 11. November 2016, 16:10
von peterdot
thanks for your replies!
"workerpool" is the module for multi threading I think:
https://pypi.python.org/pypi/workerpool

The complete code of the name checker was released here:
https://bitbucket.org/matrixik/twitter- ... ss-checker

As I am new to Python and did not code this script myself, I really can't say a lot about the sense of some parts. what I can say is: it works (under Phyton 2.7), at least for some checks. but after some checks, there seems to be some kind of "overflow" with the threading part, or, maybe Twitter itself blocks the requests, which lasts in a very confusing error message about the "json object".

I would really appriciate if some of you with good Phyton knowledge could dig into this and fix the code (as I am simply not skilled enough I think). thank you so much for your help, really appriciate that.
regards!

Re: twitter username checker - no json object could be found

Verfasst: Freitag 11. November 2016, 16:45
von peterdot
I am now pretty shure that it is Twitter blocking the requests. After an hour, I can check around 50 names. Then suddenly, the script always fails. After waiting for some period, the sctript works again.

It would be so much easier, if the code could provide a useful error message, like "Request blocked" or something like this. Any ideas how to check if requests to Twitter are generally blocked (after my IP gets banned)?

Re: twitter username checker - no json object could be found

Verfasst: Freitag 11. November 2016, 16:59
von Liffi
Maybe it helps if you read the official API rate limits.

Although I am not too sure if you are using that API.

Re: twitter username checker - no json object could be found

Verfasst: Samstag 12. November 2016, 11:03
von snafu
In line 44:

Code: Alles auswählen

name_info = json.loads(name_json.text)
`name_json.text` can return an empty string. This would produce the error message as shown in your first post when passing it to `json.loads()`. To work around the problem you just put in a check like this:

Code: Alles auswählen

if not name_json.text:
    raise IOError('API returned empty JSON response')

Re: twitter username checker - no json object could be found

Verfasst: Samstag 12. November 2016, 11:26
von Sirius3
@snafu: not only empty results, any invalid json leads to that exception. You need to check the status_code:

Code: Alles auswählen

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Twitter usernames checker
"""
from multiprocessing.pool import ThreadPool
import requests
 
TWITTER_NAMES_FILE = 'twitter_usernames.txt'
TWITTER_AVAILABLE_NAMES_FILE = 'names_available.txt'
TWITTER_TAKEN_NAMES_FILE = 'names_taken.txt'
TWITTER_URL = 'https://twitter.com/users/username_available'
 
NR_OF_THREADS = 20
 
HEADERS = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; '
                        'Trident/6.0)'}

def check_name(name):
    name_json = requests.get(TWITTER_URL,
        params={'username': self.name}, headers=HEADERS)
    if name_json.status_code == 200:
        name_info = name_json.json()
        return name_info['valid'], name
    else:
        raise IOError("twitter down")
    
def main():
    pool = ThreadPool(NR_OF_THREADS)
    with open(TWITTER_NAMES_FILE) as names, \
         open(TWITTER_AVAILABLE_NAMES_FILE, 'a') as available, \
         open(TWITTER_TAKEN_NAMES_FILE, 'a') as taken:
        for valid, name in pool.imap_unordered(check_name, names)
            (taken if valid else available).write(name + '\n')
 
if __name__ == '__main__':
   main()

Re: twitter username checker - no json object could be found

Verfasst: Samstag 12. November 2016, 12:01
von snafu
@Sirius3
Missing colon at the end of line 33. ;)

Re: twitter username checker - no json object could be found

Verfasst: Sonntag 13. November 2016, 18:51
von heiner88
Mit folgenden Änderungen funktioniert das obige Programm mit Windows 7 und Python 3.5.2:
(Zeile 21: self.name ==> name, Zeile 19: name.strip(), Zeile 34: taken vertauschen mit available)
(Test-Namen: barackobama, ...)

Code: Alles auswählen

from multiprocessing.pool import ThreadPool
import requests

TWITTER_NAMES_FILE = 'twitter_usernames.txt'
TWITTER_AVAILABLE_NAMES_FILE = 'names_available.txt'
TWITTER_TAKEN_NAMES_FILE = 'names_taken.txt'
TWITTER_URL = 'https://twitter.com/users/username_available'

NR_OF_THREADS = 20

HEADERS = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)'}

def check_name(name):
    name = name.strip()
    name_json = requests.get(TWITTER_URL,
        params={'username': name}, headers=HEADERS)
    if name_json.status_code == 200:
        name_info = name_json.json()
        return name_info['valid'], name
    else:
        raise IOError("twitter down")

def main():
    pool = ThreadPool(NR_OF_THREADS)
    with open(TWITTER_NAMES_FILE) as names, \
         open(TWITTER_AVAILABLE_NAMES_FILE, 'a') as available, \
         open(TWITTER_TAKEN_NAMES_FILE, 'a') as taken:
        for valid, name in pool.imap_unordered(check_name, names):
            (available if valid else taken).write(name + '\n')

if __name__ == '__main__':
   main()