twscrape/readme.md
2023-05-28 19:08:47 +03:00

6.0 KiB

twscrape

Twitter GraphQL and Search API implementation with SNScrape data models.

Install

pip install twscrape

Or development version:

pip install git+https://github.com/vladkens/twscrape.git

Features

  • Support both Search & GraphQL Twitter API
  • Async/Await functions (can run multiple scrapers in parallel at the same time)
  • Login flow (with receiving verification code from email)
  • Saving/restoring account sessions
  • Raw Twitter API responses & SNScrape models
  • Automatic account switching to smooth Twitter API rate limits

Usage

import asyncio
from twscrape import AccountsPool, API, gather
from twscrape.logger import set_log_level

async def main():
    pool = AccountsPool()  # or AccountsPool("path-to.db") - default is `accounts.db` 
    await pool.add_account("user1", "pass1", "user1@example.com", "email_pass1")
    await pool.add_account("user2", "pass2", "user2@example.com", "email_pass2")

    # log in to all new accounts
    await pool.login_all()

    api = API(pool)

    # search api (latest tab)
    await gather(api.search("elon musk", limit=20))  # list[Tweet]

    # graphql api
    tweet_id, user_id, user_login = 20, 2244994945, "twitterdev"

    await api.tweet_details(tweet_id)  # Tweet
    await gather(api.retweeters(tweet_id, limit=20))  # list[User]
    await gather(api.favoriters(tweet_id, limit=20))  # list[User]

    await api.user_by_id(user_id)  # User
    await api.user_by_login(user_login)  # User
    await gather(api.followers(user_id, limit=20))  # list[User]
    await gather(api.following(user_id, limit=20))  # list[User]
    await gather(api.user_tweets(user_id, limit=20))  # list[Tweet]
    await gather(api.user_tweets_and_replies(user_id, limit=20))  # list[Tweet]

    # note 1: limit is optional, default is -1 (no limit)
    # note 2: all methods have `raw` version e.g.:

    async for tweet in api.search("elon musk"):
        print(tweet.id, tweet.user.username, tweet.rawContent)  # tweet is `Tweet` object

    async for rep in api.search_raw("elon musk"):
        print(rep.status_code, rep.json())  # rep is `httpx.Response` object

    # change log level, default info
    set_log_level("DEBUG")

    # Tweet & User model can be converted to regular dict or json, e.g.:
    doc = await api.user_by_id(user_id)  # User
    doc.dict()  # -> python dict
    doc.json()  # -> json string

if __name__ == "__main__":
    asyncio.run(main())

CLI

You can also use the CLI to make requests (before that you need to log in to some accounts through the programming interface).

twscrape search "QUERY" --limit=20
twscrape tweet_details TWEET_ID
twscrape retweeters TWEET_ID --limit=20
twscrape favoriters TWEET_ID --limit=20
twscrape user_by_id USER_ID
twscrape user_by_login USERNAME
twscrape followers USER_ID --limit=20
twscrape following USER_ID --limit=20
twscrape user_tweets USER_ID --limit=20
twscrape user_tweets_and_replies USER_ID --limit=20

The default output is in the console (stdout), one document per line. So it can be redirected to the file.

twscrape search "elon mask lang:es" --limit=20 > data.txt

By default, parsed data is returned. The original tweet responses can be retrieved with --raw

twscrape search "elon mask lang:es" --limit=20 --raw

View a list of commands:

# show all commands
twscrape

# help on specific comand
twscrape search --help

Advanced usage

Get list of connected accounts and their statuses

twscrape accounts

# Output:
# ───────────────────────────────────────────────────────────────────────────────────
# username  logged_in  active  last_used            total_req  error_msg
# ───────────────────────────────────────────────────────────────────────────────────
# user1     True       True    2023-05-20 03:20:40  100        None
# user2     True       True    2023-05-20 03:25:45  120        None
# user3     False      False   None                 120        Login error

Or from code:

pool = AccountsPool()
print(await pool.accounts_info())  # list

Limitations

API rate limits (per account):

  • Search API – 250 req / 15 min
  • GraphQL API – has individual rate limits per operation (in most cases this is 500 req / 15 min)

API data limits:

  • user_tweets & user_tweets_and_replies – can return ~3200 tweets maximum

Models

See also