twscrape/readme.md

# twscrape

<div align="center">
  <a href="https://pypi.org/project/twscrape">
    <img src="https://badgen.net/pypi/v/twscrape" alt="version" />
  </a>
  <a href="https://pypi.org/project/twscrape">
    <img src="https://badgen.net/pypi/python/twscrape" alt="python versions" />
  </a>
  <a href="https://github.com/vladkens/twscrape/actions">
    <img src="https://github.com/vladkens/twscrape/workflows/test/badge.svg" alt="test status" />
  </a>
  <!-- <a href="https://npmjs.org/package/array-utils-ts">
    <img src="https://badgen.net/npm/dm/array-utils-ts" alt="downloads" />
  </a> -->
  <a href="https://github.com/vladkens/twscrape/blob/main/LICENSE">
    <img src="https://badgen.net/github/license/vladkens/twscrape" alt="license" />
  </a>
</div>

Twitter GraphQL and Search API implementation with [SNScrape](https://github.com/JustAnotherArchivist/snscrape) data models.

## Install

```bash
pip install twscrape
```
Or development version:
```bash
pip install git+https://github.com/vladkens/twscrape.git
```

## Features
- Support both Search & GraphQL Twitter API
- Async/Await functions (can run multiple scrapers in parallel at the same time)
- Login flow (with receiving verification code from email)
- Saving/restoring account sessions
- Raw Twitter API responses & SNScrape models
- Automatic account switching to smooth Twitter API rate limits

## Usage

```python
import asyncio
from twscrape import AccountsPool, API, gather
from twscrape.logger import set_log_level

async def main():
    pool = AccountsPool()  # or AccountsPool("path-to.db") - default is `accounts.db`
    await pool.add_account("user1", "pass1", "user1@example.com", "email_pass1")
    await pool.add_account("user2", "pass2", "user2@example.com", "email_pass2")

    # log in to all new accounts
    await pool.login_all()

    api = API(pool)

    # search api (latest tab)
    await gather(api.search("elon musk", limit=20))  # list[Tweet]

    # graphql api
    tweet_id, user_id, user_login = 20, 2244994945, "twitterdev"

    await api.tweet_details(tweet_id)  # Tweet
    await gather(api.retweeters(tweet_id, limit=20))  # list[User]
    await gather(api.favoriters(tweet_id, limit=20))  # list[User]

    await api.user_by_id(user_id)  # User
    await api.user_by_login(user_login)  # User
    await gather(api.followers(user_id, limit=20))  # list[User]
    await gather(api.following(user_id, limit=20))  # list[User]
    await gather(api.user_tweets(user_id, limit=20))  # list[Tweet]
    await gather(api.user_tweets_and_replies(user_id, limit=20))  # list[Tweet]

    # note 1: limit is optional, default is -1 (no limit)
    # note 2: all methods have `raw` version e.g.:

    async for tweet in api.search("elon musk"):
        print(tweet.id, tweet.user.username, tweet.rawContent)  # tweet is `Tweet` object

    async for rep in api.search_raw("elon musk"):
        print(rep.status_code, rep.json())  # rep is `httpx.Response` object

    # change log level, default info
    set_log_level("DEBUG")

    # Tweet & User model can be converted to regular dict or json, e.g.:
    doc = await api.user_by_id(user_id)  # User
    doc.dict()  # -> python dict
    doc.json()  # -> json string

if __name__ == "__main__":
    asyncio.run(main())
```

## CLI

You can also use the CLI to make requests (before that you need to log in to some accounts through the programming interface).

```sh
twscrape search "QUERY" --limit=20
twscrape tweet_details TWEET_ID
twscrape retweeters TWEET_ID --limit=20
twscrape favoriters TWEET_ID --limit=20
twscrape user_by_id USER_ID
twscrape user_by_login USERNAME
twscrape followers USER_ID --limit=20
twscrape following USER_ID --limit=20
twscrape user_tweets USER_ID --limit=20
twscrape user_tweets_and_replies USER_ID --limit=20
```

The default output is in the console (stdout), one document per line. So it can be redirected to the file.

```sh
twscrape search "elon mask lang:es" --limit=20 > data.txt
```

By default, parsed data is returned. The original tweet responses can be retrieved with `--raw`

```sh
twscrape search "elon mask lang:es" --limit=20 --raw
```

View a list of commands:

```sh
# show all commands
twscrape

# help on specific comand
twscrape search --help
```

## Advanced usage

### Get list of connected accounts and their statuses

```sh
twscrape accounts

# Output:
# ───────────────────────────────────────────────────────────────────────────────────
# username  logged_in  active  last_used            total_req  error_msg
# ───────────────────────────────────────────────────────────────────────────────────
# user1     True       True    2023-05-20 03:20:40  100        None
# user2     True       True    2023-05-20 03:25:45  120        None
# user3     False      False   None                 120        Login error
```

Or from code:
```python
pool = AccountsPool()
print(await pool.accounts_info())  # list
```

## Limitations

API rate limits (per account):
- Search API – 250 req / 15 min
- GraphQL API – has individual rate limits per operation (in most cases this is 500 req / 15 min)

API data limits:
- `user_tweets` & `user_tweets_and_replies` – can return ~3200 tweets maximum

## Models
- [Tweet](https://github.com/vladkens/twscrape/blob/main/twscrape/models.py#:~:text=class%20Tweet)
- [User](https://github.com/vladkens/twscrape/blob/main/twscrape/models.py#:~:text=class%20User)

## See also
- [twitter-advanced-search](https://github.com/igorbrigadir/twitter-advanced-search) – guide on search filters
- [twitter-api-client](https://github.com/trevorhobenshield/twitter-api-client) – Implementation of Twitter's v1, v2, and GraphQL APIs
- [snscrape](https://github.com/JustAnotherArchivist/snscrape) – is a scraper for social networking services (SNS)
- [twint](https://github.com/twintproject/twint) – Twitter Intelligence Tool