twscrape/readme.md
2023-05-28 19:08:47 +03:00

175 строки
6.0 KiB
Markdown

# twscrape
<div align="center">
<a href="https://pypi.org/project/twscrape">
<img src="https://badgen.net/pypi/v/twscrape" alt="version" />
</a>
<a href="https://pypi.org/project/twscrape">
<img src="https://badgen.net/pypi/python/twscrape" alt="python versions" />
</a>
<a href="https://github.com/vladkens/twscrape/actions">
<img src="https://github.com/vladkens/twscrape/workflows/test/badge.svg" alt="test status" />
</a>
<!-- <a href="https://npmjs.org/package/array-utils-ts">
<img src="https://badgen.net/npm/dm/array-utils-ts" alt="downloads" />
</a> -->
<a href="https://github.com/vladkens/twscrape/blob/main/LICENSE">
<img src="https://badgen.net/github/license/vladkens/twscrape" alt="license" />
</a>
</div>
Twitter GraphQL and Search API implementation with [SNScrape](https://github.com/JustAnotherArchivist/snscrape) data models.
## Install
```bash
pip install twscrape
```
Or development version:
```bash
pip install git+https://github.com/vladkens/twscrape.git
```
## Features
- Support both Search & GraphQL Twitter API
- Async/Await functions (can run multiple scrapers in parallel at the same time)
- Login flow (with receiving verification code from email)
- Saving/restoring account sessions
- Raw Twitter API responses & SNScrape models
- Automatic account switching to smooth Twitter API rate limits
## Usage
```python
import asyncio
from twscrape import AccountsPool, API, gather
from twscrape.logger import set_log_level
async def main():
pool = AccountsPool() # or AccountsPool("path-to.db") - default is `accounts.db`
await pool.add_account("user1", "pass1", "user1@example.com", "email_pass1")
await pool.add_account("user2", "pass2", "user2@example.com", "email_pass2")
# log in to all new accounts
await pool.login_all()
api = API(pool)
# search api (latest tab)
await gather(api.search("elon musk", limit=20)) # list[Tweet]
# graphql api
tweet_id, user_id, user_login = 20, 2244994945, "twitterdev"
await api.tweet_details(tweet_id) # Tweet
await gather(api.retweeters(tweet_id, limit=20)) # list[User]
await gather(api.favoriters(tweet_id, limit=20)) # list[User]
await api.user_by_id(user_id) # User
await api.user_by_login(user_login) # User
await gather(api.followers(user_id, limit=20)) # list[User]
await gather(api.following(user_id, limit=20)) # list[User]
await gather(api.user_tweets(user_id, limit=20)) # list[Tweet]
await gather(api.user_tweets_and_replies(user_id, limit=20)) # list[Tweet]
# note 1: limit is optional, default is -1 (no limit)
# note 2: all methods have `raw` version e.g.:
async for tweet in api.search("elon musk"):
print(tweet.id, tweet.user.username, tweet.rawContent) # tweet is `Tweet` object
async for rep in api.search_raw("elon musk"):
print(rep.status_code, rep.json()) # rep is `httpx.Response` object
# change log level, default info
set_log_level("DEBUG")
# Tweet & User model can be converted to regular dict or json, e.g.:
doc = await api.user_by_id(user_id) # User
doc.dict() # -> python dict
doc.json() # -> json string
if __name__ == "__main__":
asyncio.run(main())
```
## CLI
You can also use the CLI to make requests (before that you need to log in to some accounts through the programming interface).
```sh
twscrape search "QUERY" --limit=20
twscrape tweet_details TWEET_ID
twscrape retweeters TWEET_ID --limit=20
twscrape favoriters TWEET_ID --limit=20
twscrape user_by_id USER_ID
twscrape user_by_login USERNAME
twscrape followers USER_ID --limit=20
twscrape following USER_ID --limit=20
twscrape user_tweets USER_ID --limit=20
twscrape user_tweets_and_replies USER_ID --limit=20
```
The default output is in the console (stdout), one document per line. So it can be redirected to the file.
```sh
twscrape search "elon mask lang:es" --limit=20 > data.txt
```
By default, parsed data is returned. The original tweet responses can be retrieved with `--raw`
```sh
twscrape search "elon mask lang:es" --limit=20 --raw
```
View a list of commands:
```sh
# show all commands
twscrape
# help on specific comand
twscrape search --help
```
## Advanced usage
### Get list of connected accounts and their statuses
```sh
twscrape accounts
# Output:
# ───────────────────────────────────────────────────────────────────────────────────
# username logged_in active last_used total_req error_msg
# ───────────────────────────────────────────────────────────────────────────────────
# user1 True True 2023-05-20 03:20:40 100 None
# user2 True True 2023-05-20 03:25:45 120 None
# user3 False False None 120 Login error
```
Or from code:
```python
pool = AccountsPool()
print(await pool.accounts_info()) # list
```
## Limitations
API rate limits (per account):
- Search API – 250 req / 15 min
- GraphQL API – has individual rate limits per operation (in most cases this is 500 req / 15 min)
API data limits:
- `user_tweets` & `user_tweets_and_replies` – can return ~3200 tweets maximum
## Models
- [Tweet](https://github.com/vladkens/twscrape/blob/main/twscrape/models.py#:~:text=class%20Tweet)
- [User](https://github.com/vladkens/twscrape/blob/main/twscrape/models.py#:~:text=class%20User)
## See also
- [twitter-advanced-search](https://github.com/igorbrigadir/twitter-advanced-search) – guide on search filters
- [twitter-api-client](https://github.com/trevorhobenshield/twitter-api-client) – Implementation of Twitter's v1, v2, and GraphQL APIs
- [snscrape](https://github.com/JustAnotherArchivist/snscrape) – is a scraper for social networking services (SNS)
- [twint](https://github.com/twintproject/twint) – Twitter Intelligence Tool