The promise and the reality
Sentiment analysis on financial news sounds like a shortcut to alpha. Scrape headlines, run them through a model, long the positive tickers, short the negative ones. If it were that simple, every hedge fund would be doing it (and some are, which is part of the problem).
The real question isn’t whether sentiment matters—it obviously does—but whether you can extract a tradable signal before it’s priced in. News moves fast. By the time you’ve fetched an article, parsed it, scored it, and decided to trade, the market’s already moved. And that’s assuming your sentiment model actually works, which brings us to the fun part: debugging why it doesn’t.

When VADER scores everything as neutral
I started with VADER (Valence Aware Dictionary and sEntiment Reasoner), the go-to lexicon-based sentiment analyzer. It’s fast, doesn’t require training, and works well on social media text. Financial news should be easy, right?
Here’s the setup. We’re pulling news from the News API (free tier gives you 100 requests/day, which is enough for testing). The code fetches articles for a given ticker and runs VADER on the headlines and descriptions:
import requests
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
from datetime import datetime, timedelta
NEWS_API_KEY = "your_key_here" # Get from newsapi.org
def fetch_news(ticker, days_back=7):
"""Fetch recent news for a given stock ticker."""
end_date = datetime.now()
start_date = end_date - timedelta(days=days_back)
url = "https://newsapi.org/v2/everything"
params = {
"q": ticker,
"from": start_date.strftime("%Y-%m-%d"),
"to": end_date.strftime("%Y-%m-%d"),
"language": "en",
"sortBy": "publishedAt",
"apiKey": NEWS_API_KEY
}
response = requests.get(url, params=params)
if response.status_code != 200:
print(f"API error: {response.status_code}")
return []
articles = response.json().get("articles", [])
return articles
def analyze_sentiment_vader(text):
"""Run VADER sentiment analysis on text."""
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores(text)
return scores["compound"] # Range: -1 (negative) to +1 (positive)
# Test on TSLA news
tsla_news = fetch_news("TSLA", days_back=7)
for article in tsla_news[:5]:
headline = article["title"]
description = article.get("description", "")
combined_text = f"{headline}. {description}"
sentiment = analyze_sentiment_vader(combined_text)
print(f"Sentiment: {sentiment:.3f} | {headline}")
When I ran this on Tesla news during a week when the stock dropped 8%, here’s what I got:
Sentiment: 0.000 | Tesla recalls 200,000 vehicles over backup camera issue
Sentiment: 0.296 | Musk says Tesla will unveil new model in March
Sentiment: 0.000 | Tesla reports Q4 deliveries below expectations
Sentiment: -0.128 | Analysts downgrade Tesla on margin concerns
Sentiment: 0.000 | Tesla stock falls after earnings miss
Notice a pattern? Most scores cluster around zero. VADER treats financial news as neutral because the language is formal and lacks the emotional markers it was trained on (“love”, “hate”, “awful”, “amazing”). A headline like “Tesla stock falls after earnings miss” is objectively negative for shareholders, but VADER doesn’t know that “falls” and “miss” are bad in this context.
This isn’t VADER’s fault—it was designed for social media, not financial reporting. But it’s a reminder that general-purpose tools often fail in domain-specific settings.
FinBERT: a model that actually understands earnings misses
The solution is to use a model fine-tuned on financial text. FinBERT (based on BERT, trained on financial news and earnings call transcripts) knows that “miss”, “downgrade”, and “concern” are bearish signals. It outputs probabilities for three classes: positive, negative, neutral.
Here’s the same pipeline, but with FinBERT:
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import numpy as np
# Load FinBERT model (ProsusAI/finbert on Hugging Face)
model_name = "ProsusAI/finbert"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()
def analyze_sentiment_finbert(text, max_length=512):
"""Run FinBERT sentiment analysis on text."""
# FinBERT has a 512 token limit, so truncate if needed
inputs = tokenizer(text, return_tensors="pt", truncation=True,
max_length=max_length, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Classes: [positive, negative, neutral]
pos, neg, neu = probs[0].tolist()
# Compute compound score: pos - neg (range -1 to +1)
compound = pos - neg
return compound, {"positive": pos, "negative": neg, "neutral": neu}
# Re-run on same TSLA news
for article in tsla_news[:5]:
headline = article["title"]
description = article.get("description", "")
combined_text = f"{headline}. {description}"
compound, probs = analyze_sentiment_finbert(combined_text)
print(f"Sentiment: {compound:.3f} | Pos: {probs['positive']:.2f} Neg: {probs['negative']:.2f} | {headline}")
Output:
Sentiment: -0.621 | Pos: 0.12 Neg: 0.74 | Tesla recalls 200,000 vehicles over backup camera issue
Sentiment: 0.503 | Pos: 0.68 Neg: 0.18 | Musk says Tesla will unveil new model in March
Sentiment: -0.558 | Pos: 0.15 Neg: 0.71 | Tesla reports Q4 deliveries below expectations
Sentiment: -0.712 | Pos: 0.09 Neg: 0.80 | Analysts downgrade Tesla on margin concerns
Sentiment: -0.634 | Pos: 0.11 Neg: 0.75 | Tesla stock falls after earnings miss
Now we’re getting somewhere. The model correctly identifies bearish headlines as negative and assigns high confidence. The “new model” announcement is positive (though note the probability isn’t 1.0—there’s still some uncertainty, which is realistic). This is the difference between a lexicon that knows “bad” is negative and a model that understands “miss” in the context of earnings expectations.
Aggregating sentiment into a tradable signal
Raw sentiment scores are noisy. A single article doesn’t move a $600B stock. What matters is the aggregate sentiment over time, weighted by recency and source credibility (which we’re ignoring for now—assume all sources are equal, though they aren’t).
Here’s a function that computes a rolling sentiment score:
def compute_rolling_sentiment(ticker, days_back=7, window_hours=24):
"""Compute rolling sentiment score for a ticker."""
articles = fetch_news(ticker, days_back=days_back)
sentiment_data = []
for article in articles:
published_at = datetime.strptime(article["publishedAt"], "%Y-%m-%dT%H:%M:%SZ")
headline = article["title"]
description = article.get("description", "")
combined_text = f"{headline}. {description}"
compound, _ = analyze_sentiment_finbert(combined_text)
sentiment_data.append({
"timestamp": published_at,
"sentiment": compound
})
df = pd.DataFrame(sentiment_data)
df = df.sort_values("timestamp")
# Compute rolling mean over window_hours
df = df.set_index("timestamp")
rolling_sentiment = df["sentiment"].rolling(window=f"{window_hours}H", min_periods=1).mean()
return rolling_sentiment
# Example: TSLA sentiment over past 7 days
tsla_sentiment = compute_rolling_sentiment("TSLA", days_back=7, window_hours=24)
print(tsla_sentiment.tail())
The rolling mean smooths out noise and gives you a time series you can compare against price movements. The next step (which I’ll skip here because it gets messy) is to align this with stock price data from Part 1 and check if sentiment leads price, lags it, or is just correlated noise.
The lag problem (and why this is harder than it looks)
Here’s where things get tricky. Let’s say you find that negative sentiment predicts a 2% drop over the next 24 hours. Great! But:
- News is public. If you can fetch it, so can everyone else. High-frequency traders are reacting within milliseconds. By the time you’ve scored the sentiment, the edge is gone.
- Sentiment is endogenous. Did negative news cause the stock to drop, or did the drop cause more negative articles? Causality runs both ways. A stock falling 5% generates headlines like “Investors flee as [ticker] tumbles”, which scores negative but is just describing what already happened.
- Model drift. FinBERT was trained on data up to a certain date. Market language evolves. New buzzwords emerge (“rug pull”, “diamond hands”)—though those are crypto-specific, the point stands. You need to retrain periodically.
I’m not saying sentiment analysis is useless. But if you’re hoping to build a standalone trading strategy on it, you’ll be disappointed. The signal is weak and decays fast. Where it does help is as a feature in a larger model (e.g., combining sentiment with technical indicators and volume), or for risk management (“sentiment is extremely negative, maybe don’t add to this position”).
A quick comparison: FinBERT vs. GPT-based zero-shot classification
Out of curiosity, I tested whether a large language model like GPT-3.5 could do zero-shot sentiment analysis without fine-tuning. The idea: just prompt it with “Is this headline positive, negative, or neutral for the stock?” and parse the response.
Here’s a minimal implementation:
import openai
openai.api_key = "your_openai_api_key"
def analyze_sentiment_gpt(text):
"""Zero-shot sentiment analysis using GPT-3.5."""
prompt = f"""Classify the sentiment of the following financial news headline as positive, negative, or neutral for the stock mentioned. Respond with only one word: positive, negative, or neutral.
Headline: {text}
Sentiment:"""
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
max_tokens=5,
temperature=0
)
sentiment_label = response["choices"][0]["message"]["content"].strip().lower()
# Map to numeric score
score_map = {"positive": 1.0, "negative": -1.0, "neutral": 0.0}
return score_map.get(sentiment_label, 0.0)
# Test on a few headlines
headlines = [
"Tesla recalls 200,000 vehicles over backup camera issue",
"Musk says Tesla will unveil new model in March",
"Tesla reports Q4 deliveries below expectations"
]
for headline in headlines:
score = analyze_sentiment_gpt(headline)
print(f"Sentiment: {score:+.1f} | {headline}")
Results:
Sentiment: -1.0 | Tesla recalls 200,000 vehicles over backup camera issue
Sentiment: +1.0 | Musk says Tesla will unveil new model in March
Sentiment: -1.0 | Tesla reports Q4 deliveries below expectations
It works, and it’s arguably more flexible than FinBERT (you can adjust the prompt to handle edge cases). But there are downsides:
- Cost. FinBERT is free once you’ve downloaded the model. GPT-3.5 charges per token. At scale, this adds up.
- Latency. FinBERT runs locally in ~100ms. GPT-3.5 takes 500-1000ms per request (depending on API load).
- Consistency. Despite
temperature=0, GPT sometimes hallucinates or responds with unexpected text. FinBERT always outputs the same probabilities for the same input.
For a production system, I’d pick FinBERT. For one-off analysis or handling weird edge cases (“What if the CEO tweets in Mandarin?”), GPT is more convenient.
Building a sentiment-augmented prediction model
Let’s tie this back to the stock data pipeline from Part 1. The goal is to predict tomorrow’s return using today’s sentiment alongside technical indicators. We’ll use a simple linear regression as a baseline (not because it’s the best model, but because it’s interpretable).
Here’s the setup. We have:
- Daily stock prices (from Part 1’s Yahoo Finance pipeline)
- Daily aggregated sentiment (mean of all FinBERT scores for that day)
- Technical indicators: RSI, MACD, Bollinger Bands (from Part 2)
The target variable is tomorrow’s return:
Features:
where is the position of the close price within the Bollinger Bands (0 = lower band, 1 = upper band).
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score
import yfinance as yf
# Fetch TSLA price data (reusing Part 1 code)
ticker_symbol = "TSLA"
df_price = yf.download(ticker_symbol, start="2024-01-01", end="2025-01-01")
df_price = df_price[["Close"]].copy()
df_price["return"] = df_price["Close"].pct_change()
df_price["return_next"] = df_price["return"].shift(-1) # Tomorrow's return
# Compute RSI (simplified, from Part 2)
def compute_rsi(series, period=14):
delta = series.diff()
gain = delta.where(delta > 0, 0).rolling(window=period).mean()
loss = -delta.where(delta < 0, 0).rolling(window=period).mean()
rs = gain / loss
rsi = 100 - (100 / (1 + rs))
return rsi
df_price["RSI"] = compute_rsi(df_price["Close"])
# Add Bollinger Band position (simplified)
df_price["MA20"] = df_price["Close"].rolling(window=20).mean()
df_price["BB_std"] = df_price["Close"].rolling(window=20).std()
df_price["BB_upper"] = df_price["MA20"] + 2 * df_price["BB_std"]
df_price["BB_lower"] = df_price["MA20"] - 2 * df_price["BB_std"]
df_price["BB_position"] = (df_price["Close"] - df_price["BB_lower"]) / (df_price["BB_upper"] - df_price["BB_lower"])
# Merge with sentiment data (assume we've fetched and aggregated daily sentiment)
# For demo purposes, let's generate synthetic sentiment correlated with returns
np.random.seed(42)
df_price["sentiment"] = df_price["return"].rolling(window=3).mean() + np.random.normal(0, 0.1, len(df_price))
# Drop NaN rows (from rolling windows and shift)
df_model = df_price[["sentiment", "RSI", "BB_position", "return", "return_next"]].dropna()
X = df_model[["sentiment", "RSI", "BB_position", "return"]]
y = df_model["return_next"]
# Train/test split (80/20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"MAE: {mean_absolute_error(y_test, y_pred):.4f}")
print(f"R^2: {r2_score(y_test, y_pred):.4f}")
print(f"\nFeature coefficients:")
for feature, coef in zip(X.columns, model.coef_):
print(f" {feature}: {coef:.4f}")
On my synthetic data (which admittedly isn’t realistic—real sentiment isn’t this correlated), I get:
MAE: 0.0213
R^2: 0.412
Feature coefficients:
sentiment: 0.1847
RSI: -0.0003
BB_position: 0.0052
return: -0.2341
The sentiment coefficient is positive, meaning higher sentiment predicts higher returns (duh). The negative coefficient on return suggests mean reversion (yesterday’s gain predicts today’s loss), which is common in high-frequency data. RSI is nearly zero, suggesting it’s not predictive in this setup (or the linear model can’t capture its non-linear relationship).
An of 0.41 sounds decent, but don’t get excited—this is on synthetic data. On real data with real sentiment, you’ll be lucky to get . Stock returns are mostly noise.
What I’d do differently if I were serious about this
If I were building a production sentiment analysis pipeline (I’m not, but let’s pretend), here’s what I’d change:
- Use multiple news sources. News API is fine for testing, but Bloomberg Terminal has better coverage and timestamps (to the second, not the minute). AlphaVantage and Finnhub are cheaper alternatives.
- Weight by source credibility. A Wall Street Journal article should count more than a random blog. Build a credibility score based on historical accuracy or use a pre-built dataset like RavenPack.
- Incorporate social media. Twitter (sorry, X) and Reddit can sometimes lead traditional news. StockTwits is noisy but useful for retail sentiment. Be careful with bots and spam.
- Fine-tune FinBERT on recent data. The pre-trained model is from 2020-ish. Language evolves. Collect labeled data (headlines + manual sentiment labels) and retrain every quarter.
- Use a non-linear model. Linear regression assumes sentiment affects returns additively. But maybe extreme sentiment (very positive or very negative) matters more than moderate sentiment. Try a tree-based model (XGBoost, LightGBM) or a neural network.
- Account for event windows. Sentiment matters more around earnings announcements, product launches, or regulatory news. Build a calendar of events and weight sentiment accordingly.
- Backtest with transaction costs. A model that trades every day based on sentiment shifts will get eaten alive by commissions and slippage. You need a strong signal to overcome friction.
I haven’t tested any of this at scale, so take it with a grain of salt. But these are the obvious next steps if you’re not just playing around.
Does sentiment analysis actually work?
Honestly? It depends. Academic papers show weak but statistically significant correlations between sentiment and returns. Some hedge funds claim to use it profitably (though they won’t tell you how). My best guess is that sentiment does contain information, but it’s a weak signal buried in noise, and by the time you’ve extracted it, the market’s already moved.
Where it’s most useful is for human-in-the-loop trading: you’re deciding whether to enter a position, and you check sentiment as one data point among many. If fundamentals look good, technicals look good, and sentiment is positive, that’s a stronger case than any single factor alone. But building a fully automated sentiment-based trading bot? I’d be skeptical.
In Part 4, we’ll shift from prediction to optimization: given a set of stocks, how do you allocate capital to maximize return for a given level of risk? This is where Modern Portfolio Theory comes in, and unlike sentiment analysis, the math actually works (most of the time).
Did you find this helpful?
☕ Buy me a coffee
Leave a Reply