-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathzero_dl_up.qmd
148 lines (115 loc) · 4.31 KB
/
zero_dl_up.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: "EDA on 0/0 BSL"
date: last-modified
format:
html:
code-fold: true
engine: knitr
---
TODO: update the SQL query
```{r}
#| label: utility functions
source("R/table_with_options.R")
# very lazish function, col should be a string
agg_count <- function(dat, col) {
agg <- aggregate(cbind(count = dat$count),
list(name_col = dat[[col]]),
sum)
colnames(agg) <- c(col, "count")
return(agg)
}
```
The goals of this page is storing a quick EDA about broadband services locations with 0 MBps uploads and 0 MBps downloads. To be concise we are going to call them 0/0 speeds.
We have counted every services that have been declared with 0/0 speeds and associated with their States, ISP and technology. To clarify that does not mean a location have 0/0 speeds only but that one "ISP x technology" is provided with this kind of service in this location.
The data used to provide most of the analysis was done with this 2 SQL queries. They were saved and stored in `data/`
```{sql}
#| label: SQL query to get the data
#| eval: false
SELECT
state_abbr,
brand_name,
technology,
count(brand_name)
FROM
staging.june23
WHERE
(max_advertised_download_speed = 0 AND
max_advertised_upload_speed = 0) = true
GROUP BY brand_name, state_abbr, technology;
-- first get all 0/0 then get all the non 0/0
SELECT
state_abbr,
brand_name,
technology,
count(brand_name)
FROM
staging.june23
WHERE
(max_advertised_download_speed = 0 AND
max_advertised_upload_speed = 0) = false
GROUP BY brand_name, state_abbr, technology;
```
```{r}
#| label: Load data
zero_loc <- read.csv("data/zero_dl_up.csv")
not_zero <- read.csv("data/not_zero_dl.csv")
```
## Summary by technologies:
```{r}
#| label: 0/0 by tecnology
agg <- agg_count(zero_loc, "technology")
agg_not <- agg_count(not_zero, "technology")
technology <- merge(agg, agg_not, by.x = "technology",
by.y = "technology", all.x = TRUE, all.y = TRUE)
colnames(technology) <- c("technology", "cnt_zero_dl", "cnt_non_zero")
technology$rate_zero <- round(technology$cnt_zero_dl /
(technology$cnt_zero_dl + technology$cnt_non_zero), 4)
table_with_options(technology)
```
</br>
We do not mind too much `70` (Unlicensed Terrestrial Fixed Wireless) because we are filtering it out but we are keeping `71` (Licensed Terrestrial Fixed Wireless) , `72` (Licensed-by-Rule Terrestrial Fixed Wireless)and `10` (Copper Wire).
To take that into account I will filter out Unlicensed Terrestrial Fixed Wireless for the rest of this document. I also filtered out 60 and 61 to be consistant with our pipelines.
## Summary by ISP
```{r}
#| label: ISP with 0/0
filter_sat <- c(60, 61, 70)
zero_loc <- zero_loc[which(! zero_loc$technology %in% filter_sat), ]
not_zero <- not_zero[which(! not_zero$technology %in% filter_sat), ]
agg <- agg_count(zero_loc, "brand_name")
agg_not <- agg_count(not_zero, "brand_name")
rate_zero <- merge(agg, agg_not,
by.x = "brand_name", by.y = "brand_name"
, all.x = TRUE)
colnames(rate_zero) <- c("brand_name", "cnt_zero_dl", "cnt_non_zero")
rate_zero$rate_zero <- round(rate_zero$cnt_zero_dl /
(rate_zero$cnt_zero_dl + rate_zero$cnt_non_zero),
4)
table_with_options(rate_zero[
order(rate_zero$cnt_zero_dl, decreasing = TRUE),])
```
::: {.column-margin}
**402** ISPs are declaring services with 0/0 MBips (We have 2902 ISPs registered in FCC NBM)
:::
## Sumamry by States
```{r}
#| label: States
#| fig-column: margin
st_agg_zero <- agg_count(zero_loc, "state_abbr")
st_agg_not <- agg_count(not_zero, "state_abbr")
st_agg <- merge(st_agg_zero, st_agg_not,
by.x = "state_abbr", by.y = "state_abbr",
all.x = TRUE, all.y = TRUE)
colnames(st_agg) <- c("ST", "cnt_zero_dl", "cnt_non_zero")
st_agg$rate_zero <- round(st_agg$cnt_zero_dl /
(st_agg$cnt_zero_dl + st_agg$cnt_non_zero),
4)
library(ggplot2)
ggplot(st_agg[!is.na(st_agg$rate_zero),], aes(rate_zero)) +
geom_boxplot(orientation = "y",
fill='#A4A4A4', color="black") +
coord_flip() +
theme_bw()
table_with_options(st_agg[order(st_agg$rate_zero, decreasing = TRUE), ])
```
</br>
One point of concern is that services with 0/0 speeds could be generated for various reasons. One could be that some technology offer very low downloads/uploads and that is rounding to `0` an other could be that the location is not actually deserved but the ISP think it can do it.