Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
ritugala authored Sep 22, 2023
0 parents commit f455092
Show file tree
Hide file tree
Showing 35 changed files with 84,709 additions and 0 deletions.
Empty file added Data/kafka_bad.txt
Empty file.
1 change: 1 addition & 0 deletions Data/kafka_mpg.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
time,user id,movie id
922 changes: 922 additions & 0 deletions Data/kafka_rate.csv

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions Data/kafka_recs.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
time,user id,ip,recs,ms
29 changes: 29 additions & 0 deletions DataCleaning/compress_mpg_produce_rating.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from pyspark.sql import SparkSession
from pyspark.sql.functions import max,col,when
spark=SparkSession.builder.appName("mpg1").getOrCreate()

#create dictionary, compress and save new ratings
mpg1=spark.read.csv("/home/team14/Downloads/kafkaPipe/Data/mpg_from_raw_mpg.csv",header=True,inferSchema=True)

result=mpg1.groupBy("user id","movie id").agg(max("val").alias("max_timing"))
d={(row['user id'],row['movie id']):row["max_timing"] for row in result.collect()}
print(d)

d_list=[(key[0],key[1],value) for key,value in d.items()]
d_df=spark.createDataFrame(d_list,["userid","movieid","timing"])
mpg1=mpg1.join(d_df,(col("userid")==col("user id"))&(col("movieid")==col("movie id"))&(col("timing")==col("timing")))

mpg1=mpg1.withColumn("rating",
when(col("timing")<=20,1)
.when(col("timing")<=50,2)
.when(col("timing")<=100,3)
.when(col("timing")<=150,4)
.otherwise(5))

df=mpg1.select("timing","user id","movie id","rating").toPandas()
compression_opts = dict(method='zip',
archive_name='spark_additional_rating_from_mpg_rating_score.csv')
df.to_csv('spark_additional_rating_from_mpg_rating_score.zip', index=False,
compression=compression_opts)


5,281 changes: 5,281 additions & 0 deletions DataCleaning/data preprocess2 copy.ipynb

Large diffs are not rendered by default.

26 changes: 26 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Use an official Python runtime as a parent image
FROM python:3.9

# Set the working directory to /app
WORKDIR /app

# Copy the requirements file into the container and install the necessary packages
COPY requirements.txt ./
#RUN pip install scikit-learn
#RUN pip install scikit-surprise==1.1.1
RUN pip install --no-cache-dir -r requirements.txt

# Copy the Flask app files into the container
COPY movies/ ./movies/
# COPY movies/ /app/movies/

# Set the environment variable for Flask
# ENV FLASK_APP=movies/app.py
ENV FLASK_APP=movies/recommend.py


# Expose the Flask port
EXPOSE 5000

# Run the Flask app
CMD ["flask", "run", "--host=0.0.0.0"]
Binary file added Final slides.pdf
Binary file not shown.
Loading

0 comments on commit f455092

Please sign in to comment.