CMP7202 Web Social Media Analytics and Visualizations assignment brief BIRMINGHAM CITY UNIVERSITY FACULTY OF COMPUTING ENGINEERING AND THE BUILT ENVIRONMENT COURSEWORK ASSIGNMENT BRIEF
CMP7202 Web Social Media Analytics and Visualization
Coursework Assignment Brief
Assessment – Postgraduate
Academic Year 2025-26
Module Title: Web Social Media Analytics and Visualizations Module Code: CMP7202 Assessment Title: Report and code Assessment Type: CWRK Weight: 65 % College: College of Computing Module coordinator: OGERTA ELEZAJ Hand in deadline date: Thursday 15th May 2026 03:00 pm Support available for students required to submit a re-assessment: Timetabled revisions sessions will be arranged for the period immediately preceding the hand in date. NOTE: At the first assessment attempt, the full range of marks is available.
At the re-assessment attempt the mark is capped and the maximum mark that can be achieved is 50%.
Assessment Summary This assessment is a group assessment undertaken in pairs (groups of two students).
Each group is required to submit a single final project codebase and a joint written report.
Students must collaboratively develop an end-to-end project in social media analytics, starting from data collection and progressing through data preprocessing, analysis, visualization, and interpretation, with the aim of extracting meaningful insights and driving evidence-based conclusions.
The project covers the full social media analytics lifecycle, closely mimicking an industry-style project setup, where team-based collaboration, shared responsibility, and integration of technical components are essential.
IMPORTANT STATEMENTS Standard Postgraduate Regulations
Your studies will be governed by the BCU Academic Regulations on Assessment, Progression and Awards. Copies of regulations can be found at https://www.bcu.ac.uk/student-info/student-contract
Cheating and Plagiarism
Both cheating and plagiarism are totally unacceptable and the University maintains a strict policy against them. It is YOUR responsibility to be aware of this policy and to act accordingly. Please refer to the Academic Registry Guidance at https://icity.bcu.ac.uk/Academic-Services/Information-forStudents/Assessment/Avoiding-Allegations-of-Cheating
The basic principles are:
Don’t pass off anyone else’s work as your own, including work from “essay banks”. This is plagiarism and is viewed extremely seriously by the University. Don’t submit a piece of work in whole or in part that has already been submitted for assessment elsewhere. This is called duplication and, like plagiarism, is viewed extremely seriously by the University. Always acknowledge all of the sources that you have used in your coursework assignment or project. If you are using the exact words of another person, always put them in quotation marks. Check that you know whether the coursework is to be produced individually or whether you can work with others. If you are doing group work, be sure about what you are supposed to do on your own. Never make up or falsify data to prove your point. Never allow others to copy your work. Never lend disks, memory sticks or copies of your coursework to any other student in the University; this may lead you being accused of collusion. AI tools cannot be used to write assignments as these have to be your own work. Please refer to: FAQ link By submitting coursework, either physically or electronically, you are confirming that it is your own work (or, in the case of a group submission, that it is the result of joint work undertaken by members of the group that you represent) and that you have read and understand the University’s guidance on plagiarism and cheating.
You should be aware that coursework may be submitted to an electronic detection system in order to help ascertain if any plagiarised material is present. You may check your own work prior to submission using Turnitin at the Formative Moodle Site. If you have queries about what constitutes plagiarism, please speak to your module tutor or the Centre for Academic Success.
Electronic Submission of Work
It is your responsibility to ensure that work submitted in electronic format can be opened on a faculty computer and to check that any electronic submissions have been successfully uploaded. If it cannot be opened it will not be marked. Any required file formats will be specified in the assignment brief and failure to comply with these submission requirements will result in work not being marked. You must retain a copy of all electronic work you have submitted and re-submit if requested.
Re-Assessment Details: Title: Assessment 2- Final project
Style: Coursework and academic report
Rationale: This assessment provides a unique opportunity for the students to develop an end-to-end project in social media analytics, starting from data collection and aiming to extract insights and drive conclusions. The project handles social media analytics lifecycle which mimics industry project’s setup.
Description: Assessment 2 is a group assessment undertaken in pairs (groups of two students) which tests students’ ability to analyze social media data using Natural Language Processing (NLP) techniques and statistical methods.
The deliveries for this assessment:
Final project code and report for both parts A and B. (65%) Challenge Title
Understanding Sentiment, Topics, and Influence in Construction Discourse
Background and Motivation
Online social media platforms such as Reddit have become important spaces where professionals, practitioners, and the public discuss complex socio-technical issues. In the construction domain, Reddit users frequently discuss topics such as cost overruns, delays, regulation, sustainability, and the adoption of AI and digital technologies.
However, these discussions are:
Learning Outcomes to be Assessed:
Utilize various Application Programming Interface (API) services to collect data from different social media sources. Conduct basic social network and statistical analysis to render network visualisations and to understand network characteristics. Derive insights and discover patterns in structured social media data using methods such as correlation, regression, and classification. Extrapolate and analyse trends in unstructured-text data using natural language processing methods such as sentiment analysis and topic classification Large-scale and unstructured Emotionally charged Socially influenced, where certain users shape narratives more than others The challenge is to determine what people are talking about, how they feel, and who influences the conversation, using computational social media analytics techniques.
Solving this challenge helps organisations and researchers:
Understand public and professional sentiment Identify emerging themes and concerns Detect influential voices and communities Reflect on the societal implications of digital discourse Challenge Questions Students should frame their analysis around the following core challenge questions:
What are the dominant topics discussed in construction-related Reddit communities? What sentiments and emotions are associated with these topics, and how do they vary across time and communities? Who are the most influential users shaping these discussions, and how are communities structured? (Advanced / distinction level) How do LLM-based interpretations compare with traditional statistical and NLP-based findings? This assessment adopts a Data Study Group-inspired approach, in which ethical framing, responsible data use, and reflective practice are treated as integral components of the analytical process rather than as post-hoc considerations.
Dataset Data Provided
Students will work with a Reddit dataset containing approximately 10,000 comments, collected from construction-related subreddits.
The dataset includes (but is not limited to):
body – comment text author – Reddit username created_utc – timestamp subreddit – community name score – upvotes/downvotes comment_id, parent_id, link_id – discussion structure Data is collected programmatically using a provided Reddit API script, demonstrating ethical and reproducible data access.
Data Considerations
Data is publicly available Usernames must not be deanonymized Ethical use and limitations of social media data must be discussed The Challenge Tasks (Assessment Structure) The challenge is divided into two parts, aligned with CMP7202 learning outcomes.
Part A: Statistical analysis Challenge Task A1: Data Collection and Exploration
Students must:
Run the provided Reddit data collection script Load and inspect the dataset Perform descriptive statistical analysis (volume, activity, scores, time trends) Challenge Task A2: Network and Graph Analysis
Students must:
Construct a user interaction graph Apply centrality measures to identify influential users Detect and visualise communities Interpret what network structure reveals about discourse dynamics Challenge focus:
Influence is not measured by opinion alone, but by position in the network.
Students must support their statistical and network analysis with appropriate visualizations, such as temporal plots, distributions, and network graphs. All key analytical findings in Part A must be visually represented and clearly explained.
Part B – Understanding Meaning and Emotion Challenge Task B1: Sentiment Analysis
Students must:
Apply sentiment analysis to Reddit comments Analyse sentiment distribution: overall by subreddit over time Discuss limitations of automated sentiment detection Challenge Task B2: Topic Modelling
Students must:
Apply topic modelling (LDA, NMF, or BERTopic) Identify and label dominant themes Analyse how topics evolve and co-exist Challenge focus:
Topics are not fixed — they emerge, shift, and overlap.
Challenge Task B3: LLM-Assisted Interpretation (Advanced) Students may use LLMs to:
Label topics Summarise clusters of discussion Reflect on framing, stance, or narrative patterns LLM use must be:
Clearly documented Critically evaluated Used for analysis, not report writing All analyses in Part B must be accompanied by clear and interpretable visualisations, including but not limited to sentiment distributions, topic frequency plots, topic evolution over time, and thematic representations. Visualisations should be used to support interpretation and discussion, not merely presented without explanation.
Expected output:
Students must submit a single PDF report of a maximum of 2,000 words that presents the challenge background, methodology, results, discussion, limitations, and conclusion. In addition, students must submit an executable codebase or notebook that demonstrates the data collection process, analytical methods applied, and the visualisations used to support the findings.
For advice on writing style, referencing and academic skills, please make use of the Centre for Academic Success: Centre for Academic Success – student support | Birmingham City University (bcu.ac.uk)
Workload: 30 hours for 2000 words report and a presentation of 1000 words.
Transferable skills: The student will benefit from doing these assessments in developing both technical and transferable skills, which include:
Problem solving Programming skills Analytical skills Time management Project management Verbal and written communication skills Marking Criteria:
Table of Assessment Criteria and Associated Grading Criteria
Learning Outcomes 1
Utilize various Application Programming Interface (API) services to collect data from different social media sources.
2
Conduct basic social network and statistical analysis to render network visualisations and to understand network characteristics.
3
Derive insights and discover patterns in structured social media data using methods such as correlation, regression, and classification.
4
Extrapolate and analyse trends in unstructured-text data using natural language processing methods such as sentiment analysis and topic classification
Assessment Criteria Weighting: 20% 20% 30% 30% 0 – 29% F No social media source has been utilised for data collection.
No code has been provided.
No data cleaning has been attempted.
The report shows no attempt to explain the collected data.
No code is provided for network analysis.
No graphs are provided for network visualisation.
The report has no discussion of network analysis.
No attempts to apply statistical methods to extrapolate a meaningful understanding of the data.
Non-academic sources. No respect of time limits.
No or a superficial interpretation of the results.
No examination of social media analytics.
A superficial interpretation of the results.
No clear solution is provided for the underlying trends and topics.
No code has been provided.
No report submitted, or report shows little understanding of social data mining and the interpretation of the results. No articulation is provided to underpin the analysis.
30 – 39% E Unsuccessful or no attempts for data collection from various social media sources.
Incomplete or no running code has been provided.
No or inadequate data processing has been attempted.
The report shows a superficial interpretation of the collected data.
Incomplete code is provided for network analysis.
A very few incorrect/Misleading graphs provided for network visualisation.
The report has a very limited discussion of network analysis.
The choice of networks and methods to analyse is very poor.
Very poor attempts to apply statistical methods to extrapolate a meaningful understanding of the data. Very Poor coverage of social media analytics.
A poor interpretation of the results.
Inaccurate or vague solution provided for the underlying trends and topics.
Less than satisfactory report with incomplete or insufficient explanation of the adopted method(s) or lack of interpretation of the results.
Very poor articulation to underpin the analysis and insights.
40 – 49% D Poor attempts for data collection from various social media sources.
Incomplete and poor-quality code has been provided.
Incomplete data cleaning has been attempted.
The report shows limited and mostly incorrect interpretation of the collected data.
Code has some attempts for network analysis. However, wrong or incomplete output is presented.
A very few graphs are represented for network visualisation, which is mostly incorrect/misleading.
The report has some discussions of network analysis. However, unclear and/or inaccurate.
The choice of networks and analysis methods is poor but can be justified.
Poor attempts to apply statistical methods to extrapolate a meaningful understanding of the data.
Few/poor academic sources.
Results described with some lack of analysis, and it contains errors.
A superficial interpretation of the results.
A non-efficient solution provided for the underlying trends and topics.
Code has been provided but is incomplete and not clear with no comments and mostly no output.
An adequate report with a good explanation of some of the adopted method(s) and fair interpretation of some of the results. Poor articulation to underpin the analysis and insights.
50 – 59% C Satisfactory but incomplete attempts have been demonstrated for data collection from various social media sources.
The code has been provided but incomplete and poorly commented.
Valid data cleaning has been provided but with some errors.
The report shows satisfactory but incomplete interpretation of the data collected.
Code has some attempts for network analysis. However, the output is incomplete/not clear.
A good attempt to generate graphs and measures for network visualisation. However, more graphs and/or better visualisation methods are expected.
The report has adequate discussions of network analysis. However, more details of the insights are required.
The choice of networks and analysis methods is adequate, however better choices can be made.
Satisfactory attempts to apply statistical methods to extrapolate a meaningful understanding of the data.
Satisfactory coverage of social media and web analytics.
Results described but it may lack some analysis.
A non-efficient but valid solution provided for the underlying social media analytics.
Code has been provided but is incomplete and poorly commented.
The report may have elements of good explanation of the adopted methods or the interpretation of the results.
Satisfactory articulation to underpin the analysis and insights.
60 – 69% B A successful but incomplete collection of data from various social media sources.
Code has been provided which is mostly complete but sometimes vague or inefficient.
Good data cleaning has been provided but can be improved.
The report shows a satisfactory interpretation of the data collected.
Code has very good attempts for network analysis. However, the output is missing some important information.
Several graphs and measures for network visualisation have been generated. However, some graphs are very complex to understand or incorrectly visualised.
The report has adequate discussions and comparison of network characteristics. However, some insights are incorrect and/or superficial.
The choice of networks and analysis methods is good, however better choices can be made.
Good attempts to apply statistical methods to extrapolate a meaningful understanding of the data.
Satisfactory academic sources are supporting insights.
Good coverage of social media analytics.
Sufficient interpretation of the results.
A valid solution provided for the underlying social media analytics.
Code has been provided which is mostly complete with satisfactory comments and structure.
The report has elements of very good explanation of some of the adopted methods or the interpretation of the results.
Good articulation to underpin the analysis and insights.
70 – 79% A A successful collection of data from various social media sources with some minor errors.
Code has been provided which is complete but improvable. The code is clearly commented.
Good data cleaning has been provided.
The report shows a good interpretation of the data collected.
Code has complete elements for network analysis. However, the code can be improved for better efficiency. Some errors are produced.
Graphs and measures for network visualisation have been presented. However, a few graphs that are very vague and/or not well discussed.
The report has very good discussions and comparison of network characteristics. However, some more insights are expected.
The choice of networks and analysis methods is very good; however, comparison is not comprehensive.
Very good attempts to apply statistical methods to extrapolate a meaningful understanding of the data.
Very good coverage of social media analytics.
Sufficient interpretation of the results.
Valid solution is provided for the underlying knowledge discovery problem.
Code has been provided and is mostly complete with clear comments and structure.
Very good report with a good explanation of most of the adopted method(s).
Excellent interpretation of the results.
Very good articulation to underpin most of the analysis and insights. More insights and discussions were expected.
80 – 89% A+ A successful collection of data from various social media sources.
The code is mostly complete, efficient and clearly commented.
Complete data cleaning has been provided; still some more minor improvements are required.
The report shows a very good interpretation of the data collected.
Code has complete elements for network analysis. A few parts of the code can be improved for better efficiency.
Most of the graphs and measures for network visualisation have been presented.
The report has mostly in-depth discussions and comparison of network characteristics.
Effective application of statistical methods to extrapolate a meaningful understanding of the data.
In-depth analyses strengths/weakness of academic argument with insightful conclusions.
Comprehensive coverage of social media analytics.
Excellent interpretation of the experiment and results.
An appropriate solution is provided that reflects a good understanding of different techniques.
Code has been provided to a very good standard that is mostly well structured with clear comments.
Very good report with mostly complete explanation of the adopted method(s).
Very good discussions of the interpreted results. Very good articulation to underpin the analysis and insights.
90 – 100% A* Accurate and efficient collection of data from various social media sources.
The code is complete, efficient and clearly commented on.
Complete data cleaning has been provided.
The report shows an excellent and comprehensive interpretation of the collected data.
Code for network analysis is complete, efficient and error-free.
All graphs and measures for network visualisation have been well presented.
The report has impressive in-depth discussions and comparison of network characteristics.
Excellent application of statistical methods to extrapolate a meaningful understanding of the data.
Exclusive focus on research papers.
In-depth analyses strengths/weakness of academic argument with insightful conclusions.
Excellent coverage of social media analytics.
Excellent interpretation of the experiment and results.
Very efficient solutions are provided that reflect a good understanding of different techniques.
Code has been provided of a high standard that is well structured with clear comments.
Excellent report with excellent explanation of all the adopted method(s) and excellent interpretation of the results.
Very good articulation to underpin the analysis and insights.
Submission Details: Format: The submission is by submitting a code and report on Moodle.
Regulations:
Re-sit marks are capped at 50% Full academic regulations are available for download using the link provided above in the IMPORTANT STATEMENTS section
Late Penalties
If you submit an assessment late at the first attempt, then you will be subject to one of the following penalties:
if the submission is made between 1 and 24 hours after the published deadline the original mark awarded will be reduced by 5%. For example, a mark of 60% will be reduced by 3% so that the mark that the student will receive is 57%. if the submission is made between 24 hours and one week (5 working days) after the published deadline the original mark awarded will be reduced by 10%. For example, a mark of 60% will be reduced by 6% so that the mark the student will receive is 54%. if the submission is made after 5 days following the deadline, your work will be deemed as a fail and returned to you unmarked. The reduction in the mark will not be applied in the following two cases:
the mark is below the pass mark for the assessment. In this case the mark achieved by the student will stand where a deduction will reduce the mark from a pass to a fail. In this case the mark awarded will be the threshold (i.e. 50%) If you submit a re-assessment late, then it will be deemed as a fail and returned to you unmarked. Feedback:
Marks and Feedback on your work will normally be provided within 20 working days of its submission deadline.
Where to get help:
Students can get additional support from the library support for searching for information and finding academic sources. See their iCity page for more information: http://libanswers.bcu.ac.uk/
The Centre for Academic Success offers 1:1 advice and feedback on academic writing, referencing, study skills and maths/statistics/computing. See their iCity page for more information: https://icity.bcu.ac.uk/celt/centre-for-academic-success
Additional assignment advice can be found here: https://libguides.bcu.ac.uk/MA